Using play by play data from the 2020 NBA season there are identifiable instances of coaches overreacting to foul trouble. Only a small percentage of players foul out of games because they are intentionally prevented from being put in the position to foul out even though there is no penalty for fouling out besides being disqualified from the game. Coaches do this so they can have their best players late in the game when the outcome is most clear, but risk taking out a player out for too much time putting their team at a big disadvantage.
Using the machine learning technique K means clustering, players were grouped into three different clusters based on in-game statistics like points per minute, fouls per minute, rebounds / minute, and other traditional basketball metrics. Those three groups were classified as:
- Reliant Starters
- Big Men
- Role Players
These clusters gave a better overall picture of which players are most similar to each other. With that information, fouls per game can accurately be evaluated for each player type. Fouls per game became the main component in predicting the probability of fouling out of the game using the Poisson distribution. Predicting the likelihood of a player fouling out in the next X minutes of game play can give a coach a better understanding of the risks involved with leaving a player in the game or taking them out. With this understanding, coaches can optimize the amount of time a player can participate without sacrificing the player’s vulnerability to fouling out.
There is an unwritten rule in the NBA that if a player gets into foul trouble you must take him out immediately without a second thought. Right there on the spot. It doesn’t matter if it’s your best player or the last player off the bench.
Unlike baseball, hockey, and football, basketball games prevent players from having too many fouls in a game. The rule itself makes sense. A player who fouls everyone every time down the court would stop the pace of play and put others in physical danger. With the upper limit of six fouls in a game, players with four or five fouls start to become more hesitant to play tough defense in fear that they can be disqualified from a game.
Common wisdom suggests the coach of a team wants to use their best players as much as possible, but they’re also faced with the possibility of losing one or more of them due to the foul out rule. Coaches will try to prevent this from happening by substituting out a player if he gets too many fouls early in the game to make sure he’s available in the final, pivotal moments.
But is this a legitimate fear? Do the minutes at the end of the game mean more than minutes in the second quarter? Are players typically known for not being safe enough when they have a higher amount of fouls? There’s no penalty for a player being ejected from a game due to fouling out, so why do coaches treat it more sensitively than any other aspect of the game?
This article hopes to prove the myth of foul trouble is blown out of proportion and the repercussions when a player spends more time on the bench due to a danger that’s not even real.
As always, code to my program and analytics can be found here: https://github.com/anchorP34/NBA-Analysis
Where did this data come from?
This data came from play by play data of each game in the 2020 season. All data was web scraped from https://www.basketball-reference.com. Here is an example image of how the data was ingested:
As with most data, there are some assumptions that had to be made due to inconsistencies with data.
- There are times where a player was playing in a game, a quarter ended, and then was substituted into the game. We are going to assume they were substituted out during the break between quarters.
- There are times where a player was not playing in a game, a quarter ended, and then they had an event in the game (scored points, made an assist, got a rebound, etc). We are going to assume they were substituted in during the break between quarters
This analysis also focused specifically on only the first four quarters of every game. Some games went into overtime and it is not comparable to count fouls in overtime since not every game goes into overtime but every game has at least four quarters.
Establishing Some Ground Rules
What is foul trouble?
A player is considered to be in foul trouble under the following circumstances:
- Has at least two fouls in the first quarter of a game
- Has at least three fouls in the first half of a game
- Has at least four fouls in the first three quarters of a game
- Has five fouls in the fourth quarter
Is there a way to establish if this is true? One way to determine if there truly is a difference of actions depending on whether a player becomes in foul trouble is to see what percentage of players are taken out of the game after they commit a foul. The chart below breaks down the percentage of players removed from a game immediately after a foul depending on whether they were in foul trouble and the quarter it occurred in.
This bar chart shows the percentage of time a player gets immediately taken out after a foul. Blue bars represent players who either became or were in foul trouble on the foul, red represents players who fouled but did not become in foul trouble. There is clearly a difference in size between the red and blue bars. Players who are in or become in foul trouble are substituted out of the game at much higher rates than players who are not in foul trouble. Almost 70% of players who are in foul trouble in the 2nd quarter are immediately taken out of the game. This exemplifies that coaches believe there is a strategic advantage of taking a player out when they are in foul trouble, but how often are players at risk of fouling out?
How often does a player foul out?
There might be high urgency to prevent players from fouling out, but is the paranoia legitimate? There would be cause for concern if six fouls in one game were too few for players to be comfortable. If that was the case, we would see a high frequency of players foul out over the course of the season. The figure below shows that is not the case.
The chart above shows the percentage of starting players who fouled out given the quarter and number of fouls they had. For example, players who had two fouls in the 2nd quarter ended up fouling out of the game 3% of the time. If fouling out of a game was something to worry about we should see high percentages as the number of fouls increases, but that is not the case. Players who had five fouls in the last quarter only fouled out 16% of the time. The odds are still in a player’s favor if they are in the last quarter with five fouls in the game. So far, there seems to be no reason for coaches to succumb to the fear a player is going to foul out.
If you’re curious who in the world got four fouls in the 1st quarter, check out Russel Westbrook’s not so glorious start to a game here: https://www.basketball-reference.com/boxscores/pbp/201911130HOU.html
Assignment of Players
All NBA players are not created equal. There are five positions on the court, but over the years the NBA has seen a transformation of five unique type of players mold into players who can fit any position. We’re now seeing big men, who used to just sit under the basket, pull up and shoot from 25 feet away from the hoop. A game where the fundamental idea used to be working the ball inside has shifted to see how often you can launch the ball from the three point line. Point guards now play isolation basketball with the intention of getting fouled to shoot free throws. It’s a whole new game.
If there used to be five distinct types of players for each position on the court, how many “types” of players are there now? Using a K-Means clustering machine learning algorithm, it was determined there were three groupings that would have the most explanation of describing groups of players without having too many specific groupings. The Elbow Method used to make this decision can be viewed here.
There were some data transformations that needed to happen as well. Instead of looking at data aggregations from the game level, it was changed to summarizations at the minute level. The reasoning behind that is some players might be impactful role players but do not get the same minutes as some of the starters. It’s assumed the more minutes for a player, the more points, rebounds, and assists they will produce so breaking it down at the minute level makes it a more equal playing field.
There was also normalization of data that needed to occur. It can be confusing on what is considered bad, good, or great when looking at different metrics. Is 0.8 points per minute good? To make things easier, all metrics were scaled to represent percentiles on a scale from 0 to 1. If a player was in the 90th percentile of a category, they would score 90%. The higher the percentile, the more that player was a leader in that category. Note, this can be good and bad. 99th percentile in points is great, 99th percentile in fouls per minute is awful.
Certain players needed to be filtered out of the dataset as well. This dataset only considered players who played at least 28,800 minutes during the year which is equivalent to 10 full games worth of minutes. This was done to filter out players who might have only come in for small amounts of garbage time or could have been injured for most of the year.
The following chart shows the different metrics that were used in grouping players together, the group they fell into (1, 2, or 3), and the median percentile the group of players had.
The three clusters of players have an identity associated with each of them.
- Cluster 1 — Reliant Starters. These players start 89% of their games, have high scoring per minute, and are successful shooters from three point range. There are 105 players in this grouping.
- Cluster 2 — Big Men. These players have high field goal percentage and high two point percentages to go along with their high rebounds per minute. There are 71 players in this grouping.
- Cluster 3 — Role Players. These players have percentiles that are mostly in the 30% and never go higher than 60%. They come off the bench and fill the gaps when necessary. There are 178 players in this grouping.
Fouls Per Minute Meets the Poisson Distribution
Now that players have been assigned to different clusters, what can that tell us about how coaches should determine whether to keep or leave players in a game based on the number of fouls they have? The main idea is to determine the expected number of minutes a player would have played on any given night and the number of fouls per minute for that player to predict the probability they foul out of the game. If the probability of them fouling out of the game is above a threshold, they should be allowed to stay in the game. If it is below the threshold, they should be kept out of the game until the probability becomes lower than the threshold.
For example, let’s say LeBron James typically plays 32 of the 48 minutes in an NBA game. After eight minutes of playing in the first quarter, he has two fouls which would put him in foul trouble. James averages one foul per ten minutes played. Instead of thinking about the typical notion players should be removed due to foul trouble, it should shift to “What is the likelihood of getting four more fouls in the next 24 minutes?”. In addition, the coach has decided he doesn’t want the probability of LeBron fouling out to be higher than 20% this early in the game.
The traditional statistics approach to solving problems along the lines of “How many times will this event happen in a specified period of time?’’ relates to the Poisson distribution. The Poisson probability density function (PDF) looks like this:
In this case, the X in the formula would be the number of fouls until LeBron fouls out and ƛ would be the average number of fouls per minute multiplied by the number of minutes remaining in the play period. Going back to our previous example, James averaged one foul per ten minutes played and there is only 24 minutes (32 minutes typically played — eight already played) remaining in the game, so we would expect him to get 2.4 fouls (1 / 10 * 24) during that time. Rewriting the equation to see the likelihood of getting four fouls in that time frame would look like this:
The probability of LeBron getting four fouls in the next 24 minutes he plays is 12.54%. This is where the threshold comes into play. Since the coach’s threshold is 20%, it can be determined LeBron can continue to play at his normal time slot without fear of fouling out in the game.
However, that probability is if LeBron gets exactly four fouls. It is easier to interpret “What is the likelihood LeBron fouls out of the game?”. Another way to ask the same question but slightly different is “What is the likelihood LeBron doesn’t foul out of the game?”. By calculating the probability LeBron doesn’t foul out of the game, we can take the compliment (1 minus the probability) to calculate the likelihood of fouling out. Taking the same example, the calculation transforms to this:
The likelihood that LeBron gets zero to three fouls in the remaining 24 minutes is 77.9%, meaning the probability he gets more than three fouls is 22.1%. This time with the threshold of 20%, LeBron should head to the bench to prevent the likelihood of him fouling out. For reasons why the Poisson distribution was used, click here to see more information in the appendix.
This strategy can continuously be used throughout the game to evaluate player probabilities of fouling out. For a more accurate assessment of the expected fouls per game at the player level, the average fouls per game can be selected depending on the cluster the player was assigned to. The distribution of fouls per minute of each cluster can be shown in the box plot below.
Players in Clusters 1 and 3 have similar distributions to each other, but players in Cluster 2 have a higher rate of fouls per minute. The breakdown of the average foul per minute for each cluster is
- Cluster 1: .07 fouls per minute
- Cluster 2: .11 fouls per minute
- Cluster 3: .08 fouls per minute
These values put together the final part of the Poisson probability function for calculating the likelihood a player will get X more fouls in the remaining time he’s expected to play. This can be applied to all fouls that occurred during the season to get a final understanding of when the risk becomes too high for a player to remain in the game.
When A Player Should Actually Be Pulled
We’ve talked a lot about why coaches are too reactive to taking out their players to prevent fouling out, but we haven’t given the answer of when it is wise to take players out of the game. Using foul data from the 2020 season and assigning players to their cluster, the visualization below walks through the likelihood of a player fouling out of a game in different scenarios.
This chart represents the likelihood of fouling out of a game if a player was going to play X more minutes. For example, the blue dashed line at 20 minutes represents the likelihood of a player fouling out of the game having four fouls and belonging to Cluster 2 is 65%. This does not mean there is 20 minutes left in the game but rather represents 20 minutes of play time for the player. There is a vertical line representing the length of time for each quarter.
Cluster 1 is a solid line, Cluster 2 is a dashed line, and Cluster 3 is the dotted line. Each color grouping has Cluster 1 on the bottom and Cluster 2 on the top. This can be most easily visualized with the purple grouping. This chart follows common wisdom that the fewer fouls and closer to zero remaining minutes a player is, the less likely they will foul out of the game.
Focusing in on the Cluster 1 (the solid, colored lines), if a player gets two fouls in the first quarter there is still less than a 40% chance they foul out if they were to play every second for the rest of the game. If the coach is not planning on taking out the player for one second there is still a favorable chance they don’t foul out. This also holds true with having three fouls in the second quarter and four fouls in the third quarter. Once again, if a player in Cluster 1 is in foul trouble in the first three quarters, they could still be expected to play every second of the remainder of the game and be in favor of not fouling out. This group of players should rarely be taken out due to foul trouble and should be taken out at their regular intervals for rest. This is also important because Cluster 1 represents the Reliant Starters and most likely the best players on the team.
Cluster 2, the Big Men group, needs to be more cautious about foul management because they are the group at most risk of fouling out but should still be optimistic. For example, if at halftime a player in Cluster 2 wanted to play the remainder of the game without coming out having three fouls has a 50% probability of fouling out of the game. However, an early 3rd quarter foul puts the probability of fouling out over 70% so the swings can start to become large.
Cluster 3 are for role players and are in the middle of clusters 1 and 2. Since these players are most likely players coming off the bench, they are expected to have less playing time than the other clusters and can therefore move closer on the right of the Minutes Remaining axis. These probabilities are consistently low and should rarely be in trouble of fouling out.
There should be no reason why there are so few games where a player fouls out. Coaches seem to be in constant fear of an outcome that has no real punishment. The idea that time at the end of the game is more valuable than at any other point in the game is an illusion. The reason why coaches want their best players on the court at the end of the game is because the outcome is easier to see than at any other point, but sacrificing playing time of those players early in the game puts teams in worse position at the end of the game. Understanding foul trouble management gives coaches the ability to prevent hitting the panic button too early and putting their team in the best position to win the game.
Yes, there are more factors that can go into foul trouble than just the number of fouls and the desired remaining playing time. Topics like who the team is playing, home and away, opposing players, and even the referees calling the game can make a difference in how fouls are called. From a high level overview, the Poisson distribution illuminated the focal points where coaches should be aggressive or hesitant on leaving their best players in. Too often, analytics have a reputation of giving reasons to overthink situational game play rather than give the most appropriate answer. The results of this analysis should provide coaches a reason to keep players in when they are traditionally considered to be in foul trouble instead of giving a reason to take them out. If the idea is to get the maximum number of minutes out of the best players, foul-outs should be far more common. There’s no bonus given for getting on the team bus after the game with one or more of your fouls still in your back pocket.
The Elbow method to determine the number of clusters to pick for the K means clustering algorithm. This method is common in choosing the right number of clusters by determining the total within sum of squares (TWSS) between points and their centroid. Once K begins to increase, the TWSS slowly becomes smaller and smaller. The idea is to choose the smallest value of K before the change in TWSS doesn’t give more information about the centroid’s location. In this case, 3 was the number chosen.
The Poisson distribution has some properties about it making it a great choice for calculating probabilities with foul trouble.
- It focuses on distributions with rates (number of car accidents a month, number of lightning strikes in a storm per hour, etc).
- It has a memoryless property. This means it doesn’t matter how long its been since the last event, it is only calculating based off the current time. This is great for this because the 2nd foul for a player shouldn’t have an impact on their 3rd foul (excluding intentional fouls at the end of the game)
- The Poisson distribution works with discrete events. You can’t have 3.4 fouls in a game, you can only have discrete values.
One other idea that was not used in this analysis is the Poisson distribution is related to another distribution called the Exponential distribution. The Exponential distribution calculates probabilities between the discrete events. For example, what is the probability of there being 5 minutes between fouls? This could help give more information about foul information for later in the game of the likelihood for when the next foul will occur.