The NFL is a unique beast. And when I say that, I am not referring to the style of the game, the number of players playing at any given moment, etc. I am referring to the number of games in a season. It's something we (NFL fans) consider it part of the football chaos. After all, less games means each game holds a tremendous amount of weight.
And that is painfully apparent when looking at playoff odds.
Every year, the sports media will comment on the significant uphill battle teams face if they start 0-3. But that piqued my curiosity - what are the odds of making the playoffs at any given moment in the season?
Data
Finding week-by-week game data for all seasons of the NFL (Super Bowl era) was surprisingly difficult. Websites like Pro-Football-Reference are good options but they have now blocked web scrapers. In fact, if you hit the max number of calls when scraping PFR, you are advised to use an API like NatStat. So away I went.
I had never heard of NatStat but was decent impressed with the wide range of sports league dataset. API pricing was fair-ish ($8/month for access to 1 league and 500 calls/hour). For this analysis I just needed NFL data.
But of course, no dataset is perfect and engineering is needed. The NatStat API v3 has play-by-play data, game results data, and more (like venues, betting lines, etc.). In this analysis, I was just focused on game results.
When designing the data processing, I essentially needed 2 components: each team's individual record game-by-game (or week-by-week) and whether that team made the playoffs at the end of the season. The first challenge was the games data returned results for the entire season, playoffs included. This messed things up when aggregating, as records exceeded the regular season game count (example: 17 in 2024). Great.
So from that point, I knew I needed to create a dictionary of the number of games per season (source: Wikipedia).
Seasons | Number of regular season games per team |
---|---|
1966 | 14 games (15 weeks, odd number of teams) |
1967–1977 | 14 games (14 weeks) |
1978–1981 | 16 games (16 weeks) |
1982 | 9 games (17 weeks, strike) |
1983–1986 | 16 games (16 weeks) |
1987 | 15 games (16 weeks, strike) |
1988–1989 | 16 games (16 weeks) |
1990–1992 | 16 games (17 weeks) |
1993 | 16 games (18 weeks, additional bye week) |
1994–2000 | 16 games (17 weeks) |
2001 | 16 games (18 weeks, September 11 attacks) |
2002–2020 | 16 games (17 weeks) |
2021–present | 17 games (18 weeks) |
So now, I used the records that exceeded the season game counts to my advantage. In other words, if a record exceeded the regular season game count for a given year, that team clearly made the playoffs.
With that process, I had one part of the puzzle - what teams made the playoffs each year. Now, the next part, records per week per team, was next.
Side Note / Pain In The A** Warning: The games data JSON responses were a little weird to say the least. Some data was missing (randomly) and it handled ties as just returning "nulls" in some cases, but picking one of the two teams as the winner in other cases (?). These two issues both took manual adjustments. Additionally, the 500 calls per hour created limitations. Additionally, not mentioned earlier, responses capped at 100 records (WHY, WHY, WHY). To get around both limitations, I had to create a 30-day sliding window across each year, starting at an arbitrary data in the offseason (4/1/YYYY) in this case. What this caused was some windows to have no games data (return error). To fix that problem, a series of try-excepts were needed (not ideal but it works).
Ok, with that out of the way, back to solving records over time. Once the individual games were captured, it was fairly straightforward to aggregate on a game-by-game basis. It starts with looping over the games data and then doing 2 things: creating a new row in a dataframe per game per team but also omitting any games that exceeded that given season's regular season game count. After that data was obtained, I was then able to join the playoff dataframe and the game-by-game dataframe. The result was this table:
Season | Team | Final Wins | Final Losses | Final Ties | Playoffs? | Game | Game-by-Game Wins | Game-by-Game Losses | Game-by-Game Ties | Record String |
---|---|---|---|---|---|---|---|---|---|---|
2008 | PIT | 14 | 5 | 0 | y | 0 | 1 | 0 | 0 | '1 - 0 - 0 |
2008 | PIT | 14 | 5 | 0 | y | 1 | 2 | 0 | 0 | '2 - 0 - 0 |
2008 | PIT | 14 | 5 | 0 | y | 2 | 2 | 1 | 0 | '2 - 1 - 0 |
2008 | PIT | 14 | 5 | 0 | y | 3 | 3 | 1 | 0 | '3 - 1 - 0 |
2008 | PIT | 14 | 5 | 0 | y | 4 | 4 | 1 | 0 | '4 - 1 - 0 |
2008 | PIT | 14 | 5 | 0 | y | 5 | 5 | 1 | 0 | '5 - 1 - 0 |
2008 | PIT | 14 | 5 | 0 | y | 6 | 5 | 2 | 0 | '5 - 2 - 0 |
2008 | PIT | 14 | 5 | 0 | y | 7 | 6 | 2 | 0 | '6 - 2 - 0 |
2008 | PIT | 14 | 5 | 0 | y | 8 | 6 | 3 | 0 | '6 - 3 - 0 |
2008 | PIT | 14 | 5 | 0 | y | 9 | 7 | 3 | 0 | '7 - 3 - 0 |
2008 | PIT | 14 | 5 | 0 | y | 10 | 8 | 3 | 0 | '8 - 3 - 0 |
2008 | PIT | 14 | 5 | 0 | y | 11 | 9 | 3 | 0 | '9 - 3 - 0 |
2008 | PIT | 14 | 5 | 0 | y | 12 | 10 | 3 | 0 | '10 - 3 - 0 |
2008 | PIT | 14 | 5 | 0 | y | 13 | 11 | 3 | 0 | '11 - 3 - 0 |
2008 | PIT | 14 | 5 | 0 | y | 14 | 11 | 4 | 0 | '11 - 4 - 0 |
2008 | PIT | 14 | 5 | 0 | y | 15 | 11 | 5 | 0 | '11 - 5 - 0 |
Results
The rest of the process was straightforward (as is any analytics project where 90% of it is data engineering). I used Seaborn to generate a heat map of the records. Before the heatmap was created, I sort of speculated what it may end up looking like, especially along the matrix diagonal. I initially thought any .500 record in the season would be more or less a 50% chance of making the playoffs. That and other observations proved that to not be true interestingly enough.
There are a lot of interesting observations with the historical data.
Dropping to 0-2, to most, seems recoverable. But shockingly, historically 0-2 teams only made the playoffs 10% of the time.
The "50-50 line" is not at the .500 diagonal, nor is it linear. It follows a pattern of records 1 win above .500 but then splits into a "Y". That can be easily understood as the season concludes, there are less probabilistic paths a team can follow.
For teams 2-2 right now, dropping to 2-3 is a 25% swing. It's not a horrible idea to start considering selling, especially when considering other factors like injuries and even momentum.
The "point of no return" is around 5 losses under .500. The first team to reach that could be the Jaguars this week.
All in all, this was a fun exercise that can be updated year over year. This analysis will be part of a longer series of looking into factors that impact playoff odds. More to come...
Comments