"Once the fans of history get an idea of the people he beat, then they will get a better perspective of him, I’m sure. He’s got an All-Star list of victories." - Mike Tyson on Evander Holyfield (from the Chasing Tyson documentary).
Evander Holyfield is one of the greatest boxers of the 1990's. Unlike many boxers of that time, Holyfield wasn't a flashy fighter. However, he does boast an impressive roster of boxers he defeated: Riddick Bowe, George Foreman, Larry Holmes, Michael Moorer, John Ruiz, and Mike Tyson. The idea of being ranked based upon the quality of the boxers defeated is very similar to how Google ranks websites with the PageRank algorithm. PageRank ranks a website by the sites that link to it. An incoming link from a more prominent website carries more weight than a link from an obscure website. The parallel to boxing is clear, defeating a higher ranked boxer carries more weight than defeating a lower ranked boxer.
As a collaboration with Dr. Tien Chih, we decided to modify PageRank for boxing and apply it to all world title fight boxers for the five major leagues (IBO, IBF, WBA, WBC, WBO) using the world title fights themselves as the "links" for the algorithm. We scraped BoxRec.com for world title fights using Scrapy, cleaned the data, and applied our algorithm to produce a ranking.
PageRank is a well-known algorithm with many great explanations. Our modification accounts for draws and for the outcomes of multiple matches through additional weightings of the adjacency matrix. An in-depth guide to our process can be found in a preprint of our article currently under peer review. There is also an Inside Science article about our ranking of the champion boxers from the 1990's.
The general idea is to use Linear Algebra to let the network rank itself. By our implementation, each boxer is given an "eigenvector score." with the highest score of 1 given to our top ranked boxer — Wladimir Klitschko.
Why PageRank?
There are many ways to rank boxers. Some rankings are point systems like at BoxRec.com. Many others are qualitative, obtuse, or involve hidden formulas. We wanted a powerful, simple, and open ranking system. The code for our ranking system is on our GitHub project page.
Other simple ranking systems tend to do poorly in boxing. Win/Loss ratios make a bad measure as most champion boxers have extremely high ratios. In fact, 6 of the top 10 ranked boxers at BoxRec.com are undefeated. This is complicated by the low number of match-ups in boxing. Of BoxRec.com's top 10 boxers, Manny Pacquiao boasts the highest number of match-ups at 67 over the past 22 years.
Even the decision of who can fight in a title match-up is decided by opaque ranking systems. As Yuen Yiu of Inside Science puts it, "Each of these systems has their own specific point trading mechanism, such as how much a knockout win is worth, or how inactivity affects one's points, each intentionally or unintentionally favoring certain kinds of boxers over others. In addition, these point trading systems are often opaque to the fans and produce controversial results."
In our implementation, we treated each win the same. Draws are worth half a win to both boxers, and losses do not detract from the defeated boxer. A ranking system that punishes losses encourages elite boxers to avoid fights and only fight safe opponents. By not punishing losses, our ranking system has the benefit of rewarding elite boxers who fight up and coming opponents with a shorter time in the sport.
The PageRank system also rewards consistency over luck. A win against a top rated boxer will elevate an up and coming boxer in our ranking, but not necessarily above the top rated boxer he fought. Boxing victories are often not transitive. Often three boxers will defeat each other like in the game Rock, Paper, and Scissors. However, such a victory would elevate the up and coming boxer over boxers defeating lesser opponents. This would allow the up and coming boxer to obtain better fights and maybe eventually overtake a top rated boxer.
However, one drawback to the PageRank ranking rewarding consistency over a few outstanding performances is that the ranking does not explicitly identify rising stars. So when viewing the results, it is important to realize the model is not predicting the outcome of future matches. The model is analyzing the history of matches that have taken place and rewarding the boxers appropriately.
The Rankings
Below are the top 20 boxers by our algorithm with the latest scrape on June 6th, 2017. You can get an in-depth view of the rankings including rankings by weight class and boxer debut year in our interactive tool.
The Top 20 Boxers
Note: Our algorithm only considers world title fight matches. Rankings may change with the inclusion of all match-ups.Ranking | Boxer Name | Alias | Weight Class | Eigenvector Score |
---|---|---|---|---|
1 | Wladimir Klitschko | Dr Steelhammer | heavyweight | 1.00 |
2 | Floyd Mayweather Jr | Money / Pretty Boy | welterweight | 0.97 |
3 | Erik Morales | El Terrible | super bantamweight | 0.96 |
4 | Ricardo Lopez | Finito | minimumweight | 0.92 |
5 | Rosendo Alvarez | El Bufalo | minimumweight | 0.89 |
6 | Oscar De La Hoya | Golden Boy | welterweight | 0.83 |
7 | Thomas Hearns | Hitman/Motor City Cobra | super welterweight | 0.70 |
8 | Manny Pacquiao | Pac Man | welterweight | 0.67 |
9 | Evander Holyfield | The Real Deal | heavyweight | 0.61 |
10 | Lennox Lewis | The Lion | heavyweight | 0.61 |
11 | Marco Antonio Barrera | Baby Faced Assassin | super bantamweight | 0.57 |
12 | Andre Ward | S.O.G. | light heavyweight | 0.54 |
13 | Pongsaklek Wonjongkam | Pongsaklek Sitkanongsak | flyweight | 0.54 |
14 | Keith Thurman | One Time | welterweight | 0.52 |
15 | Sugar Ray Leonard | Sugar | welterweight | 0.51 |
16 | Danny Garcia | Swift | welterweight | 0.50 |
17 | Humberto Gonzalez | Chiquita | light flyweight | 0.48 |
18 | Shane Mosley | Sugar | welterweight | 0.47 |
19 | Jhonny Gonzalez | super featherweight | 0.46 | |
20 | Terence Crawford | Hunter / Bud | super lightweight | 0.45 |
While the ranking is topped by a heavyweight champion, it is surprising that heavyweight champions do not dominate the top-ranked fighters. In the top 100 boxers, heavyweight and welterweight champions are over-represented while minimumweight, middleweight, and bantamweight champions are under-represented. However, this over-representation is not as extreme when scaled by the total number of world title fight boxers in each weight class.
The two graphs below have the raw count of boxers in the top 100, and the percentage over/under representation from each weight class. If the Top 100 boxers were represented as a perfect sample of the world title fight boxers, each weight class would have a baseline percentage of 3.01% of their boxers represented. The percentage difference shown in the second graph above is the difference between the actual percentage representation versus the baseline representation of 3.01%.
Below is an interactive graph of the top 100 boxers. We only consider world title fight matches. As a result, many canonically famous matches are absent from the graph. The links are directed from loser to winner, and the size of the nodes are the eigenvector score.
You can see from the graph that the boxing network mostly breaks up into weight classes. Wladimir Klitschko dominates the heavyweight class while Floyd Mayweather Jr dominates the welterweight class.
The boxing network also contains insights into what the ranking sees in the top 100 boxers. Below is a chart showing the distribution of the number of world title fighters defeated for the top 100 boxers versus all boxers. For example, nearly 60% of all title fight boxers have not defeated another boxer in a world title match.
There are many insights within this chart. The top 100 boxers more consistently win world title fights with a much higher variation of the number of opponents they defeat. This is behavior should be expected by our ranking as it values consistent wins. We also see from the chart that there are six top 100 boxers who have only won a single world title fight. A famous example of one of our top 100 boxers that only have a single world title match victory is Tyson Fury, ranked 90th, who defeated Wladimir Klitschko in 2015. There is also a boxer who defeated 21 opponents but did not make it to the top 100, Anucha Phothong.
Summary
There is no universally accepted ranking system in any sport. Each model will emphasize different characteristics, and it is difficult to make quantitative systems align perfectly with our qualitative judgments. PageRank provides the benefits of being simple while powerful. It allows the boxer network to rank itself, and by not punishing losses, incentivizes champions to keep fighting. It values consistency but also rewards defeating tough opponents.
While we focused on Boxing, PageRank based ranking algorithms can be a powerful tool for other sports where match-ups are sparse. Similar rankings have been used in Football and Table Tennis. We hope this provides a valuable method for your sports rankings toolbox!
The project GitHub repository can be found here.