Since 1985, the number of wins from the 5th seed over the 12th seed in the first round of the ncaa tournament are 65 wins to 96 total games played (that is a 67.77% win prediction). Considering how “experts” seed teams, one could expect that that the number 5 seed would beat the number 12 seed… but not all of the time. If we give the seeds weight, then the experts would predict that the the number five seed would lose 5/16 times (this is giving them the benefit of the doubt) and win 11/16 times in their matchups. That is an expected win percentage of 68.75%. Is this significant? I could be.
<WARNING... The rest of this argument makes use of Statistics>
Calculating a z score we find that z = (observed value- expected value)/(standard deviation of the expected values)
Observed value = .6777
Expected value = .6875
Standard deviation of the expected values can be calculated using the binomial formula (s = sqrt(np(1-p))) since our data only considers wins versus losses and 96 games provides a large enough sample to satisfy the conditions of the equation.
Thus, s = sqrt(96*.6875*(1-.6875)) = 4.541475
And z = (.6777-.6875)/ 4.541475 = -0.0021578892
Now lets set up a hypothesis test.
The null hypothesis would reflect no difference between the observed and the expected, ie., = . That is, = .6875. whereas the alternate hypothesis would indicate that < , or < .6875.
To be sure lets set a level our of significance to rather conservatively, say 5%.
To determine significance, we need only compare the observed z = -0.0021578892 with the 5% critical value z* = -1.645 from a standard z table. Because -0.0021578892 is not farther than -1.645 from zero (not by a long shot!) we cannot conclude that the alternative hypothesis is correct to a 5% level (or even much less) and must accept the null hypothesis.
Furthermore, to those who would make the argument, “there is always a 5/12 upset in every tournament”, I respond with, “is that really surprising?” With an expected win percentage of approximately 69%, and assuming that one 5/12 game does not have an effect (that is… is independent of) any other 5/12 game in the tournament. Then the probability that all 5/12 splits have 5 as the winner over 12 is (.69)^4 = .2234 or about 22% of the time… restated78% of the time at least one of the will win in the first round.
Add on the normal fluctuations of a low number of samples (by law of large numbers standards) and a recent “streak” of #12 wins is easily explained.
Very cool. This makes an interesting argument for talking about how well basketball can be modeled by these statistics since no 16 seed has beaten a 1 seed, even though the numbers say it should happen every so often.
Making the jump from seed weights to probability is definately a questionable part of my argument. Given that there have been 67 tournaments (starting in 1939), and the seed weighting would have us expect 1 upset out of every 16 games between a one seed and a sixteen seed... we should have seen an upset by now... but then again maybe not. If I flip a coin four times it is unlikely I get 4 heads, but not impossible... or even highly unlilely (.5)^4= 6.25%.
But you will also note that I considered the weighted probabilities to be generous... and lowering these probabilities only strengthens my argument in regards to the five twelve myth :)
Interesting to look at. I think most 16 seeds these days are winners of small-time conferences. The argument could be made that they could be ranked differently because they are not the 60-64th best teams out there. It just adds a bit to the conversation, especially for sports fans who may not always feel comfortable talking in math class.