Some Odds for Comparison
|Being Struck by Lightning in Your Lifetime||1/12,000||8.3*10^-5|
|Being Struck by Lightning within a Year||1/1,000,000||1*10^-6|
|Being Struck by a Meteorite in Your Lifetime||1/1,600,000||6.3*10^-7|
|Having Identical Quadruplets||1/15,000,000||6.7*10^-8|
|Being Killed by a Vending Machine||1/112,000,000||8.9*10^-9|
- "How Dangerous is Lightning?" - National Weather Service
- Brian Clark Howard, "What Are the Odds a Meteorite Could Kill You?", National Geographic, 9 Feb 2016
- Andrew Carter, "15 Things More Likely to Happen than Winning Mega Millions", The Daily Beast, 30 Mar 2012
Some Technical Details
This calculation is performed using a binomial distribution, which treats the pool of voters as infinite and fluid. Of great significance is the fact that it does not take into account that the sampled population is part of the voting population. A hypergeometric distribution would be ideal, treating the finite numbers of voters and accounting for the fact that voters surveyed are part of the voter population. However, the hypergeometric distribution is a much harder calculation. Testing has shown that for a surveyed population of a little more than 800, results are good to at least 2 significant digits for voter populations all the way down to 50,000 and even somewhat below that. This is more than sufficient accuracy for any statewide election. This calculation may be updated to use a hypergeometric distribution for small voter populations in order to make this tool useful for small local elections.
Binomial Distribution Calculation
If we know the portion of the population voting for Measure I is 10%, then, we could surmise that the odds of selecting a person and finding that they support Measure I is also 10% = 0.1. Surveying two people and finding 1 for and one against would have odds of %10 * %90 = 0.1 * 0.9 = 0.09 (%9). Finding a supporters of the bill and b non supporters would have odds of 0.1a0.9b = 0.1a(1-0.1)b. If we define the odds a p, then we get odds of pa(1-p)b.
For the purposes of a survey, the surveyor knows what they found, a and b, but they don't know the actual odds, p. Naively, they might just say that the percentage of people supporting the bill is a/(a+b). However, we know that this is not a very exact answer.
Consider the case where the surveyor asks only one person their opinion, and that person supports Measure I. Using the naive method above, we would then conclude that nearly everyone probably supports Measure I. However, we can quickly surmise that this is not likely to be true based on just one opinion.
Instead, what we do is calculate the expected value for p, also known as the expectation value. The calculation for an expected value is very much like a calculation for a center of mass. Effectively, we add up all the possible probabilities multiplied by the effective probability that we would see the observed survey results with each possible probability. Then we take that and divide it by a sum of all the possible effective probabilities. Since the probabilities could be anything, this adding up is not a simple sum, but rather an integral. the equation comes out as follows:
Compare that to the equation for center of mass.
If we want to find out the portion of the mass that is in a specific range of x, we could calculate it as follows:
|∫m(x)*dx||(Integrating over the range of interest.)|
|∫m(x)*dx||(Integrating over the entire range, xi to xf.)|
Similarly, we can find the odds that the actual probability is in the range pi to pj, by calculating the following:
|∫pa(1-p)b dp||(Integrating from pi to pj.)|
|∫pa(1-p)bdp||(Integrating over the entire range, 0 to 1.)|
In our case, we want to know the odds that we are within 1 vote of dead even odds. So, we define the range as (0.5-1/v) to (0.5+1/v), where v is the total number of voters. (Note: This range is not quite right, and it has something to do with how the binomial distribution is not quite right for problems involving finite numbers of discreet units, such as people voting, but we will refine this range later.)
The denominator, with the full integral from 0 to 1, as it happens, simplifies to a!b!/(a+b+1)! so we end up with:
|[(a+b+1)!/(a!b!)] ∫pa(1-p)b dp||(Integrating from (0.5-1/v) to (0.5+1/v).)|
Now, we can take the remaining integral and note that 1/v is really very tiny. Over the range of this integral, we can expect that the value of the function being integrated does not change significantly, or more importantly, that the slope of the function does not change significantly. So we will just calculate the value at p=0.5 and multiply by 2/v to get the value of this integral. Hence we have:
This is what we will calculate, with some minor modification regarding the range, as noted above.
Insights from the Hypergeometric Distribution
Cranking numbers using a hypergeometric distribution, rather then the slightly less apropos binomial distribution. It comes out very similar and very different at the same time, but in rather expected ways. Basically, if the number of people voting is even, then the odds of making a difference is about 50% smaller than if the number of people voting is odd.
This is expected because we have defined the range of significance as being values within one vote of dead even. (50% voting for candidate A, and 50% for candidate B.) The binomial distribution is really most appropriate for an infinite pool being sampled from, and thus, it is a continuous distribution, so we integrated from 0.5-1/V to 0.5+1/V (effectively (0.5*V-1)/V to (0.5*V+1)/V), regardless of whether 0.5*V even makes sense. (Any voter tally should be a whole number, because voters are quantized.) The hypergeometric distribution, however, is more appropriate for finite sets, and as such, the calculation involves a (really enormous) summation rather than an integration. When the total number of voters, V, (excluding one's self) is odd, a dead even result is not possible, so one can only have two states in which one's vote would make a difference. When V is even, then there is only one state of significance. Dead even is the only significant state, because any deviation away from dead even will result in one or the other candidate being ahead by 2. Since the odds of each state vary little in this tiny range 50% fewer states comes out to a 50% smaller probability of being significant.
Practically speaking, the uncertainty in V means that odd and even V's are just about equally likely for most elections, so we can simply average to estimate the effective odds as being 25% smaller than the calculated odds for an odd state, or 50% larger than the calculated odds for an even state.
The odds calculated using the final equation derived using the binomial distribution match the larger value (odd V) to 2 significant digits, which makes sense, considering that the V/2 +/- 1 which was integrated over covers a range of 2 effective votes/states, whereas for even V, the range would need to be only one state. Rather than making the distinction, we will multiply by 0.75 in order to account for the uncertainty in V as described above.
The calculation based on the hypergeometric distribution is much slower for normal uses, however an implementation based on the hypergeometric distribution may eventually be added for smaller values of V since this is where the results start to deviate significantly, and the calculation should be more tractable for smaller V.
The final calculation now becomes: