![]() |
A STATISTICAL ANALYSIS OF THE |
The Rules of the Game
Twice a week, on Tuesday and Friday, beginning May 18, 2001, a machine in Austin, Texas is used to select five balls numbered from 1 to 35. Lottery players attempt to pre-select the winning numbers to be awarded various amounts of money.
![]() |
Each Two-Step playslip has five places called playboards. Each playboard contains two boxes with the numbers one through thirty-five in each. Four numbers are selected from the top box and one bonus number is selected from the bottom box on any or all the playboards. Provision is made for these numbers to be entered into more than one drawing by marking a multi-draw number from two to ten.On the playslip, it says players can win in the following ways:
The over-all odds of winning for each play board played are 1 in 32. |
Probability of Winning or Losing
The probabilities of the preceding events occurring are calculated as follows. The probability of selecting all five numbers correctly is 1/C(35,4) times [C(4,4) times C(31,0)] times 1/35 (the probobility of selecting the bonus ball because the bonus ball is always selected from 35 balls after the first four balls are drawn) which is 1/1,832,600 which is approximately .000000545672833. The probability of selecting the first four correctly but not the bonus ball is 1/C(35,4) times [C(4,4) times C(31,0)] times 34/35 (the probability of not selecting the bonus ball) which is 1/53,900 which is approximately .000018552875190. The probability of selecting three of the first four correctly and the bonus ball is 1/C(35,4) times [C(4,3) times C(31,1)] times 1/35 which is 1/14,779 which is approximately .000067663429945. The probability of selecting three of the first four correctly but not the bonus ball is 1/C(35,4) times [C(4,3) times C(31,1)] times 34/35 which is 1/434 which is approximately .002300556516275. The probability of selecting two of the first four correctly and the bonus ball is 1/C(35,4) times [C(4,2) times C(31,2)] times 1/35 which is 1/657 which is approximately .001522427191958. The probability of selecting only one of the first four correctly and the bonus ball is 1/C(35,4) times [C(4,1) times C(31,3)] times 1/35 which is 1/102 which is approximately .009811197407544.
Finally, the probability of selecting only the bonus ball correctly is 1/C(35,4) times [C(4,0) times C(31,4)] times 1/35 which is 1/58 which is approximately = .017169594764709. The sum of these probabilities is approximately .030890537858454 which is approximately 1/32 -- the probability of winning anything.
The probability of losing is also interesting to calculate. The probability that none of the numbers will be chosen is 1/C(35,4) times [C(4,0) times C(31,4)] times 34/35 which is approximately .583766222000122. The probability that exactly one of the first four numbers will be chosen and not the bonus ball is 1/C(35,4) times [C(4,1) times C(31,3)] times 34/35 which is approximately .333580702543259. Finally, the probability that exactly two of the first four numbers will be chosen but not the bonus ball is 1/C(35,4) times [C(4,2) times C(31,2)] times 34/35 which is approximately .051762524992228.
The probability of losing is the sum of these numbers which is approximately .969109449535608. Of course this number is 1 - .030890537858454, the probability of winning anything at all.
Odds Versus Probability
On the playslips, it states the odds of winning. However, as we have seen above, the numbers actually printed on the playslips are the probabilities of winning. These numbers are usually quite different. If p is the probability of winning an event, then 1 - p is the probability of losing that event. The odds of winning that event are the probability of winning divided by the probability of losing or p/(1 - p). Suppose the probability of winning an event were 1/3. Then the probability of losing the event would be 2/3, so the odds of winning that event would be (1/3)/(2/3) = 1/2 which is quite different from the probability of winning. Fortunately, when the probabilities for winning an event are very small, the probabilities and odds are very close to the same number. In the case of the Texas Two-Step Lottery, we have the following:
ODDS VERSUS PROBABILITIES |
|||
| ODDS | PROBABILITY | DIFFERENCE | |
| Matching 4 of 4 plus BB | 0.000000545673131 | 0.000000545672833 | 0.000000000000298 |
| Matching 4 of 4 | 0.000018553219405 | 0.000018552875190 | 0.000000000344216 |
| Matching 3 of 4 plus BB | 0.000067668008595 | 0.000067663429945 | 0.000000004578650 |
| Matching 3 of 4 | 0.002305861280469 | 0.002300556516275 | 0.000005304764194 |
| Matching 2 of 4 plus BB | 0.001524748510551 | 0.001522427191958 | 0.000002321318593 |
| Matching 1 of 4 plus BB | 0.009908410781718 | 0.009811197407544 | 0.000097213374174 |
| Matching BB | 0.017469539681772 | 0.017169594764709 | 0.000299944917062 |
The difference column would seem to indicate that there would be no problem using the terms odds and probabilities interchangeably when discussing the Texas Two-Step Lottery.
Randomness of the Lottery
The most important property of any lottery is that the numbers be chosen randomly. In order to test the Texas Two-Step numbers, the following measures were used: frequency of the numbers chosen, the mean, standard deviation and the Chi square test.
Frequency of Numbers Chosen
Theoretically, the probability P(x) that any given number x will be one of the first four drawn is:
which is the hypergeometric probability formula. So the number of times we expect x to occur in n drawings is n times 4/35. Since there have been 100 drawings at the time of this writing, x should have occurred 100 multiplied by 4/35 times or 11.43 times. Compare this theoretical frequency with the actual frequencies given in the following table:
FREQUENCY OF OCCURRENCE OF |
|||||||
| 1 | 8 | 14 | 10 | 27 | 11 | ||
| 2 | 13 | 15 | 8 | 28 | 16 | ||
| 3 | 22 | 16 | 16 | 29 | 9 | ||
| 4 | 12 | 17 | 18 | 30 | 12 | ||
| 5 | 10 | 18 | 20 | 31 | 8 | ||
| 6 | 11 | 19 | 9 | 32 | 11 | ||
| 7 | 6 | 20 | 11 | 33 | 13 | ||
| 8 | 11 | 21 | 8 | 34 | 9 | ||
| 9 | 11 | 22 | 12 | 35 | 11 | ||
| 10 | 10 | 23 | 13 | ||||
| 11 | 13 | 24 | 8 | ||||
| 12 | 10 | 25 | 14 | ||||
| 13 | 5 | 26 | 11 | ||||
Theoretically, the probability P(x) that any given number x will be the bonus number is:
which again is the hypergeometric probability formula. So the number of times we expect x to occur in n drawings is n times 1/35. Since there have been 100 drawings at the time of this writing, x should have occurred 100 multiplied by 1/35 times or 2.86 times. Compare this theoretical frequency with the actual frequencies given in the following table:
FREQUENCY OF OCCURRENCE OF |
|||||||
| 1 | 2 | 14 | 1 | 27 | 0 | ||
| 2 | 3 | 15 | 3 | 28 | 3 | ||
| 3 | 5 | 16 | 4 | 29 | 2 | ||
| 4 | 1 | 17 | 3 | 30 | 1 | ||
| 5 | 5 | 18 | 2 | 31 | 2 | ||
| 6 | 5 | 19 | 4 | 32 | 6 | ||
| 7 | 2 | 20 | 5 | 33 | 5 | ||
| 8 | 0 | 21 | 2 | 34 | 2 | ||
| 9 | 3 | 22 | 3 | 35 | 5 | ||
| 10 | 2 | 23 | 2 | ||||
| 11 | 4 | 24 | 3 | ||||
| 12 | 1 | 25 | 4 | ||||
| 13 | 4 | 26 | 1 | ||||
Mean, Standard Deviation and Distribution of Numbers Chosen
If the machine is choosing the numbers randomly, the average number chosen from the numbers 1 to 35 should be 18 and the standard deviation should be 10.247. The actual average number chosen by the Texas Two-Step Lottery machine for the first four balls is 17.85 and the actual standard deviation is 10.12. The actual average number chosen by the Texas Two-Step Lottery machine for the Bonus Balls is 18.23 and the actual standard deviation is 10.55.
For the Chi square test on the first four numbers, the 35 Texas Two-Step numbers were grouped into 7 intervals containing five numbers each as follows: 1, 2, 3, 4 and 5; 6, 7, 8, 9 and 10; 11, 12, 13, 14 and 15 and so on up to 31, 32, 33, 34, and 35. The following table shows the total number of times the numbers occurred in their respective intervals:
DISTRIBUTION OF THE FIRST FOUR NUMBERS |
|
| 1 to 5 | 65 |
| 6 to 10 | 49 |
| 11 to 15 | 46 |
| 16 to 20 | 74 |
| 21 to 25 | 55 |
| 26 to 30 | 59 |
| 31 to 35 | 52 |
Because 400 numbers have been chosen so far and there are seven intervals, the average number of numbers in each interval is 400/7 = 57.14. Note that this number is also five times the expected occurrence of each number found earlier, 4/35 times 100.
The Chi square test can now be run on the data in the intervals for 100 drawings as follows:
| X2 | = | (65 - 57.14)2/57.14 + (49 - 57.14)2/57.14 + |
| (46 - 57.14)2/57.14 + (74 - 57.14)2/57.14 + | ||
| (55 - 57.14)2/57.14 + (59 - 57.14)2/57.14 + | ||
| (52 - 57.14)2/57.14 | ||
| = | 9.99. |
According to a table of critical values of Chi square1, the Chi square value needs to be at least 10.645 to indicate non-randomness with a probability of at least .9, so it cannot be concluded at this point in time after 100 drawings that the number selections are non-random with an error of 10% or less.
For the Chi square test on the bonus numbers, the 35 Texas Two-Step numbers were again grouped into 7 intervals containing five numbers each as follows: 1, 2, 3, 4 and 5; 6, 7, 8, 9 and 10; 11, 12, 13, 14 and 15 and so on up to 31, 32, 33, 34, and 35. The following table shows the total number of times the numbers occurred in their respective intervals:
DISTRIBUTION OF THE BONUS NUMBERS |
|
| 1 to 5 | 16 |
| 6 to 10 | 12 |
| 11 to 15 | 13 |
| 16 to 20 | 18 |
| 21 to 25 | 14 |
| 26 to 30 | 7 |
| 31 to 35 | 20 |
Because 100 bonus numbers have been chosen so far and there are seven intervals, the average number of numbers in each interval is 100/7 = 14.29. Note that this number is also five times the expected occurrence of each bonus number found earlier, 100 times 1/35.
The Chi square test can now be run on the data in the intervals for ?? drawings as follows:
| X2 | = | (16 - 14.29)2/14.29 + (12 - 14.29)2/14.29 + |
| (13 - 14.29)2/14.29 + (18 - 14.29)2/14.29 + | ||
| (14 - 14.29)2/14.29 + (7 - 14.29)2/14.29 + | ||
| (20 - 14.29)2/14.29 | ||
| = | 7.66. |
According to a table of critical values of Chi square1, the Chi square value needs to be at least 10.645 to indicate non-randomness with a probability of at least .9, so it cannot be concluded at this point in time after 100 drawings that the number selections are non-random with an error of 10% or less.
It will be interesting to keep track of the number behavior for the new Texas Two-Step Lottery as its numbers are drawn to see if a similar analysis shows non-randomness in the number selection process.
1. Weimer, Richard C., Statistics, Second Edition. William C. Brown, Publishers, Dubuque, IA, 1993, p 731.
2. Lamb, Jr., John F., Huffstutler, Ron, Brock, Archie and Aslan, Farhad (Bill). "A Statistical Analysis of the Texas Lottery," Texas Mathematics Teacher, Vol. XLI (1) January, 1994.
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |