Why You Shouldn't Use Mathhammer: A Guide to Statistics in Warhammer 40k
This article is going to be half rant, half informational, and hopefully, partially educational. My goal is not to eradicate mathhammer, but to help tone it down. The reason, which I hope that I will adequately argue, is that mathhammer, as it is often used on the internets, is a very poor representation of the game world.
As all of us are aware, Warhammer 40k (and most other wargames) is played with a 6-sided dice, or D6 for short. The D6 is used in one of two ways in Warhammer. It is either a boolean (yes-no) outcome determiner, such as when you roll to hit in combat, or a random number generator, such as when you roll for leadership. In either case, a single D6 has a special statistical distribution unique to it, which is the discrete uniform distribution of one through six, of D6~U(1,6) for short. The special thing about the uniform distribution is that all outcomes are equally likely (given a fair dice). It should be no surprise that rolling a 1 is as equally likely as rolling a 6, no matter how much we wish it were true. We also know the expected value, or average value or mean, and the variance, or spread, of the D6. For a uniform distribution, the average value is (b+a)/2, where b is the highest value and a is the lowest value the distribution can take, which gives us a mean of 3.5. The variance of a uniform distribution is equal to (((b-a+1)^2)-1)/12, or in the case of the D6, 35/12, or roughly 2.92.
There is something important to take away from the distribution of a D6. For starters, you should notice that no matter how hard you try, you will never roll a 3.5 on a D6. What the average value tells you what the average of all the rolls would be if you rolled the dice an infinite number of times, not, and I cannot emphasis this enough, what will happen each time you roll the dice. We all inherently know this, but we are often mislead by expected values when mathhammer is involved. We can say what is expected in each dice roll, with relative confidence, depending on the variance (and skewness) of the distribution. So for the D6, while 3.5 is the average value, who says that 3 or 4 isn't more likely that any other value? Say we didn't know that it was a uniform distribution, we could estimate (poorly) that 68% of the outcomes would lie within one standard deviation (the square root of the variance) of the mean. So in this case, 68% of the dice rolls should be between 1.79 and 5.2. Thus there is a 68% you'll roll a 2,3,4 or 5. Compare this to the true chance, 67%, and it isn't that different. But is equally likely that you'll roll a 1,2,5 or 6, and completely miss the middle two values. Thus there is nothing special about those two values at all.
Let us consider the easier two uses for the D6, which is a random number generator. The most common time we use this is when we roll for leadership tests, but is also used to determine vehicle damage, number of attacks for models with random attacks and the number of inches in a scatter. Let us look at leadership tests in particular. A imperial guardsman has a standard leadership of 7, so in order for this guardsman to hold his ground, he must roll 2D6 and get a sum of less than or equal to 7. So what is the distribution of a sum of 2D6? The easiest thing is to count the possible outcomes, as below:
For our leadership 7 guardsman, he only has a 58.3% chance of holding his ground when put into a stressful situation. That is why we attach sergeants (Ld 8 = 72.2% success), and commissars (83.3% success). How do we deal with rerolls though? When we reroll failures we suddenly dive a little deeper into statistical theory, and we begin to experience conditional probabilities. The probability of a reroll is the same of the original roll, given that the original roll was failed. Since we don't reroll if we pass, the probability that a reroll was successful is 0% if the original roll was passed, since we never roll the reroll. So the P[Reroll Passed|First Rolled Failed] = P[Pass]P[Fail], or in the cased of leadership 7, = (58.3%)(100%-58.3%) = 24.3%. You add this to the probability that you passed the first roll, and this gives the probability that you pass with a reroll. I know that was not explained that well, but the best I can say, is if you want to find the chance that you'll pass with a reroll, add together the chance you'll pass the first roll, and the chance you'll pass the second roll times the chance you failed the first roll.These are shown below:
So what is the point I want to get across here? This is an acceptable use of mathhammer because the probabilities are describing one off events, and there is nothing declaring that you'll pass x leadership tests per game. There is no such thing as passing .72 leadership tests per leadership test, but you can say that you'll pass 72% of your leadership tests. The first is mathematical blasphemy, and you should be executed by the Mathematical Inquisition for uttering such words.
Unfortunately, this is not the extent of the horrors uttered by mathhammer enthusiasts. The worst is when it comes to two units, duking out it out in successive shooting rounds. I'll give an example:
Let me explain. Remember that the average value of a distribution tells you only one thing; what the average outcome of an infinite number of trials will result in. So if I played an infinite number of games, where this situation happened an infinite number of times, I would expect to see these numbers (expect for that fact that there is no such thing as 4.4 marines, and the binomial formula is only intended to use discrete values). What I want to know when I look at this situation are two things: What is the chance that I'll lose, and how many turns can I expect to hold back the marines? There is only one way to answer the first question, and it is through running simulations.
Ideally we want to run as many simulations as possible, for which a computer would be best programed to do. We will determine each round, by random chance, exactly how many times the marines wipe the guard. You can perform this in real life as well, but a computer can do it infinitely many more times than a human can in the same amount of time. What this is called is a Monte Carlo Experiment (wiki). You can do this very quickly by rolling dice, say five times, like I just did. In this experiment the marines won each time, and lost an average of 1.2. But how confident am I in these numbers? 0%. But if I did this each morning before I left for work, which took about 15 minutes, I would have 1825 total samples, which I am willing to bet that the guard would win atleast once in there. My bet is that the guard would win a respectable number of times, close to 10 or 5%, because even in my experiment this afternoon, the guard nearly won. But this issue with this is that I am only human, and there are only 24 hours in a day. A computer can run the 1825 simulations in less time than it took me to do 5. The more times this is ran, the more accurate it would be. Other great things we could do with Monte Carlo experiments would be things impossible to do with mathhammer, such as finding the average number of hits under a scattering blast marker, given certain distributions of squads.
I have one last gripe against mathhammer.
Say you have a unit of ten guardsmen in rapid fire range of a unit of two units of Space Marines. One unit is an terminator assault squad of 3 guys, and another is a tactical squad of 5 guys. The guardsmen have 6 lasguns, 1 laspistol, 1 plasmagun and 1 heavybolter. The game is kill points, so I need to decide which unit I have to most chance of wiping out this turn.
The most common tactic I see is to calculate the average kills of each weapon and figure out which has the highest average. So the the 13 las shots will kill (on average) 0.72 power armored marines or 0.36 terminators. The plasmagun will kill (on average) 0.83 of the tactical marines and 0.55 terminators. And the heavy bolter will kill (on average) 0.33 power armored marines or 0.17 terminators. Adding these all together you will kill (on average) 1.88 tactical marines or 1.08 terminators.
This doesn't actually say anything about what is my better option. Yes I kill more tactical marines, but I kill I higher proportion of the terminators. It is mathematically true that adding the averages together gives you the overall average. But that isn't the answer I am looking for. What I am looking for is whether the probability that I'll cause 5 or more unsaved wounds on the tactical marines is less than or greater than the probability that I'll cause 3 or more unsaved wounds on the terminators.
For each weapon you can built yourself a probability mass function. The chances that a group of 13 las shots will kill x marines is a binomial distribution with p = (1/2)(1/3)(1/3) = (1/18). Thus q = (17/18). We have n = 13. This gives a probability mass function of x of:
You can do a similar thing for the plasmagun and the heavy bolter. First then plasma gun:
Then the heavy bolter:
What we are interested in the end then is what is the probability that x+y+z >= 5. We have to use some probability rules. For x+y+z to be greater than 5, if x >= 5 then y&z only need to be greater than 0. So P[X >= 5]*P[Y >= 0]*P[Z >= 0] = 0.05%. That is one way to get above 5. But X = 4, and then Y and Z need to be combined only above 1. So what is Y is greater or equal than 0 and Z is greater than or equal to 1, or viscera. You find all these and add them together to get the probability that x+y+z >= 5, which in this case is 3.3%.
You do the same thing for the terminators and you'll find the probability that you'll kill the entire squad of terminators is about 10%.Neither is impressive, but in theory shooting the terminators is the better option than killing shooting the tactical marines, even though on average you'll kill fewer terminators, in each one off game you have a higher chance of wiping out the whole squad and winning a kill point.
Thus in conclusion, I'd like to see more responsible use of mathhammer on the tubes. I know the math can be daunting at times, but the use of excel to do the difficult math can make your life a billion times easier. I hope that more people with an interest in mathematics will be willing to do some of the difficult calculations needed to solve most of these problems in the future.
(If anyone can help write a formula for the x+y+z probability mass function I would greatly appreciate it. It would be equally awesome if there was a way to generalize to any number of weapons. Thanks!)
As all of us are aware, Warhammer 40k (and most other wargames) is played with a 6-sided dice, or D6 for short. The D6 is used in one of two ways in Warhammer. It is either a boolean (yes-no) outcome determiner, such as when you roll to hit in combat, or a random number generator, such as when you roll for leadership. In either case, a single D6 has a special statistical distribution unique to it, which is the discrete uniform distribution of one through six, of D6~U(1,6) for short. The special thing about the uniform distribution is that all outcomes are equally likely (given a fair dice). It should be no surprise that rolling a 1 is as equally likely as rolling a 6, no matter how much we wish it were true. We also know the expected value, or average value or mean, and the variance, or spread, of the D6. For a uniform distribution, the average value is (b+a)/2, where b is the highest value and a is the lowest value the distribution can take, which gives us a mean of 3.5. The variance of a uniform distribution is equal to (((b-a+1)^2)-1)/12, or in the case of the D6, 35/12, or roughly 2.92.
There is something important to take away from the distribution of a D6. For starters, you should notice that no matter how hard you try, you will never roll a 3.5 on a D6. What the average value tells you what the average of all the rolls would be if you rolled the dice an infinite number of times, not, and I cannot emphasis this enough, what will happen each time you roll the dice. We all inherently know this, but we are often mislead by expected values when mathhammer is involved. We can say what is expected in each dice roll, with relative confidence, depending on the variance (and skewness) of the distribution. So for the D6, while 3.5 is the average value, who says that 3 or 4 isn't more likely that any other value? Say we didn't know that it was a uniform distribution, we could estimate (poorly) that 68% of the outcomes would lie within one standard deviation (the square root of the variance) of the mean. So in this case, 68% of the dice rolls should be between 1.79 and 5.2. Thus there is a 68% you'll roll a 2,3,4 or 5. Compare this to the true chance, 67%, and it isn't that different. But is equally likely that you'll roll a 1,2,5 or 6, and completely miss the middle two values. Thus there is nothing special about those two values at all.
Let us consider the easier two uses for the D6, which is a random number generator. The most common time we use this is when we roll for leadership tests, but is also used to determine vehicle damage, number of attacks for models with random attacks and the number of inches in a scatter. Let us look at leadership tests in particular. A imperial guardsman has a standard leadership of 7, so in order for this guardsman to hold his ground, he must roll 2D6 and get a sum of less than or equal to 7. So what is the distribution of a sum of 2D6? The easiest thing is to count the possible outcomes, as below:
- 2: 1,1
- 3: 1,2 - 2,1
- 4: 1,3 - 2,2 - 3,1
- 5: 1,4 - 2,3 - 3,2 - 4,1
- 6: 1,5 - 2,4 - 3,3 - 4,2 - 5,1
- 7: 1,6 - 2,5 - 3,4 - 4,3 - 5,2 - 6,1
- 8: 2,6 - 3,5 - 4,4 - 5,3 - 2,6
- 9: 3,6 - 4,5 - 5,4 - 6,3
- 10: 4,6 - 5,5 - 6,4
- 11: 5,6 - 6,5
- 12: 6,6
Ld | Chance to Roll | Chance to Pass |
2 | 2.8% | 2.8% |
3 | 5.6% | 8.3% |
4 | 8.3% | 16.7% |
5 | 11.1% | 27.8% |
6 | 13.9% | 41.7% |
7 | 16.7% | 58.3% |
8 | 13.9% | 72.2% |
9 | 11.1% | 83.3% |
10 | 8.3% | 91.7% |
11 | 5.6% | 97.2% |
12 | 2.8% | 100.0% |
For our leadership 7 guardsman, he only has a 58.3% chance of holding his ground when put into a stressful situation. That is why we attach sergeants (Ld 8 = 72.2% success), and commissars (83.3% success). How do we deal with rerolls though? When we reroll failures we suddenly dive a little deeper into statistical theory, and we begin to experience conditional probabilities. The probability of a reroll is the same of the original roll, given that the original roll was failed. Since we don't reroll if we pass, the probability that a reroll was successful is 0% if the original roll was passed, since we never roll the reroll. So the P[Reroll Passed|First Rolled Failed] = P[Pass]P[Fail], or in the cased of leadership 7, = (58.3%)(100%-58.3%) = 24.3%. You add this to the probability that you passed the first roll, and this gives the probability that you pass with a reroll. I know that was not explained that well, but the best I can say, is if you want to find the chance that you'll pass with a reroll, add together the chance you'll pass the first roll, and the chance you'll pass the second roll times the chance you failed the first roll.These are shown below:
Ld | Chance to Roll | Chance to Pass (with Reroll) |
2 | 2.8% | 5.5% |
3 | 5.6% | 16.0% |
4 | 8.3% | 30.6% |
5 | 11.1% | 47.8% |
6 | 13.9% | 66.0% |
7 | 16.7% | 82.6% |
8 | 13.9% | 92.3% |
9 | 11.1% | 97.2% |
10 | 8.3% | 99.3% |
11 | 5.6% | 99.9% |
12 | 2.8% | 100.0% |
So what is the point I want to get across here? This is an acceptable use of mathhammer because the probabilities are describing one off events, and there is nothing declaring that you'll pass x leadership tests per game. There is no such thing as passing .72 leadership tests per leadership test, but you can say that you'll pass 72% of your leadership tests. The first is mathematical blasphemy, and you should be executed by the Mathematical Inquisition for uttering such words.
Unfortunately, this is not the extent of the horrors uttered by mathhammer enthusiasts. The worst is when it comes to two units, duking out it out in successive shooting rounds. I'll give an example:
A unit of 5 space marines starts shooting at a unit of 10 guardsmen. They are in rapidfire range, so get 10 bolter shots, which hit 6.67 times, wound 4.4 guardsmen, whom get no saves. The 6.6 guardsman shoot 13.2 shots, hit 5.5 times, wound 1.9 times, and the marines lose 0.6 to failed saves. The remaining 4.4 marines shoot 8.8 shots, hitting 5.8 times, wounding 3.9 times, which the gaurd get no saves against. This leaves 1.7 guard, etc. etc. etc.This is so incredibly wrong that author was summarily executed at his desk. I won't even get into the issues such as why the hell didn't the nearby Leman Russ kill the marines, or why didn't the guard charge into combat, where they will take fewer casualties since marine attacks don't ignore armor, etc. I know people do this for ease of calculations, and for a quick and dirty comparison of two units in a combat situation, but this doesn't tell me anything of value when I am looking at this situation on the table.
Let me explain. Remember that the average value of a distribution tells you only one thing; what the average outcome of an infinite number of trials will result in. So if I played an infinite number of games, where this situation happened an infinite number of times, I would expect to see these numbers (expect for that fact that there is no such thing as 4.4 marines, and the binomial formula is only intended to use discrete values). What I want to know when I look at this situation are two things: What is the chance that I'll lose, and how many turns can I expect to hold back the marines? There is only one way to answer the first question, and it is through running simulations.
Ideally we want to run as many simulations as possible, for which a computer would be best programed to do. We will determine each round, by random chance, exactly how many times the marines wipe the guard. You can perform this in real life as well, but a computer can do it infinitely many more times than a human can in the same amount of time. What this is called is a Monte Carlo Experiment (wiki). You can do this very quickly by rolling dice, say five times, like I just did. In this experiment the marines won each time, and lost an average of 1.2. But how confident am I in these numbers? 0%. But if I did this each morning before I left for work, which took about 15 minutes, I would have 1825 total samples, which I am willing to bet that the guard would win atleast once in there. My bet is that the guard would win a respectable number of times, close to 10 or 5%, because even in my experiment this afternoon, the guard nearly won. But this issue with this is that I am only human, and there are only 24 hours in a day. A computer can run the 1825 simulations in less time than it took me to do 5. The more times this is ran, the more accurate it would be. Other great things we could do with Monte Carlo experiments would be things impossible to do with mathhammer, such as finding the average number of hits under a scattering blast marker, given certain distributions of squads.
I have one last gripe against mathhammer.
Say you have a unit of ten guardsmen in rapid fire range of a unit of two units of Space Marines. One unit is an terminator assault squad of 3 guys, and another is a tactical squad of 5 guys. The guardsmen have 6 lasguns, 1 laspistol, 1 plasmagun and 1 heavybolter. The game is kill points, so I need to decide which unit I have to most chance of wiping out this turn.
The most common tactic I see is to calculate the average kills of each weapon and figure out which has the highest average. So the the 13 las shots will kill (on average) 0.72 power armored marines or 0.36 terminators. The plasmagun will kill (on average) 0.83 of the tactical marines and 0.55 terminators. And the heavy bolter will kill (on average) 0.33 power armored marines or 0.17 terminators. Adding these all together you will kill (on average) 1.88 tactical marines or 1.08 terminators.
This doesn't actually say anything about what is my better option. Yes I kill more tactical marines, but I kill I higher proportion of the terminators. It is mathematically true that adding the averages together gives you the overall average. But that isn't the answer I am looking for. What I am looking for is whether the probability that I'll cause 5 or more unsaved wounds on the tactical marines is less than or greater than the probability that I'll cause 3 or more unsaved wounds on the terminators.
For each weapon you can built yourself a probability mass function. The chances that a group of 13 las shots will kill x marines is a binomial distribution with p = (1/2)(1/3)(1/3) = (1/18). Thus q = (17/18). We have n = 13. This gives a probability mass function of x of:
x | P[X=x] | P[X >=x] |
0 | 47.6% | 100% |
1 | 36.4% | 52.4% |
2 | 12.8% | 16.1% |
3 | 2.8% | 3.2% |
4 | 0.4% | 0.5% |
5 | 0.04% | 0.05% |
6 | < 0.001% | < 0.001% |
7 | < 0.001% | < 0.001% |
8 | < 0.001% | < 0.001% |
9 | < 0.001% | < 0.001% |
10 | < 0.001% | < 0.001% |
11 | < 0.001% | < 0.001% |
12 | < 0.001% | < 0.001% |
13 | < 0.001% | < 0.001% |
You can do a similar thing for the plasmagun and the heavy bolter. First then plasma gun:
y | P[Y=y] | P[Y >=y] |
0 | 34% | 100% |
1 | 48.6% | 66% |
2 | 17.4% | 17.4% |
Then the heavy bolter:
z | P[Z=z] | P[Z >=z] |
0 | 70.2% | 100% |
1 | 26.3% | 28.8% |
2 | 3.3% | 3.4% |
3 | 0.1% | 0.1% |
What we are interested in the end then is what is the probability that x+y+z >= 5. We have to use some probability rules. For x+y+z to be greater than 5, if x >= 5 then y&z only need to be greater than 0. So P[X >= 5]*P[Y >= 0]*P[Z >= 0] = 0.05%. That is one way to get above 5. But X = 4, and then Y and Z need to be combined only above 1. So what is Y is greater or equal than 0 and Z is greater than or equal to 1, or viscera. You find all these and add them together to get the probability that x+y+z >= 5, which in this case is 3.3%.
You do the same thing for the terminators and you'll find the probability that you'll kill the entire squad of terminators is about 10%.Neither is impressive, but in theory shooting the terminators is the better option than killing shooting the tactical marines, even though on average you'll kill fewer terminators, in each one off game you have a higher chance of wiping out the whole squad and winning a kill point.
Thus in conclusion, I'd like to see more responsible use of mathhammer on the tubes. I know the math can be daunting at times, but the use of excel to do the difficult math can make your life a billion times easier. I hope that more people with an interest in mathematics will be willing to do some of the difficult calculations needed to solve most of these problems in the future.
(If anyone can help write a formula for the x+y+z probability mass function I would greatly appreciate it. It would be equally awesome if there was a way to generalize to any number of weapons. Thanks!)
Comments
Post a Comment