Home > Education Policy, Maths, Teaching > False Variables and Simpson’s Paradox

## False Variables and Simpson’s Paradox

Last weekend I attended a day of lectures as part of my MA course. The focus of the day was on barriers for learning and it was quite intensive. Part of the day involved looking at the statistics involved in various things and seeing how they related to the development of children and the lecturer mentioned the idea that a false variable can skew ones ideas, and can make it look like something is having an effect, when in reality it is something else.

This idea of false variables is one that has been “following” me around recently. The first book I read this year was “The Simpsons and their Mathematical Secrets” by Simon Singh. In the book he discusses “Simpson’s Paradox”. The example he uses is in relation to the US government vote on the American civil rights act of 1964. In the north, 94% of democrats voted for the act compared to 85% of republicans. In the south 7% of democrat voted for, and 0% of republicans did. However, overall 80% of republicans voted for the act, compared to 61% of democrats. This example is great for showing Simpson’s Paradox and really emphasises the fact that stats can be deceptive. The worrying thing is that these stats can be manipulated to show that a higher proportion of democrats in the north and in the south supported the bill, or that a higher proportion of republicans supported the bill. Meaning both sides can legitimately lay these claims and hence really confuse the electorate. The fact of the matter is that the real variable that was feelings towards the bill differed largely due to attitudes in the north vs attitudes in the south, rather than a political allegiance.

Simpson’s paradox also appeared at school recently. A teach-firster in our department was planning a lesson on probability and asked me if I knew “that thing where you have a higher probability of picking one colour in each bag of balls, but if you put them all into the same bag you get a higher probability of the other.” This produced a rather interesting discussion, around Simpson’s Paradox, no one else in the department were familiar, and they all found it pretty interesting. We both then included it in our lessons. The question was around bins with coloured counters in them and showed that you had a higher probability of picking black counters from the blue bin in two cases, but if you combined the counters into the same bin, the higher probability came from the red bin.

The example of this false variable situation given in our lecture was that of breast feeding. The stats suggest that breast feeding equates to a better academic achievement for the pupil. But if you drill down into the stats you see that there is a far higher proportion of breast feeding mothers in the “middle class” as opposed to the “working class”, and that academic achievement may be more down to socio-economic status, rather than the breast feeding itself. This could be due to a plethora of reasons which may include: a higher level of education to the parents, enabling them to provide more support to learning at home; a higher income in the house which may enable private tuition if a child is falling behind or even that more working class families are reliant on shift work, longer days and multiple jobs, leaving them less time to spend with their children to aid their development. This is clearly a complex issue, and it highlights the fact when reading anything that includes statistics you have to ask yourself, “does the author have an agenda, and are they twisting the facts to suit it?”