Archive

Posts Tagged ‘statistics’

It’s election time again

April 21, 2015 6 comments

It’s election time again, and we all know what that means….. BAD GRAPHS!!!! Last year, in the run up to the locals, we had some classics and this time is no different.

Let’s start with a positive, an example for others to look at:

image

Amazing isn’t, despite the fact that he’s running for parliament, Jamie Hanley (@jamiehanley) – Labour candidate Pudsey, has managed to buck the trend and include a fully correct bar chart!! Dave Gale (@reflectivemaths) did point out that technically the claim “Can’t win here” is invalid, and should in fact read “statistically extremely unlikely to win here” or something similar. But that doesn’t take away from the excellent bar chart.

Now for some crimes:

Exhibit A

image

image

These two came through my door, both from Greg Mulholland (@gregmulholland1) the Liberal Democrat defending the constituency. The first error, I’m sure you’ll notice, is that the numbers are different on both leaflets despite them claiming to be from the same set of election results! The second error is that the tiny gap between the Lib Dem and the Labour bars is supposed to signify around 700 people (on one, 800 on the other!) and that massive gap between the Labour and Conservative bars is supposed to be 1200 on the top and 1600 on the bottom. So the biggest it should be is twice the gap between Lib Dems and Labour, as you can see it is considerably more than this. The third error is that the aforementioned gaps between the Labour and the Conservative bars is supposed to signify between one third and one fifth of the total size of the Conservative bar, but as you can see it is actually considerably bigger on both bars. The final error is the one Dave mentioned above, the claim “Can’t win here” is technically wrong. If one of my pupils turned this in I’d be fuming!

Here’s what they should look like:

image

image

Exhibit B

image

This classic was sent to me the other day and is from a Liberal Democrat leaflet in Bristol West, where Stephen Williams is the candidate and is trying for reelection. This is a terrible example. Firstly, the gap between Lib Dems and Labour is, as the numbers state, 10% and the gap between Labour and conservative is 20%, so clearly the gap in the bars should reflect this and the Labour bar should be considerably nearer the Conservative bar. However, this is clearly not the case in this picture. Next, the gap between the Labour and Conservative bar should be less than half the size of the Conservative bar, it isn’t! And again, Dave will be screaming about that all too familiar slogan. Another terrible effort. Here’s what it should look like:

image

Exhibit C

image

This was sent to me by a friend, it’s from literature relating to the local elections that are currently being held and is from the Lib Dem candidate (Martin Hughes) in the Horsforth Ward of Leeds City Council. Error 1, as you can see, numbers wise the difference between Lib Dem votes and Conservative votes is 157 and the difference between Conservative votes and Labour votes is 216, these numbers are fairly similar, but the gap on the bar chart suggests the Tory Labour difference to be roughly 4 times that of the Lib Dem Tory difference. And 216 is very definitely not 4 times 157. Error 2: the 605 vote difference between Labour and UKIP is shown by a gap that is much bigger than the entire UKIP bar which represents 1059, a considerable amount of votes more! And again, there’s that slogan. Another terrible bar chart, and here’s the correct version:

image

Exhibit D

image

Greg again, it’s no wonder that Colin Beveridge (@icecolbeveridge) has started referring to this sort of graph as a “Mulholland”, every leaflet that comes has another crime against statistics contained in its midst, perhaps Greg should read Colin’s book? This one is particularly telling, firstly we have a question of validity, the data is ten years old. There has been a general election since then so why use 2005 data? It fits the narrative, the leaflet is aimed at trying to convince Conservative voters to vote tactically and the information from the last election would tell them that actually the Conservative candidate came second. As you can see above though, the landscape has changed and the aggregated results (whichever version is true!) in the last local do suggest a similar landscape to 2005. Why hasn’t he used those again? I can only conjecture that it is because he wanted to use percentages somehow as a general as turnout is much higher it seems closer? Secondly we have the fact that there is a 4% difference between Labour and Lib Dems and a 6% difference between Labour and Conservative, this should show as a similar difference in bar heights (the difference Labour to Conservative should be 1.5 times the Labour Lib Dem difference), as you can see the difference is nothing of the sort and 6% appears to be ten times the size of 4%. Then there’s the Tory bar, apparently representing 27% yet smaller in size than the aforementioned 6% gap. And to top it off, that slogan again! Here it is, but correct:

image

Exhibit E

image

Here we have a superb example of a terrible misleading bar chart. It’s from Liberal Democrat literature in Lewisham East (where the candidate is Julia Fletcher). As you can see from the graphic, the Labour bar is largest, followed by the Lib Dems who are fairly close then the tories who are much further behind. But hang on, look at those percentages! The Labour bar represents 43%, the Lib Dem bar 28% and the Conservative bar 23%, the Lib Dem bar should be much nearer the Conservative bar than the Labour bar! And both Lib Dem and Tory bars should be around half the size of the Labour one. And again we see that slogan. Another shocking effort, and here’s another correct version:

image

Exhibit F

As I’ve been writing this post I’ve just received this link to this crime against statistics!

image

And I’m sure you’ll agree it’s immense. This one is from a Lib Dem leaflet in Wantage, where Alex Meredith is the candidate. They have tried to get by on a technicality, using a broken blue bar to show that the Tory share is broken, but that negates entirely the need to include the bars at all, they are there to be a visual representation of the proportion of votes, but using a broken bar makes a mockery of this as the ammended heights suggest a close race, when actually the blue bar should be almost double the size of the yellow. Even with this technicality the bar is still wrong, as the rest bar should be half the size of the yellow bar but infact it’s less than one third. And yet again, Dave’s favourite slogan has reared it’s head. A terrible effort, here’s how it should look:

image

While writing this post I asked people to send me any bar chart crimes that had come through their doors on election material, I particularly asked for non Lib Dem ones, yet the masses that came in all seemed to be theirs. I am beginning to worry that no one in one of the parties that has held government office for the last 5 years can grasp a basic bar chart. I’d be seriously worried if year sevens were making these mistakes. What’s worse is that there are tons of free websites that will do it for you, I used meta-chart for these. Or perhaps it’s just their policy to use misleading bar charts on their flyers. I am always on the lookout for crimes against statistics, so please do send me any you find, be they election related or not.

Stem and Leaf – is there a point?

July 25, 2014 3 comments

Stem and leaf diagrams, or “Those leafy stem things”, as one of my former pupils used to call them, have long been an annoyance of mine. I’d never heard of them until I was brushing up on the GCSE syllabus ahead of my PGCE and when I did come across them I couldn’t see anything that they brought to the party that couldn’t better be shown using alternative methods.

You can imagine my feelings then as the KS3,4 and now 5 curricula jettisoned them, meaning the end was in sight for the need to teach them. I let my feelings on this be known in my recent post around the new A level curriculum and this led to further discussion around them on twitter. Then Jo Morgan (@mathsjem) wrote this fantastic piece which supports their place in a classroom and gives some great activities to use in teaching them.

It got me thinking, are my feelings unfounded? Should I be writing off stem and leaf diagrams? I’ve long been an advocate of maths for maths sake, see this defence of circle theorems for one example, so why is this feeling bot the sane for stem and leaf?

Perhaps it’s that it falls under the banner of “stats”, a very applied area of maths. This suggests that there should be an application associated with it. The use mentioned in Jo’s blog for bus and train timetables is the best example I’ve seen, but I think a normal timetable will be easier to read for the majority if people, as the majority of folk aren’t familiar with stem and leaf. Hannah (@missradders) suggested that they were used a lot in baseball, but I can’t see any reason that they would be better than a bar chart or a boxplot.

Colin Wright (@ColinTheMathmo) suggested during the twitter discussion that they could be used to build understanding around data, even though they are no use for any real data sets which would be far too big. Jo also uses this idea in her defence, saying they could provide a good introduction to the ideas of skew, quartiles and outliers. I can see this argument, but I still think there are better, more visual and less convoluted ways to introduce these to pupils, such as the aforementioned bar charts and box plots along with scattergraphs and a host of other data presentation methods (but not pie charts, they’re just as bad, if not worse! But that’s a topic for another day!)

I really enjoyed Jo’s post, if you haven’t read it I would advise you do. It made me think and look hard at my views. In the end though, I still see no need in stem and leaf diagrams and will be glad to see the back of them. If you have opinions either way I would love to hear them, especially if you have further real life uses!

A probability puzzle

May 4, 2014 4 comments

“Colin and Dave are playing a game. Colin has a probability of 0.2 of hitting the target with any given shot; Dave has a probability of 0.3. Whoever hits the target first, wins. Colin goes first; what is his probability of winning?”

Yesterday I listened to the latest edition of “Wrong, but useful” (@wrongbutuseful), and the above is Is the puzzle set by the cohost Dave “The king of stats” Gale (@reflectivemaths).

It was pretty late on in the evening, but I decided to have a quick attempt at the puzzle nonetheless.

image

Here you can see my back of the envelope workings, complete with the word “frustum” on the top of the envelope as I had added an extra r into it in my recent post. You will also see that approaching midnight on a Saturday after a day at the NTEN-RESEARCHED-YORK conference is not the most idea time for solving maths puzzles! My thinking was fairly valid, I think, but my tired brain has made an absolute ton of mistakes! (If you haven’t spotted them, go and have a look before reading on!)

When I looked at it this morning the first thing that jumped out at me was “0.14×0.028 does NOT equal 0.0364. Then I thought, “ffs, 10/43 is very definitely not 0.43!” This was closely followed by: “and why have you used 0.14 as r, the 0.2 is the probability he hits, if he hits the game is over, Cav you ploker!”

So I had another go:

image

I think this is right now. (Although do feel free to correct me if you spot another error!) The probability Colin wins on the first go is 0.2, on his second go is the product of him missing (0.8) Dave missing (0.7) then him hitting (0.2) so 0.8×0.7×0.2. As the sequence goes on you are multiplying by 0.7 and 0.8 each time, so it gives rise to a geometric sequence with first term (a) as 0.2 and common ratio (r) as 0.56.

image

The total probably of Colin winning is the sum of the probabilities of him winning each time. This is because the probability he wins his the probability he wins on his first go OR his second go, OR his third etc. The game goes on until someone wins, so is potentially infinite, thus we need to sum the series infinitely.

image

As the series has a common ratio of 0.56, which is less than one, we can sum the series to infinity using s = a/(1-r) which gives 0.2/(1-0.56) = 0.2/0.44 = 5/11. Thus the probability Colin wins is 5/11 or 0.45recurring.

What does it mean?

April 29, 2014 3 comments

Today my year 11s were busy revising ahead of tomorrow’s mock exam and one of them started singing the averages song. You know the one:

“Mean is average, mean is average, mode is most, mode is most, median’s the middle, median’s the middle, range high low, range high low.”

This got me thinking about the words we use. I’ve always disliked this song as a mnemonic as it encourages people to think of the mean as the “average” when actually the mode and the median are also averages too. The median in particular is a very useful one and we need pupils to understand the distinction. I have been very impressed in recent staff meetings to hear the principal, an English teacher by trade, use the term “national median” rather than “national average”!

As I was thinking about this, though, I had the sudden realisation that I should also be feeling the same way about the term “mean”! Granted, at GCSE level we only talk about one mean, the arithmetic mean, but that doesn’t mean the geometric mean doesn’t exist. (Nor the root mean square nor harmonic mean for that matter! Other means are available)

This is a hypocrisy in the way we treat certain words. I’m not the only maths teacher who dislikes the way mean and average have become synonymous. But no one has ever mentioned that the word arithmetic is missing from the term every time we use it.

I worry that we may be setting students who go on to further study statistics up for confusion in the future by simply referring to the arithmetic mean as the mean.

Have you ever used the term arithmetic mean, or even geometric mean, with your students? Have you shared my worry? Or do you think I’m being overly pedantic and it doesn’t matter? I’d love to hear your opinion.

Statistical Deception

April 21, 2014 6 comments

When teaching and talking about statistics I always emphasise the need to be careful what you believe and to always ask yourself “what agenda does the person presenting this data have?”

I’ve written before about how stats can legitimately be manipulated to serve different points of views, especially when there are false variables at work. But recently I’ve noticed at darker art in statistical manipulation, one that is, at its heart, lying.

We are less than six weeks away from local elections now, and it is becoming silly season for party political leaflets coming through our letterboxes. Now we all know that the political parties will present data in a way that makes them look better, they are trying to win your vote afterall, but we would expect them not to lie. For the data to be accurate and presented correctly. Unfortunately, however, this is not always the case:

Exhibit A

image

This popped up a number of times in my twitter feed from a variety of sources. I believe it is from a Lib Dem leaflet in Manchester. As you can see, they have presented a bar chart with proportions labelled as percentages. The first screaming error is that the red bar and the orange bar are massively different heights, yet are both emblazoned by the label 39%. The second glaring error is that the percentages add up to more than 100%. The first implies that either the Lib Dems are deliberately trying to mislead voters into thinking they are in a stronger position in the ward than they are, or that they don’t realise that 39% is equal to 39%. I’m not sure which is worse?!

Here’s an excel interpretation of what the graphs should look like:

Manc

Exhibit B

image

This graph came through my door in Leeds North West parliamentary constituency. The first thing that caught my eye was that although the gap between the number of votes between Lib Dems and Labour; and between Labour and Conservative is almost the same, the difference in the gaps between the bars was almost 5 times as big, which would imply almost five times as many less votes! An obvious fallacy. Either it’s a deliberate attempt to mislead, or they can’t draw a bar chart. If it’s the latter, do we want them in charge of our local authority budgets?! (or the entire economy for that matter!!)

Something else that struck me as deciving, although this time mathematically correct at least, was the choice of data. This was a leaflet issued in the run up to a local election, and the data set used was from the last local election. Why then, is the data that for the parliamentary constituency rather than the council ward? The ward makes up around a quarter of the constituency, and the vote share in the ward is radically different to that of the constituency. The sitting councilor is conservative and sits on a huge majority, and the Lib Dem candidate last time out cane third. To issue a leaflet in the run up to a local election which implies the conservatives can’t win in a ward where they have a large majority and back it up with local election data for a parliamentary constituency is deliberately deceptive and misleading.

Here’s an excel interpretation of what this one should look like:

LeedsNW

Exhibit C

image

This one comes from “across the pond” and is another which was viral. This one seemed to appear constantly for a few days everywhere I looked. If you are still wondering what’s wrong with it, take a little look at those numbers down the left hand side…. See it? The y axis goes upwards to zero! Drew Barker (@twentythree) made this version which gives a much better picture as to what’s going on.

image

I can’t wait to see what my classes make of these!

nb I haven’t “selected” these graphs as an attack on the Lib Dems, it’s just they are the only party who have sent me a leaflet with incorrect maths. I’ll gladly expose any of the parties if they themselves do. I do collect these, so if you spot anything similar, do send me it!

%d bloggers like this: