How to Lie to Yourself and Others With Statistics

by Ziad Shihab/ October 27th, 2016

How to Lie to Yourself and Others With Statistics http://lifehacker.com/how-to-lie-to-yourself-and-others-with-statistics-1788184031 GIF Misusing statistics is one of the most powerful ways to lie. Normally we teach you how to avoid misinterpreting statistics but knowing how to numbers are manipulated can help you spot when it happens. To that end were going to show you how to make data say whatever the hell you want to back up any wrong idea you have. This post is part of our Evil Week series at Lifehacker where we look at the dark side of getting things done. Sometimes evil is justified and other times knowing evil means knowing how to beat it. Want more? Check out our evil week tag page. Its that time of year again: As Halloween approaches its time to unleash our dark side. Welcome Read more Read more Gather Sample Data That Adds Bias to Your Findings The first step to building statistics is determining what you want to analyze. Statisticians refer to this as the population.. Then you define a subset of that data to collect that when analyzed should be representative of the population as a whole. The larger and more accurate the sample the more precise your conclusions can be. Of course there are a few big ways to screw up this type of statistical sampling either by accident or intentionally. If the sample data you gather is bad youll end up with false conclusions no matter what. There are a lot of ways you can mess up your data but here are a few of the big ones: Self-Selection Bias: This type of bias occurs when the people or data youre studying voluntarily puts itself into a group that isnt representative of your whole population. For example when we ask our readers questions like Whats your favorite texting app? we only get responses from people who choose to read Lifehacker. The results of an informal poll like this likely wont be representative of the population at large because all our readers are smarter funnier and more attractive than the average person. Convenience Sampling: This bias occurs when a study analyzes whatever data it has available instead of trying to find representative data. For example a cable news network might poll its viewers about a political candidate. Without polling people who watch other networks (or dont watch TV at all) its impossible to say that the results of the poll would represent reality. Non-Response Bias: This happens when some people in a chosen set dont respond to a statistical survey causing the answers to shift. For example if a survey on sexual activity asked Have you ever cheated on your spouse? some people may not want to admit to infidelity making it look like cheating is rarer than it is. Open-Access Polls: These type of polls allow anyone to submit answers and in many cases dont even verify that people only submit an answer once. While common theyre fundamentally biased because they dont attempt to control the input in any meaningful way. For example online polls that just ask you to click your preferred option fall under this bias. While they can be fun and useful theyre not good at objectively proving a point. These are just some of the many many ways that a sample can be biased. If you want to create a misleading impression well pick your poison. For example open-access polls on websites can be used to prove that whichever candidate you like best won a debate or that Undertale is the best game of all time. The beauty of sampling biases is that someone somewhere is taking an unscientific poll that will say anything you want. So just Google around until you find an unscientific poll you like or heckcreate your own. Choose the Analysis That Supports Your Ideas Anscombes quartet shows four different charts that have nearly the exact same statistical summaries. Since statistics use numbers its easy to assume that theyre hard proof of the ideas they claim to support. In reality the math behind statistics is complex and analyzing it improperly can yield different or even entirely contradictory conclusions. If you wanted to twist a statistic to suit your needs fudge the math. To demonstrate the flaws in analyzing data statistician Francis Anscombe created Anscombes quartet (diagramed above). It consists of four graphs that when viewed on a chart show wildly different trends. The X1 chart shows a basic scatter plot with an upwards trend. X2 shows a curved trend that was going up but is now going downward. X3 shows a smaller trend upwards but with one outlier on the Y axis. X4 shows data thats perfectly flat on the X axis save for one outlier thats super high on both axes. Heres where it gets crazy. For all four of these charts the following statements are true: The average x value is 9 for each dataset The average y value is 7.50 for each dataset The variance for x is 11 and the variance for y is 4.12 The correlation between x and y is 0.816 for each dataset If you only saw this data in text form you might think all four situations were identical. For example say you had a chart like X1 that showed mens salaries at your company over the years and one like X2 showing salaries for women over the same time at the same company. If you show only the the text youd see they made the same average salary! However if you show the charts people would see that womens salaries were trending downward for some reason. Anscombe suggested that to avoid misleading people you should always visualize your data before drawing conclusions and be aware of how outliers influence the analysis. Its hard to miss an outlier on a properly graphed chart but they can have a massive yet invisible effect on text. Of course if your goal is to mislead people you can just skip this step. Make Charts That Only Emphasize Your Pre-Conceived Conclusion Most people dont have the time to do their own statistical analysis so they rely on you to show them charts that summarize your conclusions. If you create your charts properly they should suggest ideas that correspond to reality. If you want to screw them up you can emphasize the data you like the best. One of the most famous hilariously inaccurate charts in recent memory came from a member of Congress in a meeting regarding Planned Parenthood. During this meeting Rep. Jason Chaffetz (R-Utah) attempted to argue that PPs abortion services rose since 2006 while its cancer services had declined over the same time period. This is the chart he used to demonstrate this: This is one of the worst chart Ive ever seen. And it was presented to the House Oversight Committee. At first glance this looks like abortions have skyrocketed while cancer services have dropped dramatically. We can thank several flaws in this chart for that conclusion: Theres no label on the Y axis. While the lower X axis is labeled for years the Y axis has no label at all. Is it number of procedures? Amount of money spent on procedures? Who knows! You dont have to. The Y axis scales are all wrong. In addition to the incorrect label the scale of the Y axis is all wrong. The red lines final data point is 327000 which is inexplicably higher on the chart than the pink lines 935573 final data point. Technically each line is going in the right direction but the scaling is all kinds of wrong. It lacks context. These data points (such as they are) only suggest what happens not why it happens. For example in 2009 the U.S. Preventative Services Task Force updated its recommendation to get mammogram screenings every two years instead of its previous suggestion of every year. This could account for the decrease in cancer screenings. Most charts arent quite this flagrantly wrong but its a great example of how to mislead by simply leaving out a few key elements of a chart. News site Quartz showed what this chart would look like if it were represented properly (note 2008 data is not provided and thus missing from the chart): This is much more accurate. If youre into that sort of thing. On this scale the rise in abortion procedures is relatively flat while cancer screenings have gone down. However since individual data points are shown we can see that the decline began right around 2009 just like we predicted. This is how you accurately present information in its proper context! So if you want to mislead people all it takes is a little chart fudging. Leave off your labels manipulate the axis a bit and you too can trick people into thinking you have a better point than you do. Obscure Your Sources At All Costs The easier it is to see your sources the easier other people can verify or disprove your conclusions. If your conclusions can be verified then by all means let people see your data and how you got there. However if your goal is to mislead people never let anyone find out how you came to the conclusions you did. For proper sourcing every single person who ever mentions a piece of data will include a reference to the source. News sites should link to the studies or research theyre quoting (not articles about the studies). Researchers may not show their entire data set but the source of a study should answer some basic questions: How was the data gathered? Did you call people on the phone? Stop them outside the mall? Was it a Twitter poll? The method that you use to gather your data might point to (or disprove) sampling bias. When was the data collected? When did you collect the data and how long did it take to gather? Reports can get outdated fast and trends can change over time. Including the time frame that data comes from can say a lot about the conclusions you draw. Who collected the data? The person or group that collects data may provide hints about how trustworthy the data is. A tobacco company study claiming cigarettes are safe might not be correct unless someone else can verify it. Who was asked? Particularly in the area of surveys and polls its important to know who was questioned. If a politician only polls people that are already friendly to them they wont get data that represents the population as a whole. Sourcing isnt just used to avoid bias but to give others the chance to verify your claims. It opens your data your methods and your conclusions up to criticism. It lets other try to poke holes in your ideas. If your conclusions cant stand up to criticism then they fall apart. The most accurate statistics are the ones that others can see and corroborate with their own research. However if your goal is to mislead yourself or someone else dont bother sharing the sources. In fact your best defense is to just say Look it up! and walk away. No one can disprove that. Illustration by Angelica Alzona. Photos by Wikimedia Commons Americans United For Life and Quartz. Millennials Sigh via Lifehacker http://lifehacker.com October 25 2016 at 01:03AM