-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathweek7.html
More file actions
152 lines (146 loc) · 12.8 KB
/
week7.html
File metadata and controls
152 lines (146 loc) · 12.8 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<title>Class template</title>
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<link href="css/bootstrap.min.css" rel="stylesheet">
<link href="css/custom.css" rel="stylesheet">
</head>
<body class="markdown github">
<header class="navbar-inverse navbar-fixed-top">
<div class="container">
<nav role="navigation">
<div class="navbar-header">
<button type="button" class="navbar-toggle" data-toggle="collapse" data-target="#bs-example-navbar-collapse-1">
<span class="sr-only">Toggle navigation</span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
<a href="index.html" class="navbar-brand">J298 Data Journalism</a>
</div> <!-- /.navbar-header -->
<!-- Collect the nav links, forms, and other content for toggling -->
<div class="collapse navbar-collapse" id="bs-example-navbar-collapse-1">
<ul class="nav navbar-nav">
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown">Class notes<b class="caret"></b></a>
<ul class="dropdown-menu">
<li><a href="week1.html">What is data?</a></li>
<li><a href="week2.html">Types of stories</a></li>
<li><a href="week3.html">Working with spreadsheets</a></li>
<li><a href="week4.html">Acquiring, cleaning, and formatting data</a></li>
<li><a href="week5.html">R, RStudio, and the tidyverse</a></li>
<li><a href="week6.html">Data journalism in the tidyverse</a></li>
<li><a href="week7.html">Don't let the data lie to you</a></li>
<li><a href="week8.html">Databases and SQL</a></li>
<li><a href="week9.html">Finding stories using maps</a></li>
<li><a href="week10.html">Maps meet databases</a></li>
<li><a href="week11.html">More fun with R</a></li>
<li><a href="week12.html">R practice</a></li>
<li><a href="week13.html">PostGIS practice</a></li>
<li><a href="week14.html">More fun with R</a></li>
</ul>
</li>
<li><a href="software.html">Software</a></li>
<li><a href="datasets.html">Data</a></li> <li><a href="questions.html">If you get stuck</a></li>
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown">Email instructors<b class="caret"></b></a>
<ul class="dropdown-menu">
<li><a href="mailto:p.aldhous@gmail.com">Peter Aldhous</a></li>
<li><a href="mailto:abh@berkeley.edu">Amanda Hickman</a></li>
</ul>
</li>
</ul>
</div><!-- /.navbar-collapse -->
</nav>
</div> <!-- /.navbar-header -->
</header>
<div class="container all">
<h2 id="week-7-march-1-2018">Week 7 | March 1, 2018</h2>
<p><em>Instructor: Amanda Hickman</em></p>
<h1 id="don-t-let-the-data-lie-to-you">Don't Let the Data Lie To You</h1>
<h2 id="dataset-of-the-week-10-min-">Dataset of The Week (10 Min)</h2>
<p>Presenting: Sarah El Safty and Josh Slowiczek</p>
<h2 id="lies-data-tells">Lies Data Tells</h2>
<h3 id="some-people-want-to-lie-">Some people want to lie.</h3>
<ul>
<li><a href="https://qz.com/122921/the-chart-tim-cook-doesnt-want-you-to-see/">https://qz.com/122921/the-chart-tim-cook-doesnt-want-you-to-see/</a> -- QZ on a very disingenous Apple chart.</li>
</ul>
<p>You’re often going to find yourself working with numbers that were given to you by a source who has a vested interest in how your story turns out. Ask lots of questions. Be skeptical.</p>
<h3 id="all-data-lies-">All data lies.</h3>
<p>What do you think is the fastest way to reduce the number of unsolved rape cases in your precinct? </p>
<ul>
<li><a href="https://www.eastbayexpress.com/oakland/too-many-rapes-dismissed/Content?oid=12633555">https://www.eastbayexpress.com/oakland/too-many-rapes-dismissed/Content?oid=12633555</a></li>
<li><a href="https://www.buzzfeed.com/alexcampbell/unfounded">https://www.buzzfeed.com/alexcampbell/unfounded</a></li>
<li><a href="http://www.pulitzer.org/winners/t-christian-miller-propublica-and-ken-armstrong-marshall-projec">http://www.pulitzer.org/winners/t-christian-miller-propublica-and-ken-armstrong-marshall-projec</a></li>
</ul>
<p>If there's a meaningful reward for moving the numbers, there's a real incentive to move the numbers without changing the underlying issue at all.</p>
<p>And if you see a hospital that advertises high surgical success rates, does that mean they have the best surgeons? Or that they only take easy cases?</p>
<p>VA Hospitals are addicted to metrics and they almost always turn out to be gameable, often in ways that make problems worse.</p>
<ul>
<li><a href="https://www.nytimes.com/2014/05/29/us/va-report-confirms-improper-waiting-lists-at-phoenix-center.html">https://www.nytimes.com/2014/05/29/us/va-report-confirms-improper-waiting-lists-at-phoenix-center.html</a></li>
<li><a href="https://www.wnyc.org/story/manipulating-metrics-veterans-health-care-system">https://www.wnyc.org/story/manipulating-metrics-veterans-health-care-system</a></li>
<li><a href="https://www.nytimes.com/2018/01/01/us/at-veterans-hospital-in-oregon-a-push-for-better-ratings-puts-patients-at-risk-doctors-say.html">https://www.nytimes.com/2018/01/01/us/at-veterans-hospital-in-oregon-a-push-for-better-ratings-puts-patients-at-risk-doctors-say.html</a></li>
</ul>
<p>90% of fetuses diagnosed with Down Syndrome are aborted. </p>
<h3 id="all-data-has-context-">All data has context.</h3>
<p>It is made by people. People take shortcuts. They interpret things and make calls.</p>
<ul>
<li><p>The NYT -- often <a href="http://www.nytimes.com/2013/09/29/us/children-and-guns-the-hidden-toll.html">accidental gun deaths</a> are classified as homicides and suicides -- one core takeaway is that it is up to the coroner to make a call, and they don't always make the call you'd expect. One result is that accidental gun deaths are wildly under counted.</p>
</li>
<li><p>Kansas: <a href="https://splinternews.com/how-an-internet-mapping-glitch-turned-a-random-kansas-f-1793856052">https://splinternews.com/how-an-internet-mapping-glitch-turned-a-random-kansas-f-1793856052</a> / <a href="https://splinternews.com/this-is-the-new-digital-center-of-the-united-states-1793856143">https://splinternews.com/this-is-the-new-digital-center-of-the-united-states-1793856143</a> / <a href="https://source.opennews.org/articles/distrust-your-data/">https://source.opennews.org/articles/distrust-your-data/</a> (esp <a href="https://www.buzzfeed.com/ryanhatesthis/who-watches-more-porn-republicans-or-democrats">https://www.buzzfeed.com/ryanhatesthis/who-watches-more-porn-republicans-or-democrats</a> and <a href="https://www.vox.com/2014/4/21/5636040/whats-the-matter-with-kansas-and-porn">https://www.vox.com/2014/4/21/5636040/whats-the-matter-with-kansas-and-porn</a> )</p>
</li>
<li><p><strong>The Fires (Joe Flood)</strong> -- The Fires is one of a few excellent stories about the famous burning Bronx of the 1970s. One thing he talks about, which others have also talked about, is the direct relationship between radical cutbacks in FDNY station funding in the Bronx and the rise in fires -- maybe The Bronx was burning because the infrastructure to put fires out had been decimated. RAND's statistical team analyzed <em>precicted</em> citywide response times and determined that the city could safely close a lot of Bronx firehouses -- without acknowledging that those firehouses were some of the busiest in the city and often weren't responding to fires in their immediate vicinity because they were already out fighting fires. So a very "pure" data-driven approach conveniently rationalized shuttering a lot of stations in poor areas.</p>
<ul>
<li><a href="https://nypost.com/2010/05/16/why-the-bronx-burned/">Why The Bronx Burned</a> (NY Post, May 2010)</li>
<li><a href="https://www.goodreads.com/book/show/7906964-the-fires">Goodreads on The Flood</a></li>
<li><a href="https://citylimits.org/2010/06/04/reviews-a-city-on-fire/">Reviews: A City on Fire</a> (City Limits, June 2010)</li>
</ul>
</li>
<li><p><a href="https://fivethirtyeight.com/features/we-used-broadband-data-we-shouldnt-have-heres-what-went-wrong/">538 had to retract a story on broadband reach</a> because they didn't understand the data they were working from.</p>
</li>
</ul>
<p>IP addresses as a proxy for location will give you a ton of hits in Kansas.</p>
<ul>
<li>Years ago, the Chicago Sun Times ran a story about exceptionally <a href="https://web.archive.org/web/20130303021058/http://www.suntimes.com/opinions/letters/18515250-474/story-misses-the-mark-on-cta-crime.html">high crime rates at CTA stations</a>, but the "crime spike" was turnstile jumping. (<a href="http://wbezdata.tumblr.com/post/44257873024/cta-sun-times-get-in-data-fight">WBEZ</a>)</li>
</ul>
<h3 id="all-data-has-biases-">All data has biases.</h3>
<p>I've talked about this before, so I won't dwell on it, but 311 calls are not a random sample of lived experiences.</p>
<ul>
<li><p><a href="https://nextcity.org/daily/entry/who-is-most-likely-dial-311">https://nextcity.org/daily/entry/who-is-most-likely-dial-311</a> </p>
</li>
<li><p>This is a classic in statistics texts: in the 70s, researchers looked at admissions rates at UC Berkeley and found that women were far, far more likely to be rejected. A closer examination revealed that women were more likely to apply to more competitive programs, so department by department, there wasn’t evidence of discrimination. <a href="http://vudlab.com/simpsons/">http://vudlab.com/simpsons/</a> --</p>
<p> “The only real way to maybe avoid this and similar potential land mines in a statistical model is to thoroughly understand your subject area and to be skeptical of any result." John Perry @ AJC</p>
</li>
</ul>
<p>This is closely related to the ecological fallacy: if I tell you that states with more foreign born residents have more wealthy households, what’s your next question? (Are foreign born people more likely to be wealthy? No.) An older study showed that there was a positive state-by-state correlation between literacy and foreign born populations: areas with high immigrant populations were likely to be more literate. What you don’t know is whether immigrants are likely to be more literate.</p>
<p><a href="http://blog.statwing.com/the-ecological-fallacy/">http://blog.statwing.com/the-ecological-fallacy/</a> <a href="http://andrewgelman.com/2013/02/03/heuristics-for-identifying-ecological-fallacies/">http://andrewgelman.com/2013/02/03/heuristics-for-identifying-ecological-fallacies/</a></p>
<p>Question order changes how people respond to questions. This is over a decade old now, but in '03, Americans were more likely to say they support civil unions if you already asked them if they support gay marriage.</p>
<ul>
<li><a href="http://www.pewresearch.org/methodology/u-s-survey-research/questionnaire-design/#question-order">http://www.pewresearch.org/methodology/u-s-survey-research/questionnaire-design/#question-order</a></li>
</ul>
<p>Pew has a lot of great research about survey design.</p>
<p><a href="https://xkcd.com/552/"><img alt="I used to think correlation was causation but then I took a statistics course." src="https://imgs.xkcd.com/comics/correlation.png"></a></p>
<ul>
<li><a href="http://tylervigen.com/spurious-correlations">Spurious Correlations</a></li>
<li><a href="https://github.com/amandabee/CUNY-data-storytelling/blob/master/lecture%20notes/skepticism.md">More Notes</a></li>
<li><a href="https://archives.cjr.org/cover_story/dark_shadows.php">How you tell the story matters</a></li>
<li>Newsrooms cherry pick data, too. <a href="https://www.theatlantic.com/business/archive/2012/09/whats-really-eating-the-family-budget-it-aint-smartphones/262918/">https://www.theatlantic.com/business/archive/2012/09/whats-really-eating-the-family-budget-it-aint-smartphones/262918/</a></li>
</ul>
<h2 id="where-do-you-find-data-">Where do you find data?</h2>
<p>Who is stuck? Let's brainstorm getting unstuck.</p>
<h2 id="sql-bingo">SQL Bingo</h2>
<a href="https://docs.google.com/presentation/d/1qsd1hZsd6U6b0sZJCoshFvjXnGcVy9yqafNlWe3SsB4/edit#slide=id.g344ba11d90_0_20">Slides</a> |
<a href="https://github.com/amandabee/workshops/tree/master/2018/sqlbingo">Source</a>
<h2 id="next-week-10-min-">Next week (10 min)</h2>
<ul>
<li>No data preso next week, Caron and Carlos are up 3/15</li>
<li>We're meeting on Tuesday 3/6 in the same room.</li>
</ul>
</div> <!-- /.container all -->
<script src="https://code.jquery.com/jquery.min.js"></script>
<script src="js/bootstrap.min.js"></script>
</body>
</html>