Lies, Damned Lies and Big Data
by Russel Neiss
Nary a day goes by that there isn’t yet another response to what seems to be an endless myriad of reactions to the Pew Study. And this week, JTA reported on a $750,000 initiative by the NY Fed for twelve NY-based congregations to promote “more sophisticated data use among synagogues.” Ostensibly, these and countless other “data driven” initiatives going back to the establishment of the Bureau of Jewish Social Research in 1919 have existed in order to facilitate and support the work of Jewish educators and communal professionals and help them engage in the holy work they do – but I wonder if our fetishizing of these figures is doing more harm than good.
Robert McNamara was appointed as the U.S. secretary of defense when the Vietnam conflict began to rear its ugly head in the 1960s. As a former professor at the Harvard Business School and an executive of the Ford Motor Company, he had developed a belief that only through the application of statistical data could decision makers understand a complex problem and make the right choices. And so, when faced with an intractable conflict in Vietnam and a nebulous goal of protecting South Vietnam from the communists, McNamara sought some metric that would allow him to measure the progress of the US in the war – and the “body count” was born. So long as number of dead Viet Cong kept rising at a faster rate than that of the US military McNamara could claim success, since the “data doesn’t lie.” It didn’t matter that the generals on the ground thought the metric was garbage, nor that some of the raw numbers were exaggerated by troops trying to curry favor, once they were codified in McNamara’s spreadsheet, they became like halacha l’moshe m’sinai.
The misuse and abuse, of data by McNamara and others should serve as a cautious tale about the limitations of what we can glean from these large scale data projects currently in use in our own community. The underlying data can be faulty, it can be biased, it can be misanalyzed and used to mislead. But most notably, it can fail to capture what it claims to actually quantify.
We tend to collect data on the things that are easiest to measure, which are not necessarily the most important pieces. So in schools we’ll measure student test scores, student/faculty ratios, attendance, etc. but we often won’t solicit feedback from parents or the students on their own experiences. Pew (and NJPS) ask respondents if they “fasted for all or part of Yom Kippur,” as if that is somehow indicative of how meaningful they find the holiday. We can use A/B testing to determine which subject lines of emails yields a higher open rate – but if you use a ‘Buzzfeed’ or ‘Upworthy’ style headline for your end of year mailer because the data shows more people will open it, you might end up looking foolish as your message is diminished by the lack of seriousness that people often associate with those publications (but it’s not likely that your data will tell you that).
Yes, there are many things that the data is good at, and it is important to have broad outlines of what our community looks like, and to make evidence based decisions in a rational way but we should not become so fixated on the data, and so obsessed with its power and promise that we fail to appreciate its inherent flaws and ability to mislead.