Dataclysm by Christian Rudder
Broadway Books, 2014, 320 pages
Nonfiction (Data Science)
Despite not being terribly old, Dataclysm by Christian Rudder already has two editions with two different subtitles. The eBook edition I read had the subtitle of Love, Sex, Race, and Identity while another version emphasizes with Who We Are (When We Think No One Is Looking). I note this because I think this marketing strategy is interesting, especially as the two subtitles are so different and imply completely different things about the content of the book. I’ll also point out that, while I (and many others) pronounce “data” as “date-uh,” the title demands the pronunciation of “dah-tah” to reap the rewards of the, er, pun.
Christian Rudder, as a founder of OKCupid, has access to an extraordinary amount of data (dahtah, I remind myself). While discussing the habits of OKCupid users, Rudder broadens the implications he finds there to the whole of society – at least in America and sometimes beyond. The problem with this, and Rudder does admit it, is his data is not representative of any actual population aside from the population that uses OKCupid. Despite his acknowledgement (and a brief chapter on users who are not WASP-y men), Rudder often writes as if the information he extrapolates from this population can be applied across any and all spectrums. I may only have a minor in Psychology (and, yeah, I avoided the stats class – oops), but even I know that is poor scholarship.
Admittedly, the book is a work of popular nonfiction, so I suppose some might argue it doesn’t matter, as long as it’s interesting and somewhat informative. But the problem with acting as if your research is comprehensive aside from a few nods otherwise is that people will use that information as such and it can do some serious damage to how society ultimately operates. Insidiously, yes, but impactful nonetheless. There’s also a piece Rudder never really did acknowledge – the fact is, there’s only one kind of person who will use OKCupid/dating sites: people who will use OKCupid/dating sites. Meanwhile, Rudder takes the information from this particular dataset and applies much of it to the American population at large. Surely there are at least sometimes fundamental differences between the people who are willing or choose to use dating sites and those who are not. What, for example, about technophobes?
All this said, if you’re a straight, probably-middle-class, white person living in America, you might find a good deal of this book insightful to not only others but yourself. Rudder has an accessible form of writing that makes even complicated data structures, theories, and concepts, easy to grasp for the layperson. Rudder does a pretty excellent job explaining the various graphs he used, some of which were in formats totally new to me, which was exciting (though I made the mistake of reading this on a black-and-white Kindle, which made some interpretation challenging – get the print, if you can). What’s more, he explains it in an order that makes sense and doesn’t bog the reader down with details. Instead, he explains the essentials, points out a few especially interesting details, and leaves the rest (with some encouragement) for you to coax out yourself with careful examination of the graph.
He’s funny, too, though perhaps overly self-deprecating in some parts. One passage leads him to provide a picture of his adolescent-self with no reining in on the punches. Rudder relishes in his nerdiness, which, as it is, happens to be trendy right now, so more power to him. Regardless of his approach, the humor itself adds another layer of accessibility to an otherwise often-inaccessible, but increasingly in-demand and important, subject.
Ultimately, the content in Dataclysm can’t begin to cover the actual topic at hand. Like many a teacher and professor told me, the subject is too broad; narrow it down. Rudder might have done well with this somewhat-nebulous topic if he’d gone more in-depth and written something lengthier, though that would likely take away from its readability and popular intrigue. Smart readers will recognize there’s a great deal of complexity behind each statement that Rudder chooses to avoid, but I’m again torn between feeling this is at the reader’s detriment and feeling the book wouldn’t have such wide appeal if he did go into greater detail.
Dataclysm is a great introduction to the world of data. As someone who primarily lives outside of the data world, I found myself understanding a great deal more about it than I had previously (despite numerous explanations of various data theories and structures from my ever-patient data scientist boyfriend). The organization, for the most part, makes sense and the concrete examples Rudder offers do well to illustrate his points. If data is something you want to “get into” but don’t know where to start, maybe start here and move onto something a little more challenging and in-depth.
❤❤❤❤ out of ❤❤❤❤❤