Culturomics and Google’s Ngram Viewer: More Noise?

The other day, a few minutes of wilfing led us to Technium’s post on Google’s latest project, the Ngram Viewer. Is Google making us stupid again? But this is serious stuff, as evidenced by the Ngram Viewer introduction in last December’s Science. The Ngram Viewer is a corpus allowing users to search keywords in millions of books and to quantitatively plot the results. So what? A TED video helps explain the development and potential uses. Commentary to the Science article, and to the claims made in the TED video, questions the usefulness of the Google project.

Is the Ngram Viewer an electronic Tower of Babel? We’re not sure; what are its implications, its practical uses? It appears to be an interesting cultural anthropological tool. The corpus contains “over 500 billion words,” and “cannot be read by a human.” But anyone can access it at the Culturomics site. In the Science paper, “Quantitative Analysis of Culture Using Millions of Digitized Books,” the authors provide this takeaway: “Cultural change guides the concepts we discuss (such as ‘slavery’). Linguistic change – which, of course, has cultural roots – affects the words we use for those concepts (‘the Great War’ vs. ‘World War I’). In this paper, we will examine both linguistic changes, such as changes in the lexicon and grammar; and cultural phenomena, such as how we remember people and events.”

Closing the paper is a concise definition of culturomics with a touching comment on its limitations: “Culturomics is the application of high-throughput data collection and analysis to the study of human culture. Books are a beginning, but we must also incorporate newspapers (29), manuscripts (30), maps (31), artwork (32), and a myriad of other human creations (33, 34). Of course, many voices – already lost to time – lie forever beyond our reach.” (Not to mention the trunk of writing, molding in our basement for over twenty years, that we finally threw out – the poems were beginning to crawl out of the trunk, climb up the basement stairs, and haunt our dreams.) The Science paper concludes with examples of how culturomics might be used as “a new type of evidence in the humanities.” Yet some of the paper’s conclusions seem obvious: “People, too, rise to prominence, only to be forgotten.” Surely, that “One generation passeth away, and another generation cometh” is not a new concept. But their discussion of the impact of censorship is interesting. In any case, the field of Humanities currently needs all the help it can get.

We played around a bit with the Ngram Viewer. In one experiment we plotted “silence” against “noise,” and found that noise overtook silence around 1961, even though 1961 is the year Wesleyan first published Silence, by John Cage. Cage would have enjoyed the Ngram Viewer. Our Ngram Viewer chart plotting silence and noise is shown below: