Pin It

Half a Trillion Words 

click to enlarge GOOGLE NGRAM CHART
  • Google Ngram chart

It's been called "Angry Birds for Nerds," garnering over a million hits on the day of its release. If you haven't played with Google's latest data-visualization tool, the "Ngram Viewer," I heartily recommend you set aside five minutes of computer time. Or five hours. Or five days. Warning: it's as addictive as catapulting birds. (books.google.com/ngrams/)

Over the past six years, Google has digitized over 15 million books, word for word. A subset of these -- 5.2 million books going back to 1500, comprising 500 billion words -- forms the Ngram Viewer's dataset. Whatever complicated magic went into creating the program, the user experience is simplicity itself. In a nutshell, the viewer shows the frequency with which words or short phrases occur in those 5.2 million books, year-by-year. The output is a simple graph where the x-axis shows the years, defaulting to the period 1800 to 2000. The y-axis shows what percentage of the books (by language, defaulting to English) contains those words or phrases.

Sounds pretty basic, yes? But this simple idea, if the ambitions of its inventors are fulfilled, will lead to a new science they have dubbed culturomics, "what becomes possible when you apply very high-turbo analysis to questions in the humanities." Here are a few sample findings of the viewer:

Written references to celebrities faded twice as quickly in the mid-20th century as they did in the early 19th. Think 7 1/2 minutes of fame in the future.

Not just celebrities. Apparently, we forget the past sooner than we used to in general. For example, the viewer shows more references for the year 1900 in the decades following it, than for 1950 in the decades following that year.

Censorship shows up loud and clear. Take the contrast between English and German books referencing Jewish artist Marc Chagall. After 1933 (when the Nazis came to power), references to Chagall in German books plummet to zero by 1940, while English books for that period follow a "normal" curve for artists.

Compare "women" and "men" to see (presumably) the effect of feminism. The former term barely rates prior to 1970, but surpasses the word "men" in 1986 -- only to be overtaken by "men" in 2001. Any theories?

Fifty years ago, British scientist-novelist C.P. Snow bemoaned the gap between the "two cultures" of the sciences and the humanities. Google's Ngram Viewer is a giant step in rapprochement, a meta-tool that anyone can use to compare and analyze social trends, cultural tendencies, fashion crazes -- virtually anything that's been written about -- and then come up with their own theories why the curves plot the way they do. Send me your discoveries, and I'll feature them in a future column.

Barry Evans (barryevans9@yahoo.com) needs to get a life after spending far too much time playing with this toy. His travel photos are on display at Ramone's, Old Town Eureka, through November.

 

Tags:

  • Pin It

Comments

Subscribe to this thread:

Add a comment

About The Author

Barry Evans

more from the author

Latest in Field Notes

© 2014 The North Coast Journal Weekly

Website powered by Foundation

humboldt