Tom Sawyer Character StreamGraph

By: Jeff Clark    Date: Tue, 17 Jun 2008

The above image is a StreamGraph for the book The Adventures of Tom Sawyer, by Mark Twain. Click on it to see a larger version. It seems to do a pretty good job of communicating the ebb and flow of the various characters throughout the book. The Mississippi River figures prominently in the book so a stream-like representation of the text seems appropriate.

I have adapted the StreamGraph code used to create the various Twitter Topic Streams so I can create StreamGraphs from arbitrary text documents. The document is split up into 25 equal sized segments and the word counts are done within each segment. These segments are used in place of time along the horizontal axis of the StreamGraph. This document StreamGraph again focuses on capitolized words but ignores a few common ones like 'Mr' and 'Mrs'. I'm also using a longer format for the graph and showing two labels for each word series - one on the left half of the graph, and one on the right. The difference in label size for the same word can show whether it was used more frequently in the first or second half of the document. In the 'Tom Sawyer' graphic above you can clearly see that both 'Ben' and 'Mary' are more prominent in the first half of the text but that 'Huck' is more common in the second half.

 


Twitter Topic Streams for some Top Users
Blog
Little Brother StreamGraph