Clustered Word Clouds

By: Jeff Clark    Date: Fri, 10 Oct 2008

Tag clouds have become a commonplace method of illustrating the popular words used in a text and do a good job of communicating the gist of what the text is about. The tag cloud below was generated by the wonderful Wordle from the text of 'I Have a Dream' by Martin Luther King Jr. Many people familiar with the famous speech would likely recognize it from this cloud of words. Wordle provides lots of options to control colors, fonts, and the style of layout and produces an excellent result.

I Have a Dream - Martin Luther King Jr.

One critical drawback of tag clouds is that the words are scrambled (or sometimes positioned strictly by frequency) and one cannot tell from the cloud which words were actually used together in the original text. One powerful line from the speech is: little black boys and black girls will be able to join hands with little white boys and white girls as sisters and brothers. The key words from this sentence appear in the tag cloud above but they are so disconnected visually that the meaning is completely scrambled.

I think we can do better. I developed Word Association Clouds a few months back to allow the use of the familiar tag cloud style layout to navigate related words in a text. Why not use word 'relatedness' to control positioning in a tag cloud layout ? Here is my first attempt below.

I partitioned the words into clusters based on how often they were used near each other in the text. I then positioned the words in the largest clusters near each other and used color to emphasize the structure. It's a bit tricky to position them with an algorithm so that the groups stay together and the overal layout is compact so there are a few more gaps than I'd like. I think, overall, that it came out pretty well.


Presidential Debate
Clustered Word Clouds for Books