News Spectrum

By: Jeff Clark    Date: Tue, 13 May 2008

Introducing News Spectrum ! It is a visualization of the words used for two topics in the latest results from Google News. One topic is coloured blue, the other red, and the associated words are coloured and positioned based on how highly they are associated with the two topics. Click on any word to see the related Google News results.

This is a generalization of my recent Obama McCain News Spectrum that allows you to enter your own terms of interest. Press the 'Enter' key to generate the spectrum after entering your words. The layout algorithm has also been improved to minimize the number of overlapping words. Give News Spectrum a try ! As always, feedback is welcome.

Thanks to Google News for the data, Processing.org for the tools, and Chris Harrison for the inspiration behind the design.

News Spectrum (static image)

Obama McCain News Spectrum

By: Jeff Clark    Date: Mon, 12 May 2008

I was thinking about the Word Association Spectrums created by Chris Harrison and thought it might be interesting to create something similar using live data. I've come up with a little application that gets the latest google news results for two terms of interest and generates a word spectrum based on the words found in the results. I removed stop words in order to highlight the words more likely to be of interest. It's an obvious drawback that there are often many hard to decipher overlapping words but it's kind of fun to play with nevertheless. This initial version shows a news spectrum related to the terms 'Obama' and 'McCain'.

Obama McCain News Spectrum (static image)


(More...)

Disease Gene Map

By: Jeff Clark    Date: Fri, 09 May 2008

The New York Times has published an interesting interactive diagram depicting the relationship between various diseases and the genes that are known affect them. The large circle in the image below is zoomed in on one part of the diagram. [via FlowingData]

Word Association Spectrums

By: Jeff Clark    Date: Fri, 09 May 2008

Chris Harrison has a wonderful collection of visualizations one of which I featured recently in More Color Name Graphics.

Chris recently posted a set of beautiful Word Association Spectrums based on an extremely large dataset from Google containing word bigram distributions. The example shown below is for the words 'war' and 'peace'. The horizontal position of the various words indicate whether they more frequently follow 'war' or 'peace' in the analyzed text. So the word 'memorial' is positioned very close to the left (at the bottom) because the bigram 'war memorial' occurs much more often (normalized by overall counts) than 'peace memorial'. The vertical position is random.

My own Document Contrast Diagrams also stretch out words along a horizontal axis based on the strength of association between two poles. My diagrams try and express a lot more information as well - probably too much. Chris's Word Association Spectrums carry less information. This simplicity allows for a much more elegant design. He has generated spectrums for other interesting word pairs like 'kids:adults' , 'good:evil', and 'american:chinese'. I might like to see versions that don't show the common prepositions so that the nouns, verbs, and adjectives stand out more.

Word Association Spectrum for War and Peace (click to visit Chris Harrison's Post)

May 6th Speech Contrast Diagram

By: Jeff Clark    Date: Wed, 07 May 2008

I ran the speeches delivered by both Obama and Clinton last night after the May 6th primary results and used them to build a Document Contrast Diagram. See the link for a description of how to interpret the diagram.

May 6th Primary Speech Contrast Diagram (click to see larger version)

May 6th Primary Speech Cloud Comparison

By: Jeff Clark    Date: Wed, 07 May 2008

I have taken the speeches delivered by both Obama and Clinton last night after the May 6th primary results and used them to build a Document Cloud Comparison. It shows which words were used together by each speaker using linked word clouds. A static image is shown below for references to the word 'change' to give you a flavour but the real fun comes with exploring the interactive application.

If you enter a blank focus string in the application it shows a standard word cloud and colors words that are unique to one speaker or the other. The top words used by Obama and not by Clinton include 'side' , 'down', 'government', 'values', 'yes', 'lead' , 'life', 'kind', 'trust', and 'united' . Those used by Clinton uniquely include 'keep', 'feel', 'journey', 'working', 'invisible', 'west', and 'story'.

'change' Associations and References (static image)

Give it a try yourself. The application is written in Java so you may have to wait a few seconds for it to start up.


(More...)

State of the Union Sentence Bars

By: Jeff Clark    Date: Thu, 01 May 2008

As I pointed out in my last post, Directed Sentence Drawings generated from a text make it extremely difficult to see in what order the various topics were discussed and that a simple bar for each sentence in the order they occurred in the text and coloured by topic would be much better in most respects. I've built a graphic to show what I mean. I have also added the most frequent topic words for each set of 10 consecutive sentences.

State of the Union - Sentence Bars with Topic Colours (click to see larger version)

Click on the up arrow below if you found this interesting:

Directed Sentence Drawings

By: Jeff Clark    Date: Thu, 01 May 2008

In my post earlier today about Sentence Drawings I mentioned that the overall shape of the graphic doesn't really express anything useful. I have come up with a variation on the idea that tries to address this.

In the sentence drawings produced by Stephanie Posavec or David Sparks each line segment is turned 90 degrees to the right relative to the previous one. This makes the overall shape highly sensitive to minor variations in the text which is why the overall shape doesn't carry much meaning - it's almost random.

I call my diagrams Directed Sentence Drawings because the direction of the line segments are a function of their topic. As before, each sentence is assigned a topic or remains neutral based on the vocabulary it contains. I place a neutral point in the middle of the diagram and four other topic points form a diamond shape around it (see below). For the State of the Union diagrams produced below I used the four topics Government, Domestic, Economy, and Security. The algorithm is as follows:

  1. start at the neutral point
  2. find the topic for the sentence and use it to set the color for the line
  3. draw the line from the current position towards the topic that it is about
  4. the length of the line is proportional to the length of the sentence
  5. if the line is continuing in the same direction as the last segment, draw a small circle at the starting point
  6. if the line is reversing direction, use a small arc to shift it over so it doesn't overlay the previous segment

The diagram immediately below is constructed from the State of the Union Address for the year 2000. It shows there were many sentences about both Domestic and Economic issues, a fair number concerning Government and fewer about Security. The dominant colours give this away but also the overall shape makes it obvious. There is a greater density of lines near the Domestic and Economic topic nodes.

Directed Sentence Drawing for SOTU 2000

This next diagram is for the SOTU of 2001, the first delivered by George W. Bush. It's obvious that it was much shorter, had even less discussion of Security issues than Clinton's in 2000, and also not much sustained discussion about Domestic issues.

Directed Sentence Drawing for SOTU 2001

The SOTU for 2002 was delivered after 9/11 and clearly shows that Security has become the predominant concern.

Directed Sentence Drawing for SOTU 2002

This last diagram is for the SOTU of 2008 and shows that Security is still very topical but that Economic and Governmental issues are starting to recapture attention.


(More...)

More Sentence Drawings

By: Jeff Clark    Date: Thu, 01 May 2008

I posted a few weeks back on Stephanie Posavec's interesting graphics constructed from the text of Kerouac’s On the Road. One of her pieces featured Sentence Drawings that were generated using each sentence in sequence with line segments coloured to reflect the topic and sized based on the length of the sentence.

David Sparks has constructed a set of similar sentence drawings for the State of the Union addresses delivered by Bush over his 8 years in office.

David Spark's Sentence Drawing for SOTU 2008 (click to see graphic with all 8 addresses delivered by Bush)

I find these interesting to look at. However, the dominant visual feature is the overall shape of the graphic and I don't think it really expresses anything useful.

More Color Name Graphics

By: Jeff Clark    Date: Fri, 25 Apr 2008

Dolores Labs has posted an update on how people have used their color name data in various ways. They linked to my own Color Names Explorer - thank you very much ! Their post is called Color flowers, networks, photos, and even 3D and has several more interesting views of this data. The one that really caught my eye was by Chris Harrison who created a flower-like image by rendering the names in their associated color and varying the position by hue along the radius. I don't think many of these images, including my own, are particularly useful, but they sure are interesting to look at !

Chris Harrison's Color Name Flower (click to see larger version in original article)

 

Color Name Flower Closeup

Portfolio

By: Jeff Clark    Date: Sat, 19 Apr 2008

There is a new Portfolio link available from all pages on my weblog. It links to a simple index of my most interesting or useful applications and gives a pretty good idea of the kinds of things I like to create.

I'm currently available for data analysis or visualization projects if anybody is interested in working together. I live near Toronto, Canada but I'm open to projects done remotely. I would be happy with creative projects that vary in size from a few days to a few months of work. Send me an email if you are interested.

Pennsylvanian Debate Word Cloud Comparison

By: Jeff Clark    Date: Thu, 17 Apr 2008

I have taken the words spoken by both Obama and Clinton during the Pennsylvanian Democratic debate held on April 16th, 2008 and constructed from them a Document Cloud Comparison. Basically, it lets you see which words were used together by each speaker using linked word clouds. A few static images are shown below to give you a flavour but the real fun comes with exploring the interactive application.

If you enter a blank focus string in the application it shows a standard word cloud and colors words that are unique to one speaker or the other. The top words used by Obama and not by Clinton include 'politics' , 'decade', 'election', 'economic', 'somehow', 'generation' , 'mission', 'forward', and 'problem' . Those used by Clinton uniquely include 'york', 'begin', 'world', 'best', 'support', 'administration', 'police', and 'hope'.

'Country' Associations and References (static image)

'jobs' Associations and References (static image)

Give it a try yourself. The application is written in Java so you may have to wait a few seconds for it to start up.


(More...)

Pennsylvanian Debate Comparison

By: Jeff Clark    Date: Thu, 17 Apr 2008

I have taken the words spoken by both Obama and Clinton during the Pennsylvanian Democratic debate held on April 16th, 2008 and constructed from them a Document Contrast Diagram. See the link for a description of how to interpret the diagram.

It shows that they spoke roughly the same number of words but with Obama speaking slightly more. Both were slightly positive in overall emotional tone with some areas of negativity related to guns and security for Clinton and taxes for Obama. There was a great deal of overlap in the words used by the two speakers with the words 'kind', 'Democrats' , 'important', 'country', 'make', 'work', 'president', 'can', 'take' , 'right', and 'guns' being frequently used by both. 'Know' was used a lot by both but more often by Clinton. They both spoke each others names much more than their own but Obama used Clinton's name more often than the reverse.

Key words used frequently and uniquely or much more often by Obama included 'true' , 'statement' , 'economic' , 'issues', 'election', 'confident', 'George' , 'American', 'policy', 'politics', 'income', 'change', 'General', 'ideas', 'Chicago', and 'individuals'. Words used frequently and uniquely or much more often by Clinton included 'decisions', 'stay', 'withdraw', 'Iran', 'failed', 'begin', 'world', 'military', 'best', 'York', 'administration', 'Philadelphia', 'impose' , 'order', 'police', and 'oil'.

Pennsylvanian Debate Contrast Diagram (click to see larger version)

Pennsylvanian Democratic Debate

By: Jeff Clark    Date: Thu, 17 Apr 2008

I added the transcript for the Pennsylvanian Democratic debate held on April 16, 2008 to the interactive Transcript Analyzer. The image below is smaller (and more blurry) than from the application but gives a rough idea of what was discussed by which candidate and when. Here are the primary topics covered in order:

  1. mixed introductory comments (jobs + health + foreign + policy)
  2. guns + religion
  3. wright + remarks
  4. foreign + policy + iraq + iran
  5. tax + economy + jobs
  6. guns + ban
  7. mixed closing comments (jobs + health + foreign + policy)

Notable by their absence were the words 'immigration' and 'nafta' .

Democrat Debate - Apr 16th, 2008 ( click for interactive application )

One small refinement was made to the application. The counts and bars for the various words will now also include simple plural variations. So references to 'jobs' will also include 'job', and references to 'gun' would also include 'guns'.

Give the Transcript Analyzer a try yourself and, as always, feedback is welcome !

Stephanie Posavec

By: Jeff Clark    Date: Sun, 06 Apr 2008

One of the areas I have been exploring here on Neoformix is the notion of constructing graphics in an algorithmic fashion from textual data. The site NOTCOT has just published an article on some interesting work by Stephanie Posavec that explores this same idea. She has constructed a number of different works based on the text of Kerouac’s On the Road. From NOTCOT's article:

The maps visually represent the rhythm and structure of Kerouac’s literary space, creating works that are not only gorgeous from the point of view of graphic design, but also exhibit scientific rigor and precision in their formulation: meticulous scouring the surface of the text, highlighting and noting sentence length, prosody and themes, Posavec’s approach to the text is not unlike that of a surveyor.

Here are a few images that will give you a taste and a rough idea of what they mean. Although definitely more on the artistic side of information visualization, I like these images and the ideas behind them a great deal.





Obama/Clinton Economic Speech Contrast Diagram

By: Jeff Clark    Date: Fri, 28 Mar 2008

Recently both Clinton and Obama delivered speeches related to the economy. Clinton's was more focussed specifically on the housing crisis. I took the text of Clinton's Halting the Housing Crisis and Obama's Renewing the American Economy and created a Document Contrast Diagram.

It clearly shows that they were about the same length, both slightly positive in overall emotional tone but Clinton's text varied more in tone. The large blue word circles for 'mortgage', 'housing', 'crisis', 'families', 'foreclosure' show the primary topic of interest for Clinton. Obama's mostly unique key terms were 'American', 'financial', 'risk', 'system', 'regulatory', and 'institutions'. The blue segments in the middle of Obama's speech show that he used words in that section more strongly associated with Clinton overall. This is where he discussed the housing crisis.

Obama/Clinton Economic Speech Contrast Diagram (click to see larger version)

Color Names Explorer

By: Jeff Clark    Date: Thu, 27 Mar 2008

Dolores Labs recently did an interesting experiment where they showed many people samples of colors and asked them what they should be called. They posted a graphic that showed the color names that people used for the various colors.

Dolores Labs' Color Name Cloud (click to see larger version in original article)

They also posted the raw data for other people to play with. Martin Wattenberg at IBM Research took the data and created a much more beautiful graphic. Nathan at FlowingData discusses the design differences in the post A Little Bit of Design Goes a Long Way With Infographics.

Wattenberg's Version of the Color Name Cloud (click to see larger version in original article)

I decided to try my hand at building a simple interactive 3D explorer for the data as well. I combined entries with the same name and found the average RGB values. The frequency count was used to highlight the more common names by scaling the size of the text in a manner likely similar to that used by Wattenberg. I then plotted the names in 3D using the red (x), green (y), and blue (z) components of the color value.

Color Name Cloud - initial view



Color Name Cloud - zoomed in view

The initial view is similar to Wattenberg's but not spaced out as nicely. My version also suffers from the fact that the size of the name depends on both frequency of use and how much blue the color happens to contain since the more blue a color has the closer it is drawn to the front of the display.

You can try out the color name explorer below. Can you find the shade somebody called 'baby poop' ?


(More...)

Ontario Budget Speech 2007-2008 Contrast Diagram

By: Jeff Clark    Date: Wed, 26 Mar 2008

I'm a proud citizen of Canada and have decided to include a bit more analysis of Canadian-themed data and text in the future.

Yesterday the 2008 Ontario budget speech was delivered which outlines the governments' priorities for the coming year. I have constructed a Document Contrast Diagram from the text of the 2007 Ontario Budget Speech and the 2008 Ontario Budget Speech.

Document Contrast Diagram for 2007/2008 Ontario budget Speeches (click to see larger version)

My first post on Document Contrast Diagrams will give some guidance on how to interpret the image. Here are a few things I noticed that are illustrated by the diagram. You may have to view the larger version to see some of these details.

  1. The 2007 speech was slightly longer.
  2. Overall, both speeches had a positive emotional tone.
  3. Some of the primary words common to both speeches were 'Ontario', 'years', 'Speaker', 'health', 'today', 'improve', 'municipalities', 'care'
  4. Common words used a bit more often in 2007 include 'Budget', 'economic', 'rates', 'province', 'support', 'provide'
  5. Common words used a bit more often in 2008 include 'business', 'tax', 'plan', 'help', 'communities', 'continue', 'public'
  6. Words used much more often in 2007 include 'child', 'children', 'families', 'means', 'reassessment', 'surpluses', 'reserve', 'greenbelt', 'clean', 'car'
  7. Words used much more often in 2008 include 'invest', 'jobs', 'government', 'students', 'school', 'grants', 'skills', 'training', 'build', 'create', 'infrastructure', 'partner', 'Toronto'
  8. The 2007 speech had no segments of strong negative emotional tone.
  9. The 2008 speech had a couple of segments of moderately negative tone - one associated with 'jobs' and the other with 'funding'.

Super Tuesday Contrast Diagrams

By: Jeff Clark    Date: Tue, 25 Mar 2008

The image below shows the Document Contrast Diagram from the remarks made by both Clinton and Obama after the Super Tuesday primaries on Feb 5th.

Document Contrast Diagram for Clinton/Obama Super Tuesday Remarks (click to see larger version)

My first post on Document Contrast Diagrams will give some guidance on how to interpret the image. Here are a few things I noticed that are illustrated by the diagram. You may have to view the larger version to see some of these details.

  1. The two segment columns show that Obama's speech was longer - it had roughly 40% more words.
  2. There was a pretty strong difference in the vocabulary used. There are lots of large word circles that are coloured strongly red or blue.
  3. There were many common words as well. Some of the most frequently used words that were used about the same number of times by both speakers are: 'Thank', 'mortgage', 'voted', 'states', 'year', 'President', 'war', 'deserve', 'health', 'across', 'challenges', and 'young'.
  4. Words used frequently and primarily or only by Clinton include: 'America', 'day', 'voice', 'opportunity', 'world', 'life', 'country', 'child', and 'nation'.
  5. Words used frequently and primarily or only by Clinton include: 'Washington', 'time', 'different', 'can' , 'change', 'cannot', 'Yes', and 'boys'.
  6. The emotional tone varied more in Obama's speech than in Clintons.
  7. The segment with the most negative tone in Obama's speech occurred around the middle and was related to 'Bush'.
  8. The segment with the most negative tone in Clinton's speech occurred near the end and was related to 'war'.
  9. Overall, both speeches had a positive emotional tone.

Document Contrast Diagrams

By: Jeff Clark    Date: Thu, 20 Mar 2008

A Document Contrast Diagram is a visual summary of the content of two text documents that illustrates shared words, words that are unique to one document or the other, word frequency, relative size of the two documents, distribution of emotional tone within the documents, related words based on co-occurence, and the most common word in each document segment. Have a look below at the Document Contrast Diagram for the 2007 and 2008 US State of the Union (SOTU) Addresses. If you wish you can click on the image to see a larger version.

I'm hoping that much of the following is reasonably intuitive but here are a number of points regarding interpretation:


(More...)

Older Posts...