Spot is an interactive real-time Twitter visualization that uses a particle metaphor to represent tweets. The tweet particles are called spots and get organized in various configurations to illustrate information about the topic of interest.
Spot has an entry field at the lower-left corner where you can type any valid Twitter search query. The latest 200 tweets will be gathered and used for the visualization. Note that Twitter search results only go back about a week so a search for a rare topic may only return a few. When you enter a query the URL is changed so you can easily bookmark it or send it to someone. The query brainpicker gives you a display something like this:
At the top left, next to the logo, are five icons to access the different views. The first is called Group mode and is shown above. Basically, tweets that share a lot of the same words are grouped together inside larger circles. Tweets are often grouped because they are retweets of the same original content but this doesn't have to be the case. They may be tweets from different people that don't even know each other but happen to be discussing the same thing. The intent is to show quickly the most popular things people are saying about a particular topic. Tweets that are more unique are placed in the phyllotaxy spiral to the right.
All the tweet spots show an image of the sender and at any time can be clicked on to see the tweet details. Clicking on the text of an open tweet will show the original in another browser window. Click on the background or an open tweet spot to close it or you can directly click on another spot.
Here is a complete list of the views and what they show:
The Word View, again for the query brainpicker:
The string 'brainpicker' matches the wonderful twitter account by Maria Popova and the results shown above are mainly retweets of or discussions about the tweets she has sent. You can also do a search for @brainpicker including the @ sign to see the latest tweets sent from that account. This uses the standard Twitter API to get the data and so can go back farther in time. The Word View for this query clearly shows the Brainpicker focus on books, reading, writing, art, and maps.
You can also retrieve the latest tweets from a twitter list. Here is an example for a list I created by analyzing who was on various lists created about data visualization. In the search field enter @Top100in/datavis and you should get something like this for the User View:
I was inspired to create this when playing with the wonderful Twitter visualization called Revisit by Moritz Stefaner. Another influence was the Stamen work on Digg swarm which is no longer active but there is a video. My academic background in physics makes it natural for me to think in terms of interacting particles.
This application was created with the wonderful Processing.js which is the javascript-based extension of the Processing tool I have used in the past. Thanks to Ben Fry, Casey Reas, John Resig, David Humphrey and the other people in the Centre for Development of Open Technology at Seneca College. Thanks also to Jim Bumgardner for the excellent tutorial on phyllotaxy spirals and to The Noun Project for four of the icons. Thanks also of course to Twitter and all the people who fill it with great content!
Performance is pretty good with the Chrome browser, and decent in Firefox and Safari. It will not work in Internet Explorer (except perhaps the new IE 9). It seems to work reasonably well on the newer iPads although the search field is broken currently in that environment. The application will go out and get new tweets periodically. For popular queries the analysis and display of those tweets will often cause lagging to occur.
Here is a Multiscale Mosaic of Obama created from hundreds of pictures taken during his time in office.
The Van Gogh Portrait Mosaics were fun but I wanted to try an example that uses photographs as opposed to paintings. I settled on a portrait of Obama because of the widespread availability of photographs of him that are free of copyright restrictions. The subimages for this design are taken from the White House's Flickr photostream and seem to have been primarily taken by Pete Souza. I downloaded the 1000 most 'interesting' photos from the stream and used those as input to my process. I also manually selected and hand-centered about 10 interesting regions from these images to augment the set.
Here is a close-up showing the detail near the eye and nose.
Here are four mosaic portraits of Vincent Van Gogh. The primary images and all the various component tiles are regions of paintings by Van Gogh.



A few more details on the multiscale mosaic process can be found in the post Multiscale Mosaics. The portrait images are all from WikiMedia Commons. The other Van Gogh paintings came from here. I created these by writing custom code in Processing.
I have been further refining my multiscale mosaic technique in search of the overriding goal of reconstructing an image from sub-images in such a way that balances the clarity of the large target image and the sub-images. I have tried out lots of ideas and the ones that seem to have the most potential for creating interesting multiscale mosaics are:
I have used a cropped region of Vincent Van Gogh's painting Self-Portrait With Grey Felt Hat as my target image while developing these ideas. The sub-images are sections of Van Gogh paintings. They are either the central squares or a few are manually selected square regions that focus on some interesting detail.
These techniques do seem capable of producing interesting mosaic images that can carry meaning at multiple visual scales.
The post Mona Mosaics showed a number of ways to segment a flat surface and build mosaics by filling regions with the average colour for that region in some underlying image. Here is another example of the same technique but this time using a Phyllotaxy spiral, sometimes called a Fibonacci spiral. It's an arrangement commonly found in plant growth - for example in the Sunflower.
Jim Bumgardner has an excellent tutorial where he develops the idea and gives code for producing the pattern and several variations. I'm using something based on his Example 10 code to produce the mosaic below from a simple radial gradient. I love the swirling spirals in opposite directions found in the pattern.
And of course we must apply it to the Mona Lisa image as well.
In the previous posts Mona Mosaics, Recursive Mona, and Blended Mona I played around with some ideas for reconstructing the famous Mona Lisa image in different ways. One of the things I did was to build up the image from smaller versions of itself. I was using simple image tinting and blending to get reasonable results.
This time I'm going to select sub-images from a set of pictures and use those to build the large image. This has been done for many years now and there are various tools to support it but I thought it would be interesting to try it myself. For this test rendering I'm using a small set of 23 images related to pizza. For simplicity they are all square images so they map well to the square regions determined by my algorithm. The algorithm selects the best-matching sub-image for each region and if the match isn't very good then it sub-divides the region and tries again at a smaller scale. This version uses blending to try and balance clarity of both the sub-images and the global picture.
For purposes of comparison here is the same image with no blending applied. You can see the sub-images more clearly but the overall image is only vaguely defined. This could be improved by using smaller sub-image pixels or a larger collection of sub-images to choose from.
The previous post, Recursive Mona, showed an image of the Mona Lisa constructed from smaller versions of itself. One of the things I don't like about that image, and most other 'photographic mosaic' type images, is that the grid structure controlling the sub-images is so visually prominent. Using multiple scales as I did helps to some degree but the regularity detracts from the overall image.
I've tried to improve this by breaking down the squares that require a more detailed rendering into subsquares in a more varied fashion. There are now 5 or 6 different splitting algorithms used to get the sub-components. This reduces the number of places where you see large numbers of consecutive tiles with the same geometry.
Another technique I've tried out is to blend the sub-images into the overall image at their edges. This tends to smooth out the edges between adjacent sub-images so it looks more natural and also has the impact of strengthening the overal global image. Here is Mona again with both of these techniques applied.
One of the ideas presented in Mona Mosaics was to break down an image into square areas at different scales where the colour doesn't vary much. A natural extension of this is to redraw a tinted version of the original image inside each square. Repeat a few times and you get a version of the starting image built recursively from smaller and smaller versions of itself. Here is an example of the concept applied again to the Mona Lisa.
Here are a few iconic faces that I have reconstructed with triangles. Source images came from 100+ Portraits of Iconic People of All Time. The faces are Che Guevara, Salvador Dali, and Audrey Hepburn.



A couple of years ago I explored reconstructing images based on Delaunay triangulization and Voronoi decomposition. Inspired by the work of Jonathan Puckey and Andy Gilmore I've revisited the idea of rebuilding images using some geometric-based simplification.
The source image for all these example is the Mona Lisa. The first rendering is a simple square grid where the colour of each square is the average colour in that region of the underlying image. By using a smaller grid size one can obviously get more detail than is shown here.
The image beside it is much more interesting. I start by looking at large square regions to see how much the colour varies. If it is fairly consistent then that implies there is less detail in that region and I can draw it as a simple large square. If the colour variation is higher than some threshold I look at the smaller subsquares and repeat the process recursively until some lower size is reached. This gives us a version of the image that has smaller more detailed squares where the image varies a lot and larger blocks of colour elsewhere.

Images 3 and 4 are similar but use triangular regions rather than squares. Another wrinkle which I added to the recursive process is to define a location on the base image that shows the 'center of attention'. I then vary the colour consistency threshold based on distance from that point. This allows for manually defining, to a limited degree, where the regenerated image will be more detailed. For these examples I used a point in the middle of the Mona Lisa's face.

The next 2 versions use circular regions which don't filll all the space so a background colour shows through.

These 2 fill the background of each circle with the average colour of that region and this gives a much more pleasing result.

This last image uses a recursive triangle decomposition as well but the sub-triangles are defined in a more varied fashion.
Edward Tufte defines Sparklines as intense, simple, word-sized graphics, that should also be high-resolution. They are a very useful technique, especially when combined with the idea of small multiples.
I generated the example below based on the results of the 2011 Major League Soccer regular season. In this case, a whisker-style sparkline was generated for each team to show the complete Win-Loss-Tie sequence for the season. A small upward blue bar shows a win, a grey bar in the middle a tie, and a downward red bar is, of course, a loss.
The graphic succinctly illustrates how each team did over the season. A few interesting tidbits:
Here are a couple of portraits done with a simple radial scan technique. Arc segments are drawn that are coloured by sampling an image source.
I created some print graphics for Live Magazine back in February. I enjoyed the project a great deal and would be very happy to tackle more print projects. Send me an email at web1@neoformix.com if you are interested.
The graphic shows a streamgraph illustrating the top selling automobiles in the UK from 1973 until 2010. The various series were sorted to group the same brands together as much as possible and to add the newer brands to the outside of the graph.
I used custom code created with Processing to create vector output in PDF format and then fine-tuned the graphics with Adobe Illustrator.
I made a couple of minor changes to the Neoformix.com website. The first was that I removed the google Ads. They made virtually no money and cluttered the display up unnecessarily. The second change was that I added a 'Tweet' button at the bottom of every article page to make it easier to share my content on Twitter.
I've created two new Word Portraits titled War and Peace. Both the template images of Hitler and Gandhi are from the wonderful Wikimedia Commons.
I experimented a bit with adding a more 3D impression to the image by using a tool to bring forward the brighter parts of the image. This was done more for the Gandhi image since the highlighted parts of Hitler didn't correspond very well to depth. The tool I used was DeepImage by Daniel Hawkes.
It has been very gratifying to see the interest in my recently launched Tweet Topic Explorer. In the week since it was made available there have been posts about it on Infosthetics, FlowingData, Cool Infographics, and many other places. It has also had over 1,200 tweets sent about it. Thank you everyone for trying it out and telling your friends!
Much of the initial attention came from people in Europe looking at non-English accounts. The tool was enhanced a few days after launch to ignore stop words in German, Italian, Spanish, French, and Dutch. It's not a perfect implementation and of course misses many common languages but it does make the tool more useful for many more people.
Another request for improvement that I was able to deliver was the capacity to analyze the tweets from Twitter Lists. You can now enter a list name in the field to see a Word Cluster Diagram for the latest tweets from the people on the list. The volume of tweets on a list is usually pretty high so the last 800 tweets (which is how many are used by the tool) will not go very far back in time. When using the Tweet Topic Explorer with a list the tweets on the right are enhanced to include the account and icon for the author of each tweet.
Here is the result for the Twitter List @Top100In/DataVis:
And here are a few others without the tweet list shown. @mashable/marketing:
And @Scobleizer/iphone-and-ipad:
One problem I face on a daily basis is to decide for a given Twitter account whether I want to follow it or not. I consider many factors when making the decision such as language of their tweets, frequency, whether they interact on twitter with other people I admire, or if I have some personal or geographic connection with them. But the most critical factor for me is whether they tweet about things that match my interests. Sometimes you can get a hint about this by looking at their short one line twitter bio but the best way is usually to scan their latest tweets.
I have created a new tool to help see which topics a person tweets about most often. It also shows the other twitter users that are mentioned most frequently in their tweets. I call it the Tweet Topic Explorer. I'm using the recently described Word Cluster Diagrams to show the most frequently used words in their tweets and how they are grouped together. This example below is for my own account, @JeffClark, and shows one word cluster containing twitter,data,visualization,list,venn, and streamgraph. Another group has word,cloud,shaped,post etc. It's a bit hard to see in this small image but there is a cluster about Toronto where I live and mentions of run, marathon, soccer. Also, there are bubbles for some of the people on Twitter I mention the most often: @flowingdata, @eagereyes, @blprnt, @moritz_stefaner, @dougpete.
For all these images below you can click on them to go to a live version of the tool.
Here is another example showing the full tool. This one is for one of my favourite accounts to follow, @brainpicker, by Maria Popova. In this case the word 'book' has been highlighted with a click and the list to the right shows the tweets that contain the word. The words in the tweet list are coloured if they appear in the word cluster diagram. Clicking a different word bubble will select that word instead. You can click on any twitter @ID in the tweet list to load the data for that account. The tool is currently configured to load the last 800 tweets. For my account this goes back a couple of years in time but for more prolific tweeters it may only span a few weeks. The entry field at the lower left lets you explore the tweets for any twitter user.
Here are a few more examples of the word cluster diagrams generated from some twitter accounts. @acarvin is doing an extraordinary job of covering the events in the Middle East.
A few years back I introduced the idea of Clustered Word Clouds which use word size to indicate frequency but also use positioning and word colour to group words together that were highly correlated in the text. It works reasonably well I think. See the example below:
I've come up with a new variation on this idea that tries to improve a couple of things. In many word clouds, including those generated by Wordle and my clustered clouds, the font size of the words are proportional to the word frequency. This has the effect that words with many letters (for example 'indisposed') cover a much greater area than a word with fewer letters (say 'ill') if they have the same word count. Some word clouds are constructed so that the area of the word is proportional to the word count rather than font height. This often has the opposite effect of unnaturally emphasizing words with fewer letters. My new design uses solid circles of colour whose area is proportional to the count. I think they may do a slightly better job of giving the proper visual emphasis to the words.
By using larger blocks of colour I think it's also easier to visually distinguish the groups in a clustered cloud. I'm calling this new variation a 'Word Cluster Diagram'. The one below is for the same text as the older style above but the clustering algorithm, and stop word list are a bit different so they aren't directly comparable. I think it has some promise although it's not as space efficient as using the words on their own.
Five years ago today, I published my first entry on Neoformix.com. I wasn't really sure if anyone would pay attention. You have, and for that I thank you all. Thanks especially to everyone who has written about my work or passed it along to your friends.
Except for the first few months, virtually all the images, interactive applications, and analysis presented on this blog were created using code I wrote with Processing. Thanks very much to Casey Reas, Ben Fry, and the community around that wonderful tool. Thanks to all the amazing researchers, coders, artists, and designers that have most directly influenced my work, especially: Ben Shneiderman, Martin Wattenberg, Fernanda Viégas, Ben Fry, Casey Reas, Chris Harrison, Nathan Yau, Lee Byron, Moritz Stefaner, Jonathan Feinberg, Gui Borchet, Jer Thorp, Robert Kosara, Andrew Vande Moere, Manuel Lima, Frederik Vanhoutte, Mario Klingemann, Robert Hodgin, and Tom Carden.
I've selected images from a few representative posts from the past five years. Click on the image to visit the respective post. Thanks again everyone and I'm looking forward to what the next five years will bring!










I have been collecting tweets containing the words 'love' and 'hate' for a couple of years now and decided to analyze them to see what could be discovered. It was a fun project that I finished just in time for Valentine's Day. I hope you love it!
For the data I chose to use every tenth tweet containing the word 'love' and every tenth tweet containing the word 'hate' from all of 2010. This yielded 658,391 love tweets and 503,489 hate tweets. Incidentally, this means there were roughly 6.5 million tweets last year containing 'love' and about 5 million containing 'hate'.
The first set of diagrams in the graphic show the love/hate ratio for various sets of related words. Basically, I counted the number of times a word appeared together with 'love' and together with 'hate'. A simple percentage of 'love' associations out of the total gives a basic measure of sentiment - let's call it the Love Quotient ;) A value near 100% means the word is used almost exclusively with 'love' and never with 'hate' and the graph will show hearts all the way to the right side. Each full heart represents 5% over the 50% neutral point so, for example, 'amazon' has six and a bit hearts showing so its' Love Quotient is about 82%.
Using simple word association is a pretty crude measure of sentiment. It obviously would be fooled by a sarcastic tweet like: Ugg - liver and onions again. Don't you just love the food in the cafeteria? Even so, by looking at large quantities of data it seems to give reasonable results in many cases. The data definitely settles the age-old question: pie > cake!
The diagram with all the photos is actually a Treemap. Surprisingly, this is the first treemap to appear on Neoformix since my second post back in April of 2006 about The Map of the Market. This one shows the people who were mentioned most frequently with the word 'love'. It's dominated by celebrities, mostly singers who appeal to young teenagers.
The StreamGraph shows how the word 'love' was used together with various sports over the course of 2010. The term 'football' combines references to both american football and international football (soccer). You can see the peak in June for the World Cup and peaks for both hockey and skating during the winter olympics in February.
Text analysis and creation of the various graphics was done with custom code created in Processing. The Treemap diagram used the Treemap library created by Benjamin B. Bederson and Martin Wattenberg. Thanks!