Problems With Topic Flowers

By: Jeff Clark    Date: Sat, 26 Aug 2006

The Topic Flower concept has been fun to play with for the last little while. The idea of automatically transforming words into some highly visual form that illustrates some specific features of the text is intriguing to me. However, I'm not sure this particular visualization would be practical for most real world problems.

Here are some of the problems I see with this form:

  1. Low information density - I'm showing the top three topics using colours and two other features of the text using petal shape and the amount of 'hair'. That's five attributes shown using quite a large image. There are lots of ways we could show the same amount of information that would be much more compact. It would be difficult to show more than 20 or so Topic Flowers in the same visual space on a computer display. This makes them impractical to use in many situations.
  2. Over-reliance on colour - Can you tell the difference between a small amount of orange or red on the edges of petals ? What about people who are colour-blind ?
  3. Some information carrying features are too subtle - The amount of little hairs on a topic flower are difficult or impossible to estimate for smaller images.
  4. Some strong visual features of a Topic Flower carry no information - The number of petals for each level is random. This has a large visual impact on the flower. It could mean that the shape of two Topic Flowers could be quite different even if the text they were both based on was almost identical. This random feature also interacts with the 'petal shape' which is supposed to carry useful information. By this I mean that rounded petal shapes are supposed to mean something about the text but the roundness of the petals also depends on a random number. The randomness muddies the interpretation.
  5. The visual attributes only show qualitative differences. I'm measuring specific values for certain features of the text but only showing which topics are highest - not the actual measured values.
  6. The images take too long to generate.
  7. I'm sure there are other weaknesses that I've missed as well.

Many of these problems could be addressed with a better implementation but I suspect some of them are fundamental.


