Back To The Classics

I got my hands on Edward Tufte’s The Visual Display of Quantitative Information. I would like to take a moment to appreciate the physical copy of the book. I got a kind about a year and half ago and most of my reading has been via e-books for the last year. It is nice to feel the paper between your fingers again! By now, a reader of this blog must already know that I am not a designer, but rather more of a quantitative person. Most of my coursework is in the sciences. When I talk about my data visualization work and interests, Tufte’s name came up time and time again. While there have been some newer books in visualization, I wanted to make sure that Tufte’s seminal work. Tufte makes the point that has been hammered into me by this point in my visualization career. Garbage in, garbage out. A graphic is meant to reveal something about the data. If you want to allow the reader to find individual points, then the graphic that should be used is a table (the head of my lab calls these ‘display items’; we are only allowed a certain number of display items when we are publishing in a journal). However, what if the comparison that you are making does not make logical sense? The classic example of shark attacks being correlated to ice cream sales. Why is this interesting? To me, such a comparison is just noise. Scientists do it ALL THE TIME. Maybe because what will advance our career is to publish that next big Nature, Cell, or Science paper, we make absurd correlations and use this as permission to launch extensive investigations to establish causality. Tufte shows the graphic below I cannot think of a logical way to link solar radiation to stock prices. Maybe agricultural output that correlates with solar radiation leads to simulation of economies in the Northern hemisphere? When the theory presented by the graphic is too convoluted for an apt reader to grasp, maybe the theory isn’t correct. I wanted to highlight an interesting explanatory data visualization that Tufte calles a time-space graphic. The author is trying to showing the life history of the Japanese beetle. The horizontal axis is the time vector and the vertical access is the proximity to ground level. If I were communicating the same data, my instinct would be to use something like below. However, my way of communicating the data is much more abstract than the graphic that was published. Why was my inclination to communicate different stages as points? I think it might be because when I view a dataset, I am not actively thinking about where the points in my spreadsheet are coming from. They are just points to me. Herein lies the challenge of data visualization. There are an infinite number of ways to generate a visualization with a dataset. How do we prevent our brain from guiding us towards the same type of visualization even if it is not appropriate for a given dataset?