Enrico Bertini - Dataviz Beginner’s Toolkit #1: Books and Other Resources

Another valuable post I found via the blog fellinlovewithdata

One of the main goals of this blog, other than challenging the status quo with reflections at the intersection between academics and practitioners, is to help people become data visualization experts. It’s not rare for me to receive emails from people who are enthusuastic about visualization but have little guidance about how to become an expert.

I have been posting some few articles in the past with this specific goal but I realized that they are too scattered and not organized in a way to represent an organic resource for the readers.

For this reason, I decided to create a series specifically designed to help those of you guys who are excited about visualization but really don’t know how and where to start. The series is meant to be part of a permanent collection in FILWD and it’s my first serious attempt to react to my own call to action: “When will we decide to provide lots of value?“.

Introducing the series

The Data Visualization Beginner’s Toolkit will function as an orientation guide for poeple who need guidance in finding the right resources to become data visualization experts. In the guide I will not be teaching visualization directly, nothing technical or theoretical about it (I have plans for this later), but I will show you the resources and one path.

Having such a guide is particularly important today because data visualization is really just like a jungle. There are plenty of opinions, blog posts, research papers, consumerist visualizations, books, etc., and it’s very hard to separate the wheat from the chaff.

When reading this series please keep in my this is my very personal view and, as such, is limited to my own experience. Also, whatever list I will propose is certainly neither unique nor exhaustive. If you are looking for an exhaustive list of resources I highly recommend you Andy Kirk’s collection of data visualization resources.

Here is a tentative list of topics I am planning to cover in the series (subject to changes):

Books and Other Resources
Programming Languages and Tools
Sources of Good Examples
Research Papers
University Courses

Please if there is anything else you would like to be covered let me know! Send me a message or add a comment below.
Books about Visualization

There is a reason why I start the series with a list of books: if you don’t know the basics of data visualization you will always be an amateur. And what’s worse, visualization experts will notice it and will not take your work seriously.

Also, orienting yourself in the mess we have right now might prove discouraging and prone to errors. If you type “data visualization” in Amazon the result is a disaster, believe me.

Finally, even if you end up picking up very good books, it is definitely possible they are not the right ones given the amount of knowledge and expertise you currently have. Here I suggest the following path (in order).

Show Me the Numbers: Designing Tables and Graphs to Enlighten (To acquire solid foundations). This book teaches the basics of visualization by using only tables and simple charts. You won’t find fancy and colorful visualizations, only scatter plots, bar charts and stuff like that. But that’s the way to go! If you understand the basics then it’s a lot easier to spot the limitations of basic graphs and go beyond them. Plus the book contains the best summary of visual perception applied to visualization I know. It really is a true gem. Don’t make the mistake to be attracted by fancy stuff and skip the basics, start here and you will have very solid foundations.

Readings in Information Visualization: Using Vision to Think (Chapter 1 only) (To go beyond simple charts). Once you understand how charts work and you have learned the basics of visual perception, you are ready to explore fancier stuff. Yet you need some guidance on how to explore the huge data visualization space. The first chapter of this book is the best self-contained piece of work I know. It’s able to provide all it’s needed to start thinking more creatively, but in a structured manner, about advanced visualizations. The book also has a strong emphasis on interaction which is important. If you want to go beyond the first chapter fine, but the book itself is a collection of papers and many of them are totally outdated. But wait a moment, this doesn’t means you cannot find useful material there! In the collection you can find fundamental papers that are totally worth a read: the work of Jacques Bertin above all.

The Visual Display of Quantitative Information and the rest of Tufte’s books  (To learn what “graphical excellence” is). People go crazy with Tufte’s book and I understand why: they are totally beautiful, the cover, the format, the colors, the contet, everything. But regardless their beauty, I have always thought it’s really hard to learn something out of them; they require you to think really deeply about what you see. Basically they are “just” a collection of images. The Visual Display of Quantitative Information is the first one and is the only one I truly recommend because it give more guidance than the others. The others are wonderful but you will have to spend more time on them to translate their content into design practices.

Information Visualization: Perception for Design (To know what happens in our brain when we see a visualization). If you have read all the books cited above congratulations! You have learned really a lot. Now, information visualization is deeply rooted in visual perception and cognition. If you want to master the art of visualizization, at some point you will have to know these basics; especially if you aspire at designing innovative visualizations that fit people’s needs. This book starts from the very basics of human vision (e.g., how the eyes work) up to how we think with visualizations. It’s a tough read but it’s totally worth it. You will have to spend quite some time thinking how these theories apply to your specific projects, but believe me, it’s a true investment. I experienced countless situations where a visualization design problem was deeply rooted in one of the issues discussed in this book. You will find yourself referring back to it all the time.
More Books about Visualization

Important: Are the books not mentioned above bad or not worth it? Absolutely not.

It is important to consider two factors: (1) there are several books I have never read or even skimmed through which might provide some additional value to you; (2) there are extremely valuable book I’ve not included just because they are either too advanced or don’t fit the progression of readings I am proposing here. Please keep in mind: I am suggesting you to read these books in the order I gave above.

A few additional books that come into my mind, which need at least a short mention are:

Any other book written by Stephen Few.
Any other book written by Edward Tufte.
The statistics-flavored and super-classic Visualizing Data and The Elements of Graphing Data by William Cleveland.
The monumental Semiology of Graphics by Jacques Bertin, which I did not include because it is still hard to get despite a new edition came out and because it’s really a hard read for non-experts.
The extremely beautiful and information rich Visual Language for Designers by Connie Malamed, which I did not include because I haven’t finished reading it yet.
The deep and dense How Maps Work by Alan MacEachren, which despite the title teaches visualization and makes you think deeply about it.
The not known enough and little gem Designing Visual Interfaces by Kevin Mullet and Darrel Sano, which teaches aesthetics in a functional and systematic manner.

Books NOT about Visualization

It’s important to acknowledge that not all the knowledge a visualization expert needs comes from data visualization books. I have no intention to write another long list of related disciplines’ books, but it’s important for you to know that a good data visualization expert may have strong foundations in areas such as: statistics and data mining, data management and manipulation, human-computer interaction and cognitive science.

I don’t want to scare you: you can start doing visualization without these, but little by little you likely will find yourself digging more into these areas.

Also, let me stress the importance of human-computer interaction and related areas. While the rest is normally acquired, at least on a superficial level, by using various technologies you encounter along the way, human-computer interaction has a less technical flavor and you might not learn anything of it unless you seek it.

Knowing how people reason and interact with user interfaces is a crucial skill, the real differentiatior, that you’d better acquire if you want to become a pro. I cannot stress this point enough. Visualization, as any other user interface, happens in people’s mind, not in the computer! And if you want to design great ones you’d better learn how people’s mind work.

There is only one book I feel like suggesting as a starting point: the brilliant, super-practical, and freely-available Task-Centered User Interface Design.
Other Learning Resources

Unfortunately, other than the books I mentioned above, there are not many other sources from which you can really learn something. But luckily there are some few notable exceptions! Tamara Munzner and Jeff Heer, top-researchers in the field, share the material of their courses freely on the web and you should not miss them for any reason:

1) Tamara Munzner’s InfoVis Course Slides at University of British Columbia
2) Jeff Heer’s InfoVis Course Slides at Stanford University

These are university courses, with a specific target, but I cannot think of a more carefully and better organized set of information covering the whole theory and practice of information visualization. What is really unique in these courses and their material is the way this information is organized. Information visualization is still a young discipline and nobody really agrees yet on the content and order to use when teaching it. These two courses found in my opinion the perfect balance between coverage and organization.

Another great source for learning data visualization are Stephen Few’s articles and white papers, which teach a whole lot of fundamental data visualization skills with his usual concise and effective style.
Can I get all the knowledge I need with these books? No.
And there are two main reasons. First of all, one of the biggest and surprising gaps I see in the current literature is a book that teaches systematically how to design a visualization from scratch. I am really surprised. Ben Fry in his Visualizing Data has a few elements of it, but since the book essentially teaches also how to use Processing the whole thing is a bit too diluted. Apart from that, I am not aware of any book that fills this gap (please let me know in case you know one).
A second issue is that there is no book that can really teach you to be a great data visualization designer. The only way to become an expert is to actually design your own stuff and iterate over and over on it until you perfect your skills. Studying and reflecting is important, but doing is equally, if not more, important. The two things complement and enrich each other.
Conclusion

That’s all folks. I really hope this series will be useful to you. Let me know what you think and if it helps. Also, I’d really love if you could enrich it with your suggestions. You can write comments below or send message to me on twitter at @FILWD.

Please do not forget to share this with your friends or people who might benefit from it. Its main purpose is to let you guys become better data visualization experts. Help me to spread the word around.


Enrico Bertini - Dataviz Beginner’s Toolkit #1: Books and Other Resources

Enrico Bertini - The Data Visualization Beginner’s Toolkit #2: Visualization Tools

This is a valuable post I found via the blogfellinlovewithdata
It’s a guide to the tools most useful when starting out in dataviz and some context around them.

Staple Data Visualization Tools

Staple data visualization tools are tools with which you cannot go wrong. These are the tools I feel confident to suggest, especially if you are starting out. Of course, this list is very personal and you might find other tools you like. As I said above, if you are in love with a tool go with it. But if you don’t know where to start this list is a very safe bet.

Processing

Processing is the mother of all data visualization environments. Ben Fry and Casey Reas created it in 2001, out of their work at MIT, to help data designers create visualization sketches. Today it is one of the most established tool I can think of, maybe the most established. It has a huge user base and it has been used for every conceivable data visualization project (a lot for artistic purposes but for “serious” stuff too). The library is based on Java and this means that in order to use it you would need to learn at least bits of it. But, given the handy functions Processing provides this could also be considered a gentle introduction to the language itself.

If you are willing to write code, you want total freedom in terms of design, and a solid platform, I cannot think of anything better than Processing. You just need to download the software (it is totally free), give a look to the amazing learning material, and start writing code.

Processing does not have a rich set of user interface widgets but frankly I don’t think this is a too limiting factor. Interaction can be very smooth and if you need high performance you can always use OpenGL which is nicely integrated into the library. If you want to generate output for the web you can also use processing.js, which generates browser readable javascript code.

Big Pluses: totally free, lots of learning material, very flexible, lots of examples, can be extended with any java library available, can generate many kinds of output, can afford high performance through the OpenGL integration.
Few Minuses: it takes learning a new language if you don’t know Java, need to write code even for very simple charts, limited support for advanced user interface components, not conceived for the web.
Notable Examples: any project from Ben Fry | amazing “serious” bio-applications from Miriah Meyer.

R

If you have never heard of R, you are in trouble. I think there’s no way for a data professional to ignore it today. R is a programming language and environment and it is the de facto standard for anything concerning data crunching; visualization included. R is not a visualization tool, it is much much more. It comes with a standard and comprehensive library of data manipulation and statistical functions, plus a huge set of ever growing libraries available on the web.

Data visualization can be done by writing very simple statements with the standard graphics library it comes equipped with or with any of the additional libraries people use, like the fantastic ggplot2.

Normally people use it through the standard console where you write your statements to process data and generate graphics. While R certainly requires programming skills, technically you don’t necessarily need to write full programs, rather your need to write a few statements in the console. But the difference may become blurred.

If you are not too inclined to learning a full programming language like Java, going with R could be a good compromise. The big plus of learning R is that with a single tool you are able to cover the full data manipulation and transformation pipeline, which is not true with other tools mentioned here. Plus, knowing R for data manipulation is a terrific skill you would need anyway.

On the downside, R gives to you less flexibility in generating exactly the visualization you have in mind, if you are thinking of anything too fancy. Also, as far as I know, it is extremely limited if you want to generate custom interactive visualizations. As far as I know R is best to generate static charts out of your data.

It’s worth noticing that several people use to post-process the charts generated with R with programs like Illustrator to make the whole output a bit prettier (check out Visualize This from Nathan Yau if you want to know more). But don’t worry I have seen people doing incredible things with R and I am sure you can do the same with a bit of practice.

Big Pluses: the most established tool for data manipulation in the world, integrated statistical and data manipulation functions, can handle very big data, huge library for additional functions, huge community, good visualization defaults.
Some Minuses: need to write statements in a console to “draw” visualizations, not as flexible as a general-purpose programming language.

D3

D3 is the creation of Mike Bostock and Jeff Heer from Stanford. Its primary feature is to permit the creation of complex interactive data visualizations through very compact code that can be delivered through a web browser. It is based on javascript and svg and provides a number of handy functions that make constructing visualizations a lot easier.

Some of you might be surprised to see such a young technology included in my list of staple visualization tools, but D3 is not as new as you might think. Jeff Heer and Mike Bostock (later) are top-class researchers and they have been developing visualization libraries for a long time, always pushing the technology further (Prefuse, Flare, Protovis, D3). D3 in particular was born on top of the ashes of Protovis, a first attempt to create a visualization library in javascript.

A data visualization language that permits to design custom visualizations with a few lines of code, at the right level of abstraction yet powerful, with very good performance, and specifically designed to run directly on the web, is something that is going to stay with us for a while and it deserves a lot of consideration.

D3 already has aficionados everywhere, they just love the technology, and the documentation is pretty amazing. Also, people start showing off examples here and there so learning from others won’t be a problem.

If you are inclined to web programming, you like javascript (I personally have a strong idiosyncrasy with it), and are familiar with web technologies like css and svg, D3 could be just the right choice for you. I don’t have any experience with it but all my geek visualization friends are super-excited about it and they swear it is the best data visualization technology ever created.

Big Pluses: visualizations delivered directly through a web browser, compact code, good community size and excellent documentation.
Some Minuses: the code is a bit tricky and it requires some getting used to, it is not as diffused as other technologies (but this is going to change soon), it might be discontinued in the future the same way as Protovis was.
Notable Examples: Jan Willem Tulp’s Urban Water | D3 Examples Page

Tableau

Finally an advanced data visualization tool that non-programmers can use! Let me tell it right away: Tableau is one of the biggest things happened in visualization during the last years and I love it. It permits to load and display data in a number of seconds simply by dragging data fields in the view and pushing a few buttons here and there.

What is striking about Tableau is that, while it is not as flexible as a programming language, it allows for pretty sophisticated visualization designs. Also, thanks to its powerful interface it is possible to explore a very large number of designs in a snap.

It takes some times to get used to its internal model and mechanisms, but once you understand how it works it is incredibly fast and powerful. I have been using it for a while and it amazes me how easy it is to go from one view to another; which is especially important in the early stages of a visualization project.

Sure, the level of customization you can achieve with alternatives based on programming is not reachable with Tableau but you can do pretty sophisticated things and I cannot think of a single better tool if you decide not to write code.

Other features I love of Tableau are the possibility to export static and interactive dashboards and the ease with which it loads a very large number of data formats.

There is one huge spot however: Tableau is not free and it’s quite expensive. However, you can still use Tableau Public, which is a somewhat limited version of Tableau, devised to create visualizations that go directly on the web and it’s free. I know a lot of people who are using Tableau only through the public version and they seem to be happy with it.

Big pluses: can create visualizations in a snap, very easy to explore many alternative views of the same data, does not require programming, very large user base.
Some minuses: not as flexible as using a programming language, it’s expensive, takes some time to understand how it works.
Notable examples: Tableau Software’s visual gallery | Clearly and Simply’s Tableau Posts

Excel

Excel?! Yes Excel. You might be surprised to see it in the list of staple data visualization tools. I took me a long time to decide whether to include it or not. I’ve been consulting with trusted friends and pondered over it for a while and I came to the conclusion it deserves its own spot.

Why?

Because Excel is a standard and it’s everywhere. Plus, people have been doing pretty amazing stuff with it.

If you happen to work in an organization of any kind, chances are Excel is what everyone use and trust (I have seen it everywhere, especially working with my fellow biologists). This means that this is the material you have to work with, whether you like it or not. People are naturally skeptical about changes (and for a good reason!) so they won’t like you introducing a new technology just because you want to spread the data visualization wisdom.

Plus, Excel is a pretty amazing piece of software, which probably unfairly inherited the overall bad light Microsoft products have. Being able to use Excel to draw effective charts can be a tremendous asset for you; with the advantage of using an almost universal platform.

The main and biggest problem with Excel is getting rid of the defaults. They are crap, a perfect gallery of junk charts. But, once you lean how to bypass them you are in the realm of affective and advanced charts. You don’t believe me? Give a look to what Jorge Camoes and John Peltier are able to do with it. And hey, if you want to learn something about Excel be sure to read their web sites from top to bottom.

I think the choice of whether to invest on Excel or not is very much dependent on your situation. If you are totally free and independent, it might not be the right choice, but if you expect to work within the constraints of your organization or with clients in the BI area or similar, being able to work in the context of their comfort tool can be a huge advantage.

Big pluses: universal platform, everybody understand excel, practically free, easy to go from data to chart, integrated with the spreadsheet functionalities.
Some minuses: the defaults are crap, harder to go beyond standard charts, slow with big data.
Notable examples: anything from Excel Charts gurus Jorge Camoes and John Peltier.
Read the rest of the post
(Note: if you are new to this series, the DVBTK doesn’t teach you how to do visualization. Rather it is meant to help people find a less chaotic and more effective path towards the acquisition of the necessary skills to become a data visualization pro. To know more, make sure to read the introduction to the series first.)

The DVBTK #1 introduced books and study material to make sure you acquire the right knowledge in the right order. Studying is the first step and there’s no level of practice that can substitute for it.

That said, it is extremely important to realize that good visualization cannot happen without practice. It’s not only that practice is a necessary complement to theory, but also that you will understand the theory only once you apply it for real.

But if you want to do visualization you need some tools right? Right. And again the web is a jungle and you might have troubles understanding what is the tool for you. You probably have heard a thousand names and acronyms but you cannot really decide; there are too many choices and too little guidance.

Here is the guidance. In the following, I propose a number of rules and factors you need to take into account when choosing a visualization tool. Furthermore I introduce a number of “staple visualization tools”: established tools which you can make great visualizations with.

And there is more to come!

I felt you needed to know more about each tool, so I decided to interview (at least) one data visualization professional with proven and long-lasting experience with it. Be sure not to miss these interviews, I will be posting them during the next weeks. And of course be sure to send your remarks or questions in the comment below, so that I will be able to address them in the upcoming posts.

Golden Rules of Visualization Tools

First of all you need some fundamental rules.

Rule #1: No tool will turn you into a pro. I think I stressed this point already in the past but it’s worth going over it again. Given the rapid development of visualization technology you might be tempted to adopt the latest technology thinking that it will turn you into a pro. This is not the case. There is no tool that can make you a pro, unless you develop your theoretical and design skills accordingly and organically. A visualization designer is a great designer regardless the tool of choice. It’s basically the same as photography. The last digital reflex may take crisper shots but it won’t turn you into the next Ansel Adams.

Rule #2: First learn one single tool very well. Again, given the vast amount of choices you may make and the endless production of new technologies, you might be tempted to go after all of them. Don’t get me wrong, experimentation and exploration are great but what you need first is a tool that make you feel home, a safe place where you know you can always express yourself regardless the complexity of the idea you have in mind. Choose one tool (see below how) and learn it very well first, you won’t regret it.

Rule #3: Choose tools you are totally in love with. Don’t choose a tool because it’s cool and everybody use it, choose the one that makes you feel great, the one you can have an affair with. People give their best with tools when they are totally in love with them and just cannot stop exploring all their capabilities. If a tool doesn’t click, if you don’t crave to use it (at least at the beginning) it’s a bad sign, move on to the next one.

Let’s clear this out now: do you need to be a programmer?

Damn it!  I was almost going to take the safe route and write down a politically-correct and well-balanced answer but … sincerely? Yes, I think you need to be able to write code. I mean, of course you can get away without coding, and below I propose tools which do not require you to write code, but why the hell do you want to limit yourself to such an extent?

I get asked this question quite often and I came to the conclusion that the cost-benefit ratio is so skewed that I cannot see a reason why not coding. And the reason is not only in the benefit part of the ratio but also, and more importantly, in the cost. If you are scared by code it’s time for you to realize that writing code is nothing special and it’s not too difficult either. We all learned to write essays at school, and writing good ones is much more difficult than writing a few lines of code.

A large segment of our culture promoted this view that writing code (together with science and engineering in general) is the sole right of engineers and geeks. Hey you know what? I am terrible at technical things and yet I managed to get a PhD in Computer Engineering and I can write with code the things I have in mind. If I can do it, you can do it.

You don’t need to become a software engineer. The most complex stuff comes when you want to design and develop full applications with lots of interaction and many interconnected modules. But in most cases this is not what you are required to do, and in any case you can always acquire more advanced skills one you find that you need them.

So, choose a language, grab a copy of a good tutorial or book, and learn to code. And hey, why not learning it by doing visualization?! Some of the tools outlined below are just perfect for this purpose (especially Processing and its sketchbook approach). That’s a win-win situation.

How to choose the “right” tool

There is no absolute “right” tool. The best tool is the one you can do great thing with, the one you love. However, there are a number of factors to keep in mind when making your choice.

  • Maturity. Is the tool one of the latest fancy and coolest technology on the market with uncertain future or it has been used consistently and with success for quite some time? It’s not a strict rule, but if you bet on the latest technology chances are it will be abandoned in the future. This is especially true for visualization where technology is evolving very very rapidly. In doubt, go for the proven and trusted.
  • Community. If your tool doesn’t have a large and stable community of enthusiastic visualization people, it’s a bad sign. Every great tool has a big community and a community is the most important factor in learning. It doesn’t matter how good the documentation is, you are going to need some help (and inspiration) from others.
  • Documentation. That’s a very relevant and critical one. Good documentation is notoriously rare. To some extent a good community can alleviate the problems due to limited or bad documentation, but you don’t want to wait for a reply in a forum to move on in your project, especially at its very early stage.
  • Examples. There are two main reasons why examples are important. First, you can use examples as a reality check: if people are not producing great visualizations with your tool of choice there must be a reason. Second, having great examples around you is a perfect method to learn fast. Learning by example is extremely powerful and should always be used in conjunction with more structured material. I know people who learn only through examples and they are great!
  • Cognitive Fit. I cannot stress this one enough. You have to choose the best tool for YOU and this is a little bit like buying a suit: you have to feel comfortable and cool with it. If not, it’s not for you. The best tools are those with a low “friction factor”, that is, it is natural and easy for you to translate your ideas into pictures.
  • Target Platform. Not all tools are created equal in the way they produce their output. Some are specifically targeted to the web, some allows easy conversion to static documents, some allow for the creation of full desktop applications. You’d better make sure to clarify what kind of output you want to produce before making a decision.
  • Interaction and Performance. If you want to create interactive visualizations you have to make sure the tool you select allows for rich interaction. Also, when large data is involved you have to make sure your environment performs smoothly.

There is more to come: interviews are on the way!

I hope the information I provided above will be sufficient to make a well-reasoned decision. In any case there is more material to come: I conducted for each tool at least one interview with a real expert who has a proven track of successful visualizations with the target environment. Stay tuned! I will be posting them in the upcoming weeks.

This series is meant to help you guys, so whatever doubt or question you have, feel free to ask by writing a comment below or sending a message on twitter or writing me an email directly. And please, if you find this post and the series useful don’t forget to share it with your friends. Thanks!

Take care,
Enrico.


Enrico Bertini - The Data Visualization Beginner’s Toolkit #2: Visualization Tools

Research Methods in Art - David Cross on Vimeo

The begining part of this video is really useful for my presentation I have to give at work. It gives and insight into research and how best to go about this within the MA. The image below was also explained and I will attempt to use this to  explain my process.

Throught this process I’m meant to be analysing the way I’m learning and remembering not to go back and forward without method or purpose as that can be really easy to do, so I’m posting this to remind me of that fact!

The video on Vimeo


Research Methods in Art - David Cross on Vimeo

Wolfram Alpha - and other personal informatics tools

A great tool to create a Facebook report!
I found it here amongst other personal informatics collection tools
Shame it loks like I have to go pro to be able to download it to use any time but still very handy to know about.

I don’t use Facebook much anymore and I thought it might be interesting to look back at my usage of it and how and when I did use it at the time it was most popular in my life.

It might be interesting to see what was going on and why I was active when I was.
I might also look into using Facebook again but in another capacity… hmm I doubt it! I just feel I’ve outgrown it… The only use of it might be for family or friends who are abroad but I’d rather call, email or whatsapp those people…
Will investigate. It seems such a rich platform for data that it seems a shame not to use it! LOL! Not a good enough reason!


Wolfram Alpha - and other personal informatics tools

My Twitter Feed