As you know, we are fans (as well as purveyors) of data visualization. What you might not know is that we are also fans of politics, and our latest data visualization Politilines is a nice blending of the two.
After watching a few of the Republican Primary debates we were compelled to try to make sense of what everyone was actually saying. To do this, we decided to take a look at the words that were most often spoken during the debate, and then map to them to both the major issues, as well as the candidates from which they came.
A look at our process
To do this we gathered the debate transcripts from the American Presidency Project at UCSB and started parsing from there.
For every question asked of the candidates we segmented the transcript into a unique section, and from there we were able to tag each answer with the relevant issues.
The questions were categorized by hand for each debate, with each question having the potential of being associated with one or more issues. We discarded any non-issue related answers, such as a candidate asking for more time or other casual conversations.
After all of the questions were categorized, we removed any words that failed to give insight into the identified issue topics, such as “the”, “although”, “while”, “etc.” and other common stop words.
During our initial exploration we simply displayed the words that were spoken most often, and used that as the basis of the visualization. While that was interesting, it lacked content and context. When a candidate said “foreign” we weren’t able to identify whether they were talking about foreign people, or foreign aid, or something else.
To make the tool more useful, we then analyzed the transcripts by looking for common groups of words, like “United States”, “foreign aid”, “Federal Reserve” and others. Through this method we found much more insights into what the candidates were saying about the specific issues. It revealed groupings like “9 percent” and “health care.”
This is our Beta release of Politilines. In future releases we hope to add context for the frequency of the words spoken by each candidate, an analysis of the word’s intent, the ability to remove candidates or words, and a way to explore the most common words and issues in all of the debates.
The result right now is a pleasant little tool that allows you to draw connections in ways that watching in realtime, or reading a transcript, might not reveal. Check it out for yourself. We hope you feel the same way.
Bonus tech note: Politilines was developed in HTML5.
We’d love to know what you think. Please drop us a line, or leave a comment below. If you have a suggestion for a feature to incorporate into a future release, don’t be shy. We’d love to hear it.