NDITech Adventures: Mining Big Data For Public Political Sentiment

Leveraging Multiple Public Content Sources on the Internet with Crimson Hexagon Allows Us to Gauge Public Opinion
The United Nations estimates that more than 2.7 billion people will be online this year. What if there was a way to leverage the power of 2.7 billion people to tell their stories about how they feel about political topics? NDI recently received a grant from Crimson Hexagon to use their ForSight platform to delve deep into the public Internet to get answers to complex political questions.  The platform is based on mathematical algorithms developed by a team of academics at Harvard.  As we are rolling out Crimson Hexagon's platform for specific programs we run, we wanted to test it on a topic we are keenly interested in.  So, we decided we wanted to know more about what the global discussions are relating to "Technology and Democracy." 
 
Forsight uses a powerful supervised machine learning algorithm developed by Harvard University’s Institute for Quantitative Social Science. It allows users to examine a huge volume of public information on the Internet such as tweets and Weibo posts, public Facebook posts, forums, Youtube, news, blogs, comments, and reviews -- all in nearly real time. Forsight identifies content by keyword associations and users "train" the algorithm by categorizing a small set of the found content into specific categories. Once trained, the platform scans the internet and its historical database for content that is matching the user-defined criteria and categorizes it according to the user’s specific schema.  Because Forsight allows users to go back in time it is very useful for learning more about the context of events, or other political and social phenomena, prior to their occurrence. 
The United Nations estimates that more than 2.7 billion people will be online this year. What if there was a way to leverage the power of 2.7 billion people to tell their stories about how they feel about political topics? NDI recently received a grant from Crimson Hexagon to use their ForSight platform to delve deep into the public Internet to get answers to complex political questions.  The platform is based on mathematical algorithms developed by a team of academics at Harvard.  As we are rolling out Crimson Hexagon's platform for specific programs we run, we wanted to test it on a topic we are keenly interested in.  So, we decided we wanted to know more about what the global discussions are relating to "Technology and Democracy." 
 
Forsight uses a powerful supervised machine learning algorithm developed by Harvard University’s Institute for Quantitative Social Science. It allows users to examine a huge volume of public information on the Internet such as tweets and Weibo posts, public Facebook posts, forums, Youtube, news, blogs, comments, and reviews -- all in nearly real time. Forsight identifies content by keyword associations and users "train" the algorithm by categorizing a small set of the found content into specific categories. Once trained, the platform scans the internet and its historical database for content that is matching the user-defined criteria and categorizes it according to the user’s specific schema.  Because Forsight allows users to go back in time it is very useful for learning more about the context of events, or other political and social phenomena, prior to their occurrence. 
 
Our test case was intended to help us gain better understanding into the "Tech4Dem" world. We wanted to know more about the nature of the conversation around technology and democracy and whether our field of work was viewed positively or negatively. To examine technology and democracy, first we needed to identify specific keywords that would pull in theright content associated with our research.  We determined that we needed the term “democracy” to be mandatory and the terms “cyber,” “technology,” “digital,” “networked,” and “connected.” to be non-mandatory but have at least one of these non-mandatory keywords included in each content piece analyzed. 
 
We then categorized a small subset of what this keyword search yielded into categories that we defined: “positive,” “neutral,” and “negative.” We also had a category for content pulled in that was considered irrelevant to our question. We 'trained' our monitor by assigning twenty pieces of content to each of our categories, teaching the machine-learning algorithm how to categorize content.  The monitor has been running now for three months months but we expanded the search to go back to the beginning of the year. 
  
  
The graph above shows public opinion as expressed through the available sources that Crimson Hexagon scans on the internet from May 31 - August 23.  A few things are noteworthy: First, we noticed that in June a disproportionate amount of negativity associated with technology and democracy that continues throughout much of the month. As it turns out this time frame coincided with Edward Snowden's revelations of the extent of NSA surveillance. Many of the articles, comments, tweets, and posts lamented the negative influence technology was having on democracy as a result of extensive state surveillance. 
 
Next we wondered if the NSA revelations would have a continued effect of  public sentiment associated with the terms technology and democracy.  The proportion of negative to positive content began to balance out and eventually sway in favor of positive with slight interruptions as additional information about the NSA were published.
 
The image to the left shows the topics of conversation across more than 1,000 pieces of content. There is definitely a section of topics concerned with technology’s negative effects on democracy, particularly in the United States, yet there were numerous other topics being talked about as well. 
 
Since our preliminary look at technology and democracy was not limited to the United States we also wanted to understand who was talking about technology and democracy and where they were talking. The image below shows an intensity map of the conversation around the world. We are, of course, only looking for content in English, but it is cldar that this is indeed a global conversation. 
While this was a rudimentary test of the platform, we are now diving much more deeply into how we can use Forsight in our work on political participation, good governance, and democracy support worldwide. On a hopeful note, we are seeing that altough events can substantially impact the conversation around our work, the general trend appears to be positive. This makes us hopeful that as the conversation around technology and democracy continues around the world we will find new and innovative ways for people to leverage technology tools for social and civic empowerment.