Adam Gartenberg's Blog

Business Analytics and Optimization, IBM and Social Marketing

Hadoop and "Twitalytics"

One of the announcements we made at IOD EMEA this week was around "big data" and support for Hadoop.  A few bloggers and I had a chance earlier this week to sit down with Marie Wallace, from the IBM Content Analytics Marketing team, who showed us a demo of one way to put this to work.

Marie has been working on applying content analytics for social analytics, and as such described herself as a consumer of this technology, and it was obvious that she was excited about the implications of the technology.

She demonstrated for us "Twitalytics," which as the name implies is analytics on twitter content "made easy."  It is made up of a combination of IBM Content Analytics technology, Hadoop and BigSheets (IBM technology that basically allows you to work with the data as if it were a spreadsheet... just one that can accommodate 100+ million rows of data).

The goal of the initiative she demonstrated for us was to be able to answer a question - say, "What products do people want to buy?"  (You could picture this being asked by a consumer goods or electronics manufacturer trying to decide what products to build next, or a retailer trying to determine which hot product they needed to make sure they had in stock for the holiday season.)

Once you've got your question and selected a data source (in this case, she picked Twitter, but it could also include multiple sources that could be consolidated together), it's time to consolidate the data and run the analysis model. This is where Hadoop comes in, as the massive parallelism allows results to be churned out very quickly.

That resulting data is then brought into BigSheets where it can be manipulated and visualized (via Many Eyes), and exported into Content Analytics for further analysis. (The results in this case might indicate a particular model of mobile phone, or a new laptop, etc. that are in high demand.)  Once developed, models like these can be set up to run automatically against new content as it comes in (Twitter never sleeps, after all... well, almost never), and it is also very easy to iterate and refine the models as you go.

Marie explained that the value of this approach comes in many forms:

  • It's very quick to build - the entire demo she showed us was built in just two days.
  • It's very quick to run (Content Analytics lets you rapidly model, while Hadoop lets you rapidly execute).  As a point of comparison, Marie described work done to analyze medical data for a large client where the model took 121 days to execute.  Here, the focus is on fast, responsive, real-time results.
  • It allows you to get insight from large amounts of data without just ending up with more noise.