Wading through the Big Data Hype
Is big data really the next big thing—or is it just a way for the big IT shops to scare their enterprise customers into buying more gear and services? While more than 85 percent of respondents to a recent IDG study agreed that big data offered business value, only 23 percent deemed their own projects successful. One begins to wonder if there is less than meets the eye.
Big data is not really new. Companies have been generating and mining vast amounts of information since the dawn of the computer age. While there are plenty of data warehouses in those traditional monolithic systems requiring intense care and feeding, more and more data today is gleaned from multiple distributed sources, formal structured databases, unstructured data feeds, and semi-structured object stores.
What has changed radically is how and where data is stored, consumed, and used. The driving force behind much of this change is the consumerization of the Internet, which generates massive amounts of data (zettabytes at last count). The data that can be reasonably and easily accessed is primarily used for consumer behavior forecasting, which allows for faster and more accurate responses to business opportunities.
H&R Block used it to address a persistent problem that every tax question not answered immediately was a lost sale. Other companies—Hertz, T-Mobile, and US Xpress to name a few—have taken advantage of the tools with good business results, which bodes well for the deployment of more well-executed big data analytic systems in the future.
Several problems need to be addressed before big data becomes just another standard IT tool. It has been estimated that more than 80 percent of the data is unstructured. Such data needs to be appropriately mapped before it can be used effectively. Add the fact that most applications and architectures—with a few well-known exceptions (HaDoop)—are not designed to capture, validate, or analyze all that diverse data.
Another issue is that despite the massive scale of the Internet, more than 96 percent of the interesting data is still locked inside corporate data vaults. IDC estimates that only 0.5 percent of all available data is being processed today. It rapidly becomes clear that just writing big checks to large IT vendors with expensive solutions to this systemic problem is not the correct answer. Like the complex multimillion dollar ERP system implementations twenty years ago, it takes time for best practices and supporting tools to shake out.
The trick is qualifying the validity of the data and building analytics applications that can capture the nuggets of actionable knowledge buried among all those Beiber and Lady Gaga tweets. Really, what buried marketing treasure is uncovered with the knowledge that Justin Bieber (#justinbieber) has 39 million Twitter followers? Quick, cash in on Bieber-branded anything before he becomes yet another footnote in pop music history.