IP EXPO Europe. Powering the Digital Enterprise. 8-9 October 2014 ExCeL London. Register Now www.ipexpo.co.uk. Incorporating Cyber Security Expo and Data Centre Expo

Editorial & Analysis

About the author

Doug Cutting

Doug Cutting

Doug (@cutting) is the creator of numerous successful open source projects, including Lucene, Nutch and Hadoop. Doug joined Cloudera in 2009 from Yahoo!, where he was a key member of the team that built and deployed a production Hadoop storage and analysis cluster for mission-critical business analytics. Doug holds a Bachelor’s degree from Stanford University and sits on the Board (and is currently chairman) of the Apache Software Foundation.

Expert Opinion: Big Data is ‘big’ in more ways than one

14 Nov 2012

Over the last 25 years, there have only been a handful of truly transformative movements in IT - and Big Data is one of them, says Doug Cutting, chief architect at Hadoop specialist Cloudera.

Truly transformative movements in IT don’t come along very often. The bar is a high one, with past examples including the advent of e-business/e-commerce, the continually falling cost of storage and computer processing power, and fairly recently, the “consumerisation” of IT, driven by mobile and cloud computing. Yes, Big Data is as Big as these.

I don’t expect you to blindly accept that fact, however, so let’s take the necessary first step of defining what ‘Big Data’ is, and what it isn’t. A new report called ‘Demystifying Big Data: A Practical Guide to Transforming the Business of Government’ , which was prepared by the TechAmerica Foundation for the US Government’s Federal Big Data Commission, is instructive. It uses this definition: “Big Data is a phenomenon defined by the rapid acceleration in the expanding volume of high velocity, complex, and diverse types of data.” (In some circles these qualities are otherwise known as ‘the Three V’s’, for volume, velocity, and variety.)

In regard to “expanding volume,” the report points out that in 2011, 1.8 zetabytes of data were created by businesses and governments globally, an amount that is predicted to double annually. It dwarfs the mere 5 exabytes that would contain every word ever spoken, by every human being, in history. Although some people may quibble with the precise figures, almost everyone would agree that we are producing data at an unprecedented, and accelerating, rate.

The qualities of high velocity, complexity, and diversity are the key factors, however. It’s simply a fact today that, thanks to the Internet and mobile web, most of the data we produce is non-transactional - it doesn’t involve names, dates, amounts, and locations that can be neatly stored in relational columns and rows - and we produce it at alarming speed. The relational database management system, which eats transactional data for lunch and has been the foundation of IT for decades, is simply not designed to support these qualities in any scalable fashion. At a certain point, the relational model breaks down, limiting the amount of data that can be utilized.

So, if you ever need to explain Big Data during an elevator ride, you could say that it involves the process of getting value out of as much data as possible, regardless of its size, format or structure.

Big Data got its start in the consumer web, where the collection of huge data volumes created these challenges early on, but now it’s finding its way into other industries, too. For example, Chevron is using Big Data to identify more productive locations for deep-sea drilling, and a US company called Explorys is helping American patients receive better health care at a lower cost.

Big Data is also making a big impact in public sectors; the USA Search project is helping US government organisations better understand the information that its citizens are looking for, and connecting them with appropriate government services.

Whatever the industry, these advances are made possible by a new ‘data science’ mindset: instead of using highly sophisticated algorithms on limited data sets, as they did in the past, users are now applying simpler analytics to as much detail data as they can get their hands on.

Based on the impact that Big Data is already making, one could argue that more organisations have embraced Big Data than cloud computing, which gets far more attention. Look for that trend to continue.

Related articles:

HP, Cloudera to join forces on big data appliance

Curious about Big Data?

Tackling the 4 major impacts of big data

How visualisation uncovers the big picture of Big Data

Hot IT Trends

blog comments powered by Disqus