Big Data World: Part 1. Definitions.

This post is the first in a series about Big Data. It is aimed at telling you how we at JetBrains see Big Data, and consequently, how we’re creating products for it.

Next parts:

Big Data World: Part 2. Roles.

Table of contents:

The world of big data can seem mysterious, hidden behind a curtain of unknown and weird words. It’s time to clear up this mystery and define Big Data.

What is Big Data?

As any term that has been overhyped at some point, the term “Big Data” has become convoluted with vast meaning. I will use the three definitions that I feel are most accurate:

Data that won’t fit the node’s memory

This is dependent on each piece of hardware, so we can’t define a universal, static value for what constitutes “big data”. I remember my ancient Intel 80386 – its 16MB memory means that anything more than 8MB would be classed as “big data”.

100 MB of data looks small now, but it was considered huge in the past and required sophisticated algorithms to process.

Today, Big Data is much bigger in absolute terms, but still requires sophisticated processing, distributed computing, and special storage formats.

Data that scales on 3V

3V (pronounced as triple-v) stands for Volume, Velocity, and Variety. Scaling on 3V means that you won’t have to re-architecture your storage, jobs, and processes if volume, velocity, or variety will grow, say, ten times.

It’s hard to say what “ten times” means in terms of variety, but data tends to change frequently and rapidly in terms of form and velocity.

As you might have guessed, this definition is primarily determined by software.

Enough data to make reliable business decisions

Let’s not forget why data, big or small, matters in the first place – to do business. Taking this into consideration, defining “Big Data” in terms of business applications is useful.

Successful businesses are almost always data driven, and usually focus on making business reliable, predictable, and consistent. Doing these things well, however, requires more data than merchants had during, say, the Middle Ages. The modern business model, user-centric, and working with each person differently, is not possible without large amounts of data

For example, most big e-commerce companies have huge clickstreams (streams of user-generated events) based on marketing that predicts which goods will be more popular than others.

Customers

Now that we understand what “Big Data” is, let’s try to understand who the consumers are.

There are three main categories of internal customers:

  1. Management
  2. Marketing
  3. Analysts

Management needs reports to understand what’s going on in the company, improve existing plans, and create new plans.

Product managers want to improve their products through experimentation and need data to analyze the results of experiments and propose new ideas.

Marketing needs data to analyze marketing metrics, such as COA (cost of acquisition), LTV (lifetime value), and so on. They also need data to build successful marketing companies.

Conclusion

That’s how we see what big data is and who are the main consumers of the result of work with big data.

Our main projects for big data are:

In the next post I’ll define who are the people working with data and qualifications they need.

If you would like to read more posts like this, please do not forget to subscribe to our blog. Please let us know what you think here in comments or in our Twitter.