Company

Visit jetbrains.com

Articles Big Data Tools

Big Data World, Part 1: Definitions

Pasha Finkelshteyn

Read this post in other languages:

This post is the first in a series about Big Data. In it, we’d like to tell you how we at JetBrains see Big Data, and consequently, how we’re creating products for it.

Next parts:

Table of contents:

What is Big Data?
Customers
Conclusion

The world of big data can seem mysterious, hidden behind a curtain of unknown and weird words. It’s time to clear up this mystery and define Big Data.

What is Big Data?

As any term that has been overhyped at some point, the term “Big Data” has become convoluted with vast meaning. I will use the three definitions that I feel are most accurate:

Data that won’t fit the node’s memory

This is dependent on each piece of hardware, so we can’t define a universal, static value for what constitutes “big data”. I remember my ancient Intel 80386 – its 16 MB memory meant that anything more than 8 MB would be classed as “big data”.

100 MB of data looks small now, but it was considered huge in the past and required sophisticated algorithms to process.

Today, Big Data is much bigger in absolute terms, but still requires sophisticated processing, distributed computing, and special storage formats.

Data that scales on 3V

3V (pronounced as triple-v) stands for Volume, Velocity, and Variety. Scaling on 3V means that you won’t have to re-architecture your storage, jobs, and processes if volume, velocity, or variety will grow, say, ten times.

It’s hard to say what “ten times” means in terms of variety, but data tends to change frequently and rapidly in terms of form and velocity.

As you might have guessed, this definition is primarily determined by software.

Enough data to make reliable business decisions

Let’s not forget why data, big or small, matters in the first place – to do business. Taking this into consideration, defining “Big Data” in terms of business applications is useful.

Successful businesses are almost always data driven, and usually focus on making business reliable, predictable, and consistent. Doing these things well, however, requires more data than merchants had during, say, the Middle Ages. The modern business model, user-centric, and working with each person differently, is not possible without large amounts of data

For example, most big e-commerce companies have huge clickstreams (streams of user-generated events) based on marketing that predicts which goods will be more popular than others.

Customers

Now that we understand what “Big Data” is, let’s try to understand who the consumers are.

There are three main categories of internal customers:

Management
Marketing
Analysts

Management needs reports to understand what’s going on in the company, improve existing plans, and create new plans.

Product managers want to improve their products through experimentation and need data to analyze the results of experiments and propose new ideas.

Marketing needs data to analyze marketing metrics, such as COA (cost of acquisition), LTV (lifetime value), and so on. They also need data to build successful marketing companies.

Conclusion

This is how we understand what big data is and who consumes the results of working with big data.

At JetBrains, our main projects for big data include:

In the next post I’ll define what kinds of professionals work with data and what qualifications they need.

If you would like to read more posts like this, please do not forget to subscribe to our blog. Please let us know what you think here in comments or in our Twitter.

JetBrains Toolbox 2021.1이 출시되었습니다. 모든 도구 업데이트 요약 이제 JetBrains 개발자 인정 프로그램에 Gradle Fellows가 포함됩니다

Discover more

빅데이터, 빅데이터를 바라보는 JetBrains의 관점 및 관련 제품 개발과 관련하여 연재 중인 시리즈의 4번째 게시물입니다. 이번 글에서는 데이터 엔지니어의 두 번째 업무인 아키텍처를 살펴볼 예정입니다. 관련 게시물: 빅데이터의 세계, 1부: 정의 빅데이터의 세계, 2부: 직무 빅데이터의 세계, 3부: 데이터 파이프라인 구축 이 게시물 빅데이터의 세계, 5부: CAP Theorem 목차: 스토리지 아키텍처 데이터 처리 아키텍처 기술 스택 결론 빅데이터의 세계, 2부: 직무에서 논의된 바와 같이 데이터 엔지니어의 역할은 데이…

빅데이터와 관련한 JetBrains의 관점 및 제품 개발 방식을 다루는 시리즈의 세 번째 게시물입니다. 데이터 엔지니어의 가장 중요한 업무인 파이프라인 구축에 대한 글입니다.

빅데이터 관련 시리즈의 두 번째 게시물입니다. 이번 글에서는 빅데이터를 사용하는 사람들의 직무를 살펴볼 예정입니다. 모든 빅데이터 직무는 데이터가 중심이지만 직무별로 상당히 큰 차이가 있습니다. 대상 직무를 보다 잘 이해할 수 있도록 주요 포인트를 살펴보겠습니다. 빅데이터의 세계, 1부: 정의 이 게시물 빅데이터의 세계, 3부: 데이터 파이프라인 구축 빅데이터의 세계: 4부. 아키텍처 빅데이터의 세계, 5부: CAP Theorem 목차 데이터 엔지니어 데이터 과학자 머신 러닝 엔지니어 결론 데…

Company

Big Data World, Part 1: Definitions

What is Big Data?

Data that won’t fit the node’s memory

Data that scales on 3V

Enough data to make reliable business decisions

Customers

Conclusion

Subscribe to JetBrains Blog updates

Discover more

빅데이터 세계, 4부: 아키텍처

빅데이터의 세계, 3부: 데이터 파이프라인 구축

빅데이터의 세계, 2부: 직무