{"id":158178,"date":"2021-05-04T11:58:07","date_gmt":"2021-05-04T10:58:07","guid":{"rendered":"https:\/\/blog.jetbrains.com\/blog\/2021\/05\/04\/big-data-world-part-1-definitions\/"},"modified":"2023-02-20T16:57:49","modified_gmt":"2023-02-20T15:57:49","slug":"big-data-world-part-1-definitions","status":"publish","type":"blog","link":"https:\/\/blog.jetbrains.com\/ko\/blog\/2021\/05\/04\/big-data-world-part-1-definitions\/","title":{"rendered":"Big Data World, Part 1: Definitions"},"content":{"rendered":"\n<p>This post is the first in a series about Big Data. In it, we&#8217;d like to tell you how we at JetBrains see Big Data, and consequently, how we&#8217;re creating products for it.<\/p>\n\n\n\n<p>Next parts:<\/p>\n\n\n\n<ol><li>This article<\/li><li><a href=\"https:\/\/blog.jetbrains.com\/blog\/2021\/05\/13\/big-data-world-part-2-roles\/\" class=\"ek-link\">Big Data World, Part 2: Roles<\/a><\/li><li><a href=\"https:\/\/blog.jetbrains.com\/blog\/2021\/05\/20\/big-data-world-part-3-building-data-pipelines\/\" class=\"ek-link\">Big Data World, Part 3: Building Data Pipelines<\/a><\/li><li><a href=\"https:\/\/blog.jetbrains.com\/blog\/2021\/05\/27\/big-data-world-part-4-architecture\" class=\"ek-link\">Big Data World: Part 4. Architecture<\/a><\/li><li><a href=\"https:\/\/blog.jetbrains.com\/blog\/2021\/06\/03\/big-data-world-part-5-cap-theorem\/\" class=\"ek-link\">Big Data World, Part 5: CAP Theorem<\/a><\/li><\/ol>\n\n\n\n<p>Table of contents:<\/p>\n\n\n\n<ul class=\"is-style-default\"><li><a href=\"#what-is-big-data\" class=\"ek-link\">What is Big Data?<\/a><ul><li><a href=\"#wont-fit-memory\" class=\"ek-link\">Data that won\u2019t fit the node\u2019s memory<\/a><\/li><li><a href=\"#scales\" class=\"ek-link\">Data that scales on 3V<\/a><\/li><li><a href=\"#reliable-business-decisions\" class=\"ek-link\">Enough data to make reliable business decisions<\/a><\/li><\/ul><\/li><li><a href=\"#customers\" class=\"ek-link\">Customers<\/a><\/li><li><a href=\"#conclusion\" class=\"ek-link\">Conclusion<\/a><\/li><\/ul>\n\n\n\n<p>The world of big data can seem mysterious, hidden behind a curtain of unknown and weird words. It\u2019s time to clear up this mystery and define Big Data.<\/p>\n\n\n\n<h1 class=\"wp-block-heading\" id=\"what-is-big-data\">What is Big Data?<\/h1>\n\n\n\n<p>As any term that has been overhyped at some point, the term \u201cBig Data\u201d has become convoluted with vast meaning. I will use the three definitions that I feel are most accurate:<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"wont-fit-memory\">Data that won\u2019t fit the node\u2019s memory<\/h2>\n\n\n\n<p>This is dependent on each piece of hardware, so we can\u2019t define a universal, static value for what constitutes \u201cbig data\u201d. I remember my ancient Intel 80386 \u2013 its 16 MB memory meant that anything more than 8 MB would be classed as \u201cbig data\u201d.<\/p>\n\n\n\n<p>100 MB of data looks small now, but it was considered huge in the past and required sophisticated algorithms to process.<\/p>\n\n\n\n<p>Today, Big Data is much bigger in absolute terms, but still requires sophisticated processing, distributed computing, and special storage formats.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"scales\">Data that scales on 3V<\/h2>\n\n\n\n<p>3V (pronounced as triple-v) stands for Volume, Velocity, and Variety. Scaling on 3V means that you won\u2019t have to re-architecture your storage, jobs, and processes if volume, velocity, or variety will grow, say, ten times.<\/p>\n\n\n\n<p>It\u2019s hard to say what \u201cten times\u201d means in terms of variety, but data tends to change frequently and rapidly in terms of form and velocity.<\/p>\n\n\n\n<p>As you might have guessed, this definition is primarily determined by software.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"reliable-business-decisions\">Enough data to make reliable business decisions<\/h2>\n\n\n\n<p>Let\u2019s not forget why data, big or small, matters in the first place \u2013 to do business. Taking this into consideration, defining \u201cBig Data\u201d in terms of business applications is useful.<\/p>\n\n\n\n<p>Successful businesses are almost always data driven, and usually focus on making business reliable, predictable, and consistent. Doing these things well, however, requires more data than merchants had during, say, the Middle Ages. The modern business model, user-centric, and working with each person differently, is not possible without large amounts of data<\/p>\n\n\n\n<p>For example, most big e-commerce companies have huge clickstreams (streams of user-generated events) based on marketing that predicts which goods will be more popular than others.<\/p>\n\n\n\n<h1 class=\"wp-block-heading\" id=\"customers\">Customers<\/h1>\n\n\n\n<p>Now that we understand what \u201cBig Data\u201d is, let\u2019s try to understand who the consumers are.<\/p>\n\n\n\n<p>There are three main categories of internal customers:<\/p>\n\n\n\n<ol><li>Management<\/li><li>Marketing<\/li><li>Analysts<\/li><\/ol>\n\n\n\n<p>Management needs reports to understand what\u2019s going on in the company, improve existing plans, and create new plans.<\/p>\n\n\n\n<p>Product managers want to improve their products through experimentation and need data to analyze the results of experiments and propose new ideas.<\/p>\n\n\n\n<p>Marketing needs data to analyze marketing metrics, such as COA (cost of acquisition), LTV (lifetime value), and so on. They also need data to build successful marketing companies.<\/p>\n\n\n\n<h1 class=\"wp-block-heading\" id=\"conclusion\">Conclusion<\/h1>\n\n\n\n<p>This is how we understand what big data is and who consumes the results of working with big data.<\/p>\n\n\n\n<p>At JetBrains, our main projects for big data include:<\/p>\n\n\n\n<ul><li><a href=\"https:\/\/plugins.jetbrains.com\/plugin\/12494-big-data-tools\" target=\"_blank\" rel=\"noopener\">Big Data Tools for IntelliJ IDEA<\/a><\/li><li><a href=\"https:\/\/www.jetbrains.com\/dataspell\/\" target=\"_blank\" rel=\"noopener\">DataSpell<\/a><\/li><li><a href=\"https:\/\/www.jetbrains.com\/datagrip\/download\/#section=linux\" target=\"_blank\" rel=\"noopener\">DataGrip for Warehouses<\/a><\/li><li><a href=\"https:\/\/www.jetbrains.com\/datalore\/\" target=\"_blank\" rel=\"noopener\">Datalore<\/a><\/li><\/ul>\n\n\n\n<p>In the next post I\u2019ll define what kinds of professionals work with data and what qualifications they need.<\/p>\n\n\n\n<p>If you would like to read more posts like this, please do not forget to subscribe to our blog. Please let us know what you think here in comments or in our <a href=\"https:\/\/twitter.com\/BigDataToolsJB\" target=\"_blank\" rel=\"noopener\">Twitter<\/a>.<\/p>\n","protected":false},"author":1234,"featured_media":140366,"comment_status":"closed","ping_status":"closed","template":"","categories":[6659,594],"tags":[588,6697],"cross-post-tag":[],"acf":[],"_links":{"self":[{"href":"https:\/\/blog.jetbrains.com\/ko\/wp-json\/wp\/v2\/blog\/158178"}],"collection":[{"href":"https:\/\/blog.jetbrains.com\/ko\/wp-json\/wp\/v2\/blog"}],"about":[{"href":"https:\/\/blog.jetbrains.com\/ko\/wp-json\/wp\/v2\/types\/blog"}],"author":[{"embeddable":true,"href":"https:\/\/blog.jetbrains.com\/ko\/wp-json\/wp\/v2\/users\/1234"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.jetbrains.com\/ko\/wp-json\/wp\/v2\/comments?post=158178"}],"version-history":[{"count":7,"href":"https:\/\/blog.jetbrains.com\/ko\/wp-json\/wp\/v2\/blog\/158178\/revisions"}],"predecessor-version":[{"id":325773,"href":"https:\/\/blog.jetbrains.com\/ko\/wp-json\/wp\/v2\/blog\/158178\/revisions\/325773"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/blog.jetbrains.com\/ko\/wp-json\/wp\/v2\/media\/140366"}],"wp:attachment":[{"href":"https:\/\/blog.jetbrains.com\/ko\/wp-json\/wp\/v2\/media?parent=158178"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.jetbrains.com\/ko\/wp-json\/wp\/v2\/categories?post=158178"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.jetbrains.com\/ko\/wp-json\/wp\/v2\/tags?post=158178"},{"taxonomy":"cross-post-tag","embeddable":true,"href":"https:\/\/blog.jetbrains.com\/ko\/wp-json\/wp\/v2\/cross-post-tag?post=158178"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}