Big Data Tools

A data engineering plugin

Get Plugin

Big Data Tools

Roman Poborchiy, the Marketing Manager for the Machine Learning team, interviewed Pasha Finkelshteyn, a Big Data IDE Developer Advocate.

Everybody in IT works with data, including frontend and backend developers, analysts, QA engineers, product managers, and people in many other roles. The data used and the data processing methods vary with the role, but data itself is more often than not the key. — "It's a very special key, m…

Hello, fellow data engineers! It’s Pasha here, and today I'm going to introduce you to the new release of Kotlin API for Apache Spark. It's been a long time since the last major release announcements, mainly because we wanted to avoid bothering you with minor improvements. But today's announcement i…

In the first part of this blog series, I described basic dbt® concepts such as installation, creation of views, and describing models. I could have stopped there, but indeed, there are some drawbacks to only using views to build the whole transformation layer in our database. Sometimes we don't real…

For some time now, I’ve noticed that dbt® is gaining popularity. I’ve seen more questions and more success stories, so a couple of days ago I decided to try it out. But what exactly is dbt anyway? Here is the first phrase you can find in its documentation: “dbt (data build tool) enables anal…

The Kotlin API for Apache Spark is now widely available. This is the first stable release of the API that we consider to be feature-complete with respect to the user experience and compatibility with core Spark APIs. Get on Maven Central Let’s take a look at the new features this release bring…

This is the sixth installment of our ongoing series on Big Data, how we see it, and how we build products for it. In this episode, we’ll cover the PACELC theorem. It is an extension of the CAP theorem, which describes trade-offs in distributed systems that exist before partition happens. Big Dat…

This is the fifth installment of our ongoing series on Big Data, how we see it, and how we build products for it. In this episode, we’ll cover the CAP theorem. What is it? Is it correct? And why is it needed for data engineers? Big Data World, Part 1: Definitions Big Data World, Part 2:…

This is the fourth part of our ongoing series on Big Data, how we see it, and how we build products for it. In this installment, we’ll cover the second responsibility of data engineers: architecture.

This is the third part of our ongoing series on Big Data, how we see it, and how we build products for it. In this installment, we’ll cover the first responsibility of the data engineer: building pipelines.

In this part, we’ll talk about the roles of people working with Big Data. All these roles are data-centric, but they’re very different. Let’s describe them in broad brushstrokes to understand better who are those people we target.

This post is the first in a series about Big Data. It is aimed at telling you how we at JetBrains see Big Data, and consequently, how we're creating products for it. The world of big data can seem mysterious, hidden behind a curtain of unknown and weird words. It’s time to clear up this mystery and define Big Data.

Big Data Tools

Big Data Tools

Data Engineers Are Like Plumbers Who Install Pipes for Big Data

Why We Need Hive Metastore

Kotlin API for Apache Spark: Streaming, Jupyter, and More

dbt® deeper concepts: materialization

How I started out with dbt®

Kotlin API for Apache Spark 1.0 Released

Big Data World, Part 6: PACELC

Big Data World, Part 5: CAP Theorem

Big Data World, Part 4: Architecture

Big Data World, Part 3: Building Data Pipelines

Big Data World, Part 2: Roles

Big Data World, Part 1: Definitions