Big Data Tools

A data engineering plugin

Get Plugin

big data

In the first part of this blog series, I described basic dbt® concepts such as installation, creation of views, and describing models. I could have stopped there, but indeed, there are some drawbacks to only using views to build the whole transformation layer in our database. Sometimes we don't real…

For some time now, I’ve noticed that dbt® is gaining popularity. I’ve seen more questions and more success stories, so a couple of days ago I decided to try it out. But what exactly is dbt anyway? Here is the first phrase you can find in its documentation: “dbt (data build tool) enables anal…

The Kotlin API for Apache Spark is now widely available. This is the first stable release of the API that we consider to be feature-complete with respect to the user experience and compatibility with core Spark APIs. Get on Maven Central Let’s take a look at the new features this release bring…

This is the sixth installment of our ongoing series on Big Data, how we see it, and how we build products for it. In this episode, we’ll cover the PACELC theorem. It is an extension of the CAP theorem, which describes trade-offs in distributed systems that exist before partition happens. Big Dat…

This is the fifth installment of our ongoing series on Big Data, how we see it, and how we build products for it. In this episode, we’ll cover the CAP theorem. What is it? Is it correct? And why is it needed for data engineers? Big Data World, Part 1: Definitions Big Data World, Part 2:…

This is the fourth part of our ongoing series on Big Data, how we see it, and how we build products for it. In this installment, we’ll cover the second responsibility of data engineers: architecture.

This is the third part of our ongoing series on Big Data, how we see it, and how we build products for it. In this installment, we’ll cover the first responsibility of the data engineer: building pipelines.

In this part, we’ll talk about the roles of people working with Big Data. All these roles are data-centric, but they’re very different. Let’s describe them in broad brushstrokes to understand better who are those people we target.

Big Data Tools plugin for version 2021.1 of IntelliJ IDEA Ultimate, PyCharm Professional, and DataGrip has been released. You can install it from the JetBrains Marketplace or from inside your IDE. The plugin allows you to edit Zeppelin notebooks, upload files to cloud filesystems, and monitor Hadoop…

Zeppelin is a web-based notebook for data engineers that enables data-driven, interactive data analytics with Spark, Scala, and more. The project recently reached version 0.9.0-preview2 and is being actively developed, but there are still many things to be implemented. One such thing is an API for…

Big Data Tools

big data

dbt® deeper concepts: materialization

How I started out with dbt®

Kotlin API for Apache Spark 1.0 Released

Big Data World, Part 6: PACELC

Big Data World, Part 5: CAP Theorem

Big Data World, Part 4: Architecture

Big Data World, Part 3: Building Data Pipelines

Big Data World, Part 2: Roles

Big Data Tools Update Is Out: Experimental Python Support and Search Function in Zeppelin Notebooks

Big Data Tools Plugin for Apache Zeppelin