Big Data Tools 2022.2 is here!
The highlights of this release include integration with Hive Metastore and the ability to monitor Flink jobs right inside your IDE, as well as SSO authentication on Amazon S3. The new version provides many other noteworthy changes that are covered below. Get the latest version by installing it to the 2022.2 of your IDE. Hive Metastore Integration We’ve added the ability to create a Hive Metastore connection from the IDE and browse Hive catalogs, tables, and columns. Big Data Tools now also provides code completion for Spark SQL based on Hive Metastore data. Apa
Kotlin API for Apache Spark: Streaming, Jupyter, and More
Hello, fellow data engineers! It’s Pasha here, and today I'm going to introduce you to the new release of Kotlin API for Apache Spark. It's been a long time since the last major release announcements, mainly because we wanted to avoid bothering you with minor improvements. But today's announcement is huge! First, let me remind you what the Kotlin API for Apache Spark is and why it was created. Apache Spark is a framework for distributed computations. It is usually used by data engineers for solving different tasks, for example for the ETL process. It supports multiple languages straight out
Big Data Tools 2022.2 EAP: What’s New?
Big Data Tools 2022.2 EAP is now available. You can try the newly added features right away by installing the latest plugin version to the 2022.2 EAP of your IDE. Please note this is an Early Access Program build, meaning it’s not fully tested. Hive Metastore support Ability to create a Hive metastore connection from the EMR cluster window and browse Hive catalogs, tables, and columns. Apache Flink Monitoring You can now monitor Flink applications right in your IDE. Just like in the Flink Dashboard, you can launch and stop jobs, all without leaving your IDE. Th
Data Engineering Annotated Monthly – April 2022
Long time no see! Sorry about the silence, but luckily we’re back. Hi, I'm Pasha Finkelshteyn, and I’ll be your guide through this month’s news. I’ll offer my impressions of recent developments in the data engineering space and highlight new ideas from the wider community. If you think I missed something worthwhile, catch me on Twitter and suggest a topic, link, or anything else you want to see. And please feel free to subscribe to this newsletter to get it in your email inbox every month. News A lot of engineering is about learning new things and keeping a finger on the pulse of new
Data Engineering Annotated Monthly – January 2022
Due to the public holidays in Russia and my own vacation time, I didn’t get a chance to write an Annotated for December. Waiting a little longer might not be such a bad thing in this case, because now we have even more interesting releases to talk about! Hi, I'm Pasha Finkelshteyn, and I’ll be your guide through this month’s news. I’ll offer my impressions of recent developments in the data engineering sector and highlight new ideas from the wider community. If you think I missed something worthwhile, you can find me on Twitter and suggest a topic, link, or anything else you want to see. If yo
Kotlin API for Apache Spark 1.0 Released
The Kotlin API for Apache Spark is now widely available. This is the first stable release of the API that we consider to be feature-complete with respect to the user experience and compatibility with core Spark APIs. Get on Maven Central Let’s take a look at the new features this release brings to the API. Typed select and sortMore column functionsMore KeyValueGroupedDataset wrapper functionsSupport for Scala TupleN classesSupport for date and time typesSupport for maps encoded as tuplesConclusion Typed select and sort The Scala API has a typed select method that returns Dataset
Big Data Tools Plugin for Apache Zeppelin
Zeppelin is a web-based notebook for data engineers that enables data-driven, interactive data analytics with Spark, Scala, and more. The project recently reached version 0.9.0-preview2 and is being actively developed, but there are still many things to be implemented. One such thing is an API for getting comprehensive information about what's going on inside the notebook. There is already an API that completely solves the problems of high-level notebook management, but it doesn’t help if you want to do anything more complex. That was a real problem for Big Data Tools, a plugin for IntelliJ