Big Data Tools Update Is Out: Experimental Python Support and Search Function in Zeppelin Notebooks
Big Data Tools plugin for version 2021.1 of IntelliJ IDEA Ultimate, PyCharm Professional, and DataGrip has been released. You can install it from the JetBrains Marketplace or from inside your IDE. The plugin allows you to edit Zeppelin notebooks, upload files to cloud filesystems, and monitor Hadoop and Spark clusters.
In this release, we’ve added experimental Python support and global search inside Zeppelin notebooks. We’ve also addressed a variety of bugs. Let’s talk about the details.
Experimental and preliminary Python support
Although PySpark in Zeppelin is getting a lot of attention, the Zeppelin web interface only offers a very simple form of auto-completion that contains a scattered variety of variables and functions. Zeppelin, of course, never promised to offer this feature. But in the IDE we want to offer a bit more support.
Adding support for a whole new programming language can be a challenging task. Luckily, our IDEs already have great Python support, provided either in PyCharm or the Python plugin for IntelliJ IDEA. We needed to take this ready-made functionality and integrate it inside Zeppelin. This task is not entirely straightforward, however, as it requires navigating a tremendous number of nuances specific to Zeppelin. We have to make decisions about how to gather all the dependencies, how to make sure the version of Python is correct, and other similar problems.
In EAP 12, we introduced full Python highlighting in our Zeppelin notebook editor that can also mark obvious syntax errors. You can jump to the definition of a variable or function if it is declared inside the notebook. You can do the usual refactorings like rename or change signature. Zeppelin-specific tables and charts work, too. After all, people often use Zeppelin just for the charts.
Of course, there are still many things to be done. For example, we would really like to have smarter code completion for Spark API functions and other external code. Our auto-completion is currently only good for the code inside the notebook. This is still a work in progress, but we have good implementation ideas that we will try in the next releases. At the moment we can offer a fully functional Python editor, but that’s all. Python support is pretty experimental, but a journey of a thousand miles begins with a single step. In any case, the features that are already available may be enough to make it possible to write code in your favorite IDE instead of switching to the Zeppelin web interface.
Mixing multiple languages
Sometimes, you want to mix Python and Scala code in the same notebook, and Big Data Tools makes this possible. For example, this can be useful for performance reasons in computational tasks.
And yes, it’s entirely possible to mix different languages. But keep in mind that you will need IntelliJ IDEA with the Scala and Python plugins installed for full Scala support. In PyCharm, this Scala code will be executed too, but editor features will be on a plain text level.
Search in Zeppelin notebooks
It has always been possible to use search to determine which file on disk contains the text we need (for example, using the Find in Path feature, Ctrl+Shift+F). On the other hand, this standard search interface does not work with notebooks because they are not files!
In version 12 we’ve added a separate notebook search bar. Just open the Big Data Tools panel, select any of the Zeppelin connections, and click the button with the magnifying glass icon (or use the Ctrl+F keyboard shortcut). As a result, you will be taken to a window called Find in Zeppelin Connections. Activating one of the search results will open that notebook and take you to the desired paragraph.
The same feature exists in Zeppelin, too. Under the hood, we use the standard Zeppelin HTTP API to implement this. Our search results should be identical to what you see in the Zeppelin web interface.
The function is small but extremely useful. How did we live before this?
Bug fixes and small improvements
The Big Data Tools plugin is actively being developed, and rapid growth inevitably comes with some problems. In this release, we have done a lot of work on remote storages and rendering graphs and paragraphs, in addition to redesigning much of the UI (for example, SSH tunnels). We’ve redefined some system things. For example, several projects now share a single connection to Zeppelin. And of course, we removed a lot of bugs in unexpected places. Overall, the plugin should be much more enjoyable now.
If you are interested in an overview of the most significant improvements, you can find them in the What’s New section of the plugin page. If you’re looking for a specific issue, check out the full report on YouTrack.
Thank you for using our plugin! You can upgrade to the latest version either from your browser, from the plugin page, or inside the IDE. On the plugin page, You can also leave your feedback and suggestions on the plugin page. We always want to know what you think.
Documentation and Social Networks
And last but not least, in case you need help on how to use any feature of the plugin, make sure to check out the IntelliJ IDEA and PyCharm documentation. Still need help? Please don’t hesitate to leave us a message either here in the comments or on Twitter.
— The Big Data Tools plugin team