How We Did the New Jupyter Support: Interview with Anton Bragin
Data science and Jupyter Notebooks are obviously super-big in Python. PyCharm has had Jupyter support for several years but we needed to do a re-think, to better align our “IDE for Python Professionals” mission with Jupyter workflows.
PyCharm 2019.1 delivers that re-think with completely re-implemented Jupyter Notebook support. We’re proud of it and decided to interview Anton Bragin, our lead developer for Jupyter, to explain the why/what/how.
First, let’s get to know you. Tell us a little about yourself.
I joined PyCharm team year and a half ago. Before JetBrains I was involved both in science (mostly bioinformatics) and development (tools for process automation and data analysis.)
Let’s jump right in. What’s the big change and why should people care?
Scientific notebooks contain code and results of code execution. Sometimes they play well together – that’s why Jupyter is so popular – but sometimes the output can disrupt your vision of the code. So we decided to extract code from the notebook and present it separately, syncing code and the preview positions on the screen to synchronize code/execution results.
The result is still a single notebook IPYNB file, which is constantly updating from changes in the source code editor and outputs of code execution, and can be opened in the original Jupyter without any additional manipulation. There are no exports, no separate Python files to sync with the original notebook, etc.
Extracting the code into a separate editor allowed us to bring most of the PyCharm functionality into the notebook world, including code completion, inspections, type checking and more. So now it’s much easier to make your code shine.
Why does polished code matter?
Beautiful code is one of the pieces that makes data analysis reliable, maintainable and reproducible. But the new Jupyter support in PyCharm is not just about code editing, it’s also about revisiting the shortcomings of the whole concept behind Jupyter and Scientific notebooks.
Despite its tremendous popularity, Jupyter also brings some criticism within the data science community due to its limitations, such as out of order cell execution and hidden notebook state, making its usage difficult in the production-like environment (by production-like I mean every environment where results actually matter including science and technology instead of ad-hoc exploration.)
We believe that many of these limitations are not intrinsic to scientific notebooks and might be solved by proper tooling – the idea the new PyCharm Jupyter support is built around.
Take us back to the previous implementation. How was it done and what needed to change?
The original implementation tried to recreate the Jupyter layout in PyCharm. It’s pretty hard to bring rich web media output and complex code analysis tooling into a single set of desktop UI components. This caused some issues with performance and maintenance.
What’s different about the implementation of the new support?
Splitting into the code and the preview allows us to effectively reuse PyCharm and IntelliJ IDEA platform tooling for operation on code and do some complicated stuff like highlighting and code insight for multiple languages i.e. Python and Markdown in a single file. We have further plans to support cell magic languages to bring even more polyglot-style programming into the notebook user experience.
In the preview part, we have a small Single-Page web application that renders notebook media outputs by use of JavaFX WebView. This is technically easier than rendering every cell output separately and lets us share common resources e.g. for Markdown and LaTeX rendering libraries.
How does the new Jupyter support better deliver “IDE for Python Professionals”?
We kept in mind that scientific notebooks are most widely adopted for exploratory work, results-sharing, and presentation. Before presenting the results, some code polishing is usually done.
We tried to improve the UX in three areas:
- For the exploratory part we provide a full-featured graphical debugger using the same UI as for Python files, with the ability to step into declarations – something not available in Jupyter Notebook or JupyterLab web applications.
- Another feature is the variable view that reveals otherwise hidden kernel state – one of the Jupyter pain-points. With variable view the user can see variables and declarations defined in a running Python kernel which is a key aid in exploratory work.
- Powerful PyCharm Code Insight machinery could be used on a code polishing step. I will not describe it in detail since there are a lot of excellent tutorials and docs on how you could use it to improve your code.
For the final presentation of results it’s possible to hide the source code and display the preview only. Also we’re working on functionality for notebook sharing.
You and the team worked hard on this. What was this feature like to develop?
It was an interesting journey into the world of data analysis and scientific computations and a deep dive into IDE platform internals. Despite the fact that we inherited almost all key components from the Intellij IDEA platform, it was quite challenging to arrange them in a way that works for scientific notebooks.
We also tried to hear from field specialists, both data science professionals and newbies both from the company and outside right from the beginning. For example, we had a number of structured UX usability test sessions that helped us to find and fix some usability problems before the release.
Now that our Jupyter support is better tied to the IDE engine, what other work this year might we see?
There are still a lot of things we want to improve. First, we’re going to support working with remote Jupyter servers, including the debugger and variable view. Next, we want to add some Code Insight features that are specific for Jupyter notebooks. Then we will focus on some Jupyter shortcomings that are difficult to fix in a web application but can be mitigated in an IDE. Stay tuned.