Debugger Interview with PyDev and PyCharm
PyCharm’s visual debugger is one its most powerful and useful features. The debugger got a big speedup in the recent PyCharm, and has an interesting backstory: JetBrains collaborated with PyDev, the popular Python plugin for Eclipse, and funded the work around performance improvements for the common debugger’s backend.
To tell us more about the improvements as well as cross-project cooperation, we interviewed the principles: Fabio Zadrozny, creator of PyDev, and Dmitry Trofimov, Team Lead for PyCharm.
Let’s jump right into it. Tell us about the speedups in the latest pydevd release.
FZ: The performance has always been a major focus of the debugger. I think that’s actually a requisite for a pure-python debugger.
To give an example here: Python debuggers work through the Python tracing facility (i.e.: sys.settrace), by handling tracing calls and deciding what to do at each call.
Usually a debugger would be called at each step to decide what to do, but pydevd is actually able to completely disable the tracing for most contexts (any context that doesn’t have a breakpoint inside it should run untraced) and re-evaluate its assumptions if a breakpoint is added.
Now, even having the performance as a major focus, the latest release was still able to give really nice speedups (the plain Python version had a 40% speed improvement overall while the Cython version had a 140% increase).
I must say that at that point, there weren’t any low-hanging fruits for speeding up the debugger, so, the improvement actually came from many small improvements and Cython has shown that it can give a pretty nice improvement given just a few hints to it.
DT: The performance of the debugger was one of the top voted requests in PyCharm tracker. The latest release addresses this by implementing some parts of the debugger in Cython, which leads to huge performance improvements on all type of projects.
Was the Cython decision an easy one?
FZ: Actually, yes, it was a pretty straightforward decision…
The main selling point is that the Cython version is very similar to the Python version, so, the same codebase is used for Cython and plain Python code — the Cython version is generated from the plain Python version by preprocessing it with a mechanism analogous to #IFDEF statements in C/C++.
Also, this means that with the same codebase it’s possible to support CPython (which can have the Cython speedups) while also supporting Jython, PyPy, IronPython, etc. I even saw someone post about the debugger being used in a javascript implementation of Python.
DT: The idea was to make the debugger faster by rewriting the bottlenecks in C, but at the same time optional to have any compiled binaries, so that pure Python version would still work. Also, it was desirable to have as little code duplication as possible. Cython let us do all that perfectly, so it was a natural decision.
Let’s take a step back and discuss the 2014 decision to merge efforts. How did this conversation get started?
FZ: I was doing a crowdfunding for PyDev which had a profiler as one of its main points, which was something that PyCharm wanted to add too. Although the initial proposal didn’t come through, we started talking about what we already had in common, which was the debugger backend and how each version had different features at that point. I think PyCharm had just backported some of the changes I had done in the latest PyDev version at that time to its fork, and we agreed it would be really nice if we could actually work in the same codebase.
DT: We have used the fork of Pydev debugger since the beginning of the PyCharm and occasionally I would check what was going in Pydev branch to backport features and fixes from there to PyCharm. Meanwhile, Fabio does the same, taking the latest fixes from PyCharm branch. As time passed and branches diverged, it was getting more and more difficult to compare the branches and backport fixes from one another.
After one of the tough merges, I thought, maybe we’d better create a common project that would be used in both IDEs. So I decided to contact Fabio and was very happy when he supported the idea.
Did the merging/un-forking go as you planned, or were there technical or project challenges?
FZ: The merging did go as planned…
The main challenge was the different feature set each version had back then. For instance, PyDev had some improvements on dealing with exceptions, finding referrers, stackless and debugger reload, whereas PyCharm had things such as the multiprocessing, gevent and Django templates (and the final version had to support everything from both sides).
The major pain point on the whole merging was actually on the gevent support, because the debugger really needs threads to work and gevent has an option for monkey-patching the threading library, which made the debugger go haywire.
DT: The main challenge was to test all the fixes done for the PyCharm fork of the debugger for the possible regressions in the merged version. We had a set of tests for debugger, but the coverage, of course, wasn’t 100%. So we made the list of all debugger issues fixed for the last 3 years (around 150 issues,) and just tested them. That helped us to ensure that we won’t have regressions in a release.
Fabio, how did it go on your end, having JetBrains sponsor some of your work? Any pushback in your community?
FZ: I must say I didn’t really have any pushback from the community. I’ve always been pretty open-minded about the code on PyDev (which was being used early on in PyCharm for the debugger) and I believe IDEs are a really personal choice. So I’m happy that the code I did can reach more people, even if not directly inside PyDev. Also, I think the community saw it as a nice thing as the improvements in the debugger made both, PyDev and PyCharm, better IDEs.
The Python-oriented IDEs likely have some other areas where they face common needs. What do you think are some top issues for Python IDEs in 2016 and beyond?
FZ: I agree that there are many common needs on IDEs — they do have the same target after all, although with wildly different implementations ;)
Python code in particular is pretty hard to analyze in real-time — which contrasts with being simple and straightforward to read — and that’s something all "smart" Python IDEs have to deal with, so, there’s a fine balance on performance vs. features there, and that’s probably always going to be a top issue in any Python IDE.
Unfortunately, this is probably also a place where it’s pretty difficult to collaborate as the type inference engine is the heart of a Python IDE (and it’s also what makes it unique in a sense as each implementation ends up favoring one side or the other).
DT: The dynamic nature of Python was always the main challenge for IDEs to provide an assistance to developers. A huge step forward was done with Python 3.5, by adding a type hinting notation and typeshed repository from which we will all benefit a lot. But still this thing is in its early stage and we need to define and learn effective ways to adopt type hinting.
Python performance is also a challenge. In the Python world, when you care about performance, you switch from using pure Python to libraries written in C, like numpy. Or you try pypy. But in both cases performance and memory profiling becomes hard or even impossible with current standard tools and libraries. I think that tool developers can collaborate on that to provide better instruments for measuring and improving the performance of Python apps.
What’s in the future for pydevd, performance or otherwise?
FZ: I must say that performance wise, I think it has reached a nice balance on ease of development and speed, so, right now, the plan is not having any regression ;)
Regarding new development, I don’t personally have any new features planned — the focus right now is on making it rock-solid!
DT: One of the additions to pydevd from the PyCharm side is the ability to capture the types of the function arguments in the running program. PyCharm tries to use this information for code completion, but this feature now is optional and off by default. With the new type hinting in Python 3.5 this idea gets a new spin and the types collected in run-time could be used to annotate functions with types or verify the existing annotations. We are currently experimenting only with types, but it could be taken further to analyse call hierarchy etc.