Inside the Debugger: Interview with Elizaveta Shashkova
PyCharm 2017.1 has several notable improvements, but there’s one that’s particularly fun to talk about: debugger speedups. PyCharm’s visual debugger regularly gets top billing as a feature our customer value the highest. Over the last year, the debugger saw a number of feature improvements and several very impressive speedups. In particular, for Python 3.6 projects, PyCharm can use a new Python API to close the gap with a non-debug run configuration.
If you’ve been to PyCon or EuroPython and come by our booth, chances are you’ve seen Elizaveta Shashkova talking about the debugger to a PyCharm user, or giving a conference talk. Let’s talk to Liza about her work on PyCharm, the debugger, and her upcoming talk at PyCon.
Can you share with us a bit of your background and what you do on the PyCharm development team?
I started my career at JetBrains as a Summer Intern two and half years ago – I implemented a debugger for Jinja2 templates and Dmitry Trofimov (the creator of the PyCharm’s debugger) was my mentor. After that, I joined the PyCharm Team as a Junior developer and implemented a Thread Concurrency Visualizer under the supervision of Andrey Vlasovskikh, and my graduation thesis was based on it.
At the moment I’m supporting the Debugger and the Console in PyCharm.
People really like PyCharm’s debugger. Can you describe how it works, behind the scenes?
The debugger consists of two main parts: the UI (written in Java and Kotlin) and the Python debugger (written in Python). The most interesting part is on the Python side – the pydevd module, which we share with PyDev (the Python plugin for Eclipse).
We don’t use the pdb standard debugger for Python, but we implement our own debugger. At first glance, it’s quite simple, because it’s based on the standard sys.settrace() function, and in fact it just handles events which the Python interpreter produces for every line in the running script.
Of course, there are a lot of interesting frameworks or Python modules, where debugging doesn’t work by default, that’s why we add special support inside the debugger: for PyQt threads, for interactive mode in pyplot, for creation new processes, for debugging in Docker and others.
Note: A year ago we did an interview with the creator of PyDev about the funded debugger speedups.
The Cython extensions gave a big speedup. Can you explain how it works and the performance benefit?
Yes, Cython speedups were implemented by Fabio Zadrozny and they gave a 40% speed improvement for the debugger. Fabio found the most significant debugger bottlenecks and optimized them. Some of them were rewritten in Cython and it gave even more – a 140% speed improvement. Fabio has done a really great job!
On to new stuff. The upcoming PyCharm, paired with Python 3.6, gives some more debugger speedups, right?
Yes, as I’ve already mentioned, for Python interpreters before version 3.6 we used to use the standard Python tracing function. But in Python 3.6 the new frame evaluation API was introduced and it gave us a great opportunity to avoid using tracing functions and instead implement a new mechanism for debugging.
And it gave us a really significant performance improvement, for example, in some cases the debugger has become 80 times faster than it used to be, and it has become at least 10 times faster in the worst case. In some partial cases, it has become almost as fast as running without debugging.
We have so many users’ reports about debugger’s slowness, and now I hope they will be happy to try the new fast version of the debugger. Unfortunately, it’s available for Python 3.6 only.
What changed in Python 3.6 to allow this?
The new frame evaluation API was introduced to CPython in PEP 523 and it allows to specify a per-interpreter function pointer to handle the evaluation of frames.
In other words, it means that we can get access to the code when entering a new frame, but before the execution of this frame has started. And this means that we can modify the frame’s code and introduce our breakpoints right into bytecode: execution of the frame hasn’t started yet so we won’t break anything.
When we used the tracing function, the idea was similar: when entering a new frame we checked if there are any breakpoints in the current frame and, if they exist, continue tracing for every line in the frame. And sometimes it led to the serious performance problems.
But in the new frame evaluation debugger, this problem was solved: we just introduce the breakpoint into the code and the other lines in the scope don’t matter. Instead of adding an additional call to the tracing function for every line in the frame, with Python 3.6 we add just one call to “breakpoint”, and that means that the script under debugging runs almost as fast as without the debugger.
Congratulations on your Python 3.6 Debugging talk being accepted for PyCon. Who will be interested in your talk?
This talk will be interesting for people who want to learn something new about the features of Python 3.6. Also, it will be useful for people who want to learn yet another reason to move to the Python 3.6: a fast debugger, which should appear in many Python IDEs.
Moreover, after the talk people will understand, how the PyCharm’s debugger works, and why such fast debugging wasn’t possible in the previous versions of Python.
This talk is for experienced Python developers, who aren’t afraid of calling Python’s C API functions and doing bytecode modifications. :)
What is the next big thing in debugging to work on in the next year?
We have a rather old and important problem in the debugger related to the evaluation and showing big objects in the debugger. This problem has existed from the beginning, but it has become really important during the last few years. I believe it gained visibility due to the increased number of scientists who use PyCharm and work with big data frames. At the moment we have some technical restrictions for implementing this feature, but we’re going to implement it in the near future.