Curiosity-Driven Researchers: Between Industry and Academia
Timofey Bryksin, Head of Research Lab in Machine Learning Methods in Software Engineering at JetBrains, answered questions about the work his team does, the problems they try to solve, and other topics of interest to his lab.
This is the first installment of the three-part interview. Part two is now available.
What tasks does your lab handle?
We strive to leverage data-driven approaches that benefit the software development community and JetBrains in particular. Our focus is broad, and we look for ideas to improve existing products and the ways teams work. When developing software, a lot of data is generated – some useful and some less so – and based on this data, we can uncover patterns in how people write code. These patterns help us understand how certain processes work and how best to use them for something good.
Can you provide any examples?
Last year, we became interested in understanding the psychological aspects of software development processes. As the IDEs evolved, people rarely wondered why functionality was added in a particular way. As a result, over time the IDEs became quite overloaded with various features. No one had ever conducted research on how convenient this was – people just got used to things being this way, even if it wasn’t ideal.
Does the environment have a significant effect on the code itself?
There’s a known effect where people stop making their usual mistakes when they switch from a black marker on a whiteboard to a white marker on a blackboard. We wanted to see how different environmental factors, particularly UI complexity, affect the way people write code.
We had two, somewhat conflicting hypotheses. The first hypothesis stated that, the more complex the UI, the more stress a person experiences because they need to focus and search for the right button. The other stated that, the more complex the UI, the faster things would get done, because you would have more tools at your fingertips.
People with shorter attention spans might get distracted by the lack of elements. There would be fewer things on the screen grabbing their attention and keeping them “occupied”.
We looked into two different types of tasks: complex tasks that require a lot of thinking and routine tasks that can easily be performed without too much thought – but require a lot of concentration. We compared how people worked through these tasks in two different UIs: in a regular, feature-heavy interface, and then in Zen mode with no buttons or panels.
Is this measurable?
Yes, and from the data we gathered, we’ve seen that people tend to work faster in low-clutter UIs. This can be quite beneficial as it allows people to use existing tools more efficiently.
Does JetBrains have a team of psychologists?
Yes, we do have a team of psychologists who are involved in UI/UX and specific cognitive psychology research (these are subteams of a larger Market Research team). However, our approach is different. We rely more on quantitative research, where we give people a tool, see how they use it, and calculate correlations and statistical significance. The UI/UX and cognitive research teams tend to lean towards qualitative research and asking people what they like better (doing more interviews, diary and regular UX studies). One research method doesn’t cancel out the other, but together we can get some really interesting results. We’ve worked with them before, but now it seems like we’ll be collaborating much more.
Since the lab was established in 2016, how have your tasks changed?
We understood that the emerging application of machine learning to software engineering tasks was critically important for JetBrains. We don’t want to be stuck in the past, still making typewriters after the invention of the computer, so to speak. The idea remains the same: if (or should I say when) there comes a time when code can be written with the click of a button by neural networks rather than developers, a company that creates software development products will have the necessary knowledge and means to take advantage of this technological breakthrough.
But do you personally believe that software development will be so well automated that some developers will no longer be needed?
There’s a meme on the internet that goes like this: They invented the assembler? Soon, developers won’t be needed. They invented fourth-generation programming languages? Soon we won’t need developers. And now UML? You get the point. After all, what does a human do? Their main goal is not to press buttons, but to design the system correctly, consider all edge and corner cases, and think through all the scenarios of how the system will work and interact with the user. That is to say, even if you have a black box that you can just “toss” your thoughts into, and it spits out code – someone still needs to come up with the input idea.
Doesn’t this concept resemble that of UML?
Indeed! The initial proponents of the idea phrased it quite nicely: “We won’t be writing code – we’ll be designing it, and then implementing it.” The UML developers suggested that their visual models could do the first part. But the idea of “not writing code” was brought to people who have been writing code all their lives, who love their job, and want to keep doing it. Of course, they were not receptive to it.
Is there another way?
There’s a book called The Structure of Scientific Revolutions, which states that it is never the case that one thing ends and another begins – there is always some overlap. One wave rises, the other rolls back. Until the second one is fully formed, the first one is still relevant. If UML technology was taught in the first year at universities, and if people were taught to believe that you don’t need to write code because tools do everything for you, then, in five years, there would be a generation of developers who fully believed that “only Boomers write code”. The industry would have changed, too, simply because this generation would be so accustomed to the new approach. That’s why you can’t just tell people, “That’s it – you don’t have to write code anymore!” At the very least, there will still be developers who love writing code and who are great at it.
Is that the only problem?
A programmer’s role goes beyond just writing code, as coding alone is a relatively easy task. Their job is to ensure that the code works optimally, efficiently, and incorporates the entire project’s scope, which could contain millions of lines of code. The challenging part is keeping the context of the entire project in mind and understanding why many of the details were implemented in a particular way. Perhaps in the future, we will be able to provide a neural model with a hundred million lines of code and request that a small feature be added to it without disrupting anything. For now, though, even humans struggle to get it right.
What was the most interesting result of your lab?
One of the most exciting projects that we have worked on was with the Kotlin Compiler team. To test the compiler, they use code snippets written in their language and check how efficient it is and whether it generally works. Once it so happened that all available and understandable code options had already been checked one way or another. So they asked us to find code that was written by real people, but in a way that people don’t usually write it.
They wanted to put this strange code into the compiler and see if it would break, and how correctly every compiler stage would work in extreme cases when the code was written in an inexplicable way. Despite the process feeling like a wild goose chase, we were indeed able to find such code fragments. We even shared some of those findings with the Kotlin Development team, showcasing the creative ways people can write in their language.
On the contrary, have you found anything interesting in routine tasks?
One of our ongoing projects involves mining typical changes in code, which I find fascinating both from an academic and practical standpoint. We’ve been working with a number of Python projects, using smart graph mining techniques to identify common fixes that could be automated, saving developers the trouble of making these changes manually. We found quite a few interesting things. We discovered things related to migration between different library versions to improve code efficiency, migration between language versions, and stylistic edits. We even shared some of the fixes with their authors on GitHub, asking them directly if they would find it useful to have these changes automated in a tool. More than half of them expressed interest in the idea. We’ve already implemented some of the fixes, while others were project-specific.
Why is this cool?
IntelliJ IDEA, our flagship product, has over 2,000 inspections. They analyze your code and tell you how to improve it. Each inspection was written by developers based on their own experience. We didn’t start from the top, but from the bottom by looking at changes that real people actually make in code. We were able to discover them only because people make them so often. I believe this kind of automation to be important as it allows tools to be developed not just by programming numerous algorithms (each of the 2,000 inspections was invented, written, and debugged by someone), but by relying on an array of open data.
Do all the ML-related tasks within the company come to you?
No, there are other teams at JetBrains that focus more on product development and implementing different features end to end. They conduct the research, implement the feature, roll it out to the product, monitor how it’s being adopted, conduct A/B tests, and so on. However, they are less likely to take on risky and unconventional projects. It’s our team that specializes in tackling obscure and unusual tasks. We are responsible for seeking out ideas that have the potential to be truly innovative and useful. While some teams do approach us with specific tasks, these projects are relatively rare.
Are there any “definitely yes” markers when selecting projects?
We try to keep a foot in both the applied and academic “camps”, so to speak, but there isn’t an explicit checklist for one or the other. Of course, we want to create useful things that we can use right away. On the other hand, we understand that you can’t make anything great that way because you’ll get bogged down in routine tasks.
When it comes to applied projects, the nature of the product and its audience can greatly influence our direction. Mature products with established audiences, like PyCharm or IntelliJ IDEA, may not have much room for radical innovation, while emerging projects often have no time for us.
Research projects, on the other hand, often rely on external collaborations and accumulated expertise. Their focus is often on producing papers, presenting at conferences, sharing this experience with others, finding like-minded collaborators, and so on. Such projects typically involve exploring new models or approaches, such as developing a new way to embed changes in code, which can then be implemented elsewhere.
What is the proportion of applied ideas to academic ones?
It’s hard to say for sure, given the size of our backlog, but I’d say we have about an even split between applied and academic ideas. Right now we’ve got about 50 potential projects, but not all of them are immediately relevant or have the right people available. It’s important to me that everyone on the team is working on something they’re passionate about, not just what I want them to work on. So, it’s my responsibility to find people who are passionate about the things that I want them to work on. There have been interviews, for instance, where the interviewee talked about working in a certain area that we hadn’t explored before. We were able to break into this area by letting the new person take the reins. We got something out of it, regardless of how deep our backlog was.
In what proportion do ideas reach the prototype stage, and how often does a prototype turn into a feature?
Most of our ideas make it to the prototype stage because a poorly functioning prototype is still a prototype. We’ve only had a few projects that didn’t go anywhere.
However, not all prototypes end up becoming features in our paid products. Why do people use GitHub Copilot? It might not produce relevant suggestions all the time, but when it does, people get very excited.
Then you have the “wow” effect. Alternatively, the tool simply needs to be extremely reliable.
The conversion rate from prototype to feature is much lower, around 10 percent. We consider it a success when we hand a tool over to a team, and then it lives its own life, just like any other feature, which may or may not make it into the product. It may not make it for various reasons. There was one instance where we did research, the feature was rolled out to users in our EAP program, but then we faced challenges with implementing it from a UI perspective. Until a solution is found, it will sit on the shelf.
How are you evaluated at your company?
A metrics-driven approach goes against the JetBrains Research culture. We have only a few KPIs, and certainly no company-wide standards. We do not have specific targets, such as a set number of features or papers that are planned for the year.
Do you think it’s the right approach not to set KPIs for teams like yours?
I do! A good example is DARPA, the organization that created the internet. Their approach was to gather smart people, not limit them in any way, and see what they would produce. That’s how the internet and many other interesting things were created. JetBrains works the same way. We hire passionate individuals and give them a lot of freedom. How this scales up to thousands of people is a different story, but in terms of R&D, it’s good practice.
What’s wrong with metrics?
With metrics, what you measure is what grows. If your goal is to publish ten papers, you will publish them. The question is, what happens if you don’t have enough results for ten high-quality papers? You may still publish ten, but some of them just won’t be as good. In our R&D team, we share a subjective understanding of the team’s usefulness, which is supported in various ways. By incrementally building our reputation, we have come to a point where we are given the opportunity to seek out what we believe is right on our own. At the same time, the company occasionally comes up with new areas that people might be interested in exploring, and we listen to them and collaborate with different teams.
What is your planning horizon? Do you know what you’re going to do this year?
This year, I do. One area I can talk about is working with graph neural networks and other practical alternatives to transformers, which are being underestimated and could be more productively used in our tasks. As for next year, I have a rough idea, but it’s hard to plan too far ahead in our industry because everything changes so fast. For example, powerful language models are becoming widely available. Obviously, we have to respond to that. The main value comes not from the tool itself, but from its integration. Nobody said that IDEs are no longer needed because there are debuggers, right? No, debuggers actually have been seamlessly integrated into the development process. Similarly, language models need to be integrated into the development process, and vendors must be able to tackle this complex task.
Did the emergence of large language models force you to adjust your course?
Not at all. With code written by a person, what about automatically refactoring it? This task is still relevant with the advent of language models.
We can give ChatGPT a piece of code and ask how it could be improved. Interestingly, ChatGPT does provide good recommendations. However, if you then ask it to execute those same recommendations as the next step, in most cases, the resulting code won’t work. This is an example of a more modern way to handle tasks: We can now ask the language model, “Help me make this piece of code better.”
How is the lab structured in terms of personnel?
In our team, there is a very fine line between researchers and engineers. Researchers focus on collecting data, training models, and testing them. Their task is to come up with a new tool. Engineers, on the other hand, take the existing models and try to implement them. They build plugins for IDEs and other products. The task of the researchers is to read and write whitepapers, think, and train models. The engineers spend more time programming and figuring out how to integrate ideas into our tools.
How different are the requirements for them?
From engineers, we expect high-quality and predictable programming, as well as an understanding of algorithms and data structures. Their interviews are therefore similar to those conducted for other teams and projects.
There are far fewer researchers than engineers because of the requisite background and need for a good understanding of mathematics. Our ideal candidate for a researcher is someone who has worked in our area of code analysis, but unfortunately, these individuals are few and far between. We more commonly receive applications from people in related fields, such as natural language processing or computer vision.
Is it always scientists or do experts from business and industry also have a place on your team?
Practical experience in machine learning is not easily converted into research experience. We are in the middle ground between industry and academia. Those who have applied machine learning extensively in commercial activities, such as in banking, have a different experience and different goals. Their primary focus is to solve a specific problem and improve a particular metric, while our team needs to understand how a model works and adapt it to our specific tasks. Not everyone has this type of experience. We interview a lot of people, but we don’t hire many.
What is your conversion rate?
During our last hiring cycle, we got around 200 applications. We ended up interviewing 40 of those people and hired two. It’s a tough job that requires a certain set of skills, which not everyone has.
Are soft skills important?
It’s important to fit in with our informal and flat structure, which doesn’t work for everyone. We need responsible and disciplined individuals who don’t require micromanagement, and who proactively communicate with others in the team. This is very important. I believe that software engineering is all about people, not just computers. After all, code is written by people, for people. A team lead, for example, needs to be a psychologist and sociologist, in a sense, to be able to build a team and get that team to work as a unified whole. Commercial software development is all about bringing together experts from different areas and forming a team that is more functional together than each member individually. This requires a structured and understanding approach, and there is no one-size-fits-all formula for team organization or hiring decisions.
Aside from hard skills, what else do you pay attention to when hiring?
I see an interesting correlation: If a person has a PhD in any field, then we will most likely have something to talk about.
A PhD can say a lot about a person. They are likely to be independent, able to think deeply, and have written several papers on a single topic, even if it’s not directly related to our field. We’ve hired people from completely different areas, such as string theory or lasers. This gives us both diversity and valuable external ideas. This is important in R&D because some ideas may never occur to me or anybody else on the team simply because we are used to thinking in a certain way. Teaching experience is also a plus if they have the technical skills we’re looking for. And a PhD is not a strict requirement either, as there are exceptions in both directions. I’ve seen people who just ended up in a group with a great supervisor who was good at setting goals. But then again, if a person doesn’t have a PhD, it doesn’t mean that they can’t think independently and learn new things.
What kind of training is needed?
In R&D, you need to have a solid grasp of fundamentals, including statistics, probability theory, and linear algebra, which is the basis of modern neural models. The specifics of R&D in our field make the learning curve steeper. You need to understand how software development works. But everything is achievable. Interns come to us and stay, so I see the path to becoming an independent researcher quite clearly. It takes a long time, requires effort and discipline, but it is doable.
And finally, do you think curiosity can be trained?
I believe it’s innate. Either a person wants to understand how things work, or they don’t. I find it hard to imagine a researcher who is forced to be curious. While some people may be good at following directions and completing assigned tasks, it takes a special kind of curiosity to invent something new.