BeamSearch in code generation
In the previous article devoted to full-line code completion, we looked into the vocabulary that the neural net of our full line completion plugin uses for Python. However, just having 16384 tokens like self., or, s.append(, return value, and others described in the article is not enough to generate even a single line. We need a way to combine these tokens together to write chunks of code. In today’s article, we will discuss how the algorithm constructs longer phrases using the elements of the vocabulary. The first idea that deserves mentioning is autoregression. Autoregression Autore
Looking at Python through the eyes of a neural net
The JetBrains full line code completion plugin for Python is now available as a public beta. We would like to talk about some of the technologies and algorithms used to create the plugin and share statistics about Python programming that we’ve collected in the process. What is “full line code completion?” You are probably already familiar with code completion, the kind that suggests the next word the user is typing. If you are not, we have covered it in a series of articles (one, two, three, four). Full line code completion extends the service by suggesting larger fragments of code.
Code Completion, Episode 4: Model Training
The previous articles from the series covered the following topics: In the first episode, we discussed general code completion scenarios.The second episode was devoted to the difficulties of heuristics-based implementation and explaining the necessity of machine learning.In the third episode, we described the data we collect from IDEs to train the completion ranking algorithm. We would like to talk about the difficulties specific to our task and share the ways to overcome them that we found. Due to the data protection requirements described in the third article, everything we collect
Code Completion, Episode 3: Where Is the Dataset?
As we discovered in the second installment of this series, a modern code completion system needs machine learning to rank the suggestions most effectively. Machine learning has one thing in common with human learning: it requires data to extract knowledge. There are many aspects to that process. We use so-called supervised learning for code completion, and that involves feeding the algorithm a set of example problems for which the correct action is known. The algorithm then finds common patterns and dependencies and learns to take the proper actions even in situations it hasn’t already seen
Code Completion, Episode 2: Why Machine Learning?
Why not order by what’s logical? I don’t understand.From a bug report In episode 1, we learned about the principal components of the code completion system and discussed its usage patterns and quality requirements. Today, let’s look into what reasons we have to employ machine learning aside from just following the hype. It’s a tough decision to make to replace “code that works” with a machine-generated binary that uses extra memory, may slow down the performance, and is difficult to interpret. Previously, we used a set of heuristics. They worked well unless they conflicted with each
Code Completion, Episode 1: Scenarios and Requirements