The Inspection Connection – Issue #4, Duplication Detection
One of the core principles of computer programming is eliminating manual repetition. Not only is repeated code time-consuming to read and review, it is disproportionately complex to maintain and provides a ready source of bugs when one copy changes. Yet in spite of this, duplicating some code in a large project may be inevitable. IntelliJ IDEA offers a powerful inspection for detecting and deduplicating repetitive code. You can activate this inspection by navigating to the menu entry Analyze|Locate Duplicates, or by pressing Ctrl/⌘+Alt/⌥+Shift+I then typing, “Duplicated Code.”
As it turns out, “Don’t Repeat Yourself” has some notable exceptions, especially when writing unit tests, and occasionally in some rare cases when attempting to outsmart the compiler. IntelliJ IDEA gives you the opportunity to exclude test sources, since many prefer a little duplication in their tests to improve overall readability. Nevertheless, the choice is yours.
When analyzing a large set of files, you should carefully consider the scope of the inspection you are going to apply. There are a number of algorithms for detecting duplicate code – this one will scan for duplicates by comparing AST fragments. On an Intel i7-2640M with 8GB of RAM, scanning a codebase with 53,000 LOC will take about 3 minutes, or ~300 LOC/second. However your mileage may vary depending on the minimum duplicates length, line density and other factors. When searching for exact matches, the time complexity is effectively linear w.r.t. the number of nodes in your AST.
When searching for duplicate code, we may want to discard duplicates shorter than some minimum threshold or add a similarity metric in order to detect fuzzy-matches. IntelliJ IDEA lets us define these values, but keep in mind they do not correspond to characters or lines, but rather arbitrary weights. If “Visible from outside of the current scope only” is selected, only subexpressions which are valid outside of their parent construct are anonymized. If you would like to learn more how all of this works, head over to the source.
After scanning, the Duplicates Tool Window is a convenient area to browse through duplicate regions and view a side-by-side diff at each location. From here you can clearly see which sections are structurally similar and flag them for review or refactor them immediately. The highlighter will help you quickly identify which blocks are different.
When you have found a duplicate block of code, it can be easily simplified by pressing Ctrl/⌘+Alt/⌥+M to apply the Extract → Method refactoring. IntelliJ IDEA will transform all inbound variables inside the selected region with parameters and provide a helpful dialog to extract the method, which you can use to replace any other duplicates inside the analysis scope.
This inspection is also available in TeamCity, where you can assign a dedicated build runner to periodically scan your project for duplicate code. This may be an even better option if your organization needs regular assurance audits for clean-room engineering. Plagiarism detection is another important use case for the academic community, and if you are a student or teacher, IntelliJ IDEA is now completely free.
Subscribe to Blog updates
Thanks, we've got you!
The Best of IntelliJ IDEA’s HTTP Client
In today's fast-paced tech world, easy and efficient management of web services and APIs is crucial. That's where IntelliJ IDEA's HTTP Client steps in, turning complex HTTP requests into a walk in the park. The HTTP Client is a feature-rich, text-based tool in IntelliJ IDEA Ultimate. It seamlessl…
Profile on the Fly With the IntelliJ Profiler
At JetBrains, we believe that profiling should be simple and accessible to all developers, regardless of their level of expertise. That’s why the IntelliJ Profiler puts ease of use and seamless integration at the forefront of its mission. In previous releases, we introduced a range of features to s…
History of Gradle Delegation Process in IntelliJ IDEA
Nowadays, a build tool is an essential part of any project, and IntelliJ IDEA integrates with the majority of them. Among the most popular build tools is Gradle. We’d like to give a brief historical overview of its integration with IntelliJ IDEA and the delegation process implemented inside the I…
How to Work With Protobuf-Maven Projects in IntelliJ IDEA
Protobuf, short for Protocol Buffers, is a language-agnostic data serialization format developed by Google. It is designed to efficiently and reliably serialize structured data to communicate between different systems and programming languages. Protobuf offers a compact binary representation of data…