Duplicate Finder, Part of ReSharper Command Line Tools

Along with ReSharper 8 EAP earlier this year, we have made ReSharper Command Line Tools available for you to download and try. We have already written about one of the tools included in this package — InspectCode, which analyzes your code outside of Visual Studio using hundreds of ReSharper code inspections. But the package also includes another tool, dupFinder and we’ll take a closer look at it in this post.

As its name suggests, dupFinder finds duplicates in C# and Visual Basic .NET code. Being a JetBrains tool, dupFinder does it in a smart way. By default, it considers code fragments as duplicates not only if they are identical, but also if they are structurally similar, even if they contain different variables, fields, methods, types or literals. Of course, you can configure allowed similarity level as well as the minimum relative size of duplicated fragments.

Running Duplicate Analysis

dupFinder is not exactly a new kid on the block. For quite a while, JetBrains TeamCity has included it out of the box, and this is probably the easiest and the most efficient way to make use of dupFinder. However, from now on you can get it running with your custom CI, version control, quality control or any other server and here is how:

  1. Download and unzip ReSharper Command Line Tools
  2. Run the following command: dupFinder [OPTIONS] source

One way to define the target sources is to specify a solution file: dupFinder understands solution files of Visual Studio 2003, 2005, 2008, 2010, and 2012. Alternatively, you can provide a specific list of source files as a set of newline-delimited wildcards.

Configuring Options

Using optional parameters, you can configure how dupFinder should analyze your source code. To explore the full list of options, run dupFinder /help. Below are some of the options that you might be interested in:

  • /exclude allows excluding files from duplicate code search. The value is a set of newline-delimited wildcards (for example, **Generated*.cs). Note that the paths should be either absolute or relative to the working directory.
  • /exclude-by-comment and /exclude-code-regions allow excluding files by substrings of opening comments and regions. The value is a set of newline-delimited keywords (e.g. ‘generated code’ will exclude regions containing ‘Windows Form Designer generated code’).
  • /discard-fields, /discard-literals, /discard-local-vars, /discard-types specify whether to filter out similar fragments as non-duplicates if they have different variables, fields, methods, types or literals. The default value for all of them is ‘false’. To illustrate the way it works, consider the following example. There are two code fragments otherwise identical, one contains myStatusBar.SetText("Logging In...");, the other contains myStatusBar.SetText("Not Logged In");. If ‘discard-literals’ is set to ‘false’, these fragments are considered duplicates.
  • /discard-cost allows setting a threshold for code complexity of duplicated fragments. The fragments with lower complexity are discarded as non-duplicates. The value for this option is provided in relative units.
    Using this option, you can filter out equal code fragments that present no semantic duplication. E.g. you can often have the following statements in tests: Assert.AreEqual(gold, result);. If the ‘discard-cost’ value is less than 10, statements like that will appear as duplicates, which is obviously unhelpful. You’ll need to play a bit with this value to find a balance between avoiding false positives and missing real duplicates. The proper values will differ for different codebases.
  • /show-text: if this parameter is used, detected duplicate fragments will be embedded into the report.

Understanding Output

The resulting output is a single XML file that presents the following information:

  • The Statistics node is an overview of analyzed code, where CodeBaseCost is the relative size of target source code, TotalFragmentsCost is the relative size of the code for analysis after applying filters (‘discard-cost’, ‘discard-literals’, etc.), and TotalDuplicatesCost is the relative size of detected duplicates.
    Statistics node
  • The Duplicates node contains Duplicate nodes, which in turn contain two or more Fragment elements.
    Duplicate node
  • Each Duplicate node has a Cost attribute: duplicates with greater cost are the most important ones as they potentially present greater problems.
  • Each Fragment element contains file name as well as duplicated piece presented in two alternative ways: as a file offset range and as a line range. If the /show-text option was enabled for analysis, then a Text node with the duplicated code is added to each fragment.

Practical Use

We are now ready to have some practice with dupFinder. In the steps described below we’ll take a solution, e.g. SolutionWithDuplicates.sln and see how to start duplicate analysis using an MSBuild target with a simple HTML report based on the dupFinder output.

Step 1

First, we unzip ReSharper Command Line Tools somewhere, e.g. in C:programsCLT.

Step 2

Now let’s think ahead to processing the dupFinder output. If we leverage the /show-text option, we’ll be able to build an HTML report by applying an XSL transformation to the dupFinder XML output; something like this will do:

We put this XSL stylesheet with the rest of the tools into C:programsCLT.

Step 3

The easiest way to run duplicate analysis and the ensuing transformation is specify a new MSBuild target. Since we are now in the solution directory, we go into one of its project subdirectories and open the project file (*.csproj) with a text editor, then add the following element into the root <Project> node:

In this build target, which executes after the project build is finished, we move the working directory one folder up from the project directory to the solution directory, run dupFinder, and then apply an XSL transformation to the dupFinder outpiut using our XSL stylesheet.

Step 4

Finally, all we have to do is to build our solution. If everything goes right, we’ll get two new files in the solution directory: dupReport.xml and dupReport.html. If we open dupReport.html, we can look through the list of all detected duplicates right in the web browser:
HTML report

This simple example can be extended and customized in many ways, but we hope it shows you that there is nothing difficult in integrating ReSharper Command Line Tools into your workflow.

Comments below can no longer be edited.

30 Responses to Duplicate Finder, Part of ReSharper Command Line Tools

  1. Avatar

    Armin says:

    September 4, 2013

    Code Cleanup/Reformat Code

    I generate .cs files using a WPF Application.
    These .cs files checked in into the Microsoft Team Foundation Server Source Control.

    Would be nice to have a commandline tool to do a code formatting.

  2. Avatar

    Dmitry Matveev says:

    September 4, 2013

    Thanks for your feedback, Armin
    Yes, we’ve already thought about it, so something like that may appear in the next major release of ReSharper.

  3. Avatar

    Dave Solomon says:

    September 4, 2013

    I’m not having any luck getting the /exclude option to work. The command line that I’m running is “dupfinder /exclude **Test*.cs .sln”

    The output I get is

    “Duplicates Finder for .NET
    Running in 64-bit mode, .NET runtime 4.0.30319.18052 under Microsoft Windows NT
    6.1.7601 Service Pack 1
    dupFinder: Invalid option ‘exclude’. Error: Option should be boolean”

    I’ve also tried **/**.Test*.cs, ****.Test*.cs and a few other globs; none of them have worked.

    Sort of related to this, the command line help doesn’t go any further in specifying the delimiter. I’d expect commas or semicolons; the page here says it expects newline-delimited tokens which only makes sense if /exclude is looking for a file of exclude patterns. If it is, the neither the command line help nor this page makes that clear.

  4. Avatar

    Dmitry Matveev says:

    September 5, 2013

    Hi Dave,
    Sorry if didn’t make it clear but all optional parameters that receive values have the following format: /option=value
    So your example should look like:
    dupfinder /exclude=**Test*.cs YourSolution.sln

  5. Avatar

    Andy says:

    September 5, 2013

    slightly improved dupfinder.xsl

  6. Avatar

    Andy says:

    September 5, 2013

    @Dave this has worked for me:

  7. Avatar

    PaulB says:

    September 6, 2013

    I’m having the same issue as Dave.

    The /exclude option just isn’t working for me at all.

    I tried every syntax combination, including the one specified by Dmitry, and including using an external file with the patterns one-per-line, and in every case, I get the EXACT same output. Which includes all the *Test.cs files. Nothing I do seems to filter them out.

    I’m thinking this functionality is just plain broken… or at least ONE of the many permutations I did, based on very vague and unclear documentation, would have worked.

  8. Avatar

    Dmitry Matveev says:

    September 6, 2013

    Hi Paul,
    I mentioned above that the exclude paths should be either absolute or relative to the working directory.
    The best way is to run dupfinder from the solution directory – open the command line in the solution direcotry and then run dupfinder using the full path to its executable file. E.g.:
    C:programsCLToolsdupfinder.exe /exclude=**Tests.cs YourSolution.sln

  9. Avatar

    PaulB says:

    September 6, 2013

    This is the command line I’ve been using:

    dupfinder.exe /show-text /exclude=**Test*.cs /output=D:FullPathToOutputFile.xml C:FullPathToMySolution.sln

    This is what I would really prefer… I want to run it on a bunch of solutions, not having to have to cd to every single solution in between each… that’s very awkward. Especially given that the tool is on a different drive than the solution, is on a different drive than the output in this situation.

    Given that it works just fine like this *except* for the fact that the /exclude is ignored, can I put in a request to fix this (in my opinion) bug (or think of it as an enhancement if you must), so that I don’t have to run it from the solution directory to get it to work?

    I have verified that what you’ve suggested seems to work, but I cannot stress enough how awkward and unintuitive that is for me. I really want to just sit in one place and run it on all the solutions… that’s so much easier and more intuitive.

    It would also be nice if you could give the name of the xls file on the command line, so the output could have that line generated right in it, so I wouldn’t have to open the file in Notepad++, paste in the xml-stylesheet line, save, and then go look at it in the browser. Again, just going for ease of use here… trying to automate things is difficult as it is right now.

  10. Avatar

    Dmitry Matveev says:

    September 9, 2013

    Thanks for the feedback, Paul!
    To make /exclude work in your example, use /exclude=”C:**Test*.cs”

  11. Avatar

    rodmanwu says:

    September 13, 2013

    this tool is great,but can it support vs2013?

  12. Avatar

    Nuwan says:

    September 16, 2013

    Hi Dmitry

    I want to exclude multiple files. According to the documentation,
    /exclude allows excluding files from duplicate code search. The value is a set of newline-delimited wildcards (for example, **Generated*.cs).
    I was able to exclude files that ends with the word “Map” with the following command
    I also need to specify another wildcard. I tried few combinations without any luck.
    Any help will be appreciated.


  13. Avatar

    Dmitry Matveev says:

    September 18, 2013

    Yes, it supports VS2013 solution files.
    But be informed that dupFinder uses solution files for the only purpose – to get the list of included sorce files, so you can always specify just a list of source files as an input.

  14. Avatar

    Nick Dunets says:

    September 19, 2013

    How to make dupFinder to display duplicates as Warning or Error in VS build output? I’d like to make it more annoying for other developers

  15. Avatar

    Grady Werner says:

    September 23, 2013

    If you want to use Twitter Bootstrap Tabs to easily compare fragments in your output file, us this for the XSL transform:

  16. Avatar

    Dmitry Matveev says:

    October 2, 2013

    Hi All, Sorry for being away for some time.
    @Nuwan, in your case, it looks like we have an issue with delimiters. You can watch the issue and comment on it.
    @Nick, I think this link will help you.
    @Grady, good idea, thanks.

  17. Avatar

    Colin Bowern says:

    October 15, 2013

    Is there any support for importing the Inspections output files into TeamCity similar to the duplicates finder? I keep details of the build stages in psake and instead of codifying every step in detail in TeamCity itself. I see there is one for the duplicates finder “ResharperDupFinder” already.

    • Avatar

      dmitry.matveev says:

      March 25, 2014

      Hi Colin,
      If you mean the InspectCode reports, then no, currently TeamCity does not support them. However, there are plans to implement that in the future.

  18. Avatar

    Alon Golub says:

    December 19, 2013

    Yep, I must be an idiot!

    I have tried numerous settings and am unable to “exclude” any files. I’ve spent over an hour trying different mutations and read numerous sites/articles, all with no luck..

    Any help would be appreciated 🙂
    My Solution file is here …
    “C:1 DevelopmentDashboard SystemSource Code5 Solution FilesE-Tabs.Dashboard.System”
    The source code projects are located in numerous folders from this root …
    “C:1 DevelopmentDashboard SystemSource Code”

    I created a batch file (just want to use the command line) … to exclude ALL
    *generated.cs and *Designer.cs in any subfolder in any project.

    The batch file is in the solution folder and looks like this ..

    # Generate Duplicate Report
    “C:10 Dev ToolsReSharperCLTdupfinder.exe” /output=”E-Tabs.Dashboards.DuplicateReport.xml” /show-text /exclude=”WHAT GOES HERE” “E-Tabs.Dashboard.System.sln”
    # Transform File, don’t forget to change Encoding
    “C:10 Dev ToolsReSharperCLTmsxsl.exe” E-Tabs.Dashboards.DuplicateReport.xml “C:10 Dev ToolsReSharperCLTTransform.xslt” -o “E-Tabs.Dashboards.DuplicateReport.html”

    What should I use for my “exclude” property? I have tried at least 15 variations for exclude and am stuck here ;(

    Thanks in Advance for this! (Note, I would have not known about this tool, except I upgraded to ReSharper 8.1 today and it was mentioned in the blog post!)

  19. Avatar

    dmitry.matveev says:

    December 23, 2013

    Hi Alon,

    If you are running your script in the solution directory, then use
    /exclude=”**generated.cs; **Designer.cs”
    otherwise use
    /exclude=”C:**generated.cs; C:**Designer.cs;”
    C is the drive letter.
    Please answer whether it helps.

  20. Avatar

    Peter Brightman says:

    July 8, 2014

    What’s the OffsetRange? Byte-offset or character-offset?

    • Avatar

      Dmitry Matveev says:

      April 6, 2016

      Hi Peter,
      It’a a character offset range.

  21. Avatar

    William Duffy says:

    July 15, 2014

    It seems that dupfinder chokes on the VS Setup project associated with my solution.
    The first line in the vdproj file is

    However I assume there is lots more it doesn’t like.
    I tried adding an exclude of **.vdproj, but this didn’t change anything.

    Here is the error text:
    Error 1598 The project file could not be loaded. Data at the root level is invalid. Line 1, position 1. C:\DEV\DashLite_Device_Application\DashLite Setup\DashLite Setup.vdproj at (1:1) C:\DEV\DashLite_Device_Application\Astro-Med.DashLite\EXEC
    Error 1599 The command “”C:\Program Files\JetBrains\CommandLine\dupfinder.exe” /exclude=”**generated.cs; **Designer.cs; **.vdproj” /output=”dupCommonReport.xml” /show-text “Astro-Med.sln”” exited with code -1. C:\DEV\DashLite_Device_Application\Astro-Med.DashLite\Astro-Med.DashLite.csproj 1439

  22. Avatar

    Jacques says:

    February 11, 2015

    Maybe a stupid question, but do I need the full version of RS installed for this to work? I downloaded the cmd line tools and when i run it I just get a lot of red text (exceptions) from the utility.

  23. Avatar

    Patrice says:

    March 13, 2015

    I had the same problems, you need to unblock the zip file in its properties before extracting it

  24. Avatar

    Paulo Pires says:

    May 14, 2015

    I’m trying to use dupfinder in a solution with VB .NET and C# projects but the tool only report duplication in c# files. How can I force it to run in vb code?

    Thanks in advance

  25. Avatar

    Tofi says:

    July 6, 2015

    I’m very new to this so the instructions are really unclear to me.
    My first issue: You lost me at step 3. How do I run dupFinder? Can it be done through an executing program? ‘specifying a new MSBuild target’ What does that mean? What options do I have to run it without dealing with solution directories? I am dealing with .cs files in a directory.

    The other thing is, how do I pass a list of source files to it? I want to compare a bunch of source files to a known source file and then include options to single out the most similar source file. The output would then be a diff of the known source file and the most similar file, and then I would transform it to html. This is all I’m trying to do. If dupFinder is not capable of finding the most similar source file, I would be content with a diff of all source files.

  26. Avatar

    Alexander says:

    September 26, 2016

    It didn’t work with \exclude but it worked with -exclude.

    Is there some general rule for that, which I’m missing?

    • Avatar

      Dmitry Matveev says:

      September 26, 2016

      Sorry, but this post is 3 years old and a bit outdated. The up-to-date documentation for dupFinder is here.

  27. Avatar

    Ciaran Gallagher says:

    August 28, 2018

    This is rad.

Discover more