dotPeek 1.2 Is Released

We have finally finished fine-tuning the new version of dotPeek, our free .NET decompiler and assembly browser. Please download dotPeek 1.2 that receives a new superpower and learns how to walk in symbol server shoes.

dotPeek12 is released

Highlights of this update include:

  • dotPeek can be used as a symbol server and provide Visual Studio debugger with the information required to debug assembly code. Not only can dotPeek generate source files on debugger’s request but you can even generate PDB files manually with dotPeek and watch the progress in a dedicated tool window.
  • Process Explorer window that provides you with the list of all currently running processes and allows exploring their modules and decompiling those of them that are .NET assemblies.
  • Quick search and node filtering in Assembly Explorer with lowerCamelHumps supported.

For the full list of fixes and enhancements addressed in version 1.2, take a look at release notes.


Get dotPeek

Posted in dotPeek Tips&Tricks, News and Events | Tagged , , | 7 Comments

Unusual Ways of Boosting Up App Performance. Boxing and Collections

This is a first post in the series. The other ones can be found here:

Many developers today are familiar with the performance profiling workflow: You run an application under the profiler, measure the execution times of methods, identify methods with high ‘own time,’ and work on optimizing them. This scenario, however, does not cover one important performance aspect: the time distributed among numerous garbage collections in your app. Of course you can evaluate the total time required for GC, but where does it come from, and how to reduce it? ‘Plain vanilla’ performance profiling won’t give you any clue about that.

Garbage collections always result from high memory traffic: the more memory is allocated, the more must be collected. As all we know, memory traffic optimization should be done with the help of a memory profiler. It allows you to determine how objects were allocated and collected and what methods stay behind these allocations. Looks simple in theory, right? However, in practice many developers end up with the words, “Okay, so some traffic in my app is generated by some system classes whose names I see for the first time in my life. I guess this could be because of some poor code design. What do I do now?”

This is what this post is about. Actually, this will be a series of posts where we share our experience of memory traffic profiling: what we consider ‘poor code design,’ how to find its traces in memory, and, of course, what we consider best practices.* Here’s a simple example: If you see objects of a value type in the heap, then surely boxing is to blame. Boxing always implies additional memory allocation, so removing is very likely to make your app better.

The first post in the series will focus on boxing. Where to look and how to act if a ‘bad memory pattern’ is detected?

*Best practices described in this series allowed us to increase the performance of certain algorithms in our .NET products by 20%-50%.

What Tools You Will Need

Before we go any further, let’s look at the tools we’ll need. The list of tools we use here at JetBrains is pretty short:

  • dotMemory memory profiler.
    The profiling algorithm is always the same regardless of the issue you’re trying to find:

    1. Start profiling your application with memory traffic collection enabled.

    2. Collect a memory snapshot after the method or functionality you’re interested in finishes working.

    3. Open the snapshot and select the Memory Traffic view.

  • ReSharper plugin called Heap Allocations Viewer. The plugin highlights all places in your code where memory is allocated. This is not a must, but it makes coding much more convenient and in some sense ‘forces’ you to avoid excessive allocations.

Boxing

Boxing is converting a value type to the object type.  For example: Boxing example

Why is this a problem? Value types are stored in the stack, while reference types (object) are stored in the managed heap. Therefore, to assign an integer value to an object, CLR has to take the value from the stack and copy it to the heap. Of course, this movement impacts app performance.

How to Find

With dotMemory, finding boxing is an elementary task:

  1. Open a memory snapshot and select the Memory Traffic view.
  2. Find objects of a value type. All these objects are the result of boxing.
  3. Identify methods that allocate these objects and generate a major portion of the traffic.

Boxing shown in dotMemory The Heap Allocations Viewer plugin also highlights allocations made because of boxing. Boxing shown by the HAV plug-in

The main concern here is that the plugin shows you only the fact of a boxing allocation. But from the performance perspective, you’re more interested in how frequently this boxing takes place. E.g., if the code with a boxing allocation is called once, then optimizing it won’t help much. Taking this into account, dotMemory is much more reliable in detecting whether boxing causes real problems.

How to Fix

First of all: before fixing the boxing issue, make sure it really is an issue, i.e. it does generate significant traffic. If it does, your task is clear-cut: rewrite your code to eliminate boxing. When you introduce some struct type, make sure that methods that work with this struct don’t convert it to a reference type anywhere in the code. For example, one common mistake is passing variables of value types to methods working with strings (e.g., String.Format):Fixing boxing

A simple fix is to call the ToString() method of the appropriate value type:Fixing boxing 2

Resizing Collections

Dynamically-sized collections such as DictionaryListHashSet, and StringBuilder have the following specifics: When the collection size exceeds the current bounds, .NET resizes the collection and redefines the entire collection in memory. Obviously, if this happens frequently, your app’s performance will suffer.

How to Find

The insides of dynamic collections can be seen in the managed heap as arrays of a value type (e.g. Int32 in case of Dictionary) or of the String type (in case of List). The best way to find resized collections is to use dotMemory. For example, to find whether Dictionary or HashSet objects in your app are resized too often:

  1. Open a memory snapshot on the Memory Traffic view.
  2. Find arrays of the System.Int32 type.
  3. Find the Dictionary<>.Resize and HashSet<>.SetCapacity methods and check the traffic they generate.

Finding resized Dictionary in dotMemoryThe workflow for the List collections is similar. The only difference is that you should check the System.String arrays and the List<>.SetCapacity method that creates them.Finding resized List in dotMemoryIn case of StringBuilder, look for System.Char arrays created by the StringBuilder.ExpandByABlock method. Finding resized StringBuilder in dotMemory

How to Fix

If the traffic caused by the ‘resize’ methods is significant, the only solution is reducing the number of cases when the resize is needed. Try to predict the required size and initialize a collection with this size or larger. Predicting collection size

In addition, keep in mind that any allocation greater than or equal to 85,000 bytes goes on the Large Object Heap (LOH). Allocating memory in LOH has some performance penalties: as LOH is not compacted, some additional interaction between CLR and the free list is required at the time of allocation. Nevertheless, in some cases allocating objects in LOH makes sense, for example, in the case of large collections that must endure the entire lifetime of an application (e.g. cache).

Enumerating Collections

When working with dynamic collections, pay attention to the way you enumerate them. The typical major headache here is enumerating a collection using foreach only knowing that it implements the IEnumerable interface. Consider the following example:Enumerating collections example

The list in the Foo method is cast to the IEnumerable interface, which implies further boxing of the enumerator.

How to Find

As with any other boxing, the described behavior can be easily seen in dotMemory.

  1. Open a memory snapshot and select the Memory Traffic view.
  2. Find the System.Collections.Generic.List+Enumerator value type and check generated traffic.
  3. Find methods that originate those objects.

Finding enumerators using dotMemory As you can see, a new enumerator was created each time we called the Foo method.

The same behavior applies to arrays as well. The only difference is that you should check traffic for the SZArrayHelper+SZGenericArrayEnumerator<> class.Finding array enumerators using dotMemory  The Heap Allocation Viewer plug-in will also warn you about hidden allocations: HAV plug-in warning about enumerator allocation

How to Fix

Avoid casting a collection to an interface. In our example above, the best solution would be to create a Foo method overload that accepts the List<string> collection.

Fixing excessive enumerator allocations

If we profile the code after the fix, we’ll see that the Foo method doesn’t create enumerators anymore.

Memory traffic after the fix

In the next installment of this series, we’re going to take a look at the best approaches for working with strings. Stay tuned!


Get dotMemory

Posted in dotMemory Tips&Tricks, How-To's | Tagged | 23 Comments

ReSharper for C++ EAP Goes On

As you may have already heard, the Early Access Program for ReSharper with C++ support is in progress.

We’ve prepared a new build and wanted to share a quick update on what new features and options it brings to the table:

  • Better performance: indexes are now saved locally, meaning that subsequent launches are faster than the initial launch.
  • Improved code completion including smart completion.
  • More settings to customize code formatting style.
  • New control flow analyses that detect unreachable code, unitialized local variable, assigned value that is never used, and redundant ‘else’ keyword.
  • New quick-fixes including Create from usage for global variables, class and enum members.

To learn more on what ReSharper C++ features are already implemented and how they can help you in your everyday work, check this video by Dmitri Nesteruk:

Please note that there are still limitations in terms of supported project size (up to 40 MB), MS C++ extensions and MS preprocessor extensions. ReSharper C++ EAP page contains a full list of known issues and unsupported items, which you’re highly encouraged to examine before you decide to download and install an EAP build.

Posted in News and Events, ReSharper Tips&Tricks | Tagged , | 6 Comments

Webinar Recording: Merging Refactored Code – ReSharper Meets SemanticMerge

The recording of our June 17th webinar, Merging Refactored Code – ReSharper Meets SemanticMerge, is now available on JetBrains YouTube Channel.

In this webinar, Matt Ellis (JetBrains) hosts Pablo Santos (SemanticMerge) who runs through a number of refactoring examples, from the seemingly trivial (yet essential) to complex structure modification scenarios, and demonstrates how to refactor with ReSharper and later get it merged with Semantic.

Pablo talks about the challenges of merging complex refactorings and demonstrates how SemanticMerge simplifies it, parsing the code into a syntax tree then reasoning about it as code rather than text (or text with heuristics), and merges accordingly.

The slides used in the webinar are available on Slideshare.

About the Presenter:

Pablo SantosPablo Santos is the founder at Codice Software, the company behind Plastic SCM and SemanticMerge. Codice started in 2005 and since then Pablo played different roles ranging from core engineering to marketing, business development, advertising and sales operations. Nowadays he plays a dual role as lead of software engineering and product manager for both Semantic and Plastic.

Posted in How-To's, ReSharper Tips&Tricks | Tagged , , | 2 Comments

Profiling ASP.NET vNext using dotMemory and dotTrace

With ASP.NET vNext (working title), Microsoft is making some bold moves towards opening up development and deployment options for building applications. For starters, we no longer have to install the full .NET framework on our servers: ASP.NET vNext lets us deploy our own version of .NET on an app-by-app basis. Next, the project model is being migrated to being just a project.json file describing dependencies and some optional configuration, much like we write apps in NodeJS.

ASP.NET vNext is still in an early preview, but it never hurts to explore and test-drive what’s going to be the next generation of .NET. And when writing code for it, we may want to inspect performance and memory of our applications to verify we don’t have any issues there. And what better tools than dotMemory and dotTrace to do that?

Setting the correct CLR type

Before we start profiling, it is important to know that profiling currently only works when the Desktop CLR is active. The new Core CLR does not (yet?) expose the same profiler hooks.

Checking if the correct CLR is active can be done from the command line using the kvm list command.

Output of the kvm list command

Make sure that the active CLR (marked with an *) shows the svr50 runtime. If it’s svrc50 that means we’re on the Core CLR and have to switch to the Desktop CLR using the kvm set <pick the correct version from the above list> command.

Attaching to an already running ASP.NET vNext process

After starting dotMemory or dotTrace, hit Attach To Process on the toolbar. This will give us a list of all running processes on our local computer. Both profilers can also attach to processes running on remote machines. From this list, find the klr.exe process. When running multiple ASP.NET vNext applications on our machine, it will be a bit tricky to find the correct one.

Attach profiler

We can provide additional options and even make use of the profiler API in our ASP.NET vNext applications. Hitting Run will attach the profiler which will behave identical to profiling “classic” .NET applications. For example in dotMemory, we can see the memory usage (unmanaged and managed heaps) in real-time:

dotMemory memory usage of project K ASP.NET vNext

In dotTrace we can see the subsystems used by our code and drill down to the function or thread level.

Profiling ASP.NET vNext in dotTrace

Starting an ASP.NET vNext process with profiling enabled

When attaching to an already running process, some options will be unavailable. For example, dotMemory will be unable to collect memory traffic for an already running application. It will be impossible to pick a more detailed profiling mode for dotTrace, such as Tracing or Line-by-Line profiling. For these options to be available, we have to launch the process with profiling enabled from the start.

Running ASP.NET vNext applications is done using the k run or k web commands. These are aliases for the active runtime which is located under %USERPROFILE%\.kre\packages\<seleced runtime>\. From dotMemory and dotTrace, we can use the Profile | Standalone Application menu and enter the startup details for our ASP,NET vNext application.

Profiler configuration

Here’s what we have to specify:

  • Application: the ASP.NET vNext runtime to use. This will depend on which runtimes we have installed on or system but will look like %USERPROFILE%\.kre\packages\<seleced runtime>\bin\K.cmd.
  • Arguments: the arguments we would typically pass to the k command. If we start our application with k web, then web would be the argument to specify here.
  • Working directory is the path where our application and its project.json file are located.

Using this approach, we can now set all additional profiler options. dotMemory allows collecting memory traffic while dotTrace will let us perform Line-by-Line profiling and specify the measure used. Run will launch the application and lets us capture a snapshot we can investigate later.

Give it a try! Even with a new CLR in town we still have to care about memory and performance of our applications. Here are some resources to get started with dotMemory and dotTrace:

Posted in dotMemory Tips&Tricks, dotTrace Tips&Tricks, How-To's | Tagged , , , | 3 Comments

Webinar Recording and Q&A: .NET Code Coverage for Continuous Integration using TeamCity and dotCover

The recording of our June 10th webinar with Maarten Balliauw, .NET Code Coverage for Continuous Integration using TeamCity and dotCover, is now available on JetBrains YouTube Channel and slides on slideshare.

In this webinar we use dotCover to collect code coverage information while running tests in our CI process. We can see how we can configure code coverage and how we can use the TeamCity Visual Studio plugin to download the coverage snapshot generated on the build server and inspect it using dotCover on a developer machine.

How much of our code is being covered by our unit tests? Are there areas we are not testing? By capturing code coverage data during a test run, we can analyze which areas of our applications are well-tested and which ones require additional tests to be written. And where better to capture code coverage information than on our build server? Find out in this webinar.

Below are select questions and answers from our webinar:

Q: Can we use MStest with TeamCity and also get code coverage with dotCover?

A: Yes, test runner type and .NET coverage tool can be chosen in TeamCity Build Configuration Settings tab. Please check the following settings:
MSTest and dotCover with TeamCity


Q: Is it possible to use dotCover on TeamCity with xUnit.net tests?

A: Yes, it is possible. If you are using xUnit and you want to capture code coverage, you can run dotCover from the command line and import the data into TeamCity by using a service message. The process details have been described by Maarten during the webinar (starting from 24th minute).

For more information on dotCover, please visit dotCover official website. Stay tuned for further events with dotCover twitter!

Maarten BalliauwMaarten Balliauw is a Technical Evangelist at JetBrains. His interests are all web: ASP.NET MVC, PHP and Windows Azure. He’s a Microsoft Most Valuable Professional (MVP) for Windows Azure and an ASPInsider. He has published many articles in both PHP and .NET literature such as MSDN magazine and PHP architect. Maarten is a frequent speaker at various national and international events such as MIX (Las Vegas), TechDays, DPC and others.
Posted in dotCover Tips&Tricks, How-To's | Tagged , , , , , | Leave a comment

dotPeek 1.2 EAP: Introducing Process Explorer

Have you ever wanted to dig deeper into a process running on your machine? We have. That’s the reason why the new dotPeek 1.2 EAP build introduces Process Explorer.

The Process Explorer window provides you with the list of all currently running processes and allows decompiling those of them that are .NET processes. Once you locate a process to decompile, you can add it to Assembly Explorer for further investigation by clicking the “+” button. From there, you can export decompiled code to a Visual Studio project if necessary.

Process Explorer window in dotPeek 1.2

You can see native processes in this window as well although you naturally shouldn’t expect dotPeek to be able to decompile them. To display native processes, click Show Native Processes in the Process Explorer toolbar.

To display additional details about a process, for instance the .NET framework version used or MVID (which can be useful to double check that you are going to debug the right application), press F4, which should bring up a tool window displaying process properties.

dotPeek 1.2 Process Properties tool window

In case you’ve missed it, note that dotPeek 1.2 EAP can now work as a symbol server and supply Visual Studio debugger with the information required to debug assembly code. Download dotPeek 1.2 EAP and give it a try!

Posted in dotPeek Tips&Tricks, News and Events | Tagged , | 6 Comments

Heap Allocations Viewer plugin

When we first launched ReSharper 8 back in July last year, there was a very nice little feature hidden in internal mode. This was being used by the dev team to help keep an eye on performance. It was a little checkbox that said “Show allocations (delegates, closures, hidden allocations, boxing)”.

It added highlights to your code, as you type, indicating where memory is being allocated. It’s very eye-opening to see how frequently memory is allocated, and often in places you don’t expect. Take this call to a function with “params” arguments. Note the orange highlight, telling us that .net will allocate an array for the parameters, for each call:

Highlight hidden allocation of an array

But alas, features hidden in our internal mode are not very discoverable. So the developer behind the feature, Alexander Shvedov, made a few changes, fixed a few issues and pulled it out into an extension, which is now available to download from the extension manager, by searching for “heapview“. A stable version 0.9.1 is available, but you can always live on the edge by installing a prerelease version.

Extension Manager showing heapview extension

WARNING: Before we go any further, it must be said: BEWARE OF PREMATURE OPTIMISATION.

This plugin will tell you when an allocation is made, but it does not tell you how much memory is being allocated, or how frequently. It does not tell you if you have memory problems with your application. It simply shows where allocations are going to occur. You should still use a memory profiler such as dotMemory to measure your application and decide if your memory usage is causing you issues.

But of course, the plugin is still very useful. It’s always good to know what your code is doing, and having that visible can help you to make more informed design decisions. It’s also very helpful when you know you need to write code that should avoid excessive allocations. For example, the dev team use this to keep an eye on memory allocations made during performance critical code paths, such as as-you-type analysis. These code paths are called very frequently, every time you edit a document, and obviously need to be fast. Memory churn can have a huge impact on performance especially if it causes garbage collection, so even a small amount of allocations in these circumstances can have a knock on effect on performance.

And of course, it’s simply a very interesting view into what your code is doing!

So what does it highlight?

Obviously, it handles explicit allocation, with the new keyword:

Highlight explicit allocation

And it also supports the standard ReSharper functionality of changing the severity of the highlight (so you can disable it if you’re not interested in that particular highlight) and allows you to find all similar highlights in the project or solution.

Configure severity and find all in scope

Other heap allocations:

Example heap allocation highlights

The plugin will highlight allocations made due to boxing:

Highlight boxing allocations

Delegate creation, even listing local scope variables that are captured in closures:

Highlight delegate creation

It will even show you the points where the classes used to capture closure variables are created:

Highlight closure allocation

You can use this plugin in combination with a full profiler, such as dotMemory. Here’s an example where the plugin reports a possible enumerator allocation:

Possible enumerator allocation

When profiling this code using dotMemory and capturing memory allocations, we can see that the plugin is right: generic array enumerators are being allocated (of type SZArrayHelper+SZGenericArrayEnumerator<String>) and collected all the time, potentially triggering garbage collections! The backtrace also shows us that these allocations originate from Method1:

Allocations in dotMemory

Our code would be more performant and less heavy on memory traffic if we just enumerated the args array instead of casting it to an IEnumerable<string>.

As we can see, once you’ve found your allocation hotspots with the profiler, the plugin will make it obvious where in the method the memory is being allocated, whether it be explicitly, or through hidden allocations such as boxing or capturing closure information.

And of course, it’s Open Source. If you’re interested to see how it’s written, or want to report issues, or perhaps even contribute, please check it out.

Posted in ReSharper Tips&Tricks | Tagged , , , | 12 Comments

Live Webinar: Merging Refactored Code – ReSharper Meets SemanticMerge, June 17th

Join us Tuesday, June 17th, 16:00 – 17:00 PM CEST (9:00 – 10:00 AM EDT) for a free webinar, Merging Refactored Code: ReSharper Meets SemanticMerge.

In this webinar, Matt Ellis (JetBrains) will be hosting Pablo Santos (SemanticMerge) who will run through a number of refactoring examples, from the seemingly trivial (yet essential) to complex structure modification scenarios, and demonstrate how to do the refactor with ReSharper and later get it merged with Semantic.

Pablo will talk about the challenges of merging complex refactorings and demonstrate how SemanticMerge simplifies it, parsing the code into a syntax tree then reasoning about it as code rather than text (or text with heuristics), and merges accordingly.

If you’ve ever wanted to know more about ReSharper’s many refactoring capabilities and new tools designed to make merging complex refactorings a breeze, don’t miss this webinar.

Space is limited, please register now.

About the Presenter:

Pablo SantosPablo Santos is the founder at Codice Software, the company behind Plastic SCM and SemanticMerge. Codice started in 2005 and since then Pablo played different roles ranging from core engineering to marketing, business development, advertising and sales operations. Nowadays he plays a dual role as lead of software engineering and product manager for both Semantic and Plastic.

Posted in News and Events, ReSharper Tips&Tricks | Tagged , , , | 1 Comment

Webinar Recording and Q&A: High-Performance Computing with C++

The recording of our May 29th webinar with Dmitri Nesteruk, High-Performance Computing with C++, is now available on JetBrains YouTube Channel.

Languages such as JavaScript may receive a lot of hype nowadays, but for high-performance, close-to-the-metal computing, C++ is still king. This webinar takes you on a tour of the HPC universe, with a focus on parallelism; be it instruction-level (SIMD), data-level, task-based (multithreading, OpenMP), or cluster-based (MPI).

We also discuss how specific hardware can significantly accelerate computation by looking at two such technologies: NVIDIA CUDA and Intel Xeon Phi. (Some scarier tech such as FPGAs is also mentioned). The slides used in the webinar are available here.

We received plenty of questions during the webinar and we’d like to use this opportunity to highlight some of them here as well as those we didn’t have a chance to answer during the webinar. Please find the questions below with answers by presenter, Dmitri Nesteruk.

Q: Why are you not a fan of OpenCL?
A: I generally think that OpenCL is a great idea. The ability to run the same code on x86, GPU, FPGA and Xeon Phi is fantastic. However, this hinges on a very important assumption, namely that it is reasonable to run the same kind of computation on all these devices.

In practice this isn’t always the case. x86 and Xeon Phi are great for general-purpose code. GPGPU is restricted mainly to data-parallel numerics. FPGAs are an entirely different beasts – they are not as good for numerics, but excellent for structured data with a high degree of intrinsic parallelism that arises from the architecture that’s being used.

On the GPGPU side, where OpenCL is the principal way to program AMD graphics cards, I’d say NVIDIA won the war, at least for now. Whether or not they make better devices, their tools support and their marketing efforts have earned them the top spot. CUDA C is more concise than OpenCL, which is great, but of course you have to keep in mind that OpenCL tries to be compatible with all architectures, so you’ve got to expect it to be more verbose. However, programming OpenCL is like programming CUDA using the Driver API, and I’d much rather use the user-mode APIs of CUDA, together with the excellent libraries (including Thrust, where applicable).

Q: CUDA has to copy data to the devices RAM, right? At which magnitude of data do we gain something from CUDA?
A: Yes, and there’s actually a whole 5 (five!) types of memory on CUDA devices that can be used. The time delay in sending data to/from does prevent us from using CUDA for real-time purposes, but if you’re not trying to do that, then what you’re going to be worrying about is saturating the device with enough data to make the computations worthwhile. This depends first of all on how well your problem parallelizes, but assuming it does, the question becomes how much a single unit of work actually takes.

The logic here is simple: if a unit of work takes less than the time to send data to/from the device, consider keeping it on the CPU and vectorizing it if possible. If, however, you’ve got processes that take up more time, then it might make sense to do the calculations on the GPU. And keep in mind that the GPU is capable of supporting a form of task-based parallelism (streams), so if you’ve got distinct computation processes, you can try pipelining them onto the device.

The best way to tell if the GPU is the right solution or not is to write your algorithm and do a performance measurement.

Q: Can you elaborate on the protocol parsing?
A: Let’s look at a problem from a higher level. Any specific single computation problem you may have is likely to be faster when done in hardware than in software. But in practice, you’re unlikely to be designing an ASIC for each particular problem, because it’s very expensive, and because changing an ASIC is impossible once it’s designed and produce.

On the other hand, some problems do benefit from extra computation resources. For example, parsing data from a protocol such as FIX allows you to offload some of the work for, e.g., building order books from the CPU and also avoid paying the costs of data moving from your network card to RAM and back. These might seem like trivial costs, but given the kind of technology war that trading firms are engaged in, every microsecond helps!

It also so happens that FPGAs are excellent for parsing fixed data formats. This means that you can build an NIC that uses an FPGA to parse the data then and there, structure it, maybe even analyze it and send it back upstream faster than an ordinary server would. Plenty of commercial solutions exist for this, but designing your own is also a lot of fun. FPGAs also offer a lot of additional benefits that I’ve mentioned in the webinar, such as relative scalability (you can have 20 FPGAs on a single card, if you want).

Q: Would you recommend using the Intel Xeon Phi for complex simulation (like car simulator)?

A: This depends on what you’re simulating and whether the problem is actually decomposable into independently executing units of work. If you’re after agent-based modeling, then with the Xeon Phi you’re in luck, because it supports just about every parallel paradigm under the sun. You can use MPI or Pthreads or something else, and have the parts of the system interact with one another via messages.

It really depends on the specifics of the problem you’re trying to solve.

Q: As an alternative to inline assembly and instrinsics, you can ensure vectorization and use of SIMD instructions by giving hints to the compiler through pragmas. Why not mention that?

A: Indeed! In fact, one of the pain points of SIMD is that you have to abandon the normal ways of writing code. So having the compiler vectorize it for you is great – it leaves you to do other things. The same goes for OpenMP that follows exactly the same idea: mark a loop as parallel and the compiler will do its best to parallelize it.

The concern that I have with these processes are always a black box. The compiler can easily understand a simple for loop with no side effects, but problems aren’t always that simple, and when other variables and constructs get entangled, it is sometimes better to hand-craft critical paths rather than trusting the compiler to do a good job on rewriting them.

Q: In the domain you currently work, which HW alternative do you recommend using: FPGAs, GPGPUs, or anything else?
A: It’s ‘horses for courses.’ Quant finance falls into two categories: analysis when you sit at your desk and try to calibrate a model, and execution, which is when your co-located servers trade on some market or other. The rules of the game are drastically different.

For analysis, anything goes, because you’re not time-constrained by a fast-moving market. The more computational power you have, the faster you’re going to get your ideas validated. Numeric simulations (e.g., Monte-Carlo) fit nicely on GPUs and are especially great when you can allow yourself to drop to single-precision. Xeon Phi’s are general-purpose workhorses, anything can run on them, but they are a relatively new technology that I’m still getting the grasp of.

When it comes to execution, it’s all about speed. An arbitrage opportunity comes in and disappears in the blink of an eye, and thus FPGA-based hardware can help you capture it before others react. Having great hardware is not enough if you’ve got a huge ping to the exchange, obviously. I haven’t tried GPUs and FPGAs on the execution side: I suspect they might be relevant, but it’s something I haven’t investigated yet.

Keep in mind that not all trading is high-frequency trading. Some firms run things on MATLAB which, while not being as fast as hand-tuned C++ with SIMD instructions, provides sufficient execution speed coupled with a vast array of built-in math libraries meaning you don’t have to roll your own. Other quant institutions, including some of the major ones, survive just fine running their trades off Excel spreadsheets, possibly the slowest computation mechanism.

Q: You are talking a lot about FPGA. I don’t quite understand what FPGA has to do with C++ programming

A: It’s fair to say that FPGAs are typically programmed using Hardware Description Languages (HDLs) such as VHDL and Verilog. However, an interesting trend is the support for OpenCL on FPGAs. Hopefully the relationship between OpenCL and C++ is self-evident. Plus, since we’re discussing HPC technologies, FPGAs definitely deserve a mention.

Q: Does ReSharper C++ support CUDA? / Will the IDE support Embedded Systems?
A: The support for CUDA as well as the Intel C++ compiler is on the top of my personal wish list, and I’ve been pestering the ReSharper developers with related issues. While I hesitate to make any promises, I don’t imagine CUDA support is that difficult considering there’s only one language extension (the triple-angled chevrons) and the rest is fairly normal C++ that should be parseable straight away. I could be wrong of course.

Regarding the C++ IDE and its support for embedded: if you’re after cross-compilation, it will work right now, though of course the IDE will not parse compiler output. As for supporting specific libraries – we’ll consider those after the initial release which, as you may have guessed, will target the general platform rather than any particular device.

The future direction of the C++ IDE will largely depend on user demand. At the moment, it’s difficult to predict where the product will head. So keep posting and voting for feature requests!

About the Presenter

Dmitri NesterukDmitri Nesteruk is a developer, speaker, podcaster and a technical evangelist for JetBrains. His interests lie in software development and integration practices in the areas of computation, quantitative finance and algorithmic trading. He is an instructor of an entry-level course in Quantitative Finance. His technological interests include C#, F# and C++ programming as well as high-performance computing using technologies such as CUDA. He has been a C# MVP since 2009.

Posted in How-To's, Links and Opinions | Tagged , , , , , , , , , | Leave a comment