Blogroll Features FYI How-To's Tips & Tricks

TeamCity Take on Build Pipelines

Although it is possible to setup  build pipelines in TeamCity, you will not find the term “build pipeline” in the TeamCity web interface or even in our documentation. The reason is that we have our own term: “build chain“, which we feel describes TeamCity’s build-pipeline-related features better.

First of all, while the word “pipeline” does not impose sequential execution, it still implies it. At the same time, in TeamCity a build chain is a DAG – directed acyclic graph, and as such a traditional build pipeline is just a special case of the TeamCity build chain. But there are other differences as well.

Source code consistency

Source code consistency implies that all builds of a build pipeline will use the same source code revision. Traditionally, when build pipelines are described, not much attention is paid to this problem. After all, if someone needs consistency, it can be achieved by sharing data between builds using a file system or by tagging the source code in the repository and then fetching the same tag in all builds.

But from our experience in the majority of cases users need consistency for all of the builds participating in the pipeline. For instance, if you build parts of your project on different machines in parallel and then combine them in the last build, you must be sure that all of the parts have used the same source code. Actually, any time when you need to combine results of execution of several parallel builds, you need to be sure all these builds have used the same source code.

You can spend time and achieve this consistency somehow via build scripts, but since many developers face the same problem, obviously this is the area where CI tools should help. TeamCity provides source code consistency for all builds in the chain automatically. It does it even if the build chain uses different repositories and even if the builds use repositories of different types – Git, HG, SVN, Perforce, TFS.

Triggering

We also thought that traditional approach to triggering – when the next build of the pipeline is triggered after the previous one is finished –  is limiting, especially in terms of optimizations that CI system are capable of. If the whole build pipeline is known from the beginning, then it is possible to reduce build time by making some optimizations. But if the builds in the pipeline are triggered one by one, then it will always take the same time to finish completely.

For instance, say we have a pipeline C -> B -> A, where C is the first build and A is the last:

cba_chain

If we trigger the whole pipeline at once, all builds (C, B and A) will be added into the queue and, if the system already has a finished build which used the same source code revision as build C will, the build in the queue can be substituted with such finished build reducing the total build time of the pipeline. This is exactly what TeamCity does for all build chains while they sit in the queue.

But if we trigger the builds one by one, i.e C is the first, then upon finishing C we start B, and so on, then there are much fewer opportunities for a CI system to optimize something. If something triggered C, the CI system must obey it , because this is how the user configured it. Even if there are finished builds identical to C (run on the same sources revision), at this point  the CI system cannot avoid running C, as upon finishing C, build B and eventually A must be triggered as well. If we decide to not run C,  we will not run the whole pipeline, which is definitely not what the user expects.

So this is the reason why we always recommend to avoid triggering builds one by one. If the build chain is fully automated (which is what we all try to achieve these days), then ideally there should be only one trigger – at the very last build of the chain (in terms of TeamCity – the top build, build A in the example above), which will trigger all the builds down the pipeline. Fortunately, TeamCity triggers are quite advanced, you can configure them to trigger the whole chain if a change was detected in a part of it. This has an important benefit as the set of triggers that you need to maintain can be drastically decreased.

According to the data collected from our own build server, due to this and other optimizations performed in the build queue, TeamCity greatly reduces the amount of work performed by agents daily:
queue_stats

Note that our server produces about 2500-3000 build hours per day, so if there were no optimizations like this, we’d have to add more agents, or our builds would be delayed significantly.

Data sharing

Another important aspect is how you pass data from build to build in a build chain. Obviously you can use artifacts for this task: the previous build produces artifacts and the next one uses them as input. But note that in many cases you cannot rely on the fact that the next build will be executed on the same machine as the previous one. Even if it does, a few other builds could be executed on this machine and they could remove the results that you wanted to share. Fortunately, publishing artifacts to TeamCity and then using artifact dependencies to retrieve them solves all these problems.

Besides,  in TeamCity you can also share parameters between builds. All builds in TeamCity publish their parameters upon finishing: system properties, environment variables, the parameters set in their build configuration at the moment of the build start, the parameters produced by the build with the help of service messages, etc. And all of them can be used in the subsequent builds of the build chain. For instance, this makes it possible to use the same build number for the whole build chain.

Find out more on sharing parameters.

Build pipelines view

The traditional approach to build pipelines implies that there is a dashboard showing all pipelines and changes which triggered them. Given that some of our customers have build chains consisting of hundreds of builds (we’ve seen chains with up to 400 builds) and in large projects there can be hundreds or even thousands of source code changes per day, it is obvious that a simple dashboard will not work.

TeamCity has the Build Chains tab on the build configuration level displaying an aggregated view of all of the chains where builds of this build configuration participated. It can surely be used as a dashboard to some extent, but with large build chains it quickly becomes unusable.

build_chains

Fortunately in TeamCity each build of the chain also shows the state of all of the builds it depends on. So by opening the build results of the last build in the chain (the top build) you can see the state of the whole chain: which builds are running, which builds failed, which artifacts were produced by each build, which tests failed for the whole chain, etc.

build_results_deps

Summary

Hopefully this article sheds some lights on to how build pipelines can be configured and used in TeamCity and why they work this way. To sum up:

  • a build pipeline in TeamCity is called a build chain
  • source code synchronization in TeamCity comes for free, you don’t need to do anything to achieve it
  • the data from build to build can be passed with the help of artifacts or parameters
  • if possible, avoid one-by-one triggering: the bigger the parts you trigger, the better
  • monitor the build chain results using the results page of the last build in the chain or using the Build Chains tab

banner_blog@2x

image description