C++ Annotated May 2022: C++23 News, Optimizing C++ Apps, Pointers and Memset Tricks, and Tooling News
We are back today with all of the May news for you in our latest monthly C++ Annotated digest!
- Language news
- LWG Update
- P1774R8 – “Portable assumptions”
- P1021R6 – “Filling holes in Class Template Argument Deduction”
- P2590R1 – “Explicit lifetime management”
- P2581R0 – “Specifying the interoperability of Binary Module Interface Files”
- P2593R0 – “Allowing static_assert(false)
- P2587R0 – “to_string or not to string”
- P2429R0 – “Concepts Error Messages for Humans”
- Conferences
- Learning
- Tooling
- And finally, why are you a C/C++ developer?
Language news
LWG Update
There are a number of highly awaited library features that have been design-approved for C++23, but the wording is not finalized yet. These are as follows, in decreasing order of priority: std::generator
(an essential library utility for using coroutines), std::flat_set
and std::flat_map
(cache-friendly associative containers), and std::mdspan (a multi-dimensional non-owning array). Currently, LWG is working at full speed on finalizing the wording of these and getting them into the C++23 draft, but time is running out. The next WG21 plenary on July 25, 2022, is the last one where we can vote features into the C++23 draft. As a result, we might not get all of these in the next standard.
P1774R8 – “Portable assumptions”
A lot of work has gone into this paper since the last time we discussed it. The wording is now finalized. The paper is going up for vote at the next WG21 plenary and hopefully will be voted into the C++23 draft. One interesting question came up during wording review. We discovered that, because an assumed expression is odr-used, it can trigger template instantiations and lambda captures, which in turn can change the ABI. Consider:
constexpr auto f(int i) { return sizeof( [=] { [[assume(i == 0)]]; } ); } struct X { char data[f(0)]; };
On a platform where sizeof(int) == 4
, sizeof(X)
will be 4 with the assumption, but 1 if the assumption is removed. Now, this isn’t anything new: The assert macro can change the ABI, as can another standard attribute: [[no_unique_address]]
. But the paper still went back to EWG in order to confirm that this is the intended design. EWG decided to leave this design as-is. If you write an assumption that triggers a lambda capture that would otherwise not be triggered, your code is really weird indeed, and you probably have bigger problems than the ABI change triggered by the assumption.
P1021R6 – “Filling holes in Class Template Argument Deduction”
This paper originally proposed three extensions to Class Template Argument Deduction (CTAD): CTAD for alias templates, CTAD for aggregates, and CTAD from inherited constructors. The former two made it into C++20 (and have proven to be very useful), while the last one was postponed to C++23 because we did not get the wording done in time. We have finalized the wording now (see P2582R1 – “Wording for CTAD from inherited constructors”) and it is going up for vote at the next WG21 plenary.
P2590R1 – “Explicit lifetime management”
This is another paper picking up work that only partially made it into C++20. The original paper in this case is P0593R6 – “Implicit creation of low-level objects for low-level manipulations”. The core language part of the paper made it into C++20. Since then, it is no longer undefined behavior to write code like this:
struct X { int a, b; }; X* make_x() { X* p = (X*)malloc(sizeof(struct X)); p->a = 1; p->b = 2; return p; }
Before C++20, this was undefined behavior, because the pointer to X was pointing to an object that was never created (i.e. its lifetime never started and the constructor for X was never called). As of P0593, which shipped with C++20, this is fine, because malloc is one of several special functions that can implicitly start the lifetime of X. However, what if the bytes representing X come not from a “blessed” function like malloc, but from the disk or the network? Any attempt to access those bytes as if they were an object of type X (whether through reinterpret_cast, std::launder
, or anything else) still results in undefined behavior:
void process(Stream* stream) { std::unique_ptr<char[]> buffer = stream->read(); if (buffer[0] == FOO) { processFoo(reinterpret_cast<Foo*>(buffer.get())); // UB } else { processBar(reinterpret_cast<Bar*>(buffer.get())); // UB } }
P2590 proposes a new facility std::start_lifetime_as
to fix this case. As with many papers currently in flight, it is in principle on track for inclusion in C++23, but this is contingent on the wording being finished in time for plenary.
P2581R0 – “Specifying the interoperability of Binary Module Interface Files”
Continuing our theme of “Modules aren’t ready yet”, here’s another new paper from Daniel Ruoso, who has been driving the exploration of what can be done in the standard to enable a viable tooling ecosystem for C++ Modules. Binary Module Interface files (BMIs) are what we call the output of building a module that specifies its interface. You can think of it as the “precompiled header” of the modules world and, indeed, some compilers do reuse their PCH format for this file. And that’s part of the problem! The BMI format is not specified by the standard – and that’s by design (although not universally thought to be a good decision). So the format of these files is left to the compilers and may vary, not just from compiler to compiler, but between compiler versions – and even the same compiler version with different flags! Whether this is a good idea or not, we’ll move on from, for now. The problem that this paper is trying to solve is: How do we know if a given BMI file is compatible with the current compiler (with its current flags)? Leaving that to be implementation-defined, too, causes a chicken-and-egg problem. So it seems appropriate to specify a metadata format that all compilers should follow that encodes this information.
Whether a common metadata format for BMIs should exist at all is the subject of a different paper. This one is just proposing the specific metadata for interoperability purposes. It gets into the tricky details of what things (such as which flags) should play a part in the way BMI compatibility is encoded – as well as opening up the interesting possibility that a compiler could emit and/or consume BMIs in multiple formats – perhaps giving us a way to, later, specify an optional common BMI format after all!
But let’s not get ahead of ourselves. One step at a time. I’m now hoping that Modules will be a C++26 feature.
P2593R0 – “Allowing static_assert(false)
This new paper proposes to fix an annoying issue in C++. Consider this fairly unremarkable code:
template void do_something(T t) { if (is_widget(t)) { use_widget(t); } else if (is_gadget(t)) { use_gadget(t); } else { assert(false); } }
This compiles and works as intended. Now, what if we wanted to mark do_something
as constexpr
? It is often a good idea to do the work at compile time instead of at runtime:
template void do_something(T t) { if constexpr (is_widget) { use_widget(t); } else if constexpr (is_gadget) { use_gadget(t); } else { static_assert(false); } }
However, surprisingly enough, the code no longer compiles, since the static_assert(false)
makes the program ill-formed, even if the if constexpr
branch in question is known at compile time to not be taken, or even if the template in question is never instantiated. Even worse, according to the standard it makes the program ill-formed, no diagnostic required (in practice, all compilers error out). The solution proposed in this paper is basically to delay checking static_assert
declarations until the template (or appropriate specialization or constexpr if substatement thereof) is actually instantiated. This looks like a sound solution that will improve the developer experience, so let’s hope the paper will eventually make it into the standard!
P2587R0 – “to_string or not to string”
We’ve had to_string
since C++11. As a simple way of converting a numeric type to a string, it was “good enough” for many purposes – especially compared to the old idiom of streaming in and out of a stringstream
!
However, because it was defined in terms of sprintf
, it is inconsistent with the (still idiomatic) formatting in iostreams – and, by extension, with std::format
, too. In particular it’s sensitive to the C locale settings, and not C++ locales. There are other issues, too, relating to limited range, for example.
So this paper proposes redefining to_string
in terms of std::format
instead. That solves all those issues, although it is, technically, a breaking change! The author, Victor Zverovich (who brought us std::format
in the first place), believes that any code it breaks was probably broken already – but this is a possible sticking point for an otherwise no-brainer proposal.
P2429R0 – “Concepts Error Messages for Humans”
Concepts had been eagerly awaited for at least 15 years before finally showing up in C++20! One reason why they were so highly anticipated was that we would get better compiler error messages and finally put an end to the situation where we scroll through several pages of a single error message only to find that it relates to a remote effect, rather than the cause itself!
Now, good use of concepts does help here. But, now that we’ve had some real-world experience with them, we can see there is more work to do on improving the state of the art in error messages. Rather than making any concrete language-level proposals, this paper surveys the current state and how it compares to some other modern languages that do a good job with error reporting. It then draws some conclusions about directions we can go in to improve our errors – mostly as quality-of-implementation issues.
Some examples include:
- Only reporting errors in the current project – stopping at the boundary to library code (or at least standard library code), so we know where our code is failing.
- Changing the (natural) language we use in order to make it more friendly and user-focused, and even translating some concept names into more natural language – or at least simplifying std lib names down to things we recognize. It even proposes referring out to external sources for more information or educational resources.
- Considering output formats that are more colorful or laid out in visually clearer ways, or even using structured text formats like JSON or SARIF (which is designed for structured errors).
Maybe we’ll never reach the level of Elm – a language you can practically learn through error messages and warnings alone. But, aspirationally, we can still look to Elm to see how much room for improvement remains.
C++Now 2022 trip report and early access to the videos
The C++Now community is pushing C++ to its limits. They discuss the current capabilities of C++ while also looking ahead at the possibilities of the C++ of tomorrow.
From May 1 to May 6, 2022, C++ experts from around the world gathered at the C++Now conference in Aspen, Colorado, USA. Timur Doumler joined the event this year as a speaker and attendee. His trip report is now published in our blog.
Dave Abrahams opened the conference with the keynotes dedicated to generic programming and value semantics. He didn’t focus on C++, but rather on general ideas. Dave has been working on the Swift language for a long time, so he can analyze the way such key language ideas affect language evolution.
If you are interested in the talk but missed the event, you’ll be glad to learn that JetBrains, as a C++Now 2022 video sponsor, is happy to offer you free early access to C++Now 2022 conference recordings. All of the videos will later be published on the conference YouTube channel, but you can start watching them right now, as we are rolling them out one by one.
Down with pointers
In the new blog post, Andreas Fertig raises the question of whether C++ developers still need pointers. He starts with the observation that the check for nullptr
is quite often omitted. As a result, the assumption is made that the parameter is a non-null object.
References are a good alternative to pointers. And even when an API requires pointers, you can dereference and then use references right after the step where a check for nullptr
is run. This also allows you to clean the API later.
If the semantics assumes that the parameter is optional, then it’s good to use std::optional
, not pointers. And for arrays with different array sizes passed as parameters, C++20 brings std::span
.
As the author mentions, these tricks make the API cleaner and more expressive than when only pointers are used.
Thoughts about getters and setters
This article is about getters and setters and whether you need to write them all. If you have a structure with several fields, it can perform 30% faster than the version with getters. This is because of the copy operation made by the getter. Switching to returning a value by reference instead of a copy helps with performance but can lead to a dangling reference. To fix the breach, you can write 2 getters – one for lvalue and one for rvalue, even though this makes the code a lot heavier and less readable.
The same applies to setters. You need to write 2 of them to achieve better performance.
The last part of the article discusses questions of immutability. The overall conclusion of the article is that you either need to write a lot of boilerplate code, or you just use the structure members directly, without any getters or setters.
Mysterious memset
The article takes a very simple code sample and looks at the optimizations done by the compiler. It sets the first several leading characters in the string to zero. In the basic case with std::string
, the compiler optimizes the code and only adds an extra jump for the case when the part of the string you need to update has zero length.
Meanwhile, changing std::string
to std::u8string
has a dramatic effect. The compiler now uses the memset
instead of a loop. A pointer to char8_t
can’t point to int
, while char*
can. So the compiler can now be sure that updating the string won’t affect the length passed as a pointer to an integer and can now rely on the memset
function.
This small example shows how char8_t
, std::u8string
, and std::u8string_view
can be safer, as they are guaranteed not to alias objects of different types.
Testing 3 approaches for optimizing the performance of C++ apps
We all know that C++ is great if you need to optimize app performance. But do we always use the full potential of the language? Let’s put aside optimizing the code and algorithms that we use, and talk about the compiler instead. What compiler optimizations do you often rely on? In this article, our colleagues from the CLion team at JetBrains share their experiment with three approaches: link-time optimization, profile-guided optimization (PGO), and Unity (Jumbo) builds.
The sample project they are using is a Clang (LLVM) repository. The tests are done with MSVC, MinGW GCC, Clang-Cl, and Clang from MinGW.
The results are collected in a table for easy comparison. Here are the main findings:
- LTO provides a performance boost for all four compilers.
- PGO is tricky and sometimes may even decrease performance if you don’t have enough training data or if the data you do have isn’t very good. But if the data is carefully prepared, the boost can be as large as 40%!
- Unity builds change the TU size and may affect compiler optimizations. The team compared them separately and found that, in the end, they didn’t significantly affect the runtime performance.
Recursive variants and boxes
In the new blog post, Jonathan Müller discusses approaches to constructing a recursive variant in C++. std::variant
is a way to handle sum types in C++, but it does not handle recursion very well. The problem is that it needs to know the size of the type, which requires a definition, and forward declaration doesn’t work.
The solution is to use heap allocation and pointers. Jonathan presents a sample where objects are allocated on the heap and wrapped with std::unique_ptr
. So the std::variant
works correctly, as the needed type size is calculated based on the pointer size. Instead of providing storage for infinite nesting, only as much memory is allocated as actually needed for a particular expression.
However, the author shows that std::unique_ptr
is good from the implementation perspective, but not the interface. The suggested approach lacks the value semantics, and the constness no longer propagates. To address this, a solution inspired by the Rust language is suggested. box<T>
is a new type that can store a heap-allocated T
but otherwise behaves like T
. Technically, it’s a wrapper over std::unique_ptr
with a few other functions implemented. Would you like to have such types as box<T>
in the standard library?
The Developer Ecosystem 2022 survey is running now!
Every year JetBrains polls thousands of developers from all around the globe and with various backgrounds, trying to capture a snapshot of the development ecosystem. We clean, anonymize, and process the data, and then present our findings to the public. We also share the raw survey data so that everyone can build their own slices on top.
Do you remember the C++ trends from 2021? 42% of C++ developers were using C++17, with many of them planning to migrate to C++17 and C++20. 30% did not rely on any code analysis tool. CMake was the dominant project model.
We are eager to learn how these trends are changing in 2022. That’s why we’ve launched the new installment of our survey! Like in previous years, we will publish the results and insights publicly. In addition, you can opt to get a personalized infographic to visually represent how you compare to other developers. And of course, we’ll raffle off some valuable prizes!
CLion 2022.2 EAP is ongoing
CLion has launched its Early Access Program for the new v2022.2. A few big enhancements are already available for you to try.
- We all agree that colorized compiler output helps us deal with compilation errors much faster. Color-coding quickly shows us issues in the compilation or important warnings. That’s why the CLion team has enhanced CMake by adding a few missing controls and then implementing a way to enable colorized output for the Ninja generator in CMake by default.
- A long-awaited CMake options editor (aka CMakeCache editor) was added to CLion. You still can open CMakeCache.txt file in the editor or edit options via a command-line field in CMake Profile. But now you can also review all the configured options in a table in the CMake Profile settings, search for options and values, and update values right in the table.
- The most interesting enhancement delivered in the 2022.2 EAP builds is Interval Analysis. The main idea is for every integral variable to calculate the upper and lower bounds of the possible values, and then use this information in the data flow analysis. This analysis enables support for comparisons (<, >, <=, >=) of integral types to help with assumptions like unreachable code, constant conditions, and checks like Array index is out of bounds. The new analysis has been implemented in Clang as part of the bigger Data Flow Analysis in CLion and, based on measurements carried out by the team, it doesn’t regress the overall analysis performance in any significant way.
Webinar on remote C++ development with CLion
The CLion team is hosting a free webinar about remote development on Tuesday, July 26, 2022. This practice is gaining in popularity and becoming widely adopted among developers worldwide, aided in no small part by the pandemic.
In the webinar, JetBrains developer advocates Matt Ellis and Timur Doumler will demonstrate 5 different ways to perform remote development in C++ using CLion:
- The “thin client” approach using JetBrains Gateway.
- Remote development over SSH with local sources.
- CLion’s custom toolchains for developing on WSL and Docker containers.
- Remote debugging, and making an LED blink with Raspberry Pi.
- Collaborating remotely using Code With Me.
Registration is now open. Sign up to get all the links and reminders close to the webinar date.
Catch2 3.0 final release
While Google Test is still one of the most widely used testing frameworks in C++, Catch has been catching up pretty quickly! Part of its success is modern C++ and the header-only approach that was taken, which made it very easy for users to get started with unit testing in their projects. And even though Google Test is more feature-rich, the complexity of dealing with the framework is definitely a disadvantage.
The biggest change in Catch2 3.0 is that it now uses a statically compiled library as its distribution model. The release notes try to explain the change. The main point is that Catch2 is now becoming more profound and adding more advanced features, which requires a different approach to distribution. For example, when Catch2 was distributed in a single header, adding a new Matcher would cause overhead for everyone but was useful only to a subset of users.
The authors help with the migration docs for people coming from v2.x.x versions to the
v3 releases. There is also a catch_all.hpp header file, which is an explicit way to migrate and test Catch2 features. However, note that this file is very big, so it will cause significant compile time degradation.
For those who like simple header-only architecture and basic features, there’s doctest, which is a re-implementation of Catch2 (v2) but with improved performance and a couple of other convenience features.
Among other notable improvements, Catch2 now uses C++14 as the minimum supported language version.
Clang Power Tools update
Clang Power Tools is a free extension for Visual Studio that brings Clang tools (clang++, clang-tidy and clang-format) to the users. Among the two big changes delivered in the recent releases are:
- Migration to LLVM 14.0. This includes several fixes specific to Windows C++ development, like support for on-demand initialization of TLS variables, improved code generation for ARM, and several compatibility changes.
- Generating HTML, Markdown, and YAML documentation based on source code and comments.
And finally, why are you a C/C++ developer?
For our And finally section, we took this discussion from Reddit. Think about your answer before diving into the thread. What is there in C and/or C++ that seems important to you?
While the variety of answers in the thread is great, there are also quite a few things about it that aren’t so surprising. For instance, the thread is full of embedded and game developers. And a career (or simply an interest) in either of those areas is something that always pushes people toward C++.
The language-related set of reasons include, of course, the performance of C++ apps, powerful abstractions, the standardization aspect, the multi-paradigm aspect, and cross-platform portability. The fact that C++ is just one step above Assembly also attracts many developers to the language.
Some respondents said they found the C++ community to be mature, which I think is a really cool answer. Others pointed out that there is always a demand for C++ developers on the market.
I’ll wrap this up with my personal list of favorites from the Reddit thread:
- “I can only assume it’s penance for something terrible I did in a past life.”
- “I ❤️ segfault.”
- “I thought that I would be superior by using a language with pointers.”