Compile release binaries with -ftrivial-auto-var-init=zero

MarcoFalke commented at 6:55 pm on May 5, 2020: member

We make heavy use of sanitizers and memory checkers to catch memory related issues that are inherent to the C++ programming language as early as possible.

One example is uninitialized reads. They come in many forms, but one of them is

0int foo;
1if (!Parse("-1", foo)) return
2std::cout << foo << std::endl;  // Can be uninitialized read, depending on implementation of Parse

Sometimes programmers initialize foo to a constant like 0 or -1, but such practice is defeating the whole purpose of memory sanitizers. That is, for a memory sanitizer it is now impossible to detect an uninitialized read.

However, I suspect that no one is running with sanitizers enabled in production and it would be nice to not expose production systems to undefined behaviour. So I was wondering if anyone would object compiling and shipping the release binaries with -ftrivial-auto-var-init=zero?

MarcoFalke added the label Feature on May 5, 2020

MarcoFalke added the label Brainstorming on May 5, 2020

MarcoFalke added the label Build system on May 5, 2020

dongcarl commented at 7:04 pm on May 5, 2020: member

Relevant context from Microsoft using this in production: http://lists.llvm.org/pipermail/cfe-dev/2020-April/065244.html

sipa commented at 7:08 pm on May 5, 2020: member

Doing so would hurt the usefulness of running valgrind on release binaries, which afaik is the only way to test for these kinds of issues being present in unmodified executables that will actually run in production.

ryanofsky commented at 7:16 pm on May 5, 2020: member

Carl’s email link mentioned “pattern initialization” which seems like an appealing alternative here. Googling it turned up this description: https://reviews.llvm.org/D54604 “[Pattern initialization] is the recommended initialization approach. Pattern initialization’s goal is to initialize automatic variables with values which will likely transform logic bugs into crashes down the line, are easily recognizable in a crash dump, without being values which programmers can rely on for useful program semantics. At the same time, pattern initialization tries to generate code which will optimize well.”

practicalswift commented at 9:19 pm on May 5, 2020: contributor

I think this is a trade-off between “cure” and “side effect of cure”.

Assuming the presence of an uninitialized read in our code base: what is the worst-case scenario for our users of a.) using -ftrivial-auto-var-init=zero and b.) not using -ftrivial-auto-var-init=zero in production builds?

My take:

The disease we are fighting is uninitialized reads which may be exploited by an attacker to read memory content to bypass ASLR or leak secrets. (Potentially disastrous for our users.)

A cure (-ftrivial-auto-var-init=zero) has been presented.

Using it means that the disease is cured (no more reading of uninitialized memory to bypass ASLR or leak secrets), but it comes with a nasty side effect (valgrind can no longer be used to detect uninitialized reads in production binaries).

How do we handle this trade-off? Personally I share the view of the security engineers in the Chrome, Android, Windows, iOS and macOS teams who have all started using the equivalent of -ftrivial-auto-var-init=zero in some form for their production builds: the cure is worth using despite said unfortunate side-effect :)

Am I missing any negative side effects, or is the description of -ftrivial-auto-var-init=zero as a “cure” unfair in some way? WDYT? Discussion is healthy :)

MarcoFalke commented at 9:39 pm on May 5, 2020: member

I think both use cases are valid (1) wanting to run valgrind on release builds and (2) wanting to run the release build with auto-init. However they are mutually exclusive, as mentioned by me and also sipa.

A solution would be to offer both builds, but I’d say that puts too much burden on the user to pick the right build. Maybe the “valgrind release” should not be offered as prominently as the auto-init release?

sipa commented at 9:42 pm on May 5, 2020: member

@MarcoFalke That defeats the purpose - if users aren’t going to run the same release binaries as the optimally-valgrindable ones.

I’m not saying either is preferable - I see both sides too - but there is a tradeoff here.

MarcoFalke commented at 10:03 pm on May 5, 2020: member

Ok, in that case I can’t follow what the purpose is. I see those use cases:

A developer running any version of Bitcoin Core in valgrind: Motivation is to detect and fix bugs.
A user running a release of Bitcoin Core in valgrind: Motivation is to crash the node before any undefined behaviour could happen in production. Motivation could also be to test the code in valgrind and then run in production without valgrind.
A user running a release of Bitcoin Core with auto-init: Motivation is to avoid undefined behavior, but opt in to some unknown defined behaviour. This could be an option where the user does not have access to valgrind or similar.
A user running a release of Bitcoin Core as it is shipped today (without ever running it in valgrind): With the alternatives presented above, I don’t see why this would be preferable.

MarcoFalke commented at 10:10 pm on May 5, 2020: member

Is your point that some users running in valgrind are going to make it safer for users not running in valgrind, because they might report bugs they find? If yes, we should make sure that at least one person is actually doing that and actually reporting bugs they find.

sipa commented at 10:39 pm on May 5, 2020: member

My point is that discoverability of some bugs depends on the exact compilation. Changing the compilation flags changes the binary, which may make some bugs visible or invisible. This includes -ftrivial-auto-var-init=zero, which may result in bugs being present in the binary that aren’t present in a binary without (but won’t crash or trigger any other easily detectable sign). People running valgrind on a valgrind-optimized binary without zero initialization may not discover it because it’s not present in their binary, and people running valgrind on the zero-initialized version may not discover it because valgrind’s ability to detect issues is reduced in those. Just in general, the advantage of valgrind over other analysis tools is that has the ability to detect issues present in the exact binaries we release - introducing a separate valgrind-optimized version removes that advantage.

That said, again, I also see the advantage of reducing risk by initializing all variables even if it comes with a reduction in ability to detect issues.

sipa commented at 10:40 pm on May 5, 2020: member

It seems gcc doesn’t actually have an -ftrivial-auto-var-init option? So this discussion seems moot, at least for Linux/Windows binaries?

fanquake commented at 11:36 pm on May 5, 2020: member

It seems gcc doesn’t actually have an -ftrivial-auto-var-init option? So this discussion seems moot, at least for Linux/Windows binaries?

The version of Clang we use for macOS builds (6.0.1) wont support this either, as it wasn’t introduced into Clang until 8.0.0.

MarcoFalke commented at 11:32 am on May 6, 2020: member

people running valgrind on the zero-initialized version may not discover it because valgrind’s ability to detect issues is reduced in those

Hopefully you’d run the same set of tests that you’d run when you use valgrind. If neither configuration can detect the bug, then the set of tests is broken or incomplete. Assuming the set of tests is “complete”, but the bug is impossible to observe, then the bug does not exist. :thinking:

practicalswift commented at 3:43 pm on May 6, 2020: contributor

If yes, we should make sure that at least one person is actually doing that and actually reporting bugs they find.

Do we even ship the Valgrind suppressions file with our release binaries?

Valgrind is largely meaningless without project specific suppressions files. Since I added valgrind.supp to our repo back in 2017 the interest in keeping it up to date has been very close to zero:

https://github.com/bitcoin/bitcoin/commits/52ce396b2a47ed1c5d7d5764bc98a09451001749/contrib/valgrind.supp

It seems like I’m a relatively heavy valgrind user in this project and for me personally I run valgrind (and MSAN!) on binaries compiled from scratch and in such cases I would simply just not opt in to -ftrivial-auto-var-init=zero when doing so.

But enough of my needs as a developer: as an individual Bitcoin Core user with funds in my wallet I cannot see why I wouldn’t choose binaries with the added protection that -ftrivial-auto-var-init=zero brings if I were presented with the choice.

Is the idea that I should sacrifice my individual security for the “common good”? :)

MarcoFalke commented at 4:39 pm on May 6, 2020: member

Do we even ship the Valgrind suppressions file with our release binaries?

Bitcoin Core 0.20.0 should include them, yes. (Haven’t checked)

practicalswift commented at 4:44 am on May 14, 2020: contributor

It is always interesting to see how others are choosing to tackle the same type of problems we are facing:

Yesterday Joe Bialek from the Microsoft Security Response Center’s Vulnerability & Mitigations team posted this excellent article on what Microsoft is doing to eliminate uninitialized stack memory vulnerabilities from Windows using the equivalent of -ftrivial-auto-var-init=zero (InitAll).

The results are pretty remarkable: an entire security bug class which has historically accounted for roughly 5-10% of Microsoft’s CVE:s is effectively killed from a security perspective through this technique.

Some interesting quotes:

There are really two distinct classes of vulnerabilities here:

Uninitialized Memory Disclosure – Uninitialized memory is copied across a trust boundary and its contents are disclosed to a less privileged entity.

Uninitialized Memory Use – Uninitialized memory is directly used. For example, an uninitialized pointer is written through.

It’s also important to realize that uninitialized memory problems occur for both stack allocations and heap allocations. This blog is focusing on stack memory and a follow up blog will address heap memory.

In recent years, uninitialized memory use bugs have been trending up. This is likely partially attributed to more researchers being interested in this vulnerability class and writing great tools to help identify these issues.

Between 2017 and 2018, uninitialized memory vulnerabilities have accounted for roughly 5-10% of the CVE’s issued by Microsoft.

There is a near equal split between stack based vulnerabilities and heap/pool based vulnerabilities.

Uninitialized memory disclosures outweigh uninitialized memory use bugs.

A zero initialization results in:

NULL pointer, will throw an SEH exception if you dereference it on Windows (i.e., denial of service at worst instead of remote code execution) which will typically crash the program.

A zero size, or zero index, if that is what the variable is being used to track. This is expected to minimize the impact of an uninitialized size being passed to a function like memcpy that operates on a buffer based on the size passed.

A zero pointer, when tested in a NULL check, will take the path of “the pointer is NULL” and not attempt to use the pointer. This at least gives the program the chance of correctly handling the developer forgetting to initialize the pointer (since following a pattern-initialization pointer will always crash).

A zero Boolean is false which may indicate “failure”.

Our peers at Google have done some measurements and proven that on Clang, zero initialization is currently measurably better in code size and runtime performance than pattern initialization.

This exposes another benefit of zero initialization: More deterministic results. The initialization cost doesn’t depend on if a particular global variable is in the L1 cache, L2 cache, L3 cache, paged out, etc.

Windows 10 1903 was the first version of Windows to ship with InitAll enabled (shipped Spring 2019). We haven’t had any performance complaints arise related to InitAll since shipping it.

Since shipping InitAll to the world, we’ve had multiple vulnerability reports submitted to MSRC that did not reproduce on the latest versions of Windows due to InitAll. This effectively downgrades the vulnerabilities from “security bugs” to “code defect that currently has no negative impacts”. This means we no longer need to ship a security update for the in-market operating systems that have the mitigation installed, saving customers patching pain and Microsoft servicing pain.

We do still fix the issues in our active development branches so that the code is correct going forward and we fix in-market operating systems that don’t have InitAll enabled as they are still vulnerable. Over the long term, operating systems that don’t have InitAll enabled will fall out of support. Once this occurs, uninitialized memory bugs that are mitigated by InitAll will only need to be fixed in active development branches and this bug class will no longer need to be serviced to in-market operating systems.

Longer term we are exploring if it’s possible to eliminate these sorts of issues from the C and C++ language in a standardized way. Leaving variables uninitialized is not typically necessary for performance (especially when the compiler has good redundant store elimination). Instead of defaulting variables to uninitialized, it’d be nice to default to “variable must be provably initialized before use” and only allow this rule to be broken if a special uninitialized keyword was used. This could allow developers to keep performance while also saving themselves from unnecessary mistakes.

tl;dr in clickbait form: “Black hats hate him: the one weird trick that saved millions of users from losing heaps of money due to nasty memory vulnerabilities!”

practicalswift commented at 10:52 am on May 19, 2020: contributor

People interested in avoiding uninitialized reads are highly encouraged to review #18912 which re-enables the valgrind fuzzing job which was recently removed.

We are currently running without any uninit read checking (in the form of Valgrind or MSan) in Travis which feels a bit unsafe TBH :\

MarcoFalke closed this on Jan 17, 2022

DrahtBot locked this on Jan 17, 2023

Compile release binaries with -ftrivial-auto-var-init=zero #18892