fix an undefined behavior in uint::SetHex #14734

kazcw commented at 1:37 AM on November 16, 2018: contributor

Decrementing psz beyond the beginning of the string is UB, even though the out-of-bounds pointer is never dereferenced.

I don't think any clang sanitizer covers this, so I don't see any way a test could catch the original behavior.

fix an undefined behavior in uint::SetHex

Decrementing psz beyond the beginning of the string is UB, even though
the out-of-bounds pointer is never dereferenced.

0f459d868d

MarcoFalke added the label Refactoring on Nov 16, 2018

MarcoFalke added the label Utils/log/libs on Nov 16, 2018

fanquake requested review from sipa on Nov 16, 2018

promag commented at 9:41 AM on November 16, 2018: member

~~Both old and~~ new code LGTM.

practicalswift commented at 10:31 AM on November 16, 2018: contributor

Extremely nice find and another excellent contribution from you! Thanks! You're right that the sanitizers don't catch this class of UB. I think tis-interpreter is the only good tool catching this class of UB. May I ask how you found this issue?

From what I can see we unconditionally hit this UB as part of startup no matter what parameters we pass to bitcoind :-\

Why this is UB: Assuming char foo[N]; then calculating foo + n is UB if n is not in the range 0 <= n <= N. Calculating one past the end (n=N) is OK (but we may not dereference).

practicalswift commented at 10:36 AM on November 16, 2018: contributor

@promag What is "old code" referring to? Don't you agree that the existing code in master invokes UB? :-)

promag commented at 10:40 AM on November 16, 2018: member

@practicalswift TIL, it is UB unless one past the last element.

utACK 0f459d8.

promag commented at 11:09 AM on November 16, 2018: member

@kazcw how about this:

diff --git a/src/uint256.cpp b/src/uint256.cpp
index d9da66803..e5cadfae5 100644
--- a/src/uint256.cpp
+++ b/src/uint256.cpp
@@ -40,13 +40,12 @@ void base_blob<BITS>::SetHex(const char* psz)
     const char* pbegin = psz;
     while (::HexDigit(*psz) != -1)
         psz++;
-    psz--;
     unsigned char* p1 = (unsigned char*)data;
     unsigned char* pend = p1 + WIDTH;
-    while (psz >= pbegin && p1 < pend) {
-        *p1 = ::HexDigit(*psz--);
-        if (psz >= pbegin) {
-            *p1 |= ((unsigned char)::HexDigit(*psz--) << 4);
+    while (psz > pbegin && p1 < pend) {
+        *p1 = ::HexDigit(*--psz);
+        if (psz > pbegin) {
+            *p1 |= ((unsigned char)::HexDigit(*--psz) << 4);
             p1++;
         }
     }

kazcw commented at 3:41 PM on November 16, 2018: contributor

@practicalswift Just reading code! @promag That would be the minimal change to fix the problem, but Satoshi wrote C++ like an old-school C hacker :laughing:. IMO indexing is a little clearer in a codebase that these days uses more modern idioms.

practicalswift commented at 5:01 PM on November 16, 2018: contributor

@kazcw Nice! Then keep on reading code please! :-)

I've tried to launch a crusade against UB in Bitcoin Core. I even named the latest C-lightning release "The Consensus Loving Nasal Daemon" as a catchy slogan in this fight, so and I'm very happy to see your contributions! Keep 'em coming! :-)

Context:

practicalswift commented at 5:44 PM on November 16, 2018: contributor

@MarcoFalke @laanwj What about adding a new label – say "Undefined Behaviour" – for UB pull requests and issues?

FWIW I think the current use of the "Refactoring" label for UB PRs is problematic for a number of reasons:

UB PRs might be considered less important from a review attention perspective due to the refactoring label
The use of the refactoring label for UB might be interpreted as us downplaying the risks associated with UB
The "Refactoring" label is technically incorrect:
1. Code refactoring is defined as the process of restructuring existing computer code without changing its external behaviour.
2. Triggering undefined behaviour is defined as executing computer code whose behaviour is not prescribed by the language specification to which the code adheres, for the current state of the program.
3. Since undefined behaviour is a superset of external behaviour it follows that a PR fixing UB cannot technically be considered refactoring.

Some open UB PR:s that need the suggested "Undefined Behaviour" label:

#14479: serialization: Don't invoke undefined behaviour when doing type punning
#14242: Avoid triggering undefined behaviour (std::memset(nullptr, 0, 0)) if an invalid string is passed to DecodeSecret(...)
#14239: Avoid dividing by zero (undefined behaviour) in EstimateMedianVal (policy)/ConnectTip (validation)/CreateTransaction (wallet)

sipa commented at 6:05 PM on November 16, 2018: member

@practicalswift While I am in favor of (slowly) moving the code to strictly comply with the C++ standard, I also think we should also be aware that there are differences between:

Technically UB, but not for any real platform
Technically UB, but well defined for all systems we support
Would be UB, but not in any reachable codepath
Any of the above that follows from a misunderstanding by developers/reviewers, resulting in a fragile codebase.
An actual issue that can have impact on production systems.

I would argue that the first one or two are not a priority at all (still a good-to-have, but there are also code churn concerns), while the last few are increasingly more so. I don't think classifying them all under "UB" is a good idea - it would be overly dramatic for some things, and a wild understatement for others.

By this I don't want to discourage anyone from working towards improving the situation, but perhaps a "crusade" isn't the right approach.

MarcoFalke commented at 6:42 PM on November 16, 2018: member

Mostly agree with @sipa.

Also, there is a label "Bug", which (I guess) could be used when there is an observable issue with the code. I.e. actual malfunction and not just a code style issue or random compiler warning. Generally, if you are unable to write a functional or unit test case that will fail on current master, I'd say applying the "Refactoring" label is just fine.

We only support gcc and clang as compilers and in most cases of UB they just do the right thing without having to rewrite the code. And given that this piece of code is around for as long as the code base existed and no one ever run into issues, it seems unlikely that things will suddenly break.

kazcw commented at 10:43 PM on November 16, 2018: contributor

This is a complex but important topic. I think it would be worth working out a clearly-defined policy. I propose for comment an approach that admits mostly-objective triage into practical categories, based on what kinds of testing can demonstrate the issue. I have a starting point here (suggested changes / fundamental disagreements welcome): https://gist.github.com/kazcw/6849c43796d51d3e56ad12bf691ac2e6#file-reliability-vs-ub-md

If there's agreement on this approach in principle, I could PR it as section in docs/developer-notes.md, and then we could use the PR process to work out the finer details. @sipa, I think this is generally consistent with the approach you have outlined. To keep the UB discussion simple I have not addressed unreachable codepaths, which I think would fit the same needs-refactor category as other footgun-API issues. @MarcoFalke I have included a "Brittle" category in between "Refactoring" and "Broken", that would fit for example a function call that does not currently cause misbehavior, but could become broken if the compiler decided to inline it. If you agree that this kind of thing is more severe than might be expected for issues under the "Refactoring" label, we should try to come to a consensus on how to define it. @practicalswift I think a crusade on "potentially-dangerous UB" would be apt, as long as we are careful about how we define "potentially-dangerous". What do you think of the approach of leniency for a whitelist of classes of UB in existing code, with those classes selected based on clang's sanitizers options (and in this particular issue, lack thereof)? (And maybe valgrind?)

It looks like the current sanitizers used in CI test runs are "thread" and "integer,undefined". That is definitely insufficient for the definition of the "Brittle" classification: for example, those sanitizers don't include uninitialized reads. IMO a whitelist approach is appropriate: every sanitizer category should be considered to represent potential danger unless specifically decided otherwise (which we might just do lazily, as issues come up?). Ideally all sanitizers deemed important would be checked in CI test runs, but given the slowdown factor of some sanitizers, I don't know if that would be practical.

practicalswift commented at 5:03 PM on November 17, 2018: contributor

@MarcoFalke

We only support gcc and clang as compilers and in most cases of UB they just do the right thing without having to rewrite the code.

Are you willing to bet your money on that they'll "do the right thing" also in the future?

The C++ standard is a contract between the compiler writer and the programmer.

As programmers we can choose to break the contract and hope that the other party is okay with it or won't notice.

In this past this strategy seemed to work: compiler writers seldom enforced the contract we signed. They never took us to court.

However, in recent years compiler writers have started to aggressively take advantage of undefined behaviours to improve optimizations. They are taking us to court like never before:

https://kb.isc.org/docs/aa-01167 (2018: "Changes in GCC Code Optimization Can Cause a Crash in BIND")
https://lwn.net/Articles/511259/ (2012: "Signed overflow optimization hazards in the kernel")
https://www.kb.cert.org/vuls/id/162289/ (2008: "C compilers may silently discard some wraparound checks")

You've probably read them but these are great blog posts on the subject from the compiler writers' perspective:

Chris Lattner's "What Every C Programmer Should Know About Undefined Behavior":

John Regehr's "A Guide to Undefined Behavior in C and C++":

MarcoFalke commented at 5:07 PM on November 17, 2018: member

It looks like the current sanitizers used in CI test runs are "thread" and "integer,undefined"

Unfortunately we don't run the thread sanitizer (#14058), nor the memory sanitizer or valgrind (due to memory issues internal to bdb and qt, IIRC).

l2a5b1 commented at 10:25 AM on November 20, 2018: contributor

utACK 0f459d868d85053f1cc066ea9099793f88cbd655

This pull request eliminates pointer arithmetic [1] and addresses UB if psz points to an array [2].

[1] “Pointer arithmetic is fragile and easy to get wrong, the source of many, many bad bugs and security violations.“ ES.42: Keep use of pointers simple and straightforward

[2] Excerpt From: Bjarne Stroustrup. “The C++ Programming Language, 4/e“:

“The result of taking the address of the element before the initial element or beyond one-past-the-last element is undefined and should be avoided.”

”int v[] = { 1, 2, 3, 4 }; int* p1 = v; // pointer to initial element (implicit conversion) int* p4 = v–1; // before the beginning, undefined: don't do it”

gmaxwell commented at 8:49 PM on November 20, 2018: contributor

@practicalswift It sounds like you are only looking at things from one side. Consider a case where a behaviour is language spec UB but explicitly defined by our supported toolchain and ubiquitously used. In that case the existing behaviour is at least currently harmless. A fix risks introducing its own bugs (potentially not harmless) and also diverts review/testing attention away from other changes, potentially allowing bugs through or causing important improvements to be delayed. If the fix is done hastily (e.g. as part of a mass of changes in a 'crusade') even if it is not buggy itself could make the code more brittle or less clear and thus contribute to the introduction of bugs down the line. (You've seen me complain sometimes against fixing warnings by peppering in casts, and usually its because of brittleness/less clarity)

There are as many ways for code to be good as there are people and kinds of code, a decision to do something is never a one sided choice to make things better or not, it's always a prioritization of what kinds of better we're making things and what risks we're taking by making those changes.

practicalswift commented at 11:35 PM on November 20, 2018: contributor

@gmaxwell I agree with the point you and @sipa make: all changes – including UB fixes – should be critically evaluated from a risk-reward perspective. And PR:s deemed too risky should be closed off after review.

Perhaps what we don't agree on is how much work it would take to kill off the remaining instances of UB in our code base, and how complex the required changes would be. Could that be the case?

If we get specific: is there any particular type of UB you fear that we have in such quantity that it would be unrealistic to fix?

Also, are there any UB fixes that have been submitted so far that you don't think are worth fixing?

Personally I think we're better off having all UB cases identified and documented. That way we can reason about them like we do in this PR and take the appropriate action (if any). We agree on that, right? :-)

MarcoFalke commented at 12:10 AM on November 21, 2018: member

To the best of my knowledge the runtime observable UBs were already documented through an UBSAN suppressions file (https://github.com/bitcoin/bitcoin/blob/5c292dafcd54adfcd9f80c0e1fccb45c8683808f/contrib/sanitizers-ubsan.suppressions)? To fix all of them a series of separate pull requests would be required, each of them a massive review burden. It would probably take months if not years to get all through the review process. Especially unsigned-integer-overflow are tricky, as pointed out by @arvidn in #11535 (comment). Because the right fix is not always obviously clear and might even introduce more UB temporarily. I suggest to come up with a guideline on UB and maybe a separate guideline on unsigned-integer-overflow before proceeding with patches that might go in the wrong direction and give a false impression of having fixed something.

practicalswift commented at 7:07 AM on November 21, 2018: contributor

@MarcoFalke Some things that are important to note when talking about UB:

Wraparound behaviour using unsigned integers is well-defined, so what -fsanitize=integer calls unsigned-integer-overflow is not UB. (Unintended unsigned integer wraparound can be a source of bugs, but it is not UB.)
UBSan does not catch all UBs. Since UBSan performs dynamic analysis it is restricted to finding the subset of all UBs that are detectable at run-time. And in that subset of UBs it still doesn't catch all types of UB. The UB discussed in this PR is an example of that: it is UB, it is detectable at run-time but it is not catched by UBsan. Another example is uninitialized reads which are not detected by UBSan (ASan does).

If we get more specific:

Are there any specific UB fixes that have been submitted so far that you don't think are worth fixing?
Is there any particular type of UB you fear that we have in such quantity that it would be unrealistic to fix?
Is there any specific scenario in which we'd be better off by not having all UB cases clearly identified and documented so that we can reason about their impact and take the appropriate action (if any)?

sipa commented at 7:26 AM on November 21, 2018: member

@practicalswift I think the only disagreement is on how much of a priority this is.

practicalswift commented at 6:43 PM on November 27, 2018: contributor

@kazcw

It looks like the current sanitizers used in CI test runs are "thread" and "integer,undefined". That is definitely insufficient for the definition of the "Brittle" classification: for example, those sanitizers don't include uninitialized reads. IMO a whitelist approach is appropriate: every sanitizer category should be considered to represent potential danger unless specifically decided otherwise (which we might just do lazily, as issues come up?). Ideally all sanitizers deemed important would be checked in CI test runs, but given the slowdown factor of some sanitizers, I don't know if that would be practical.

PR #14794 adds AddressSanitizer (ASan) to Travis. Please review :-)

DrahtBot commented at 11:31 AM on February 4, 2019: member

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

No conflicts as of last run.

laanwj commented at 1:30 PM on February 12, 2019: member

This is sure generating a lot of discussion for a few-line change. Is there disagreement on merging this specific change?

practicalswift commented at 10:21 PM on February 28, 2019: contributor

@laanwj There is no disagreement about the code change from what I can tell. I think we should merge this or alternatively document in assumptions.h that we assume this specific class of UB to be safe in practice.

gmaxwell commented at 2:33 AM on May 7, 2019: contributor

Please do not just accept any actually known true UB as just an "assumption". Assumptions should be for platform limitations (e.g. we require int be 32 bits) and implementation defined behaviour (including places where clang and GCC explicitly strengthen the language specification and turn something that the language leaves UB into implementation defined).

I think it would be really nice if every PR fixing an obvious bug didn't turn into some awful debate about undefined behaviour.

laanwj commented at 9:49 AM on May 7, 2019: member

Agreed: so let's move back the focus in this PR to reviewing the specific code change. Please move general UB discussion elsewhere:

This has currently 2 utACKs, no NACKs.

practicalswift commented at 8:34 AM on May 16, 2019: contributor

This might be of help for people reviewing this PR who want to understand the old behaviour:

Consider the case SetHex("1000000000000000000000000000000000000000000000000000000000000000"):

When the reverse processing reaches the leading 1 we have psz == pbegin and the following happens:

*p1 |= ((unsigned char)HexDigit(*psz--) << 4);

Please note that psz has now been decremented to pbegin - 1 which is outside of the object.

An example:

$ git diff
diff --git a/src/uint256.cpp b/src/uint256.cpp
index e3bc9712e..8cd1065fb 100644
--- a/src/uint256.cpp
+++ b/src/uint256.cpp
@@ -49,6 +49,7 @@ void base_blob<BITS>::SetHex(const char* psz)
             *p1 |= ((unsigned char)::HexDigit(*psz--) << 4);
             p1++;
         }
+        assert(psz >= pbegin);
     }
 }

$ make
$ src/test/test_bitcoin -t addrman_tests
test_bitcoin: uint256.cpp:52: void base_blob<256>::SetHex(const char *) [BITS = 256]: Assertion `psz >= pbegin' failed.
Aborted

laanwj merged this on Jul 3, 2019

laanwj closed this on Jul 3, 2019

laanwj referenced this in commit 085cac6b90 on Jul 3, 2019

sidhujag referenced this in commit 7110793ec4 on Jul 5, 2019

jasonbcox referenced this in commit 63a5cddd31 on Jul 8, 2020

PastaPastaPasta referenced this in commit 3daeb28c12 on Oct 22, 2021

DrahtBot locked this on Dec 16, 2021