serialization: prevent int overflow for big Coin::nHeight

pierreN commented at 1:51 am on March 26, 2020: contributor

This is an attempt to fix fuzzer issues 1,2,8 reported by practicalswift here : #18046

The fuzzer harness doesn’t prevent deserialization of unrealistic high values for Coin::nHeight. In the provided examples, we have :

blockundo_deserialize : the varint 0x8DD88DD700 is deserialized as 3944983552 in Coin::nHeight (TxInOutFormatter::Unser)
coins_deserialize : the varint 0x8DD5D5EC40 is deserialized as 3939874496 similarly
txundo_deserialize: the varint 0x8DCD828F01 is deserialized as 3921725441 in Coin::nHeight (Coin::Unserialize)

Since Coin::nHeight is 31 bit long, multiplying a large value by 2 triggers the fuzzer.

AFAIK those values are unrealistic (~70k years for the smallest..). I’ve looked a bit a reducing the range of values the fuzzer can deserialize, but this seems to be too much code change for not much.

Hence this PR chooses to static cast nHeight when re-serializing; it seems to be the less intrusive/safest way to prevent the fuzzer output.

Another more “upstream” approach would be to limit Coin::nHeight values to something more realistic, e.g. 0xFFFFFFF (~5k years) :

https://github.com/pierreN/bitcoin/blob/de3a30bab28e2db853a795017c5ec1704a1d0fee/src/undo.h#L39 and https://github.com/pierreN/bitcoin/blob/de3a30bab28e2db853a795017c5ec1704a1d0fee/src/coins.h#L71

Thanks !

NB: i was also not sure about the component/area to prefix the PR/commit with.. ?

DrahtBot added the label UTXO Db and Indexes on Mar 26, 2020

pierreN force-pushed on Mar 26, 2020

practicalswift commented at 4:06 pm on March 26, 2020: contributor

@pierreN

Thanks for tackling this.

Consider throwing a std::ios_base::failure instead of silently truncating in case of serialization/deserialization errors.

That is the pattern used generally in the serialization/deserialisation code.

practicalswift commented at 6:10 pm on March 26, 2020: contributor

@pierreN

FWIW, this is what I use in my local tree to avoid hitting the signed integer overflow when fuzzing:

 0diff --git a/src/coins.h b/src/coins.h
 1index e71c8a47b..2be607207 100644
 2--- a/src/coins.h
 3+++ b/src/coins.h
 4@@ -59,7 +59,7 @@ public:
 5     template<typename Stream>
 6     void Serialize(Stream &s) const {
 7         assert(!IsSpent());
 8-        uint32_t code = nHeight * 2 + fCoinBase;
 9+        const uint32_t code = nHeight * uint32_t{2} + fCoinBase;
10         ::Serialize(s, VARINT(code));
11         ::Serialize(s, Using<TxOutCompression>(out));
12     }
13@@ -68,7 +68,11 @@ public:
14     void Unserialize(Stream &s) {
15         uint32_t code = 0;
16         ::Unserialize(s, VARINT(code));
17-        nHeight = code >> 1;
18+        const uint32_t block_height = code >> 1;
19+        if (block_height > (std::numeric_limits<uint32_t>::max() - 1) / 2) {
20+            throw std::ios_base::failure("Invalid nHeight when deserializing Coin.");
21+        }
22+        nHeight = block_height;
23         fCoinBase = code & 1;
24         ::Unserialize(s, Using<TxOutCompression>(out));
25     }

Feel free to use it - I’d love to see it upstreamed. That would simplify my fuzzing workflow :)

sipa commented at 10:35 pm on March 26, 2020: member

Agree with throwing an error when the value is out of range.

pierreN force-pushed on Mar 27, 2020

pierreN commented at 2:33 am on March 27, 2020: contributor

OK, thanks.

Note that strictly speaking, as long as unsigned int and uint32_t are 32b long, in the code posted by @practicalswift only the uint32_t{2} are necessary for the fuzzer exemples (to the compiler this is equivalent to the static_cast in this PR which didn’t truncate them - sorry for being unclear).

I updated the branch to use the uint32_t{} and made explicit the unsigned int == uint32_t.

But AFAIK exceptions are already raised in serialize.h:ReadVarInt (here and here) : I’m not sure you can reach the new Invalid nHeight when deserializing Coin exception.

Examples :

code = VARINT(0x8e fe fe fe 7f) will evaluate block_height > (std::numeric_limits<uint32_t>::max() - 1) / 2) to false
code = VARINT(0x8e fe fe ff 00) will not fit an uint32_t hence raise this already existing exception

Another way would be to lower the threshold for raising the exception (e.g. /4) but since we already have the uint32_t{} this seems not needed.

Did I miss something by any chance ?

MarcoFalke commented at 6:07 pm on March 27, 2020: member

ACK 21379af7537c71181dab0e9de3af5459a0f4a00e 📀

Signature:

 0-----BEGIN PGP SIGNED MESSAGE-----
 1Hash: SHA512
 2
 3ACK 21379af7537c71181dab0e9de3af5459a0f4a00e 📀
 4-----BEGIN PGP SIGNATURE-----
 5
 6iQGzBAEBCgAdFiEE+rVPoUahrI9sLGYTzit1aX5ppUgFAlwqrYAACgkQzit1aX5p
 7pUj3qwv7BGTYcciiJxr/pawLmVnsWE3zYTUPkpxF6yHvtWRJUaXA7ZfPs9CHsHCI
 81bNzPhPSel9+IpoZ5xx71UN0yNUPBxw1ZQ4PhnEffRXsdhWEz3BjhORZ+IJ+hFVq
 9xx1LPZjRjd97WARFzOnsP98ayBFFxIh9bSlA0U8jKm8BAFnONrqpI1v5CMBxdnai
10OYHF5hQM1DKM4YaPT6ZOEwknV7WO8Nifu+kAysvdoI+M0E45uzf+mDlx2qyP1LYf
11b0ErFwTdPY7ldDLBBqOuiV/R21cZyhQBEbXUpTCwzxWoHcwEHoSLUITJ+YQ7LvTR
12ET5/d8kVsBm6o1z5mtnXrHPwXEvlKbHMFtPw6ZpCTTf5FLLZTVOC32/JVIKle7gj
13a6AKmoQDEz1+XyWCHqMCBad7TnCa78k7LrGL8NAV0RoezqazyncNrri/pG0OgdKW
14t49UxzN7Pewq6ApqsHyEAgkeC3DVzw5HsrdHakIYbZu096kMuEpHoZZfL0KFkl4/
15XBxEu4pW
16=DB20
17-----END PGP SIGNATURE-----

Timestamp of file with hash 0fdeb1acb1aa9d5b5bfc0fded9cbedaaabafa625d387583314adb45f1c22a61e -

in src/coins.h:62 in 21379af753 outdated

58@@ -59,7 +59,7 @@ class Coin
59     template<typename Stream>
60     void Serialize(Stream &s) const {
61         assert(!IsSpent());
62-        uint32_t code = nHeight * 2 + fCoinBase;
63+        uint32_t code = nHeight * uint32_t{2} + fCoinBase;

ryanofsky commented at 7:37 pm on March 27, 2020:

Can someone confirm whether I’m undertanding this correctly?

Previously nHeight operand was a 31 bit integer, fCoinBase operand was a 1-bit integer, and 2 operand was a 2+ bit integer with actual width being platform specific but at least 2.

Undefined behavior could occur on an imaginary platform where all the operations were done in 31 bits and overflowed, causing the high order bit of the uint32 result not to be set when supposed to be. If https://stackoverflow.com/questions/32529080/should-bit-fields-less-than-int-in-size-be-the-subject-of-integral-promotion is right, this can only happen on an imaginary platform with 31-bit ints.

Fix here for the undefined behavior widens 2 operand so expression is evaluated in 32+ bits on every platform and never overflows.

Am I understanding this right?

sipa commented at 9:11 pm on March 27, 2020:

I believe we’re in fact already assuming that short is 16 bits, and int is 32 bits (see src/compat/assumptions.h). Maybe should just make those assumptions more explicit (asserting that the range of int matches that of int32_t, etc., …).

MarcoFalke commented at 9:42 pm on March 27, 2020:

2 operand was a 2+ bit integer with actual width being platform specific but at least 2.

Not sure, but I think you got this wrong. 2 should be an int, which should be promoted to uint32_t, if I read this right. (“if the unsigned operand’s conversion rank is greater or equal to the conversion rank of the signed operand, the signed operand is converted to the unsigned operand’s type”) and this (“The rank of any unsigned type is equal to the rank of the corresponding signed type”)

So this change in code should not result in any change to the compiled code. I checked locally that clang++ produces the same bitcoind for me with O2.

As a side note, the boolean is converted to int in a first step. See here. Then promoted to unsigned for the same reason I mentioned above.

pierreN commented at 11:13 pm on March 27, 2020:

I believe we’re in fact already assuming that short is 16 bits, and int is 32 bits (see src/compat/assumptions.h).

Ow, thank you, I hadn’t see this file. Those are great assumptions.

As a side note, the boolean is converted to int in a first step. See here. Then promoted to unsigned for the same reason I mentioned above.

Yes, my basic understanding was that 2 was being understood as an int, forcing the nHeight to be implicitly casted as an int, forcing nHeight * 2 + fCoinBase to be implicitely casted as an int then implicitly casted to an uint32_t.

Hence forcing 2 (or nHeight) to be an uint32_t makes all the implicit casting disappears. If you really wanted to be explicit, you could do static_cast<uint32_t>{nHeight} * uint32_t{2} + static_cast<uint32_t>(fCoinBase) but this is un-necessarily verbose..

ryanofsky commented at 11:15 pm on March 27, 2020:

re: #18433 (review)

2 operand was a 2+ bit integer with actual width being platform specific but at least 2.

Not sure, but I think you got this wrong. 2 should be an int, which should be promoted to uint32_t

Oh you’re right it must be a 16+ bit signed int: https://en.cppreference.com/w/cpp/language/integer_literal#The_type_of_the_literal

I think it doesn’t change the conclusion that the only way this change could change behavior is on the imaginary platform with 31 bit ints

in src/undo.h:27 in 21379af753 outdated

23@@ -24,7 +24,7 @@ struct TxInUndoFormatter
24 {
25     template<typename Stream>
26     void Ser(Stream &s, const Coin& txout) {
27-        ::Serialize(s, VARINT(txout.nHeight * 2 + (txout.fCoinBase ? 1u : 0u)));
28+        ::Serialize(s, VARINT(txout.nHeight * uint32_t{2} + (txout.fCoinBase ? 1u : 0u)));

ryanofsky commented at 7:52 pm on March 27, 2020:

Fix here seems basically identical to the fix in Coin::Serialize and I guess works the same way.

Only difference is fCoinBase from the other expression is txout.fCoinBase ? 1u : 0u here. It’s a little more verbose but effectively the same thing since fCoinBase is 1-bit.

MarcoFalke commented at 9:44 pm on March 27, 2020:

0        ::Serialize(s, VARINT(txout.nHeight * uint32_t{2} + txout.fCoinBase));

It could make sense to change this for symmetry with the implementation in Coin::Serialize?

pierreN commented at 10:58 pm on March 27, 2020:

This looks like a great idea. I’m double checking it.

pierreN commented at 10:59 pm on March 27, 2020:

Yes, TBH I was unsure why the ternary operator was used.

in src/undo.h:37 in 21379af753 outdated

33@@ -34,7 +34,7 @@ struct TxInUndoFormatter
34 
35     template<typename Stream>
36     void Unser(Stream &s, Coin& txout) {
37-        unsigned int nCode = 0;
38+        uint32_t nCode = 0;

ryanofsky commented at 8:05 pm on March 27, 2020:

Again, not sure if I understand this correctly.

The undefined behavior happens because standard only says unsigned int nCode has to be 16+ bits.

If nCode is 32 bits everything works perfectly. Any serialized VarInt value that is 2**32 or more will trigger “ReadVarInt(): size too large” exception, and any smaller values will be broken up into height and fcoinbase correctly.

If nCode is less than 32 bits, ReadVarInt() exceptions will be triggered unnecessarily. If nCode is more than 32 bits, top bits that are set will be silently discarded without triggering the ReadVarInt() exceptions that should be triggered.

Chaning nCode to be 32 bits instead of having an implementation defined width gives the right behavior on all platforms.

Again, I’m wondering if this is right or something else is going on.

MarcoFalke commented at 9:46 pm on March 27, 2020:

The undefined behavior happens because standard only says unsigned int nCode has to be 16+ bits.

I don’t think any undefined behavior has been observed here. This has just been changed for clarity and should also not result in a change in the compiled code.

pierreN commented at 10:49 pm on March 27, 2020:

Yes this is mostly for the sake of clarity. I’m struggling to see bitcoin run on platforms where unsigned int is different than 32 bits but who knows.

If nCode is less than 32 bits, ReadVarInt() exceptions will be triggered unnecessarily.

If on an imaginary platform unsigned int is 16 bits, then yes nCode have to be less than 2**16/(6*24*365)<2 years so this code would already hit the exception in ReadVarInt.

If nCode is more than 32 bits, top bits that are set will be silently discarded without triggering the ReadVarInt() exceptions that should be triggered.

Yes. In that case adding the new exception would be necessary (limiting the threshold to uint32_t, not unsigned int).

ryanofsky commented at 11:15 pm on March 27, 2020:

re: #18433 (review)

I don’t think any undefined behavior has been observed here. This has just been changed for clarity and should also not result in a change in the compiled code.

Good to know. Any I don’t think it was right for me to call it “undefined behavior”, more like “platform specific behavior” only on a theoretical platform where an int is 31 bits with when height is somehow >= 2**30.

In the future, it would be good to do fuzzer fixes and code cleanups in separate commits. Or at least say what the different changes are intended to do in the commit message. The changes are enough work to understand without having to guess which are real and which are no-ops!

ryanofsky approved

ryanofsky commented at 8:09 pm on March 27, 2020: member

Code review ACK 21379af7537c71181dab0e9de3af5459a0f4a00e assuming I’m actually understanding this PR correctly. I left some questions in review comments.

practicalswift commented at 9:18 pm on March 27, 2020: contributor

ACK 21379af7537c71181dab0e9de3af5459a0f4a00e

Would be nice to have this merged to avoid having UBSan hitting the signed integer overflow over and over when fuzzing :)

ryanofsky approved

ryanofsky commented at 11:37 pm on March 27, 2020: member

Still looks good, thanks for explanations!

serialization: prevent int overflow for big Coin::nHeight e980214bc4

pierreN force-pushed on Mar 27, 2020

pierreN commented at 11:42 pm on March 27, 2020: contributor

Thanks for the reviews, I removed the ternary operator on fCoinBase as requested. In the same spirit, I also changed the nCode / 2 into nCode >> 1 to match undo.h and coins.h.

ryanofsky approved

ryanofsky commented at 11:42 pm on March 27, 2020: member

Code review ACK e980214bc4fd49530e8d50fe0a6657b8583bc6b5. Just removed ternary ? 1 : 0 and replaced / 2 with » 1 since last review

MarcoFalke commented at 0:49 am on March 28, 2020: member

re-ACK e980214bc4fd49530e8d50fe0a6657b8583bc6b5 🎑

Signature:

 0-----BEGIN PGP SIGNED MESSAGE-----
 1Hash: SHA512
 2
 3re-ACK e980214bc4fd49530e8d50fe0a6657b8583bc6b5 🎑
 4-----BEGIN PGP SIGNATURE-----
 5
 6iQGzBAEBCgAdFiEE+rVPoUahrI9sLGYTzit1aX5ppUgFAlwqrYAACgkQzit1aX5p
 7pUgKLgv+MDL+1GuBwP8B6du9vzoVuuWIcYe1V6tmN4/0EnM5llvYevJznwvpSuRM
 8LwfFEkQ8fgAqiJDwbzM50sjaaC6eYj36uQShVCp2DcuD1mFosnaGzX9u60xQPbpu
 9HY7WytxaHwRyN3XSrXHuEL2PuVR79wQgAnwqaBxZ25KY4bb3ab0jLVNJb1rfNLuX
10fVoBrXdLWH0vm8ompFuhGylWu5u4gVkIO+auemVAGECIMhu2238HkODZ9aE6rjWR
11FNKNiSTVeAsD9iaqZo6Tw0lyf1SLSJuuHdTtQ5vIG70r/dNPK8ba20oHC1Ss22RK
12Vsie4z2Zl2wCLbqIVKQJPIw0JEOkzUwr4BnnnbzizYkOAGShQq4iIVd3UA44H+Bp
13lo6cwp/nn9WgU+fdcF1zdWmArfRHhh3Ocv3ENqwfL/AuT7wGqDe0YTBb/ZWPq6Vm
14XFi2RK3qRrxyC4URXx13LGclrEGNVAHGnqoNlDF+Kz4jUsv6w81+gMYqlS8qCBvx
15ZlP9A9rc
16=/tH2
17-----END PGP SIGNATURE-----

Timestamp of file with hash 4660dfc715111cfef91d5f61080473e35fd321fe8ef6e8c0530e5789db28a610 -

sipa commented at 1:16 am on March 28, 2020: member

utACK e980214bc4fd49530e8d50fe0a6657b8583bc6b5

MarcoFalke added this to the milestone 0.20.0 on Mar 28, 2020

practicalswift commented at 12:22 pm on March 29, 2020: contributor

ACK e980214bc4fd49530e8d50fe0a6657b8583bc6b5 – patch looks correct

in src/undo.h:39 in e980214bc4

33@@ -34,9 +34,9 @@ struct TxInUndoFormatter
34 
35     template<typename Stream>
36     void Unser(Stream &s, Coin& txout) {
37-        unsigned int nCode = 0;
38+        uint32_t nCode = 0;
39         ::Unserialize(s, VARINT(nCode));
40-        txout.nHeight = nCode / 2;
41+        txout.nHeight = nCode >> 1;

promag commented at 0:39 am on March 30, 2020:

Why this change?

pierreN commented at 1:00 am on March 30, 2020:

In the same spirit, I also changed the nCode / 2 into nCode » 1 to match undo.h and coins.h.

To match the code in undo.h and coins.h (as it is done with fCoinBase). Can remove it if people prefer ?

promag commented at 1:03 am on March 30, 2020:

🤷‍♂ don’t make a case for me.

pierreN commented at 1:22 am on March 30, 2020:

Sorry, unsure how to read this. Waiting for a second person opinion to remove it I guess ?

practicalswift commented at 1:43 pm on March 30, 2020:

Agree with @promag.

MarcoFalke commented at 1:49 pm on March 30, 2020:

This pull request has 5 ACKs. Let’s please not invalidate all the review and fight about style changes at this point.

promag commented at 2:01 pm on March 30, 2020:

Yeah, I wasn’t trying to invalidate existing reviews, I also gave my ACK. I was trying to understand the reason behind this change. I think it’s good practice to not include unrelated changes.

MarcoFalke commented at 2:16 pm on March 30, 2020:

I think it is fine to do style-changes in code that reviewers have to look at anyway. It could have been better if it was in different commits or in the same commit, but with explanation in the commit message why each change is made. Let’s just keep it in mind for the future.

pierreN commented at 0:30 am on March 31, 2020:

My bad - I just though the change was related enough to be included here. I’ll do this in a separate commit in the future. Thanks.

promag commented at 0:39 am on March 30, 2020: member

ACK e980214bc4fd49530e8d50fe0a6657b8583bc6b5.

fanquake merged this on Mar 30, 2020

fanquake closed this on Mar 30, 2020

sidhujag referenced this in commit 1914b0350e on Mar 30, 2020

MarkLTZ referenced this in commit 440e4757a7 on Apr 19, 2020

PastaPastaPasta referenced this in commit a0deb971ad on Jun 27, 2021

PastaPastaPasta referenced this in commit 45311c83a8 on Jun 28, 2021

PastaPastaPasta referenced this in commit c7039c1b9d on Jun 29, 2021

PastaPastaPasta referenced this in commit ad450e856f on Jul 1, 2021

PastaPastaPasta referenced this in commit 8fa8d6fbaf on Jul 1, 2021

PastaPastaPasta referenced this in commit 4f632952bf on Jul 14, 2021

DrahtBot locked this on Feb 15, 2022

serialization: prevent int overflow for big Coin::nHeight #18433