We don't normally use ReadVarInt from untrusted inputs, but we might see this in the case of corruption.
This is exposed in test_bitcoin_fuzzy.
We don't normally use ReadVarInt from untrusted inputs, but we might see this in the case of corruption.
This is exposed in test_bitcoin_fuzzy.
We don't normally use ReadVarInt from untrusted inputs, but we might
see this in the case of corruption.
This is exposed in test_bitcoin_fuzzy.
335 | @@ -336,11 +336,18 @@ I ReadVarInt(Stream& is) 336 | I n = 0; 337 | while(true) { 338 | unsigned char chData = ser_readdata8(is); 339 | + if (n > (std::numeric_limits<I>::max() >> 7)) { 340 | + throw std::ios_base::failure("ReadVarInt(): size too large"); 341 | + }
Could this not be done before chData is read?
It reads currently as though chData being read impacts this in some way
It won't hurt as this branch will never be taken in non-exceptional situation, and at best it may improve ILP.
I meant for readability
Alternative (as this is not used for data read from the network): restrict CVarInt to unsigned types only (the only reason to suggest this is because I'm not sure if there may be a performance impact of this PR as is).
Concept ACK
A micro benchmark of this change on my laptop did not show any significant change in runtime.
(This is a pretty bad benchmark though).
tACK 45f09618f22f0a59d872818f28fc2a938cc98311
I ran a benchmark: comparing the wall-clock time of the gettxoutsetinfo command's inner loop after commenting out the chainstate hash computation, on a node with 0 connections synced to block 250065. The chainstate was on a tmpfs. Done on i7-6820HQ CPU with frequency pegged to 2.60GHz, with 32 GiB of RAM, and dbcache=8000. Measurements using GetTimeMicros.
Master: 3747773 3747375 3734935 3743421 3723418 3734187 3717763 3732135 3693859 3729604 3731461 3752013 3714361 3703590 3748206 3743741 3733456 3743399 3743514 3750425 3730095 3719208 3735383 3714827 3728027 3712683 3718786
This PR: 3690632 3692104 3686250 3681913 3667378 3661354 3680667 3700359 3687146 3684476 3686028 3679919 3669181 3731960 3671747 3679751 3664658 3691547 3683182 3675638 3682229 3672767 3667536 3669105 3670341 3663954 3664253
If anything, it seems this patch makes UTXO deserialization (likely the most impactful usage of VARINT deserialization) slightly, but measurably, faster by 1-2% at least.
utACK 45f09618f22f0a59d872818f28fc2a938cc98311
It doesn't seem like the varint code supports negative numbers. Should there be a
static_assert(std::is_unsigned<I>::value, "no support for signed varints"); somewhere in here?
@ryanofsky would have been good when it was written, unfortunately it's wrong right now-- ccoins uses it on nVersion which is signed, and the result is corrupted data (this is a known issue) fortunately we don't use the nversion for anything. (We should make sure that fact doesn't mean these exceptions can trigger.)
@gmaxwell Nice find! Do you have a test_bitcoin_fuzzy test case to share that triggers this issue?
I'm doing some bitcoin fuzzing myself and I'm curious to roughly how many CPU months that went into finding this issue and more specifically how many total paths that had been reached before finding this (the number printed at overall results/total paths assuming afl-fuzz)?
From my experience the code paths that can be reached using test_bitcoin_fuzzy seem to be very very robust :-)
BTW, are there any known plans to submit a test_bitcoin_fuzzy based fuzzer to https://github.com/google/oss-fuzz?
I'm really interested in fuzzing bitcoind and I'd love to share corpora. I've done some fuzzing of the Apple Swift compiler using my own fuzzer - see results and @practicalswift for some context.
Concept ACK