JeremyRubin commented at 8:25 am on November 29, 2018: contributor

This PR supersedes #14387 and is an alternative to #14397.

Summary

Previously, “no duplicate inputs” was checked on a per transaction basis, with this PR, they are checked block wide. This PR also checks that long chains are in valid order, which was previously implicitly checked only. The resulting code is also more performant.

Benefits

Stricter Invariants This PR checks stricter properties. Before this PR, a block might pass checkblock with duplicate inputs spent across transactions and invalid longchain order. Enforcing stricter invariants in CheckBlock helps us guard against potential errors in later sections of the code and opens up the door to new optimizations, parallelizations, or simplifications in the application of transactions in a block. For instance, we could create all new UTXOS in parallel and then spend all inputs in parallel.
Better Performance This checker is 6.5x faster on the duplicate inputs benchmark, and 1.1x faster on a more general test. Before this PR:

0DuplicateInputs, 5, 10, 0.334544, 0.00665873, 0.00671868, 0.00669463
1DeserializeAndCheckBlockTest, 5, 160, 2.17535, 0.00267973, 0.00273858, 0.00272915

After this PR:

0DuplicateInputs, 5, 10, 0.0513754, 0.00100089, 0.00105766, 0.00102823
1DeserializeAndCheckBlockTest, 5, 160, 1.97574, 0.00245479, 0.00248704, 0.00246725

Better Complexity The time complexity of the algorithm is E[O(N)], compared to E[O(N log(N)]. It also, in theory uses much less memory in the worst case (constant 300k) compared to O(N) (i.e., 1MB) worst case for the previous algorithm. Furthermore, the number of actual hash comparisons is substantially reduced, which contributes a big constant factor. The analysis is given in more detail in the PR which justifies the constants selected.

Potential Risks & Redresses

In this section I’ll try to exhaustively go through the various risks of this PR.

Probabilistic Algorithms are Confusing. Should the PCG based hashing algorithm (or other probabilistic aspects) be broken in this code, then an attacker could theoretically cause a block to take O(N^2) time to validate. One way to de-risk this issue would be to, upon erroneous collision, fall back to checking against a std::set which would have O(n log n) worst-case behavior (the existing runtime). In general, the theory behind PCGs makes it unlikely that an attacker could predict collisions across the network because each PCG is salted individually (random seed xor’d in) and started with a different increment (the increment includes the output’s n, and the seed is the 64 bit hash value – this prevents adversary from using the output’s n to generate collisions). Therefore, a spurious collision on one instance should have an exceedingly small probability of colliding on another. Perhaps there are further ways to harden this hash scheme while maintaining performance. In any case, the block still validates, just more slowly.
Code Complexity. Should the code be simply incorrect (e.g., not actually checking that duplicate inputs are not present somehow) then unintended inflation could occur. This is always a risk with any change. I believe the stricter invariant checking justifies this change as reducing likelihood of bugs long term, as this protects sensitive code in block application from encountering certain types of invalid blocks based on sequential application of transactions.
Block Size dependency. Technically this algorithm degrades in expected performance the more transactions are in a block, which may make it more difficult to increase block sizes. Fortunately, should block sizes ever be increased, the hash table size can be increased too (e.g. a 1MB table would support 4MB blocks easily).
Loss of WTXID caching of duplicate input checks. It has been proposed WTXIDs could cache that they don’t have duplicate inputs. I don’t see WTXID caching this property as a really useful thing to cache because it needs to be checked at the block level across all transactions anyways. Further, the use of PCG for hashing makes the performance overhead negligible compared to a cache lookup.
Testing Coverage Difficulty Because the algorithm fails infrequently, it’s difficult to write a test that covers the case where the algorithm sees a spurious collision and the continues checking. This could be fixed by templating the Invariant function to allow for a biased PCG implementation which only generates collisions.
Changed failure messages Because the algorithm checks more invariants at once and detects them in a different order than before, clients relying on an exact reject reason may be confused. Clients should already not rely on exact reject reasons, so I don’t believe this has a ramification.

Add Probabalistic Checker for Duplicate Inputs

Fix hash computations to get unique bits

Use less error-prone code for bitset

revert tx_verify.* to master

Add probabalistic duplicate detector to CheckBlock

Clean up the implementation of fast deduplication checking

Small clean up of comment & code layout for Duplicate Input checking

Make duplicate checker also check for longchain validity

Cleanup the DuplicateInputs interface to CheckInputInvariants and remove some redundant checks

Change test reasons to match new behavior

Switch to PCG generator

0425c64090

fanquake added the label Consensus on Nov 29, 2018

gmaxwell commented at 8:42 am on November 29, 2018: contributor

The argument that the CVE fix was a performance regression is based on a misunderstand of the system’s current operation: Block validation is only very rarely on the critical path for block propagation. This wasn’t the case when the duplicate checking skipping was added, but it is the case now. I can’t imagine that the PR to skip the “redundant” duplicate check would have gone through if it wasn’t on the block propagation critical path then, so I can’t see a change that is 10x+ more complicated being adopted now that its off the critical path.

This adds hundreds of lines of code and a homebrew cryptographic hash that AFAICT isn’t particularly cryptographic but if broken turns into a DOS attack (and no, XORing a seed is does not obviously produce pairwise independence, which some approximation of is required to achieve the claimed property that a bad input for one user would be okay for others), – and it looks like doesn’t actually result in an observable benefit except on microbenchmarks.

NAK.

JeremyRubin commented at 9:05 am on November 29, 2018: contributor

The main benefit I’m emphasizing here is that it checks more strict properties.

As noted. the stricter check need not introduce a ‘DoS attack’ – it can revert to the existing runtime easily. In any case, our goal isn’t really to validate a maliciously created block quickly, it is to validate an honestly created block as quickly as possible and a maliciously created block in tolerable time – I figured that they O(N / log(N)) speedup to switch back to the set algorithm upon collision wasn’t worth the added complexity there, but it can certainly be done.

The PCG I am using is not homebrew (except in implementation). If you prefer, we could add a dependency to the standard PCG library which contains a similar function. Thus the unstudied portion is mostly limited to the inputs to the function. My understanding of PCG is such that the xor’d seed should produce pairwise independence, although I grant you that it may be better to use two different seeds k1_1 and k1_2. Perhaps there are other efficiently computable prfs which have pairwise independence that would be suited for this purpose – I previously used SIPHASH for this.

DrahtBot commented at 9:31 am on November 29, 2018: member

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#15773 (test: Add BitcoinTestFramework::sync_* methods by MarcoFalke)
#15639 (bitcoin-wallet tool: Drop libbitcoin_server.a dependency by ryanofsky)
#15638 (Move-only: Pull wallet code out of libbitcoin_server by ryanofsky)
#15141 (Rewrite DoS interface between validation and net_processing by sdaftuar)
#14696 (qa: Add explicit references to related CVE’s in p2p_invalid_block test. by lucash-dev)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

ryanofsky commented at 3:44 pm on November 29, 2018: member

Stricter Invariants This PR checks stricter properties. Before this PR, a block might pass checkblock with duplicate inputs spent across transactions and invalid longchain order.

The main benefit I’m emphasizing here is that it checks more strict properties.

What’s the list of properties that this PR checks for? “Invalid longchain order” seems to mean that “outputs being created by this transaction being have not been spent by an earlier transaction.” Are there other checks, too?

ryanofsky commented at 4:27 pm on November 29, 2018: member

What’s the list of properties that this PR checks for?

Found the list here:

https://github.com/JeremyRubin/bitcoin/blob/0425c6409000aeb3270ba8f9c30d2746c5c5b784/src/validation.cpp#L3078-L3086

The actual implementation of this is change is short, clean and not hard to understand. This change doesn’t add “hundreds of lines of code”, though it does add a lot of comments and analysis.

If it’s true that “Block validation is only very rarely on the critical path for block propagation” then making this change by itself try to help with performance and complexity is probably not worth the risks. But I am curious about:

Enforcing stricter invariants in CheckBlock helps us guard against potential errors in later sections of the code and opens up the door to new optimizations, parallelizations, or simplifications in the application of transactions in a block. For instance, we could create all new UTXOS in parallel and then spend all inputs in parallel.

This is an interesting idea, even though it seems like it would require a lock-free CCoinsViewCache to improve performance. It does seem conceptually like adding an “Invalid longchain order” invariant could make future optimizations possible, so maybe this is worth thinking about more.

in src/validation.cpp:3225 in 0425c64090

3220+            0,
3221+        };
3222+    };
3223+
3224+    std::unique_ptr<std::bitset<1 << 21>> pTable = MakeUnique<std::bitset<1 << 21>>();
3225+    auto& table = *pTable.get();

practicalswift commented at 9:05 am on November 30, 2018:

This .get() is redundant, right?

in src/validation.cpp:3181 in 0425c64090

3176+     *     still perfectly acceptable.
3177+     */
3178+
3179+
3180+    const auto pcg = [](uint64_t start, uint64_t inc) {
3181+        uint64_t nxt = (inc | 1) + (start * 6364136223846793005ULL);

practicalswift commented at 9:07 am on November 30, 2018:

Make it explicit that the unsigned integer wraparound that will take place here at run-time is intentional? Or alternatively rewrite so that no integer wraparound takes place at run-time? Verify with -fsanitize=integer.

JeremyRubin commented at 10:41 pm on November 30, 2018:

How do I make the wraparound explicit?

practicalswift commented at 8:52 pm on December 4, 2018:

You could use the method I used in PR #14224:

https://github.com/practicalswift/bitcoin/blob/0373038cedc07e3d775a1055c0da45f4e8fda2b7/src/attributes.h#L9-L15

MarcoFalke commented at 11:00 pm on December 4, 2018:

Why would that be preferable to the sanitizer suppressions in ./test/?

JeremyRubin commented at 4:29 am on December 6, 2018:

Yeah I’m not sure about it. Unsigned integer overflow isn’t a bug, it’s a feature… I would also like a suppression that operates at the statement level (e.g., like rust’s wrapping_mul) rather than marking the entire function with an attribute. What if there is an unsigned wraparound issue elsewhere in the code?

practicalswift commented at 5:21 am on December 6, 2018:

@MarcoFalke In contrast to the sanitizer suppressions it documents if the wraparound is intentional or not.

What do you think about using say INTENTIONAL_WRAPAROUND for the intentional wraparounds and the file based sanitizer suppressions to cover the unintentional wraparounds?

practicalswift commented at 5:24 am on December 6, 2018:

@JeremyRubin I assume you’re talking about intentional wraparounds as features? The problem isn’t intentional wraparounds – they are totally fine by definition. The problem is unintentional wraparounds.

I share your wish for suppressions that operated at the statement level, but they don’t exist yet AFAIK :-)

in src/validation.cpp:3184 in 0425c64090

3179+
3180+    const auto pcg = [](uint64_t start, uint64_t inc) {
3181+        uint64_t nxt = (inc | 1) + (start * 6364136223846793005ULL);
3182+        uint32_t x = (nxt ^ (nxt >> 18)) >> 27;
3183+        uint32_t r = nxt >> 59;
3184+        return (x >> r) | (x << (31 & (-r)));

practicalswift commented at 9:09 am on November 30, 2018:

Make it explicit that the unsigned integer wraparound that will take place here at run-time is intentional? Or alternatively rewrite so that no integer wraparound takes place at run-time? Verify with -fsanitize=integer.

in src/validation.cpp:3078 in 0425c64090

3074@@ -3075,6 +3075,232 @@ static bool CheckBlockHeader(const CBlockHeader& block, CValidationState& state,
3075     return true;
3076 }
3077 
3078+/* CheckInputInvariants checks for three criticial invariants for the inputs in a block:

practicalswift commented at 9:10 am on November 30, 2018:

Critical :-)

in src/validation.cpp:3224 in 0425c64090

3219+            pcg(out.hash.GetUint64(3) ^ k8, 1 | ((((uint64_t)out.n) << 1) ^ k8)),
3220+            0,
3221+        };
3222+    };
3223+
3224+    std::unique_ptr<std::bitset<1 << 21>> pTable = MakeUnique<std::bitset<1 << 21>>();

ken2812221 commented at 12:55 pm on November 30, 2018:

#include <bitset> to make appveyor happy.

sdaftuar commented at 3:05 pm on November 30, 2018: member

I agree with @gmaxwell. Adding complexity to the consensus code (or, as I often argue, changing it at all) should be something we do for only very good reasons, both because of the high review burden consensus code changes incur on the project, and because of the maintenance burden and cognitive load we put on future developers as well.

I really don’t think we should consider making consensus code changes that are designed to have no effect but prep the way for unspecified future code changes (which is why I often chime in with a Concept NACK on consensus refactoring PRs). These incur all the review and testing costs of a consensus change, but by design have no benefit to the network. This project is way too busy for this to be a good use of developer effort IMO.

In my view, if we’re going to make changes to the consensus code for performance reasons, then (a) those performance numbers should be demonstrably and meaningfully better for the network, and (b) we should generally discuss all the designs that might achieve the same performance benefit, and have a very good reason for not choosing the simplest such design. In the case of this change, the performance benefits could likely be realized by far simpler changes, as has been pointed out in the review comments on other versions of this PR.

I do think that it would be a useful discussion to figure out exactly what would be the simplest code design for where the duplicate inputs check should live – I’ve had some offline conversations and in my view it’s not obvious whether it should naturally be considered a context-free check we perform on transactions, or whether the check should reside instead at the utxo database layer. If we care to improve the underlying issue here, I think we would best served by engaging in that design discussion to come up with the simplest way of reasoning about things in the future.

Finally – I think we should remember that we’re not just writing code for ourselves, but for hopefully a much larger set of future engineers. If we are going to burden people with complicated reasoning about a consensus topic, it should be because it’s really important.

gmaxwell commented at 6:50 pm on November 30, 2018: contributor

If there were a PR with some massive validation speedup that needed the new behaviour this provides, then it might be worth considering on the basis of that improvement. But as conjectural prepatory work I just don’t see the gain.

To the extent that there is an argument that belt-and-suspendering the consistent check would make the code safer holds – I’m not sure about that, but I think it’s at least plausible– that could be accomplished with a simpler and clearer check.

JeremyRubin commented at 5:07 am on December 6, 2018: contributor

@sdaftuar Thanks for the review and feedback.

Understood. It’s often times hard to tell the borderline between small-steps that are easy to reason about and sweeping changes which change lots of code. I tend to err on the side of small steps with easy to reason about changes because when things are larger I perceive there’s more of a rejection for review simply because of large amount of changes required, even if it is decomposed into small steps. I also tend to prefer to do small steps so as not to invest to much effort on an approach that will not be acceptable early on.
I can clarify the types of changes I’m excited about that this opens up the door to in a bit. I do have drafts of them, but I’m still thinking about how to best structure them to make the project benefit maximally. I’m happy to give you a preview.
The differences between this approach and a simpler version (I also have an implementation of it which checks both duplicate inputs and longchain order) is somewhat substantial. But perhaps that difference would be less clear once the broader architecture changes are done – happy to explore this as well. Also the simpler approaches mentioned in the other iteration do not have the same property that this one does (longchain checking) and cannot be augmented easily to support it (it becomes a separate check).
This sounds like a good conversation. I have some opinions on this, but am not strongly attached. I don’t have a horse in the race on weather they are contextual or non contextual checks – either is somewhat fine to me, but I do feel strongly that duplicate input checks are not a per-transaction check (they are block wide). I also believe from a performance perspective, they are best performed in ConnectBlock after the first Control.add and before Control.wait. From a code fragility perspective, I would prefer this check to happen before the UTXO database or caching hierarchy is written to at all. These two goals are slightly at odds, but there might be a way to make them work nicely together.
I do believe this code is actually much simpler than the current state of affairs. Currently, where are duplicate inputs checked? They are checked in two places – per txn in check block, and then across transactions but not within transactions in ConnectBlock. That’s what lead to the CVE. Changing the code to only perform the check in once place explicitly is dramatically simpler. Reasoning about this is simpler. Now, the hashtable implementation, I agree, is more complicated, which is why I wrote a detailed comment. If performance is sacrificable a bit here as it’s no longer in the hot path at all, I would prefer the slightly less performant version as it doesn’t need as much convincing for the reader. When working on the broader change which can follow this PR, I will base it on that simpler version.
I’ve always heard the project was scratch your own itch ;) @gmaxwell, will let you know when I have something ready. I do think that the belt-and-suspendering the check holds merit; I will include the simplest version possible in that version, to the extent it isn’t worse than master.

Do you have a falsifiable framework for testing this sort of change? I’d like to produce a testable change where I can demonstrate a change in an environment that you’re happy with. Would it be -reindex-chainstate with assumevalid set to one month prior? Is there something simpler that would work and be convincing & permit more rapid iteration?

gmaxwell commented at 2:55 pm on December 9, 2018: contributor

A reindex (-chainstate is fine) would be the standard benchmark, you could set assumevalid to whatever you think will highlight the improvement the most– both extremes of AV setting fairly characterize different but important aspects of sync performance (the performance on older vs more recent history). Similarly, default dbcache or maximized dbcache are both defensible benchmarking configurations (the performance on standard vs high performance hosts).

DrahtBot added the label Needs rebase on Apr 10, 2019

DrahtBot commented at 2:21 pm on April 10, 2019: member

MarcoFalke commented at 5:33 pm on April 19, 2019: member

There hasn’t been much activity lately and the patch still needs rebase, so I am closing this for now. Please let me know when you want to continue working on this, so the pull request can be re-opened.

MarcoFalke closed this on Apr 19, 2019

laanwj removed the label Needs rebase on Oct 24, 2019

DrahtBot locked this on Dec 16, 2021

Stricter & More Performant Invariant Checking in CheckBlock #14837

Summary

Benefits

Potential Risks & Redresses

Conflicts