Improve parallel script validation error debug logging #31112

pull sipa wants to merge 5 commits into bitcoin:master from sipa:202410_parcheck_debug changing 16 files +134 −126
  1. sipa commented at 3:41 pm on October 18, 2024: member

    Builds on top of #31097 (now merged). Fixes #30960.

    So far, detailed information about script validation failures is only reported when running with -par=1, due to a lack of ability to transfer information from the script validation threads to the validation thread. Fix this by extending the CCheckQueue functionality to pass more results through than just success/failure, and use this to report the exact Script error, as well as the transaction input in which it occurred.

  2. DrahtBot commented at 3:41 pm on October 18, 2024: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/31112.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK maflcko
    Concept ACK hebasto, darosior, instagibbs, furszy

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #31269 (validation: Remove RECENT_CONSENSUS_CHANGE validation result by TheCharlatan)
    • #29700 (kernel, refactor: return error status on all fatal errors by ryanofsky)
    • #29641 (scripted-diff: Use LogInfo over LogPrintf [WIP, NOMERGE, DRAFT] by maflcko)
    • #19461 (multiprocess: Add bitcoin-gui -ipcconnect option by ryanofsky)
    • #19460 (multiprocess: Add bitcoin-wallet -ipcconnect option by ryanofsky)
    • #10102 (Multiprocess bitcoin by ryanofsky)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  3. sipa force-pushed on Oct 18, 2024
  4. DrahtBot added the label CI failed on Oct 18, 2024
  5. DrahtBot commented at 4:21 pm on October 18, 2024: contributor

    🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/31741253464

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  6. sipa force-pushed on Oct 18, 2024
  7. sipa force-pushed on Oct 18, 2024
  8. DrahtBot removed the label CI failed on Oct 18, 2024
  9. sipa force-pushed on Oct 19, 2024
  10. laanwj added the label Validation on Oct 20, 2024
  11. sipa force-pushed on Oct 21, 2024
  12. DrahtBot added the label Needs rebase on Oct 21, 2024
  13. sipa commented at 2:21 pm on October 21, 2024: member
    Rebased after the merge of #31097.
  14. DrahtBot commented at 4:06 pm on October 21, 2024: contributor

    🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/31832088128

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  15. DrahtBot added the label CI failed on Oct 21, 2024
  16. sipa force-pushed on Oct 21, 2024
  17. DrahtBot removed the label Needs rebase on Oct 21, 2024
  18. DrahtBot removed the label CI failed on Oct 21, 2024
  19. in src/validation.cpp:2182 in 5043cbd8e4 outdated
    2204-                // verification flag.
    2205-                //
    2206-                // Avoid reporting a mandatory script check failure with a non-mandatory error
    2207-                // string by reporting the error from the second check.
    2208-                error = check2.GetScriptError();
    2209+                if (auto result2 = check2(); !result2.has_value()) {
    


    Christewart commented at 3:11 pm on October 22, 2024:

    Is there perf reasons to put the auto result2 = check(); into the if expression? Seems more readable to do

    0auto result2 = check2(); 
    1if (!result2.has_value()) {
    2  ...
    3}
    

    IIUC I see why this is done on L2187 - to avoid doing the computation unnecessarily.


    sipa commented at 2:58 pm on November 11, 2024:
    Fixed.
  20. hebasto commented at 2:57 pm on October 29, 2024: member
    Concept ACK.
  21. darosior commented at 6:57 pm on November 4, 2024: member
    Concept ACK
  22. sipa force-pushed on Nov 6, 2024
  23. sipa commented at 6:19 pm on November 6, 2024: member
    I’ve taken this a little bit further, by making the debug logging uniform too.
  24. DrahtBot commented at 7:59 pm on November 6, 2024: contributor

    🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/32613781874

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  25. DrahtBot added the label CI failed on Nov 6, 2024
  26. sipa force-pushed on Nov 6, 2024
  27. DrahtBot removed the label CI failed on Nov 6, 2024
  28. in src/validation.cpp:2570 in f1fa6a5dda outdated
    2580@@ -2581,9 +2581,8 @@ bool Chainstate::ConnectBlock(const CBlock& block, BlockValidationState& state,
    2581         for (const auto& tx : block.vtx) {
    2582             for (size_t o = 0; o < tx->vout.size(); o++) {
    2583                 if (view.HaveCoin(COutPoint(tx->GetHash(), o))) {
    2584-                    LogPrintf("ERROR: ConnectBlock(): tried to overwrite transaction\n");
    2585-                    return state.Invalid(BlockValidationResult::BLOCK_CONSENSUS, "bad-txns-BIP30",
    2586-                                         "tried to overwrite transaction");
    2587+                    state.Invalid(BlockValidationResult::BLOCK_CONSENSUS, "bad-txns-BIP30",
    


    maflcko commented at 9:19 am on November 7, 2024:

    nit in f1fa6a5dda8f6b640b6377651e6d76df7ff77e14: Not sure about removing the early-return on invalid just to have it trickle through the rest of the code to a debug log statement. As seen in your last force-push, this makes the code more verbose, complex and brittle (easy to get wrong). (https://github.com/bitcoin/bitcoin/compare/28d67cd01c546fa9ce0d98be208ab6a19f1efbb0..f1fa6a5dda8f6b640b6377651e6d76df7ff77e14)

    What do you think about adding a small logging lambda function, with LogInfo("Block validation error: %s", state.ToString()); return false;. Then, the early return could be kept here (and everywhere else), returning the result of the lambda call.


    sipa commented at 2:45 pm on November 11, 2024:
    @maflcko I think there is a conceptual simplicity in having all failure paths just needing to set the state object correctly, and be logging done in a uniform way based on just that. While it’s true that there are a few control flow changes compared to the older code, those are really only needed to make the debug output consistent and testable in the presence of multiple consensus failures; the code would be correct without them.
  29. maflcko commented at 9:19 am on November 7, 2024: member
    Left a nit. Also, the tests seem to fail when running on each commit individually.
  30. in src/test/miner_tests.cpp:406 in c9c4e91d12 outdated
    403@@ -404,7 +404,7 @@ void MinerTestingSetup::TestBasicMining(const CScript& scriptPubKey, const std::
    404         hash = tx.GetHash();
    405         tx_mempool.addUnchecked(entry.Fee(LOWFEE).Time(Now<NodeSeconds>()).SpendsCoinbase(false).FromTx(tx));
    406         // Should throw block-validation-failed
    407-        BOOST_CHECK_EXCEPTION(AssemblerForTest(tx_mempool).CreateNewBlock(scriptPubKey), std::runtime_error, HasReason("block-validation-failed"));
    408+        BOOST_CHECK_EXCEPTION(AssemblerForTest(tx_mempool).CreateNewBlock(scriptPubKey), std::runtime_error, HasReason("mandatory-script-verify-flag-failed"));
    


    darosior commented at 7:11 pm on November 7, 2024:
    nit: above comment is now outdated (it wasn’t very useful in the first place).

    sipa commented at 2:47 pm on November 11, 2024:
    Fixed.
  31. sipa force-pushed on Nov 11, 2024
  32. sipa force-pushed on Nov 11, 2024
  33. sipa commented at 2:47 pm on November 11, 2024: member
    Fixed the test-each-commit failure and addressed comments.
  34. [checkqueue] support user-defined return type through std::optional
    The check type function now needs to return a std::optional<R> for some type R,
    and the check queue overall will return std::nullopt if all individual checks
    return that, or one of the non-nullopt values if there is at least one.
    
    For most tests, we use R=int, but for the actual validation code, we make it return
    the ScriptError.
    011d196c6f
  35. [validation] Make script error messages uniform for parallel/single validation
    This makes the debug output mostly the same for -par=1 and parallel validation runs. Of course,
    parallel validation is non-deterministic in what error it may encounter first if there are
    multiple issues. Also, the way certain script-related and non-script-related checks are
    performed differs between the two modes still, which may result in discrepancies.
    7341706686
  36. [validation] Add detailed txin/txout information for script error messages
    Don't just report which script error occurred, but which in which input of which transaction,
    and which UTXO was being spent.
    cca0ddbaf4
  37. [validation] include all logged information in BlockValidationState 8041101304
  38. [validation] merge all ConnectBlock debug logging code paths cc6d3ea623
  39. sipa force-pushed on Nov 11, 2024
  40. instagibbs commented at 2:20 pm on November 12, 2024: member
    concept ACK
  41. in src/validation.cpp:2181 in 011d196c6f outdated
    2186-                // verification flag.
    2187-                //
    2188-                // Avoid reporting a mandatory script check failure with a non-mandatory error
    2189-                // string by reporting the error from the second check.
    2190-                error = check2.GetScriptError();
    2191+                auto result2 = check2();
    


    instagibbs commented at 9:09 pm on November 12, 2024:
    nit suggestion s/results2/policy_only_result/ ?

    TheCharlatan commented at 9:22 pm on November 18, 2024:
    Isn’t this the consensus only result?

    instagibbs commented at 2:46 pm on November 19, 2024:
    right. So consensus_result :melting_face:
  42. in test/functional/p2p_segwit.py:1015 in 7341706686 outdated
    1011@@ -1016,7 +1012,7 @@ def test_extra_witness_data(self):
    1012         tx2.vout.append(CTxOut(tx.vout[0].nValue, CScript([OP_TRUE])))
    1013         tx2.wit.vtxinwit.extend([CTxInWitness(), CTxInWitness()])
    1014         tx2.wit.vtxinwit[0].scriptWitness.stack = [CScript([CScriptNum(1)]), CScript([CScriptNum(1)]), witness_script]
    1015-        tx2.wit.vtxinwit[1].scriptWitness.stack = [CScript([OP_TRUE])]
    1016+        tx2.wit.vtxinwit[1].scriptWitness.stack = []
    


    instagibbs commented at 9:10 pm on November 12, 2024:
    note: this is removed because it’s causing a 2nd error that may mask the first’s error being reported. Seems like a mistake in the original test code? Agree with its removal.
  43. in src/test/miner_tests.cpp:403 in 7341706686 outdated
    402@@ -403,8 +403,7 @@ void MinerTestingSetup::TestBasicMining(const CScript& scriptPubKey, const std::
    403         tx.vout[0].nValue -= LOWFEE;
    


    instagibbs commented at 9:11 pm on November 12, 2024:
    73417066867ea7920647054eb0752e0dd3100de7 seems to remove the last usages of -par=1. I think it behooves us to keep coverage somewhere since IIUC par=1 means different control flow.
  44. in src/validation.cpp:2079 in cca0ddbaf4 outdated
    2076     ScriptError error;
    2077     if (VerifyScript(scriptSig, m_tx_out.scriptPubKey, witness, nFlags, CachingTransactionSignatureChecker(ptxTo, nIn, m_tx_out.nValue, cacheStore, *m_signature_cache, *txdata), &error)) {
    2078         return std::nullopt;
    2079     } else {
    2080-        return error;
    2081+        auto debug_str = strprintf("input %i of %s, spending %s:%i", nIn, ptxTo->GetHash().ToString(), ptxTo->vin[nIn].prevout.hash.ToString(), ptxTo->vin[nIn].prevout.n);
    


    instagibbs commented at 9:20 pm on November 12, 2024:

    maybe unnecessary if the logging includes the actual block with witness commitment?

    0        const auto debug_str = strprintf("input %i of %s (wtxid %s), spending %s:%i", nIn, ptxTo->GetHash().ToString(), ptxTo->GetWitnessHash().ToString(), ptxTo->vin[nIn].prevout.hash.ToString(), ptxTo->vin[nIn].prevout.n);
    
  45. in test/functional/feature_cltv.py:177 in cc6d3ea623
    173@@ -174,7 +174,7 @@ def run_test(self):
    174             block.hashMerkleRoot = block.calc_merkle_root()
    175             block.solve()
    176 
    177-            with self.nodes[0].assert_debug_log(expected_msgs=[f'Script validation error in block: {expected_cltv_reject_reason}']):
    178+            with self.nodes[0].assert_debug_log(expected_msgs=[f'Block validation error: {expected_cltv_reject_reason}']):
    


    instagibbs commented at 3:30 pm on November 13, 2024:

    asserting on logging is subpar I know, but we could do tighter assertions now if we want to check we’re formatting properly:

     0diff --git a/test/functional/feature_cltv.py b/test/functional/feature_cltv.py
     1index 557fcc7cea..220a6f8b25 100755
     2--- a/test/functional/feature_cltv.py
     3+++ b/test/functional/feature_cltv.py
     4@@ -172,11 +172,13 @@ class BIP65Test(BitcoinTestFramework):
     5             # Now we verify that a block with this transaction is also invalid.
     6             block.vtx[1] = spendtx
     7             block.hashMerkleRoot = block.calc_merkle_root()
     8             block.solve()
     9 
    10-            with self.nodes[0].assert_debug_log(expected_msgs=[f'Block validation error: {expected_cltv_reject_reason}']):
    11+            input_idx = 0
    12+            expanded_expected_cltv_reject_reason = f"{expected_cltv_reject_reason}, input {input_idx} of {spendtx.hash}, spending {spendtx.vin[input_idx].prevout.hash:064x}:{spendtx.vin[input_idx].prevout.n}"
    13+            with self.nodes[0].assert_debug_log(expected_msgs=[f'Block validation error: {expanded_expected_cltv_reject_reason}']):
    14                 peer.send_and_ping(msg_block(block))
    15                 assert_equal(int(self.nodes[0].getbestblockhash(), 16), tip)
    16                 peer.sync_with_ping()
    17 
    18         self.log.info("Test that a version 4 block with a valid-according-to-CLTV transaction is accepted")
    
  46. instagibbs commented at 3:33 pm on November 13, 2024: member
    first pass, looks good but I want to spend more time making sure queue and control flow changes are correct
  47. DrahtBot added the label Needs rebase on Nov 14, 2024
  48. DrahtBot commented at 10:13 am on November 14, 2024: contributor

    🐙 This pull request conflicts with the target branch and needs rebase.

  49. in src/validation.cpp:2663 in 7341706686 outdated
    2659@@ -2660,8 +2660,7 @@ bool Chainstate::ConnectBlock(const CBlock& block, BlockValidationState& state,
    2660                 // Any transaction validation failure in ConnectBlock is a block consensus failure
    2661                 state.Invalid(BlockValidationResult::BLOCK_CONSENSUS,
    2662                               tx_state.GetRejectReason(), tx_state.GetDebugMessage());
    2663-                LogError("ConnectBlock(): CheckInputScripts on %s failed with %s\n",
    2664-                    tx.GetHash().ToString(), state.ToString());
    2665+                LogError("Script validation error in block: %s\n", tx_state.GetRejectReason());
    


    maflcko commented at 6:26 pm on November 15, 2024:

    nit in 73417066867ea7920647054eb0752e0dd3100de7: This is unrelated to making the error messages uniform, but I think it would be nice to use the same log level as well. Otherwise it seems a bit odd that -par=1 will print a different log level than -par=2. (Personally I think that LogInfo is probably enough)

    Edit: Doesn’t matter, as this is deleted anyway in a later commit.

  50. in src/validation.cpp:2664 in cca0ddbaf4 outdated
    2660@@ -2660,7 +2661,7 @@ bool Chainstate::ConnectBlock(const CBlock& block, BlockValidationState& state,
    2661                 // Any transaction validation failure in ConnectBlock is a block consensus failure
    2662                 state.Invalid(BlockValidationResult::BLOCK_CONSENSUS,
    2663                               tx_state.GetRejectReason(), tx_state.GetDebugMessage());
    2664-                LogError("Script validation error in block: %s\n", tx_state.GetRejectReason());
    2665+                LogError("Script validation error in block: %s in %s\n", tx_state.GetRejectReason(), tx_state.GetDebugMessage());
    


    maflcko commented at 6:59 pm on November 15, 2024:

    nit in cca0ddbaf4dde7c9b8b50f5f1fd29cefdcf301ea: Same here. I don’t see the reason to have a different log message on -par=1, compared to -par=2.

    Not that it matters, because this code is deleted anyway in a future commit. However, my recommendation would be to drop this 1-line hunk, so that no inconsistency is introduced and code churn in deleted lines is kept to a minimum.

  51. in src/validation.cpp:2697 in cc6d3ea623
    2698+    if (parallel_result.has_value() && state.IsValid()) {
    2699         state.Invalid(BlockValidationResult::BLOCK_CONSENSUS, strprintf("mandatory-script-verify-flag-failed (%s)", ScriptErrorString(parallel_result->first)), parallel_result->second);
    2700-        LogPrintf("Script validation error in block: %s", state.GetRejectReason());
    2701+    }
    2702+    if (!state.IsValid()) {
    2703+        LogPrintf("Block validation error: %s", state.ToString());
    


    maflcko commented at 9:02 am on November 16, 2024:

    style nit in the last commit: For new code my recommendation would be to use LogInfo, instead of the deprecated alias, which does exactly the same.

  52. in src/checkqueue.h:111 in 011d196c6f outdated
    110                     cond.wait(lock); // wait
    111                     nIdle--;
    112                 }
    113                 if (m_request_stop) {
    114-                    return false;
    115+                    return std::nullopt;
    


    maflcko commented at 9:24 am on November 16, 2024:
    nit in 011d196c6fff4bbcd2bd03998c64b704091d0aff: Reads a bit odd to invert the return value in this case, but I understand that keeping it return error isn’t trivially possible and doesn’t matter anyway. Maybe add a comment // return value does not matter, because m_request_stop is only set in the destructor ?
  53. maflcko commented at 11:16 am on November 16, 2024: member

    Just some non-blocking nits. Feel free to ignore.

    review ACK cc6d3ea623933cc7f0db0daaabcbdda86272c8f7 🐱

    Signature:

    0untrusted comment: signature from minisign secret key on empty file; verify via: minisign -Vm "${path_to_any_empty_file}" -P RWTRmVTMeKV5noAMqVlsMugDDCyyTSbA3Re5AkUrhvLVln0tSaFWglOw -x "${path_to_this_whole_four_line_signature_blob}"
    1RUTRmVTMeKV5npGrKx1nqXCw5zeVHdtdYURB/KlyA/LMFgpNCs+SkW9a8N95d+U4AP1RJMi+krxU1A3Yux4bpwZNLvVBKy0wLgM=
    2trusted comment: review ACK cc6d3ea623933cc7f0db0daaabcbdda86272c8f7 🐱
    3OLQ3xJDqAulXPft5P44mtxO3vy7nHJ5prxaV6bjs5vnH/H75OnFvbrFJHZ5X5IhUsDPjAi7f9Pi9LBqJNTn5Aw==
    
  54. DrahtBot requested review from instagibbs on Nov 16, 2024
  55. DrahtBot requested review from hebasto on Nov 16, 2024
  56. DrahtBot requested review from darosior on Nov 16, 2024
  57. in src/validation.cpp:2075 in 011d196c6f outdated
    2068@@ -2069,10 +2069,15 @@ void UpdateCoins(const CTransaction& tx, CCoinsViewCache& inputs, CTxUndo &txund
    2069     AddCoins(inputs, tx, nHeight);
    2070 }
    2071 
    2072-bool CScriptCheck::operator()() {
    2073+std::optional<ScriptError> CScriptCheck::operator()() {
    2074     const CScript &scriptSig = ptxTo->vin[nIn].scriptSig;
    2075     const CScriptWitness *witness = &ptxTo->vin[nIn].scriptWitness;
    2076-    return VerifyScript(scriptSig, m_tx_out.scriptPubKey, witness, nFlags, CachingTransactionSignatureChecker(ptxTo, nIn, m_tx_out.nValue, cacheStore, *m_signature_cache, *txdata), &error);
    2077+    ScriptError error;
    


    TheCharlatan commented at 5:10 pm on November 18, 2024:
    Why is this not initialized again with SCRIPT_ERR_UNKNOWN_ERROR? I guess it is redundant, because it is set in the verification functions, but might still be good to keep it as a guard?
  58. in src/test/checkqueue_tests.cpp:231 in 011d196c6f outdated
    231-            std::vector<FailingCheck> vChecks;
    232+            std::vector<FixedCheck> vChecks;
    233             vChecks.reserve(r);
    234             for (size_t k = 0; k < r && remaining; k++, remaining--)
    235-                vChecks.emplace_back(remaining == 1);
    236+                vChecks.emplace_back(remaining == 1 ? std::make_optional<int>(17 * i) : std::nullopt);
    


    furszy commented at 9:59 pm on November 18, 2024:
    nit: could also test that only the first found error is retrieved (by adding another task failure). Checking in this way that no further check()s are executed.
  59. furszy commented at 10:17 pm on November 18, 2024: member
    Concept ACK. Getting deeper.
  60. in src/test/checkqueue_tests.cpp:238 in 011d196c6f outdated
    241         if (i > 0) {
    242-            BOOST_REQUIRE(!success);
    243+            BOOST_REQUIRE(result.has_value() && *result == static_cast<int>(17 * i));
    244         } else if (i == 0) {
    245-            BOOST_REQUIRE(success);
    246+            BOOST_REQUIRE(!result.has_value());
    


    furszy commented at 0:10 am on November 19, 2024:
    nit: This isn’t from the PR but i is an unsigned integer, so this second i==0 check is redundant. Could just be else {

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-11-21 09:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me