net: Disallow invalid HeadersSyncState due to lagging clock #35351

pull hodlinator wants to merge 3 commits into bitcoin:master from hodlinator:pr/35208_alt changing 5 files +54 −14
  1. hodlinator commented at 7:56 PM on May 21, 2026: contributor

    Problem

    Headers presync computes m_max_commitments from the elapsed time since the chain-start MTP plus MAX_FUTURE_BLOCK_TIME. When the local system clock is more than MAX_FUTURE_BLOCK_TIME behind the chain-start MTP, that elapsed value is negative, but it is used in arithmetic assigned to the unsigned commitment cap. This can turn the intended zero bound into a large cap, letting low-work headers presync continue instead of aborting when a reasonable commitment cap would have been exceeded.

    Fix

    Instead of allowing an invalid HeadersSyncState object to be created, throw an exception from the constructor and shut down the node.

    Typically, the node will detect that the system clock is set too far in the past when comparing it to the chain tip during chain state loading and shut down before we start syncing headers. So in practice this is very unlikely to make a difference (might be possible if the system clock jumps backwards after we loaded the chain state).

    Commits

    • A regression test pinning the current behavior.
    • The fix, along with the corresponding test change.
    • The headers_sync_state fuzz target also now allows the same clock-skew case instead of always choosing a time at or after the chain-start MTP.

    Replaces #35208 which was clamping m_max_commitments to zero and then letting the HeadersSyncState consume headers until the block height either reached the the next commitment_period point and aborted, or reached the minimum work threshold and succeeded (possible when having been offline for >144 blocks).

  2. DrahtBot added the label P2P on May 21, 2026
  3. DrahtBot commented at 7:56 PM on May 21, 2026: contributor

    <!--e57a25ab6845829454e8d69fc972939a-->

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    <!--006a51241073e994b41acfe9ec718e94-->

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/35351.

    <!--021abf342d371248e50ceaed478a90ca-->

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    Concept ACK w0xlt

    If your review is incorrectly listed, please copy-paste <code>&lt;!--meta-tag:bot-skip--&gt;</code> into the comment that the bot should ignore.

    <!--174a7506f384e20aa4161008e828411d-->

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #35482 (fuzz: exercise the transaction-handling path in process_message(s) by HowHsu)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

    <!--5faf32d7da4f0f540f40219e4f7537a3-->

    LLM Linter (✨ experimental)

    Possible places where comparison-specific test macros should replace generic comparisons:

    • [src/test/headers_sync_chainwork_tests.cpp] BOOST_CHECK_THROW(CreateState(), HeadersSyncState::SystemClockError); -> Consider BOOST_CHECK_EXCEPTION(..., HeadersSyncState::SystemClockError, HasReason("...")) so the test also verifies the error message, not just the exception type.

    <sup>2026-06-16 10:10:22</sup>

  4. DrahtBot added the label CI failed on May 21, 2026
  5. DrahtBot removed the label CI failed on May 22, 2026
  6. w0xlt commented at 2:01 PM on May 22, 2026: contributor

    Concept ACK

  7. hodlinator commented at 1:37 PM on May 27, 2026: contributor

    Been thinking more about this during the weekend.

    One could posit that Bitcoin nodes should be more shelf stable in a post-apocalyptic future where NTP servers are unreachable. Re-establishing a shared accurate definition of current UTC time just from observing the sun's position could be challenging.

    In that kind of scenario, preventing the node from continuing, as this PR does, could be problematic.

    So should we do like #35208 and just set m_max_commitments to 0 instead? It would still be pretty unforgiving if start-height is very close to a multiple of the commitment_period - maybe allowing at least 1 commitment could be argued? (commitment_period is currently 641, giving an average of 961 blocks before aborting if we start at a random height).

    Although, if clocks in the network disagree too much about UTC time, block propagation would also be suffering due to the rule to not accept blocks from too far into the future (MAX_FUTURE_BLOCK_TIME/2h). The current check failing would also mean that the we somehow ended up with a starting block read from disk that is dated more than 2h in the future, meaning our clock is not just out of sync with UTC but has also jumped backwards since we accepted the starting block. So I think the current approach in the PR is consistent with the rest of the node.

  8. in src/headerssync.cpp:45 in db9c704117 outdated
      42 | -                                       + MAX_FUTURE_BLOCK_TIME};
      43 | +    const NodeClock::time_point now{NodeClock::now()};
      44 | +    const int64_t max_seconds_since_start{Ticks<std::chrono::seconds>(now - NodeSeconds{std::chrono::seconds{chain_start.GetMedianTimePast()}})
      45 | +                                          + MAX_FUTURE_BLOCK_TIME};
      46 | +    if (max_seconds_since_start < 0) {
      47 | +        throw SystemClockError{strprintf(
    


    l0rinc commented at 12:31 PM on June 6, 2026:

    hodlinator commented at 8:57 AM on June 16, 2026:

    Yes, I reference the first one in the PR description:

    Typically, the node will detect that the system clock is set too far in the past when comparing it to the chain tip during chain state loading and shut down before we start syncing headers. So in practice this is very unlikely to make a difference (might be possible if the system clock jumps backwards after we loaded the chain state).

    So we shouldn't be the first ones to detect and fail during startup. But if we do detect it in HeadersSyncState(), we should not continue, and it doesn't make sense to keep running the node if our local tip and local clock are in disagreement.

  9. in src/test/fuzz/headerssync.cpp:1 in 2cb8aa05be outdated


    l0rinc commented at 12:46 PM on June 6, 2026:

    The PR description should not imply that the time has to be off compared to genesis, the actual condition is relative to the known block, so it can be a recent local tip/header, not genesis. Timezone being off likely does not change local UTC, but if the actual system clock jumps backward, the condition can happen if the headers connect to a known block whose MTP is more than 2 hours ahead of the wrong local time.


    hodlinator commented at 8:54 AM on June 16, 2026:

    The PR description should not imply that the time has to be off compared to genesis, the actual condition is relative to the known block, so it can be a recent local tip/header, not genesis.

    I don't follow. Do you take "before we start syncing headers" to imply syncing from genesis?

    Timezone being off likely does not change local UTC, but if the actual system clock jumps backward, the condition can happen if the headers connect to a known block whose MTP is more than 2 hours ahead of the wrong local time.

    PR description doesn't mention timezone? This PR is about the base of what headers connect to. It would be easier to understand if you suggested changes to specific sentences.


    l0rinc commented at 10:37 AM on June 16, 2026:

    There was a misunderstanding when I reported this about having to be off by years (i.e. pre-genesis time) for this to be dangerous - but it can probably suffice to be off by a few hours.


    hodlinator commented at 12:31 PM on June 16, 2026:

    (Realized I did mention the UTC timezone in #35351 (comment), but that's not the PR description).

    Let me know if there's a specific part of the PR description you think should be changed.

  10. in src/headerssync.h:106 in db9c704117 outdated
      98 | @@ -99,8 +99,13 @@ struct CompressedHeader {
      99 |   * sync (temporary, per-peer storage).
     100 |   */
     101 |  
     102 | -class HeadersSyncState {
     103 | +class HeadersSyncState
     104 | +{
     105 |  public:
     106 | +    struct SystemClockError : std::runtime_error {
     107 | +        using std::runtime_error::runtime_error;
    


    l0rinc commented at 12:52 PM on June 6, 2026:

    db9c704 net: Throw exception from HeadersSyncState when system clock is behind start block MTP:

    diff --git a/src/headerssync.h b/src/headerssync.h
    --- a/src/headerssync.h	(revision 39798b46d08ff642ed59b84a6eb2ee911c070c94)
    +++ b/src/headerssync.h	(revision 1675b824feaf899e16c66ccff74c8d65bc3bf182)
    @@ -15,6 +15,7 @@
     #include <util/hasher.h>
     
     #include <deque>
    +#include <stdexcept>
     #include <vector>
     
     // A compressed CBlockHeader, which leaves out the prevhash
    

    Not important if my other suggestions are taken.

  11. in src/test/fuzz/headerssync.cpp:73 in 2cb8aa05be
      69 | @@ -70,11 +70,18 @@ FUZZ_TARGET(headers_sync_state, .init = initialize_headers_sync_state_fuzz)
      70 |          .redownload_buffer_size = fuzzed_data_provider.ConsumeIntegralInRange<size_t>(0, Params().HeadersSync().redownload_buffer_size * 2),
      71 |      };
      72 |      arith_uint256 min_work{UintToArith256(ConsumeUInt256(fuzzed_data_provider))};
      73 | -    FuzzedHeadersSyncState headers_sync(
      74 | -        params,
      75 | -        /*commit_offset=*/fuzzed_data_provider.ConsumeIntegralInRange<size_t>(0, params.commitment_period - 1),
      76 | -        /*chain_start=*/start_index,
      77 | -        /*minimum_required_work=*/min_work);
      78 | +    std::unique_ptr<FuzzedHeadersSyncState> headers_sync;
    


    l0rinc commented at 12:59 PM on June 6, 2026:

    The target only needs to handle a constructor path that may throw (not sure why we'd want to do an extra heap allocation here). An std::optional seems like a better fit for signalling that:

    diff --git a/src/test/fuzz/headerssync.cpp b/src/test/fuzz/headerssync.cpp
    --- a/src/test/fuzz/headerssync.cpp	(revision 09a5ecc777aa6490e094a6283a4f8d3ae021989d)
    +++ b/src/test/fuzz/headerssync.cpp	(revision 1d58a9da6ac133f4f18fd8b11814d73bbb019215)
    @@ -72,9 +72,9 @@
             .redownload_buffer_size = fuzzed_data_provider.ConsumeIntegralInRange<size_t>(0, Params().HeadersSync().redownload_buffer_size * 2),
         };
         arith_uint256 min_work{UintToArith256(ConsumeUInt256(fuzzed_data_provider))};
    -    std::unique_ptr<FuzzedHeadersSyncState> headers_sync;
    +    std::optional<FuzzedHeadersSyncState> headers_sync;
         try {
    -        headers_sync = std::make_unique<FuzzedHeadersSyncState>(
    +        headers_sync.emplace(
                 params,
                 /*commit_offset=*/fuzzed_data_provider.ConsumeIntegralInRange<size_t>(0, params.commitment_period - 1),
                 /*chain_start=*/start_index,
    

    But we likely shouldn't need this new try/catch in the first place, we should rather do the validation outside and pass in a valid time instead.


    hodlinator commented at 9:23 AM on June 16, 2026:

    Taken, thanks!

  12. in src/test/headers_sync_chainwork_tests.cpp:260 in 2cb8aa05be outdated
     252 | @@ -252,4 +253,10 @@ BOOST_AUTO_TEST_CASE(too_little_work)
     253 |          /*exp_locator_hash=*/std::nullopt);
     254 |  }
     255 |  
     256 | +BOOST_AUTO_TEST_CASE(system_clock_lagging_behind_chain_start)
     257 | +{
     258 | +    const NodeClockContext clock_ctx{std::chrono::seconds{genesis.GetBlockTime() - MAX_FUTURE_BLOCK_TIME - 1}};
     259 | +    BOOST_CHECK_THROW(CreateState(), HeadersSyncState::SystemClockError);
     260 | +}
    


    l0rinc commented at 1:58 PM on June 6, 2026:

    Could the regression coverage pin the exact MAX_FUTURE_BLOCK_TIME boundary?

    BOOST_AUTO_TEST_CASE(system_clock_lagging_behind_chain_start)
    {
        const test_only_CheckFailuresAreExceptionsNotAborts mock_checks{};
        NodeClockContext clock_ctx{(genesis.GetBlockTime() - MAX_FUTURE_BLOCK_TIME) * 1s};
        BOOST_CHECK_NO_THROW(CreateState());
    
        clock_ctx -= 1s;
        BOOST_CHECK_EXCEPTION(CreateState(), NonFatalCheckError, HasReason{"Internal bug detected: max_seconds_since_start >= 0"});
    }
    

    hodlinator commented at 9:41 AM on June 16, 2026:

    Taken, thanks!

  13. in src/net_processing.cpp:2795 in 2cb8aa05be outdated
    2788 | @@ -2789,8 +2789,17 @@ bool PeerManagerImpl::TryLowWorkHeadersSync(Peer& peer, CNode& pfrom, const CBlo
    2789 |              // of headers is known, some header in this set must be new, so
    2790 |              // advancing to the first unknown header would be a small effect.
    2791 |              LOCK(peer.m_headers_sync_mutex);
    2792 | -            peer.m_headers_sync.reset(new HeadersSyncState(peer.m_id, m_chainparams.GetConsensus(),
    2793 | -                m_chainparams.HeadersSync(), chain_start_header, minimum_chain_work));
    2794 | +            try {
    2795 | +                peer.m_headers_sync.reset(new HeadersSyncState(peer.m_id, m_chainparams.GetConsensus(),
    2796 | +                    m_chainparams.HeadersSync(), chain_start_header, minimum_chain_work));
    2797 | +            } catch (const HeadersSyncState::SystemClockError& e) {
    


    l0rinc commented at 2:04 PM on June 6, 2026:

    We're using the exception here for control flow.

    Could the local clock value be captured and validated before HeadersSyncState so it no longer needs a recoverable exception type? We could also notify through fatalError() when the local clock is too far behind (passing the same now value into HeadersSyncState).

    This way we could do the validation before the call in PeerManagerImpl::TryLowWorkHeadersSync:

    const auto now{Now<NodeSeconds>()};
    if (now < NodeSeconds{(chain_start_header.GetMedianTimePast() - MAX_FUTURE_BLOCK_TIME) * 1s}) {
        m_chainman.GetNotifications().fatalError(Untranslated("System clock too far behind chain start MTP."));
        headers = {};
        return true;
    }
    LOCK(peer.m_headers_sync_mutex);
    peer.m_headers_sync.reset(new HeadersSyncState{peer.m_id, m_chainparams.GetConsensus(), m_chainparams.HeadersSync(), chain_start_header, minimum_chain_work, now});
    
    

    and HeadersSyncState::HeadersSyncState could remain just:

    const int64_t max_seconds_since_start{Ticks<std::chrono::seconds>(now - NodeSeconds{chain_start.GetMedianTimePast() * 1s}) + MAX_FUTURE_BLOCK_TIME};
    Assert(max_seconds_since_start >= 0);
    m_max_commitments = 6 * max_seconds_since_start / m_params.commitment_period;
    

    See https://github.com/l0rinc/bitcoin/pull/186/changes#diff-6875de769e90cec84d2e8a9c1b962cdbcda44d870d42e4215827e599e11e90e3R2792-R2797 for the whole implementation


    hodlinator commented at 9:20 AM on June 16, 2026:

    We're using the exception here for control flow.

    We're throwing an exception for this exceptional condition and catching it and aborting. This allows unit tests to work. You find it distasteful because the catch happens directly in the calling function? Relying on g_detail_test_only_CheckFailuresAreExceptionsNotAborts is kind of neat, didn't know of that. But would prefer something like TestOnlyAssertError being thrown instead of reusing NonFatalCheckError - is there some reason we don't want a separate error type?

    const auto now{Now<NodeSeconds>()}; if (now < NodeSeconds{(chain_start_header.GetMedianTimePast() - MAX_FUTURE_BLOCK_TIME) * 1s}) { m_chainman.GetNotifications().fatalError(Untranslated("System clock too far behind chain start MTP.")); headers = {}; return true; }

    This allows the process to continue, why would we want to do that once the local tip and clock are out of sync? Seems like we would want the node operator to intervene and fix the clock.

    Could the local clock value be captured and validated before HeadersSyncState

    That is an interesting question. How about instead of passing in now, we pass in uint64_t max_commitments to HeadersSyncState()? By only taking unsigned we push the caller to pass in a sane value (even though the could technically send in ~0ULL). It would also make the new unit test fairly redundant.


    l0rinc commented at 10:40 AM on June 16, 2026:

    You find it distasteful because the catch happens directly in the calling function?

    I dislike constructors throwing, especially when we have the option to prevalidate.

    is there some reason we don't want a separate error type

    I don't see why, I would prefer a simple solution that is easy to cherry-pick back to old branches.

    Seems like we would want the node operator to intervene and fix the clock

    isn't that what fatalError does?

    How about instead of passing in now, we pass in uint64_t max_commitments to HeadersSyncState()?

    Not sure, the side-effectful part is the time request not the commitment calculation. If we want to make the method pure/predictable we should eliminate the side effects, hence my suggestion.


    hodlinator commented at 12:40 PM on June 16, 2026:

    I dislike constructors throwing, especially when we have the option to prevalidate.

    Yeah, I wanted to make it testable and wasn't aware of g_detail_test_only_CheckFailuresAreExceptionsNotAborts. However, I prefer LogError(<user-friendly message>) + std::abort() over Assert(<condition>).

    I experimented with prevalidation now in https://github.com/bitcoin/bitcoin/compare/master...hodlinator:bitcoin:pr/35208_alt.extract_compute

    • Extracts the max commitments computation so we don't need to error inside the constructor.
    • Doesn't add the system_clock_lagging_behind_chain_start unit test nor increase the range of the start time for the fuzz test. It could probably be covered with a functional test that rewinds the clock to trigger the error in the context of net_processing, if you prefer this direction.

    Seems like we would want the node operator to intervene and fix the clock

    isn't that what fatalError does?

    Yikes, KernelNotifications::fatalError() triggers shutdown if m_shutdown_on_fatal_error is set. That's a bit more than I was expecting from a notification. :)

    How about instead of passing in now, we pass in uint64_t max_commitments to HeadersSyncState()?

    Not sure, the side-effectful part is the time request not the commitment calculation. If we want to make the method pure/predictable we should eliminate the side effects, hence my suggestion.

    Yes, pure functions are great but my primary intent would be to externalize the error handling.


    l0rinc commented at 8:20 AM on June 17, 2026:

    my primary intent would be to externalize the error handling

    Wouldn't my suggested time extraction and validation and fatalError do that? I won't block the max commitments computation extraction but seems like a workaround to me. I can be convinced otherwise.


    hodlinator commented at 9:16 AM on June 30, 2026:

    [Aa]ssert() should be used to validate logical pre-/post-conditions. The system clock disagreeing with the stored block tip timestamp is a borderline case, but I think it's fair to argue that it should not be treated with an assertion here, but rather as a marginally more legitimate edge-case to handle. Your change duplicates part of the calculation in net_processing.cpp which my externalization of the calculation in (pr/35208_alt.extract_compute) avoids, and you add an outer error handling case in net_processing which in practice makes us never reach the assertion you add.

    I think using std::abort() after having emitted the error is clean way to handle it without having to reset the headers-vector and consider other processing happening afterwards until the process exits. Is there any other work the node is doing in parallel with headers sync such as writing to disk in a way where it would be exceedingly dangerous to interrupt it and safer to use the fatal error notification flow?

    I won't block the max commitments computation extraction but seems like a workaround to me.

    Having HeadersSyncState() receive an unsigned max_commitments variable that is calculated externally helps ensure it's correct by construction, which I think is a good property.


    If you agree that https://github.com/bitcoin/bitcoin/compare/master...hodlinator:bitcoin:pr/35208_alt.extract_compute is an acceptable direction, I will explore creating a functional test to try to provoke the std::abort() edge-case.

  14. l0rinc changes_requested
  15. l0rinc commented at 6:31 PM on June 6, 2026: contributor

    I'm not sure this is going in the right direction, seems weird to discover time drift in the constructor and catch on use site - instead of sanitizing the value before call. Please see https://github.com/l0rinc/bitcoin/pull/186/commits for how I imagined simplifying this a lot more.

  16. DrahtBot added the label Needs rebase on Jun 9, 2026
  17. hodlinator force-pushed on Jun 16, 2026
  18. hodlinator commented at 9:48 AM on June 16, 2026: contributor

    Thanks for your review & suggestions @l0rinc! Pushed the straightforward ones for now.

    I'm not sure this is going in the right direction, seems weird to discover time drift in the constructor and catch on use site - instead of sanitizing the value before call.

    Curious what you think about sending unsigned max_commitments into HeadersSyncState() (see inline comment).

  19. test: Cover future chain-start MTP boundary in headers presync
    When the local clock sits more than `MAX_FUTURE_BLOCK_TIME` behind the chain-start MTP, the elapsed value used to derive `m_max_commitments` is negative.
    The later division by the unsigned commitment period converts that negative value to a large unsigned bound.
    Presync ends up with A LOT of commitment slots, whereas it should abort.
    
    Pin that pre-fix behavior in a regression test that the subsequent fix will alter.
    
    Co-authored-by: Lőrinc <pap.lorinc@gmail.com>
    0da66bc49b
  20. net: Throw exception from HeadersSyncState when system clock is behind start block MTP
    We should not proceed syncing headers from peers when the local system clock is incorrectly set.
    
    A node with a system clock set too far back will typically fail early during startup when the chainstate detects the tip to be too far in the future. This means that in practice we don't expect the failure to ever happen in net_processing.cpp.
    One can provoke the std::abort() in net_processing.cpp to happen through adding a call to SetMockTime(1) right before creating the HeadersSyncState.
    
    Issue discovered by: Lőrinc <pap.lorinc@gmail.com>
    80ab5acb45
  21. fuzz: Make headerssync target also cover HeadersSyncState::SystemClockError
    Co-authored-by: Lőrinc <pap.lorinc@gmail.com>
    1c3ad92d0b
  22. hodlinator force-pushed on Jun 16, 2026
  23. hodlinator commented at 10:13 AM on June 16, 2026: contributor

    (Rebased against master in latest push to resolve conflicts).

  24. DrahtBot removed the label Needs rebase on Jun 16, 2026
  25. l0rinc changes_requested

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-07-02 05:51 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me