andrewtoth commented at 6:21 pm on August 8, 2024: contributor

Since #28233, periodically writing the chainstate to disk every 24 hours does not clear the dbcache. Since #28280, periodically writing the chainstate to disk is proportional only to the amount of dirty entries in the cache. Due to these changes, it is no longer beneficial to only write the chainstate to disk every 24 hours. The periodic flush interval was necessary because every write of the chainstate would clear the dbcache. Now, we can get rid of the periodic flush interval and simply write the chainstate along with blocks and block index at least every hour.

Three benefits of doing this:

For IBD or reindex-chainstate with a combination of large dbcache setting, slow CPU, slow internet speed/unreliable peers, it could be up to 24 hours until the chainstate is persisted to disk. A power outage or crash could potentially lose up to 24 hours of progress. If there is a very large amount of dirty cache entries, writing to disk when a flush finally does occur will take a very long time. Crashing during this window of writing can cause #11600. By syncing every hour in unison with the block index we avoid this problem. Only a maximum of one hour of progress can be lost, and the window for crashing during writing is much smaller. For IBD with lower dbcache settings, faster CPU, or better internet speed/reliable peers, chainstate writes are already triggered more often than every hour so this change will have no effect on IBD.
Based on discussion in #28280, writing only once every 24 hours during long running operation of a node causes IO spikes. Writing smaller chainstate changes every hour like we do with blocks and block index will reduce IO spikes.
Faster shutdown speeds. All dirty chainstate entries must be persisted to disk on shutdown. If we have a lot of dirty entries, such as when close to 24 hours or if we sync with a large dbcache, it can take a long time to shutdown. By keeping the chainstate clean we avoid this problem.

Inspired by this comment.

Resolves https://github.com/bitcoin/bitcoin/issues/11600

DrahtBot commented at 6:21 pm on August 8, 2024: contributor

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/30611.

Reviews

See the guideline for information on the review process.

Type	Reviewers
ACK	achow101, davidgumberg, l0rinc, sipa

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#31703 (Use number of dirty cache entries in flush warnings/logs by sipa)
#30610 (validation: do not wipe utxo cache for stats/scans/snapshots by sipa)
#29365 (Extend signetchallenge to set target block spacing by starius)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

DrahtBot added the label Validation on Aug 8, 2024

andrewtoth force-pushed on Aug 8, 2024

gmaxwell commented at 8:51 pm on August 8, 2024: contributor

Before someone suggests that it should do this every block, a reason to not do this too much is that delaying writes prevents many writes from ever happening at all because many outputs are quickly spent.

I don’t see a particular problem for this, there will be some write traffic increase but I don’t expect it will be enough to meaningfully reduce the life time of flash on cheap devices (and if that were a concern the default level of logging is probably more of an issue). It’s the sort of thing someone could test, however.

Someone might also want to test the latency impact (e.g. on block template creation) of the traversal.

Though long term it would be preferable to introduce a data structure to track pending writes and keep a constantly flushing going which always lags the tip by N blocks, both avoiding the latency spikes and preventing unnecessary writes of entries that are just going to be spent in the next couple blocks.

mzumsande commented at 3:49 pm on August 9, 2024: contributor

Did you check if this would slow down IBD noticeably when running with a large dbcache? If i understand it right, coins that are created and destroyed more than 1 hour of syncing apart would not have been written to disk before, but will be written and deleted after this PR - though presumably this is not a frequent event, so there wouldn’t be much of an impact?!

andrewtoth commented at 4:47 pm on August 9, 2024: contributor

Before someone suggests that it should do this every block, a reason to not do this too much is that delaying writes prevents many writes from ever happening at all because many outputs are quickly spent.

If i understand it right, coins that are created and destroyed more than 1 hour of syncing apart would not have been written to disk before, but will be written and deleted after this PR

Correct, there will be significantly less utxo cut through with this PR.

I don’t see a particular problem for this, there will be some write traffic increase but I don’t expect it will be enough to meaningfully reduce the life time of flash on cheap devices (and if that were a concern the default level of logging is probably more of an issue). It’s the sort of thing someone could test, however.

though presumably this is not a frequent event, so there wouldn’t be much of an impact?!

For normal operation, this would mean writing 10s of MB every hour, vs 100s of MB every 24 hours. I don’t think that we should be concerned about that increase in disk IO? Seems negligible to me.

Did you check if this would slow down IBD noticeably when running with a large dbcache?

Not yet. I will run a reindex-chainstate benchmark with max dbcache and see what the impact is.

sipa commented at 8:40 pm on August 9, 2024: member

It would be good to know what impact it has on IBD, as it’s necessarily going to make things slower (more syncs means more disk writes, and more memory/cpu to remember UTXOs which must be marked spent on disk), for the benefit of reducing losses in case of crashes. The question is how much time & I/O it adds.

Regarding doing it for every block (during steady state), the trade-off there is different but still exists. On the pro side there is a reduction of latency spikes instead of reducing crash losses (as these are inherently less relevant in steady state). On the con side there is an increase in frequency of writes (and the latency spikes they bring, albeit smaller ones) instead of a sync speed slowdown.

We don’t necessarily need to use the same sync frequency during IBD and during steady state, but the effects during steady state seem harder to measure, so it’s hard to say what is acceptable there.

One related idea: maybe we should randomize the sync event times a bit (say, uniformly random intervals between 50 and 70 minutes) to prevent a situation where the network over time settles into a few cohorts of synchronized syncers (think like communicating metronomes: they start randomly, but have synchronized events (blocks arriving) that slightly delay sync events if they were to coincide).

andrewtoth commented at 9:44 pm on August 9, 2024: contributor

It would be good to know what impact it has on IBD, as it’s necessarily going to make things slower

Note that it is only necessarily slower when running with very high dbcache settings. For default settings the cache fills up and is flushed far more often than every hour, so this will have no effect.

andrewtoth commented at 1:54 pm on August 10, 2024: contributor

This change didn’t appear to have any speed impact for a reindex-chainstate using max dbcache. Ran the following benchmark:

 0nohup hyperfine \
 1--export-markdown ~/bench.md \
 2--show-output \
 3--parameter-list commit 81508954f1f89f95b6fdca10c3c471cdb6566c80,27a770b34b8f1dbb84760f442edb3e23a0c2420b \
 4--prepare 'git checkout {commit} && \
 5make -j$(nproc) src/bitcoind; \
 6sync; sudo /sbin/sysctl vm.drop_caches=3;' \
 7-M 2 \
 8'echo '{commit}' && \
 9./src/bitcoind -datadir=/home/user/.bitcoin \
10-reindex-chainstate \
11-dbcache=16384 \
12-printtoconsole=0 \
13-connect=0 \
14-stopatheight=820000' &

Results are the same:

 0Benchmark 1: echo 81508954f1f89f95b6fdca10c3c471cdb6566c80 && \
 1./src/bitcoind -datadir=/home/user/.bitcoin \
 2-reindex-chainstate \
 3-dbcache=16384 \
 4-printtoconsole=0 \
 5-connect=0 \
 6-stopatheight=820000
 7
 881508954f1f89f95b6fdca10c3c471cdb6566c80
 9  Time (mean ± σ):     12239.620 s ±  3.156 s    [User: 12366.988 s, System: 562.121 s]
10  Range (min … max):   12237.388 s … 12241.851 s    2 runs
11 
12Benchmark 2: echo 27a770b34b8f1dbb84760f442edb3e23a0c2420b && \
13./src/bitcoind -datadir=/home/user/.bitcoin \
14-reindex-chainstate \
15-dbcache=16384 \
16-printtoconsole=0 \
17-connect=0 \
18-stopatheight=820000
19
2027a770b34b8f1dbb84760f442edb3e23a0c2420b
21  Time (mean ± σ):     12243.072 s ±  6.588 s    [User: 12382.386 s, System: 559.349 s]
22  Range (min … max):   12238.413 s … 12247.730 s    2 runs
23 
24Summary
25  'echo 81508954f1f89f95b6fdca10c3c471cdb6566c80 && \
26./src/bitcoind -datadir=/home/user/.bitcoin \
27-reindex-chainstate \
28-dbcache=16384 \
29-printtoconsole=0 \
30-connect=0 \
31-stopatheight=820000' ran
32    1.00 ± 0.00 times faster than 'echo 27a770b34b8f1dbb84760f442edb3e23a0c2420b && \
33./src/bitcoind -datadir=/home/user/.bitcoin \
34-reindex-chainstate \
35-dbcache=16384 \
36-printtoconsole=0 \
37-connect=0 \
38-stopatheight=820000'

So, I would say this PR is strictly beneficial for IBD/reindex-chainstate. It is the same speed, but protects against data loss.

gmaxwell commented at 6:32 pm on August 10, 2024: contributor

I didn’t comment on IBD because I figured a hit per hour isn’t that significant in the context of IBD and likely pays for itself in improving crash robustness. I’m a little surprised that the effect is quite so small, but it does probably improve performance too by making writes a little more concurrent.

Sipa: good point on the injection locking, I’ve previously wondered if the the 24 hour behavior could sync up.

andrewtoth commented at 2:14 am on August 15, 2024: contributor

I’m a little surprised that the effect is quite so small @gmaxwell I was as well, so I ran the same benchmark with -reindex as well and it gave me similar results:

Command	Mean [s]	Min [s]	Max [s]	Relative
`echo 81508954f1f89f95b6fdca10c3c471cdb6566c80 && ./src/bitcoind -datadir=/home/user/.bitcoin -reindex -dbcache=16384 -printtoconsole=0 -connect=0 -stopatheight=820000`	40765.075 ± 132.452	40671.417	40858.733	1.01 ± 0.00
`echo 27a770b34b8f1dbb84760f442edb3e23a0c2420b && ./src/bitcoind -datadir=/home/user/.bitcoin -reindex -dbcache=16384 -printtoconsole=0 -connect=0 -stopatheight=820000`	40536.484 ± 46.379	40503.689	40569.279	1.00

Will get some measurements for total chainstate bytes written and total time writing chainstate before and after this change during steady state. But, this will take a few weeks to get meaningful data. Even then there will be a lot of variance depending on how many utxos are being eagerly spent within 24 hours vs 1 hour at the time I am measuring.

Someone might also want to test the latency impact (e.g. on block template creation) of the traversal.

I’m not sure I understand what you are suggesting to measure here, or what you mean by traversal. We can measure the total time writing chainstate to disk with -debug=bench, which would be the latency impact if that’s what you mean.

Though long term it would be preferable to introduce a data structure to track pending writes and keep a constantly flushing going which always lags the tip by N blocks, both avoiding the latency spikes and preventing unnecessary writes of entries that are just going to be spent in the next couple blocks.

You mentioned a similar idea a while ago here. I agree with you there that it is a much bigger project. If this change is accepted and merged, would there really be a benefit anymore of adding something like that?

l0rinc commented at 1:26 pm on August 19, 2024: contributor

I’ll review this in more detail later, but I have a question first: Whenever a production change is made without corresponding test changes (yet the build passes), it suggests to me that there might be a gap in our coverage. Do you think it would make sense to add a test that reproduces some aspect of the old behavior before making the change—one that would need to be updated in the commit that introduces the change?

andrewtoth commented at 1:31 pm on August 19, 2024: contributor

it would make sense to add a test that reproduces some aspect of the old behavior before making the change—one that would need to be updated in the commit that introduces the change?

Good point, I can add a test that that expects flush every 24 hours, then reduce it to 1 hour after this change.

in src/validation.cpp:2849 in 81508954f1 outdated

2851         bool fPeriodicWrite = mode == FlushStateMode::PERIODIC && nNow > m_last_write + DATABASE_WRITE_INTERVAL;
2852-        // It's been very long since we flushed the cache. Do this infrequently, to optimize cache usage.
2853-        bool fPeriodicFlush = mode == FlushStateMode::PERIODIC && nNow > m_last_flush + DATABASE_FLUSH_INTERVAL;
2854         // Combine all conditions that result in a full cache flush.
2855-        fDoFullFlush = (mode == FlushStateMode::ALWAYS) || fCacheLarge || fCacheCritical || fPeriodicFlush || fFlushForPrune;
2856+        bool fDoFullFlush = (mode == FlushStateMode::ALWAYS) || fCacheLarge || fCacheCritical || fPeriodicWrite || fFlushForPrune;

l0rinc commented at 1:37 pm on August 19, 2024:

To simplify review, could you please separate the concerns - having low-risk changes (e.g. moves, renames, inlines, rewording) in separate commits from high risk ones?

andrewtoth commented at 5:44 pm on August 31, 2024:

There really isn’t much I am doing here that can be split up. I added a second commit for moving the declaration of fDoFullFlush here.

andrewtoth force-pushed on Aug 31, 2024

DrahtBot added the label CI failed on Aug 31, 2024

DrahtBot commented at 5:59 pm on August 31, 2024: contributor

🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/29514781087

Make sure to run all tests locally, according to the documentation.

The failure may happen due to a number of reasons, for example:

Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.
An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

DrahtBot removed the label CI failed on Aug 31, 2024

in src/validation.cpp:2843 in 1c1b63a041 outdated

2842         if (m_last_write == decltype(m_last_write){}) {
2843             m_last_write = nNow;
2844-        }
2845-        if (m_last_flush == decltype(m_last_flush){}) {
2846-            m_last_flush = nNow;
2847+            m_random_write_delay = std::chrono::minutes{(FastRandomContext().randrange(DATABASE_WRITE_INTERVAL_RANDOM_DELAY_RANGE))};

sipa commented at 5:40 pm on September 1, 2024:

It’d be a bit cleaner (and more granular) to have:

0static constexpr auto DATABASE_WRITE_INTERVAL_MIN{50min};
1static constexpr auto DATABASE_WRITE_INTERVAL_MAX{70min};
2...
3m_random_write_delay = DATABASE_WRITE_INTERVAL_MIN +
4    FastRandomContext().rand_uniform_duration<NodeClock>(DATABASE_WRITE_INTERVAL_MAX - DATABASE_WRITE_INTERVAL_MIN);`

l0rinc commented at 8:19 pm on September 1, 2024:

given that this delay is written multiple times, would that be more verbose? Alternatively maybe we could add an m_next_write next to m_last_write, like:

 0diff --git a/src/validation.cpp b/src/validation.cpp
 1index b54446be4e..e2c7272531 100644
 2--- a/src/validation.cpp
 3+++ b/src/validation.cpp
 4@@ -86,10 +86,9 @@ using node::CBlockIndexHeightOnlyComparator;
 5 using node::CBlockIndexWorkComparator;
 6 using node::SnapshotMetadata;
 7 
 8-/** Time to wait between writing blocks/block index and chainstate to disk. */
 9-static constexpr std::chrono::minutes DATABASE_WRITE_INTERVAL{50};
10-/** Range of random delay to add to DATABASE_WRITE_INTERVAL */
11-static constexpr uint32_t DATABASE_WRITE_INTERVAL_RANDOM_DELAY_RANGE{20};
12+/** Minimum and maximum time to wait between writing blocks/block index and chainstate to disk. */
13+static constexpr auto DATABASE_WRITE_INTERVAL_MIN{50min};
14+static constexpr auto DATABASE_WRITE_INTERVAL_MAX{70min};
15 /** Maximum age of our tip for us to be considered current for fee estimation */
16 static constexpr std::chrono::hours MAX_FEE_ESTIMATION_TIP_AGE{3};
17 const std::vector<std::string> CHECKLEVEL_DOC {
18@@ -2776,6 +2775,11 @@ CoinsCacheSizeState Chainstate::GetCoinsCacheSizeState(
19     return CoinsCacheSizeState::OK;
20 }
21 
22+NodeClock::time_point Chainstate::CalculateNextWrite(NodeClock::time_point after) const
23+{
24+    return FastRandomContext().rand_uniform_delay(after + DATABASE_WRITE_INTERVAL_MIN, DATABASE_WRITE_INTERVAL_MAX - DATABASE_WRITE_INTERVAL_MIN);
25+}
26+
27 bool Chainstate::FlushStateToDisk(
28     BlockValidationState &state,
29     FlushStateMode mode,
30@@ -2840,14 +2844,14 @@ bool Chainstate::FlushStateToDisk(
31         // Avoid writing/flushing immediately after startup.
32         if (m_last_write == decltype(m_last_write){}) {
33             m_last_write = nNow;
34-            m_random_write_delay = std::chrono::minutes{(FastRandomContext().randrange(DATABASE_WRITE_INTERVAL_RANDOM_DELAY_RANGE))};
35+            m_next_write = CalculateNextWrite(m_last_write);
36         }
37         // The cache is large and we're within 10% and 10 MiB of the limit, but we have time now (not in the middle of a block processing).
38         bool fCacheLarge = mode == FlushStateMode::PERIODIC && cache_state >= CoinsCacheSizeState::LARGE;
39         // The cache is over the limit, we have to write now.
40         bool fCacheCritical = mode == FlushStateMode::IF_NEEDED && cache_state >= CoinsCacheSizeState::CRITICAL;
41         // It's been a while since we wrote the block index to disk. Do this frequently, so we don't need to redownload after a crash.
42-        bool fPeriodicWrite = mode == FlushStateMode::PERIODIC && nNow > m_last_write + DATABASE_WRITE_INTERVAL + m_random_write_delay;
43+        bool fPeriodicWrite = mode == FlushStateMode::PERIODIC && nNow >= m_next_write;
44         // Combine all conditions that result in a full cache flush.
45         bool fDoFullFlush = (mode == FlushStateMode::ALWAYS) || fCacheLarge || fCacheCritical || fPeriodicWrite || fFlushForPrune;
46         // Write blocks, block index and best chain related state to disk.
47@@ -2882,7 +2886,7 @@ bool Chainstate::FlushStateToDisk(
48                 m_blockman.UnlinkPrunedFiles(setFilesToPrune);
49             }
50             m_last_write = nNow;
51-            m_random_write_delay = std::chrono::minutes{(FastRandomContext().randrange(DATABASE_WRITE_INTERVAL_RANDOM_DELAY_RANGE))};
52+            m_next_write = CalculateNextWrite(m_last_write);
53 
54             if (!CoinsTip().GetBestBlock().IsNull()) {
55                 LOG_TIME_MILLIS_WITH_CATEGORY(strprintf("write coins cache to disk (%d coins, %.2fkB)",
56diff --git a/src/validation.h b/src/validation.h
57index dfe33b8e29..953793bdf7 100644
58--- a/src/validation.h
59+++ b/src/validation.h
60@@ -810,7 +810,9 @@ private:
61         EXCLUSIVE_LOCKS_REQUIRED(::cs_main);
62 
63     NodeClock::time_point m_last_write{};
64-    std::chrono::minutes m_random_write_delay{};
65+    NodeClock::time_point m_next_write{};
66+
67+    NodeClock::time_point CalculateNextWrite(NodeClock::time_point after) const;
68 
69     /**
70      * In case of an invalid snapshot, rename the coins leveldb directory so

note: this way it should probably be nNow >= m_next_write instead of nNow > m_next_write to be consistent with the naming.

andrewtoth commented at 2:25 am on September 2, 2024:

Thanks for the suggestions! I could not get @sipa’s to compile, so went with @l0rinc’s.

andrewtoth commented at 1:50 am on September 2, 2024: contributor

I got some stats for running in steady state for this branch and master. I ran 4 nodes for 11 days and 18 hours using identical aws t2-small instances backed by 20GB EBS volumes all connected to another node in the same VPC.

Branch	Total time writing chainstate	Total bytes written
master 1	26372ms	497 MiB
master 2	28188ms	497 MiB
branch 1	32496ms	760 MiB
branch 2	32566ms	760 MiB

By writing every hour, we write about 50% more data, but we are only spending 20% more time writing. On master we write about 42 MiB every 24 hours which locks the main thread for 2.3 seconds. On this branch we write about 2.7 MiB every hour which locks the main thread for 115 milliseconds. So it’s about 0.9 MiB more data written every hour, which I think is a negligible amount. Note that this extra data is ephemeral and will be erased as these extra utxos are spent. It will not cause nodes to have to store more data.

Here are the write operations charts for the time period. You can see the write operations are more evenly spaced out by writing every hour. Master 1 Master 2 Branch 2 Branch 1

andrewtoth force-pushed on Sep 2, 2024

andrewtoth commented at 2:26 am on September 2, 2024: contributor

@gmaxwell @l0rinc @mzumsande @sipa thank you all for your reviews! I’ve updated the PR to have a random window for periodic flushes of between 50 and 70 minutes. I’ve also added a unit test that checks for a flush in this window. That unit test can be cherry-picked to master and verified that it does not pass.

andrewtoth force-pushed on Sep 2, 2024

in src/validation.cpp:2786 in c529a595dc outdated

2774@@ -2774,6 +2775,11 @@ CoinsCacheSizeState Chainstate::GetCoinsCacheSizeState(
2775     return CoinsCacheSizeState::OK;
2776 }
2777 
2778+SteadyClock::time_point CalculateNextWrite(SteadyClock::time_point after)
2779+{
2780+    return FastRandomContext().rand_uniform_delay(after + DATABASE_WRITE_INTERVAL_MIN, DATABASE_WRITE_INTERVAL_MAX - DATABASE_WRITE_INTERVAL_MIN);
2781+}

l0rinc commented at 11:27 am on September 2, 2024:

I was a bit hasty yesterday - the MIN/MAX naming indicates that the interval is closed (i.e. can actually reach the values), so we should add 1 minute to the range:

0NodeClock::time_point CalculateNextWrite(const NodeClock::time_point after)
1{
2    const auto time{after + DATABASE_WRITE_INTERVAL_MIN};
3    constexpr auto range{DATABASE_WRITE_INTERVAL_MAX - DATABASE_WRITE_INTERVAL_MIN + 1min};
4    return FastRandomContext().rand_uniform_delay(time, range);
5}

See:

 0BOOST_AUTO_TEST_CASE(CalculateNextWrite_interval)
 1{
 2    NodeClock::time_point after = NodeClock::now();
 3    std::set<int64_t> results;
 4    for (int i = 0; i < 100; ++i) {
 5        auto next = CalculateNextWrite(after);
 6        auto diff = (next - after) / std::chrono::minutes(1);
 7        results.insert(diff);
 8    }
 9    BOOST_CHECK_EQUAL(results.size(), 21);
10}

andrewtoth commented at 2:27 pm on September 2, 2024:

Done.

in src/test/chainstate_write_tests.cpp:37 in 5cad9d16dd outdated

30+
31+    // Time window to periodically flush is between 50 and 70 minutes
32+    SetMockTime(GetTime<std::chrono::minutes>() + 49min);
33+    chainstate.FlushStateToDisk(state_dummy, FlushStateMode::PERIODIC);
34+    m_node.validation_signals->SyncWithValidationInterfaceQueue();
35+    BOOST_CHECK(!sub->m_did_flush);

l0rinc commented at 12:11 pm on September 2, 2024:

The initial run should never flush, right? SetMockTime(GetTime<std::chrono::minutes>() + 55min); passes also, so we should likely update the test . Actually, sometimes it passes, other times it fails - this is the reason for my NACK

 0% build/src/test/test_bitcoin --run_test=chainstate_write_tests/chainstate_write_interval        
 1Running 1 test case...
 2src/test/chainstate_write_tests.cpp:35: error: in "chainstate_write_tests/chainstate_write_interval": check !sub->m_did_flush has failed
 3
 4*** 1 failure is detected in the test module "Bitcoin Core Test Suite"
 5% build/src/test/test_bitcoin --run_test=chainstate_write_tests/chainstate_write_interval        
 6Running 1 test case...
 7
 8*** No errors detected
 9% build/src/test/test_bitcoin --run_test=chainstate_write_tests/chainstate_write_interval
10Running 1 test case...
11
12*** No errors detected
13% build/src/test/test_bitcoin --run_test=chainstate_write_tests/chainstate_write_interval
14Running 1 test case...
15
16*** No errors detected
17% build/src/test/test_bitcoin --run_test=chainstate_write_tests/chainstate_write_interval
18Running 1 test case...
19
20*** No errors detected
21% build/src/test/test_bitcoin --run_test=chainstate_write_tests/chainstate_write_interval
22Running 1 test case...
23src/test/chainstate_write_tests.cpp:35: error: in "chainstate_write_tests/chainstate_write_interval": check !sub->m_did_flush has failed
24
25*** 1 failure is detected in the test module "Bitcoin Core Test Suite"

andrewtoth commented at 1:55 pm on September 2, 2024:

Well, changing the test to periodically flush after 55 minutes makes it a different test. That would mean the first flush is in the window and could randomly trigger a flush or not. Was the failure after changing to 55min?

andrewtoth commented at 2:16 pm on September 2, 2024:

Ahh, I see what you mean. We need to trigger an initial flush to set m_next_write first before testing the periodic flush window. Fixed.

in src/test/chainstate_write_tests.cpp:13 in 5cad9d16dd outdated

19+    {
20+        m_did_flush = true;
21+    }
22+};
23+
24+BOOST_FIXTURE_TEST_CASE(chainstate_write_interval, TestingSetup)

l0rinc commented at 12:41 pm on September 2, 2024:

Besides the stability and the initial run, I think we should test the valid values as well inside the range, e.g.

 0BOOST_FIXTURE_TEST_CASE(chainstate_write_interval, TestingSetup)
 1{
 2    auto DATABASE_WRITE_INTERVAL_MIN{50min};
 3    auto DATABASE_WRITE_INTERVAL_MAX{70min};
 4    auto now = GetTime<std::chrono::minutes>();
 5    auto sub{std::make_shared<TestSubscriber>()};
 6    m_node.validation_signals->RegisterSharedValidationInterface(sub);
 7    auto test_flush = [&](std::chrono::minutes offset) {
 8        assert(!sub->m_did_flush);
 9        Chainstate& chainstate{Assert(m_node.chainman)->ActiveChainstate()};
10        BlockValidationState state_dummy{};
11
12        SetMockTime(now + offset);
13        chainstate.FlushStateToDisk(state_dummy, FlushStateMode::PERIODIC);
14        m_node.validation_signals->SyncWithValidationInterfaceQueue();
15        return sub->m_did_flush;
16    };
17
18    // Initial run shouldn't flush
19    BOOST_CHECK_EQUAL(test_flush(DATABASE_WRITE_INTERVAL_MIN + 5min), false);
20
21    // Test all values (+1) in the interval
22    BOOST_CHECK_EQUAL(test_flush(DATABASE_WRITE_INTERVAL_MIN - 1min), false);
23    for (auto m = DATABASE_WRITE_INTERVAL_MIN; m <= DATABASE_WRITE_INTERVAL_MAX + 1min; m += 1min) {
24        BOOST_CHECK_EQUAL(test_flush(m), true);
25    }
26}

andrewtoth commented at 1:59 pm on September 2, 2024:

Why would we need to test every value in the window? Testing the edges of the window accomplishes the same thing, no?

l0rinc commented at 4:14 pm on September 2, 2024:

~~Using SteadyClock in #30611 (review) seems to have stabilized the tests.~~

Edit: it seems the addition of the first periodic flush stabilized the test.

Approach ACK

in src/validation.cpp:89 in 5cad9d16dd outdated

85@@ -86,10 +86,9 @@ using node::CBlockIndexHeightOnlyComparator;
86 using node::CBlockIndexWorkComparator;
87 using node::SnapshotMetadata;
88 
89-/** Time to wait between writing blocks/block index to disk. */
90-static constexpr std::chrono::hours DATABASE_WRITE_INTERVAL{1};
91-/** Time to wait between flushing chainstate to disk. */
92-static constexpr std::chrono::hours DATABASE_FLUSH_INTERVAL{24};
93+/** Time window to wait between writing blocks/block index and chainstate to disk. */

l0rinc commented at 12:43 pm on September 2, 2024:

can we add @sipa’s reasoning here for why it’s not a fixed value?

andrewtoth commented at 2:27 pm on September 2, 2024:

Done.

in src/validation.cpp:2849 in 5cad9d16dd outdated

2839@@ -2837,26 +2840,21 @@ bool Chainstate::FlushStateToDisk(
2840                 }
2841             }
2842         }
2843-        const auto nNow{SteadyClock::now()};
2844+        const auto nNow{NodeClock::now()};
2845         // Avoid writing/flushing immediately after startup.

l0rinc commented at 12:51 pm on September 2, 2024:

If my understanding is correct:

at startup we never want to flush, but the initial value of m_next_write would trigger one, so we have to set the next write value in that case quickly to prevent one
when we do flush, we want to set the next flush’s time again

I think we can deduplicate setting this value twice (which also enables us to inline CalculateNextWrite):

setting m_next_write to max, which won’t trigger fDoFullFlush since it’s >> nNow
add another boolean where we check for initial run i.e. m_next_write == NodeClock::time_point::max() (or we can inline this, it’s only used once)
set the next flush time when fInitialRun || fDoFullFlush, independently from the if (fDoFullFlush) { block

 0diff --git a/src/validation.cpp b/src/validation.cpp
 1index 6b401799bb..33fcc30806 100644
 2--- a/src/validation.cpp
 3+++ b/src/validation.cpp
 4@@ -2775,11 +2775,6 @@ CoinsCacheSizeState Chainstate::GetCoinsCacheSizeState(
 5     return CoinsCacheSizeState::OK;
 6 }
 7 
 8-NodeClock::time_point CalculateNextWrite(NodeClock::time_point after)
 9-{
10-    return FastRandomContext().rand_uniform_delay(after + DATABASE_WRITE_INTERVAL_MIN, DATABASE_WRITE_INTERVAL_MAX - DATABASE_WRITE_INTERVAL_MIN);
11-}
12-
13 bool Chainstate::FlushStateToDisk(
14     BlockValidationState &state,
15     FlushStateMode mode,
16@@ -2841,10 +2836,6 @@ bool Chainstate::FlushStateToDisk(
17             }
18         }
19         const auto nNow{NodeClock::now()};
20-        // Avoid writing/flushing immediately after startup.
21-        if (m_next_write == decltype(m_next_write){}) {
22-            m_next_write = CalculateNextWrite(nNow);
23-        }
24         // The cache is large and we're within 10% and 10 MiB of the limit, but we have time now (not in the middle of a block processing).
25         bool fCacheLarge = mode == FlushStateMode::PERIODIC && cache_state >= CoinsCacheSizeState::LARGE;
26         // The cache is over the limit, we have to write now.
27@@ -2853,6 +2844,15 @@ bool Chainstate::FlushStateToDisk(
28         bool fPeriodicWrite = mode == FlushStateMode::PERIODIC && nNow >= m_next_write;
29         // Combine all conditions that result in a full cache flush.
30         bool fDoFullFlush = (mode == FlushStateMode::ALWAYS) || fCacheLarge || fCacheCritical || fPeriodicWrite || fFlushForPrune;
31+        // Avoid writing/flushing immediately after startup
32+        bool fInitialRun = m_next_write == NodeClock::time_point::max();
33+
34+        if (fInitialRun || fDoFullFlush) {
35+            const auto time{nNow + DATABASE_WRITE_INTERVAL_MIN};
36+            constexpr auto range{DATABASE_WRITE_INTERVAL_MAX - DATABASE_WRITE_INTERVAL_MIN + 1min};
37+            m_next_write = FastRandomContext().rand_uniform_delay(time, range);
38+        }
39+
40         // Write blocks, block index and best chain related state to disk.
41         if (fDoFullFlush) {
42             // Ensure we can write block index
43@@ -2884,7 +2884,6 @@ bool Chainstate::FlushStateToDisk(
44 
45                 m_blockman.UnlinkPrunedFiles(setFilesToPrune);
46             }
47-            m_next_write = CalculateNextWrite(nNow);
48 
49             if (!CoinsTip().GetBestBlock().IsNull()) {
50                 LOG_TIME_MILLIS_WITH_CATEGORY(strprintf("write coins cache to disk (%d coins, %.2fkB)",
51diff --git a/src/validation.h b/src/validation.h
52index 1c2fd0bd3c..ea737a9c4c 100644
53--- a/src/validation.h
54+++ b/src/validation.h
55@@ -808,7 +808,7 @@ private:
56     void UpdateTip(const CBlockIndex* pindexNew)
57         EXCLUSIVE_LOCKS_REQUIRED(::cs_main);
58 
59-    NodeClock::time_point m_next_write{};
60+    NodeClock::time_point m_next_write{NodeClock::time_point::max()};
61 
62     /**
63      * In case of an invalid snapshot, rename the coins leveldb directory so

andrewtoth commented at 2:01 pm on September 2, 2024:

I’m not sure this is an improvement. We would be adding a new variable, instead of calling a function twice. Seems like a wash to me.

l0rinc commented at 3:53 pm on September 2, 2024:

we don’t have to add a variable, we can just inline it to:

0if (fDoFullFlush || m_next_write == NodeClock::time_point::max()) {

Which would simplify a few places (logically, making it obvious when we’re queuing the next run, and also codewise, reducing duplication), removing the helper method and explanatory comments and multiple assignments (i.e. business logic being spread around).

andrewtoth commented at 9:15 pm on September 2, 2024:

This seems like just replacing if (m_next_write == decltype(m_next_write){}) { with if (m_next_write == NodeClock::time_point::max()) {, and since it’s max we can just move the if block below fPeriodicWrite assignment. I don’t see how this is an improvement? Doesn’t having multiple checks for fDoFullFlush make it more complex?

l0rinc commented at 9:24 pm on September 2, 2024:

it’s not just replacing the if, please apply this patch on top to see what I mean:

 0diff --git a/src/validation.cpp b/src/validation.cpp
 1index 87a144cb99..6fdc070db3 100644
 2--- a/src/validation.cpp
 3+++ b/src/validation.cpp
 4@@ -2778,13 +2778,6 @@ CoinsCacheSizeState Chainstate::GetCoinsCacheSizeState(
 5     return CoinsCacheSizeState::OK;
 6 }
 7 
 8-NodeClock::time_point CalculateNextWrite(NodeClock::time_point after)
 9-{
10-    const auto time{after + DATABASE_WRITE_INTERVAL_MIN};
11-    constexpr auto range{DATABASE_WRITE_INTERVAL_MAX - DATABASE_WRITE_INTERVAL_MIN + 1min};
12-    return FastRandomContext().rand_uniform_delay(time, range);
13-}
14-
15 bool Chainstate::FlushStateToDisk(
16     BlockValidationState &state,
17     FlushStateMode mode,
18@@ -2846,10 +2839,6 @@ bool Chainstate::FlushStateToDisk(
19             }
20         }
21         const auto nNow{NodeClock::now()};
22-        // Avoid writing/flushing immediately after startup.
23-        if (m_next_write == decltype(m_next_write){}) {
24-            m_next_write = CalculateNextWrite(nNow);
25-        }
26         // The cache is large and we're within 10% and 10 MiB of the limit, but we have time now (not in the middle of a block processing).
27         bool fCacheLarge = mode == FlushStateMode::PERIODIC && cache_state >= CoinsCacheSizeState::LARGE;
28         // The cache is over the limit, we have to write now.
29@@ -2858,6 +2847,12 @@ bool Chainstate::FlushStateToDisk(
30         bool fPeriodicWrite = mode == FlushStateMode::PERIODIC && nNow >= m_next_write;
31         // Combine all conditions that result in a full cache flush.
32         bool fDoFullFlush = (mode == FlushStateMode::ALWAYS) || fCacheLarge || fCacheCritical || fPeriodicWrite || fFlushForPrune;
33+
34+        if (fDoFullFlush || m_next_write == NodeClock::time_point::max()) {
35+            constexpr auto range{DATABASE_WRITE_INTERVAL_MAX - DATABASE_WRITE_INTERVAL_MIN + 1min};
36+            m_next_write = FastRandomContext().rand_uniform_delay(nNow + DATABASE_WRITE_INTERVAL_MIN, range);
37+        }
38+
39         // Write blocks, block index and best chain related state to disk.
40         if (fDoFullFlush) {
41             // Ensure we can write block index
42@@ -2889,7 +2884,6 @@ bool Chainstate::FlushStateToDisk(
43 
44                 m_blockman.UnlinkPrunedFiles(setFilesToPrune);
45             }
46-            m_next_write = CalculateNextWrite(nNow);
47 
48             if (!CoinsTip().GetBestBlock().IsNull()) {
49                 LOG_TIME_MILLIS_WITH_CATEGORY(strprintf("write coins cache to disk (%d coins, %.2fkB)",
50diff --git a/src/validation.h b/src/validation.h
51index 1c2fd0bd3c..ea737a9c4c 100644
52--- a/src/validation.h
53+++ b/src/validation.h
54@@ -808,7 +808,7 @@ private:
55     void UpdateTip(const CBlockIndex* pindexNew)
56         EXCLUSIVE_LOCKS_REQUIRED(::cs_main);
57 
58-    NodeClock::time_point m_next_write{};
59+    NodeClock::time_point m_next_write{NodeClock::time_point::max()};
60 
61     /**
62      * In case of an invalid snapshot, rename the coins leveldb directory so

andrewtoth commented at 2:33 am on September 3, 2024:

Ok, done!

l0rinc commented at 12:57 pm on September 2, 2024: contributor

The test runs unreliably for some reason, we need to fix that first.

NACK 5cad9d16dd07859a76c6637789695bf1b4f36e1c

On this branch […] main thread for 115 milliseconds.

This sounds awesome!

Good point, I can add a test that that expects flush every 24 hours, then reduce it to 1 hour after this change.

Do you think we could still do something like this, i.e. add a test as a very first commit, which reproduces the old behavior first?

l0rinc changes_requested

andrewtoth force-pushed on Sep 2, 2024

andrewtoth commented at 2:48 pm on September 2, 2024: contributor

add a test as a very first commit, which reproduces the old behavior first?

Yeah, I realize the test commit can’t be cherry-picked cleanly to master. Updated to have the test commit first, and update the test for each change.

andrewtoth force-pushed on Sep 2, 2024

in src/test/chainstate_write_tests.cpp:36 in 1d50f72d16 outdated

31+    // The first periodic flush sets m_last_write and does not flush
32+    chainstate.FlushStateToDisk(state_dummy, FlushStateMode::PERIODIC);
33+    m_node.validation_signals->SyncWithValidationInterfaceQueue();
34+    BOOST_CHECK(!sub->m_did_flush);
35+
36+    // Time to periodically flush is between 50 and 70 minutes

l0rinc commented at 4:17 pm on September 2, 2024:

0    // Time to periodically flush is between 50 and 70 minutes (inclusive)

andrewtoth commented at 2:32 am on September 3, 2024:

Done.

in src/test/chainstate_write_tests.cpp:37 in 1d50f72d16 outdated

35+
36+    // Time to periodically flush is between 50 and 70 minutes
37+    SetMockTime(GetTime<std::chrono::minutes>() + 49min);
38+    chainstate.FlushStateToDisk(state_dummy, FlushStateMode::PERIODIC);
39+    m_node.validation_signals->SyncWithValidationInterfaceQueue();
40+    BOOST_CHECK(!sub->m_did_flush);

l0rinc commented at 4:19 pm on September 2, 2024:

if you decide to keep this structure, please add boundary test for 50min turning m_did_flush to true. Would also cover + 71min for the same reason.

andrewtoth commented at 4:21 pm on September 2, 2024:

I don’t think it would be true, because it’s random.

l0rinc commented at 6:25 pm on September 2, 2024:

got it, you can resolve this comment

andrewtoth force-pushed on Sep 2, 2024

andrewtoth force-pushed on Sep 3, 2024

in src/test/chainstate_write_tests.cpp:31 in 1b448f5bca outdated

26+    const auto sub{std::make_shared<TestSubscriber>()};
27+    m_node.validation_signals->RegisterSharedValidationInterface(sub);
28+    Chainstate& chainstate{Assert(m_node.chainman)->ActiveChainstate()};
29+    BlockValidationState state_dummy{};
30+
31+    // The first periodic flush sets m_last_write and does not flush

l0rinc commented at 8:34 am on September 3, 2024:

we don’t have m_last_write anymore

andrewtoth commented at 3:58 pm on September 8, 2024:

Fixed.

in src/test/chainstate_write_tests.cpp:36 in 81e9d97b86 outdated

33     chainstate.FlushStateToDisk(state_dummy, FlushStateMode::PERIODIC);
34     m_node.validation_signals->SyncWithValidationInterfaceQueue();
35     BOOST_CHECK(!sub->m_did_flush);
36 
37-    SetMockTime(GetTime<std::chrono::minutes>() + 23h + 59min);
38+    // Time to periodically flush is 1 hour

l0rinc commented at 8:38 am on September 3, 2024:

nit: 1 hour is a duration or interval or period, but technically not a time:

0    // The periodic flush interval is 1 hour

andrewtoth commented at 3:58 pm on September 8, 2024:

Fixed.

in src/validation.cpp:96 in 81e9d97b86 outdated

85@@ -86,10 +86,8 @@ using node::CBlockIndexHeightOnlyComparator;
86 using node::CBlockIndexWorkComparator;
87 using node::SnapshotMetadata;
88 
89-/** Time to wait between writing blocks/block index to disk. */
90+/** Time to wait between writing blocks/block index and chainstate to disk. */
91 static constexpr std::chrono::hours DATABASE_WRITE_INTERVAL{1};
92-/** Time to wait between flushing chainstate to disk. */
93-static constexpr std::chrono::hours DATABASE_FLUSH_INTERVAL{24};

l0rinc commented at 8:41 am on September 3, 2024:

could you please explain the flush removal in the corresponding commit message as well?

andrewtoth commented at 3:58 pm on September 8, 2024:

Added a sentence in the commit message.

in src/validation.cpp:2850 in 81e9d97b86 outdated

2853-        bool fPeriodicFlush = mode == FlushStateMode::PERIODIC && nNow > m_last_flush + DATABASE_FLUSH_INTERVAL;
2854         // Combine all conditions that result in a full cache flush.
2855-        fDoFullFlush = (mode == FlushStateMode::ALWAYS) || fCacheLarge || fCacheCritical || fPeriodicFlush || fFlushForPrune;
2856-        // Write blocks and block index to disk.
2857-        if (fDoFullFlush || fPeriodicWrite) {
2858+        fDoFullFlush = (mode == FlushStateMode::ALWAYS) || fCacheLarge || fCacheCritical || fPeriodicWrite || fFlushForPrune;

l0rinc commented at 8:52 am on September 3, 2024:

is the name still accurate, now that a periodic write can also trigger it (i.e. not just flushes, like before)?

andrewtoth commented at 3:59 pm on September 8, 2024:

Renamed to should_write.

in src/validation.cpp:2923 in 81e9d97b86 outdated

2897-            // Flush the chainstate (which may refer to block index entries).
2898-            const auto empty_cache{(mode == FlushStateMode::ALWAYS) || fCacheLarge || fCacheCritical};
2899-            if (empty_cache ? !CoinsTip().Flush() : !CoinsTip().Sync()) {
2900-                return FatalError(m_chainman.GetNotifications(), state, _("Failed to write to coin database."));
2901+
2902+            if (!CoinsTip().GetBestBlock().IsNull()) {

l0rinc commented at 9:01 am on September 3, 2024:

Is it even possible for this to be false, given that we skip flush on first run?

nit: just an observation, the starting ! hides well here, if you think it read better, here’s an alternative (though I see that IsNull is used in a few other places):

0            if (CoinsTip().GetBestBlock() != uint256::ZERO) {

andrewtoth commented at 4:02 pm on September 3, 2024:

Is it even possible for this to be false, given that we skip flush on first run?

Can we logically determine it will never be false? If so, we can simplify this quite a bit.

l0rinc commented at 4:17 pm on September 3, 2024:

what would happen when we restart with a different data directory, it will have to refetch everything, right? Or if we had a reorg on the very first block maybe? Maybe @sipa can help us out here…

andrewtoth commented at 6:08 pm on September 6, 2024:

Hmm there are a lot of entry points to FlushStateToDisk, and many places where SetBestBlock is called. I think this PR should leave this as is to not complicate review.

I was thinking we could remove the full_flush_completed variable if we didn’t have this check.

l0rinc commented at 6:49 pm on September 8, 2024:

you can resolve this comment

in src/validation.cpp:2938 in 81e9d97b86 outdated

2911+                if (!CheckDiskSpace(m_chainman.m_options.datadir, 48 * 2 * 2 * CoinsTip().GetCacheSize())) {
2912+                    return FatalError(m_chainman.GetNotifications(), state, _("Disk space is too low!"));
2913+                }
2914+                // Flush the chainstate (which may refer to block index entries).
2915+                const auto empty_cache{(mode == FlushStateMode::ALWAYS) || fCacheLarge || fCacheCritical};
2916+                if (empty_cache ? !CoinsTip().Flush() : !CoinsTip().Sync()) {

l0rinc commented at 9:19 am on September 3, 2024:

I know this line didn’t change here, I’m looking for answers: I find it quite confusing that most of the side-effects of this method are actually done inside if conditions - I wanted to understand whether the name fDoFullFlush is accurate, but had a hard time understanding which methods are just checking state (e.g. CheckDiskSpace, I assume) and which ones are changing state (e.g. empty_cache ? !CoinsTip().Flush() : !CoinsTip().Sync()). So my question is whether it’s fair to call the condition fDoFullFlush, when sometimes it just Syncs (if I understand the algo correctly)?

andrewtoth commented at 4:04 pm on September 3, 2024:

Yeah we could rename fDoFullFlush. What about should_write?

andrewtoth commented at 3:59 pm on September 8, 2024:

Renamed.

in src/test/chainstate_write_tests.cpp:1 in 1b448f5bca outdated

0@@ -0,0 +1,48 @@
1+// Copyright (c) 2024- The Bitcoin Core developers

l0rinc commented at 9:27 am on September 3, 2024:

I think the convention is:

0// Copyright (c) 2024-present The Bitcoin Core developers

andrewtoth commented at 3:59 pm on September 8, 2024:

Removed the dates altogether.

in src/validation.cpp:2851 in 1b448f5bca outdated

2851-        bool fPeriodicWrite = mode == FlushStateMode::PERIODIC && nNow > m_last_write + DATABASE_WRITE_INTERVAL;
2852+        bool fPeriodicWrite = mode == FlushStateMode::PERIODIC && nNow >= m_next_write;
2853         // Combine all conditions that result in a full cache flush.
2854         bool fDoFullFlush = (mode == FlushStateMode::ALWAYS) || fCacheLarge || fCacheCritical || fPeriodicWrite || fFlushForPrune;
2855+
2856+        if (fDoFullFlush || m_next_write == NodeClock::time_point::max()) {

l0rinc commented at 9:30 am on September 3, 2024:

we do more than add randomness to periodic write interval in this commit now, could you please split it up a bit more?

andrewtoth commented at 3:59 pm on September 8, 2024:

Split into 2 commits.

l0rinc commented at 9:34 am on September 3, 2024: contributor

Thanks for quick responses, my main concern at this stage is that the commit messages don’t explain the changes in enough detail and some symbols might need a rename after the change

DrahtBot added the label CI failed on Sep 7, 2024

andrewtoth force-pushed on Sep 8, 2024

DrahtBot removed the label CI failed on Sep 8, 2024

l0rinc commented at 6:39 pm on September 8, 2024: contributor

Running

0hyperfine \
1--runs 5 \
2--export-json /mnt/ibd_full.json \
3--parameter-list COMMIT 8aabdfa6,1b448f5b \
4--prepare 'git checkout {COMMIT} && git clean -fxd && git reset --hard && cmake -B build && cmake --build build -j8 && rm -rf /mnt/BitcoinData/*' \
5'COMMIT={COMMIT} ./build/src/bitcoind -datadir=/mnt/BitcoinData -stopatheight=820000 -dbcache=16384 -printtoconsole=0'

on a HDD SATA 2,0 TB Enterprise resulted in:

0 Benchmark 1: COMMIT=8aabdfa6 ./build/src/bitcoind -datadir=/mnt/BitcoinData -stopatheight=820000 -dbcache=16384 -printtoconsole=0
1  Time (mean ± σ):     33934.695 s ± 530.080 s    [User: 27857.218 s, System: 1873.581 s]
2  Range (min … max):   33208.510 s … 34511.668 s    5 runs

0Benchmark 2: COMMIT=1b448f5b ./build/src/bitcoind -datadir=/mnt/BitcoinData -stopatheight=820000 -dbcache=16384 -printtoconsole=0
1  Time (mean ± σ):     36605.266 s ± 584.663 s    [User: 29487.564 s, System: 2228.572 s]
2  Range (min … max):   35846.681 s … 37275.497 s    5 runs

0Summary
1  COMMIT=8aabdfa6 ./build/src/bitcoind -datadir=/mnt/BitcoinData -stopatheight=820000 -dbcache=16384 -printtoconsole=0 ran
2    1.08 ± 0.02 times faster than COMMIT=1b448f5b ./build/src/bitcoind -datadir=/mnt/BitcoinData -stopatheight=820000 -dbcache=16384 -printtoconsole=0

 0 "results": [
 1    {
 2      "command": "COMMIT=8aabdfa6 ./build/src/bitcoind -datadir=/mnt/BitcoinData -stopatheight=820000 -dbcache=16384 -printtoconsole=0",
 3      "mean": 33934.69470762795,
 4      "stddev": 530.0799467530879,
 5      "median": 34177.67060207214,
 6      "user": 27857.2184555,
 7      "system": 1873.58075582,
 8      "min": 33208.50992151314,
 9      "max": 34511.66842756214,
10      "times": [
11        33208.50992151314,
12        34511.66842756214,
13        33571.37685305914,
14        34177.67060207214,
15        34204.24773393314
16      ],
17      "exit_codes": [
18        0,
19        0,
20        0,
21        0,
22        0
23      ],
24      "parameters": {
25        "COMMIT": "8aabdfa6"
26      }
27    },
28    {
29      "command": "COMMIT=1b448f5b ./build/src/bitcoind -datadir=/mnt/BitcoinData -stopatheight=820000 -dbcache=16384 -printtoconsole=0",
30      "mean": 36605.26595186774,
31      "stddev": 584.6634786404139,
32      "median": 36732.99369363214,
33      "user": 29487.564016900003,
34      "system": 2228.57235822,
35      "min": 35846.68126299314,
36      "max": 37275.49678005814,
37      "times": [
38        36182.10911623014,
39        37275.49678005814,
40        36732.99369363214,
41        36989.04890642514,
42        35846.68126299314
43      ],
44      "exit_codes": [
45        0,
46        0,
47        0,
48        0,
49        0
50      ],
51      "parameters": {
52        "COMMIT": "1b448f5b"
53      }
54    }
55  ]
56}

So running the benchmark for almost 100 hours, having a standard deviation of ~9 minutes for both benchmarks (5 runs per commit) reveals that the full pre-assumevalid IBD consistently takes ~45 minutes (8%) longer after the change.

Note that this was also run with actual nodes (i.e. non-local copies) to be as realistic as possible and was using a HDD.

Edit: could we maybe test a 100-200 minute window instead of 60?

DrahtBot added the label CI failed on Sep 9, 2024

DrahtBot removed the label CI failed on Sep 13, 2024

andrewtoth force-pushed on Sep 22, 2024

andrewtoth commented at 1:23 pm on September 22, 2024: contributor

Rebased to benchmark reindex-chainstate with #28358 and a cache size holding the entire utxo set.

the full pre-assumevalid IBD consistently takes ~45 minutes (8%) longer after the change

I got a hold of an HDD, and surprisingly my benchmarks show the opposite. I kept the blocksdir on the SSD and only had the datadir on the HDD to isolate the effects of this change. When changing from writing at the end to writing every hour, writing every hour was ~25 minutes (6%) faster.

Command	Mean [s]	Min [s]	Max [s]	Relative
`echo 484aee0e16aff9c5fcfe66157c67bf29c66baa2a && ./build/src/bitcoind -datadir=/media/user/Barracuda/bitcoin -blocksdir=/home/user/.bitcoin -dbcache=27000 -reindex-chainstate -printtoconsole=0 -connect=0 -stopatheight=840000`	25447.840 ± 390.609	25171.638	25724.043	1.00
`echo 17cf3d64b8a56763b9aff8b8af6ac8bd2b100ff6 && ./build/src/bitcoind -datadir=/media/user/Barracuda/bitcoin -blocksdir=/home/user/.bitcoin -dbcache=27000 -reindex-chainstate -printtoconsole=0 -connect=0 -stopatheight=840000`	26932.607 ± 232.182	26768.430	27096.785	1.06 ± 0.02

Notably, before this change flushing the chainstate to disk at the end of the reindex took longer than the entire reindex into memory on an HDD (4 hours of flushing and ~7.5 hours total time). During that entire flush bitcoind will be unresponsive as the main lock is held. So, this change is also a UX improvement because the writes will be more spaced out and not block RPC responses.

l0rinc commented at 12:58 pm on September 25, 2024: contributor

I did a rebased -reindex-chainstate as well, this seems to be basically the same as before:

 0# git checkout 33adc7521cc8bb24b941d959022b084002ba7c60
 1HEAD is now at 33adc7521c Merge bitcoin/bitcoin#30765: refactor: Allow `CScript`'s `operator<<` to accept spans, not just vectors
 2
 3# git checkout 391c87640d78d9821b87e3f3de755af40a191d24
 4HEAD is now at 391c87640d validation: add randomness to periodic write interval
 5
 6hyperfine \
 7--runs 3 \
 8--export-json /mnt/ibd_full-30611.json \
 9--parameter-list COMMIT 33adc7521cc8bb24b941d959022b084002ba7c60,391c87640d78d9821b87e3f3de755af40a191d24 \
10--prepare 'git checkout {COMMIT} && git clean -fxd && git reset --hard && cmake -B build && cmake --build build -j8' \
11'COMMIT={COMMIT} ./build/src/bitcoind -datadir=/mnt/BitcoinData -stopatheight=860000 -dbcache=30000 -reindex-chainstate -printtoconsole=0'

Benchmark 1: COMMIT=33adc7521cc8bb24b941d959022b084002ba7c60 ./build/src/bitcoind -datadir=/mnt/BitcoinData -stopatheight=860000 -dbcache=30000 -reindex-chainstate -printtoconsole=0

0  Time (mean ± σ):     22795.930 s ± 42.220 s    [User: 19545.844 s, System: 926.963 s]
1  Range (min … max):   22756.042 s … 22840.149 s    3 runs

Benchmark 2: COMMIT=391c87640d78d9821b87e3f3de755af40a191d24 ./build/src/bitcoind -datadir=/mnt/BitcoinData -stopatheight=860000 -dbcache=30000 -reindex-chainstate -printtoconsole=0

0  Time (mean ± σ):     22962.909 s ± 457.340 s    [User: 19563.315 s, System: 923.347 s]
1  Range (min … max):   22461.120 s … 23356.337 s    3 runs

0  COMMIT=33adc7521cc8bb24b941d959022b084002ba7c60 ./build/src/bitcoind -datadir=/mnt/BitcoinData -stopatheight=860000 -dbcache=30000 -reindex-chainstate -printtoconsole=0 ran
1    1.01 ± 0.02 times faster than COMMIT=391c87640d78d9821b87e3f3de755af40a191d24 ./build/src/bitcoind -datadir=/mnt/BitcoinData -stopatheight=860000 -dbcache=30000 -reindex-chainstate -printtoconsole=0

I’ll rerun on a full IBD for the same params:

0hyperfine \
1--runs 3 \
2--export-json /mnt/ibd-30611-full.json \
3--parameter-list COMMIT 33adc7521cc8bb24b941d959022b084002ba7c60,391c87640d78d9821b87e3f3de755af40a191d24 \
4--prepare 'git checkout {COMMIT} && git clean -fxd && git reset --hard && cmake -B build && cmake --build build -j8 && rm -rf /mnt/BitcoinData/*' \
5'COMMIT={COMMIT} ./build/src/bitcoind -datadir=/mnt/BitcoinData -stopatheight=860000 -dbcache=30000 -printtoconsole=0'

andrewtoth commented at 1:20 pm on September 25, 2024: contributor

I did a rebased -reindex-chainstate as well, this seems to be basically the same as before:

Nice, so there’s no speed regression but crash resistance gained!

Also, I benchmarked only at 484aee0e16aff9c5fcfe66157c67bf29c66baa2a instead of 391c87640d78d9821b87e3f3de755af40a191d24 so that there’s no randomness in the benchmarks. It can give us a better picture of what this PR affects.

l0rinc commented at 1:24 pm on September 25, 2024: contributor

so that there’s no randomness in the benchmarks

I’ve rebased on master to have up-to-date data. What kind of randomness are you referring to?

there’s no speed regression

3 benchmarks, all showed different things. We have to find out what causes these differences, that’s why I want to try a complete IBD again (which seems like the only downside so far), since ultimately most people won’t have blocksdir on the SSD and datadir on the HDD doing a reindex.

andrewtoth commented at 1:29 pm on September 25, 2024: contributor

What kind of randomness are you referring to?

The random window for when this PR will flush introduced in the last commit. That is good for steady state operation to not have all nodes converge on the same flush time. But, for benchmarking a reindex or IBD it will just add randomness to the results.

3 benchmarks, all showed different things.

Well, they are all different benchmarks with different variables. Changing more variables by running different benchmarks won’t necessarily give us a clearer picture.

l0rinc commented at 7:36 pm on September 28, 2024: contributor

running different benchmarks won’t necessarily give us a clearer picture.

It could get confusing if they’re not pointing in the same direction (which seems to be the case here). But we need to investigate, we can’t have surprises, right?

I reran the previous full IBD on the previous setup with HDD, but this time until 860000 with 30gb memory.

0hyperfine --runs 3 \
1--export-json /mnt/ibd-30611-full.json \
2--parameter-list COMMIT 33adc7521cc8bb24b941d959022b084002ba7c60,391c87640d78d9821b87e3f3de755af40a191d24 \
3--prepare 'git checkout {COMMIT} && git clean -fxd && git reset --hard && cmake -B build && cmake --build build -j8 && rm -rf /mnt/BitcoinData/*' \
4'COMMIT={COMMIT} ./build/src/bitcoind -datadir=/mnt/BitcoinData -stopatheight=860000 -dbcache=30000 -printtoconsole=0'

Benchmark 1: COMMIT=33adc7521cc8bb24b941d959022b084002ba7c60 ./build/src/bitcoind -datadir=/mnt/BitcoinData -stopatheight=860000 -dbcache=30000 -printtoconsole=0

0  Time (mean ± σ):     37410.149 s ± 264.840 s    [User: 32792.866 s, System: 1938.561 s]
1  Range (min … max):   37105.210 s … 37582.606 s    3 runs

I.e. the three independent measurements before the change were:

10.42 hours
10.43 hours
10.30 hours

37542.6300163161
37582.6056244741
37105.2102307041

Benchmark 2: COMMIT=391c87640d78d9821b87e3f3de755af40a191d24 ./build/src/bitcoind -datadir=/mnt/BitcoinData -stopatheight=860000 -dbcache=30000 -printtoconsole=0 Time (mean ± σ): 41970.584 s ± 995.527 s [User: 35046.785 s, System: 2172.531 s] Range (min … max): 41203.415 s … 43095.560 s 3 runs

Having the measurements:

11.44 hours
11.97 hours
11.55 hours

41203.4147152671
43095.5604828261
41612.7761091461

Summary COMMIT=33adc7521cc8bb24b941d959022b084002ba7c60 ./build/src/bitcoind -datadir=/mnt/BitcoinData -stopatheight=860000 -dbcache=30000 -printtoconsole=0 ran 1.12 ± 0.03 times faster than COMMIT=391c87640d78d9821b87e3f3de755af40a191d24 ./build/src/bitcoind -datadir=/mnt/BitcoinData -stopatheight=860000 -dbcache=30000 -printtoconsole=0

i.e. 10% slower, roughly 1 hour longer IBD on a HDD if we flush every hour.

As you can see the measurements are pretty consistently faster before and slower after. Repeated the same experiment twice, 5 times first time, 3 times now, it doesn’t seem like a fluke.

I understand that it’s inconvenient (as the change makes theoretical sense to me), but maybe we can find a different interval (e.g. 100 minutes maybe?) where the difference isn’t this big.

l0rinc commented at 11:44 am on September 30, 2024: contributor

@davidgumberg mentioned that it’s possible flushing the block index is causing the regression. This could also explain why there isn’t a regression when the blocksdir is on an SSD and the coinsdb is not.

l0rinc commented at 10:18 am on October 13, 2024: contributor

I redid the benchmarks on an SSD with 100-120 second range:

(before) 5e90501abe test: chainstate write test for periodic chainstate flush
(write every 50-70 minutes) ffc9df87a0 validation: add randomness to periodic write interval
(write every 100-120 minutes) f5e7b1508a Increase DATABASE_WRITE_INTERVAL MIN/MAX to 100/120

0hyperfine \
1--runs 1 \
2--export-json /mnt/my_storage/ibd_full-write-chainstate-every-hour.json \
3--parameter-list COMMIT 5e90501abe13731579c61b13092b31b9fc4f8c4d,ffc9df87a0da35b451720556bdb3f0f65f315c8f,f5e7b1508a06c0c7a9c06d1ac270840fd72e9e6f \
4--prepare 'rm -rf /mnt/my_storage/BitcoinData/* && git checkout {COMMIT} && git clean -fxd && git reset --hard && cmake -B build -DCMAKE_BUILD_TYPE=Release -DBUILD_UTIL=OFF -DBUILD_TX=OFF -DBUILD_TESTS=OFF -DENABLE_WALLET=OFF -DINSTALL_MAN=OFF && cmake \
5--build build -j$(nproc)' \
6'COMMIT={COMMIT} ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=800000 -dbcache=2000 -printtoconsole=0'

 0Benchmark 1: COMMIT=5e90501abe13731579c61b13092b31b9fc4f8c4d ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=800000 -dbcache=2000 -printtoconsole=0
 1  Time (abs ≡):        30685.829 s               [User: 29155.745 s, System: 4840.419 s]
 2
 3Benchmark 2: COMMIT=ffc9df87a0da35b451720556bdb3f0f65f315c8f ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=800000 -dbcache=2000 -printtoconsole=0
 4  Time (abs ≡):        31140.007 s               [User: 29761.900 s, System: 4909.715 s]
 5
 6Benchmark 3: COMMIT=f5e7b1508a06c0c7a9c06d1ac270840fd72e9e6f ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=800000 -dbcache=2000 -printtoconsole=0
 7  Time (abs ≡):        31935.055 s               [User: 29501.483 s, System: 4915.473 s]
 8
 9Summary
10  'COMMIT=5e90501abe13731579c61b13092b31b9fc4f8c4d ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=800000 -dbcache=2000 -printtoconsole=0' ran
11    1.01 times faster than 'COMMIT=ffc9df87a0da35b451720556bdb3f0f65f315c8f ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=800000 -dbcache=2000 -printtoconsole=0'
12    1.04 times faster than 'COMMIT=f5e7b1508a06c0c7a9c06d1ac270840fd72e9e6f ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=800000 -dbcache=2000 -printtoconsole=0'

The measurements are pretty close this time, I’ll try the same on an HDD as well.

l0rinc commented at 11:26 pm on October 14, 2024: contributor

Finished the same benchmarks on a HDD with very similar results:

0# hyperfine \
1--runs 1 \
2--export-json /mnt/my_storage/ibd_full-write-chainstate-every-hour.json \
3--parameter-list COMMIT 5e90501abe13731579c61b13092b31b9fc4f8c4d,ffc9df87a0da35b451720556bdb3f0f65f315c8f,f5e7b1508a06c0c7a9c06d1ac270840fd72e9e6f \
4--prepare 'rm -rf /mnt/my_storage/BitcoinData/* && git checkout {COMMIT} && git clean -fxd && git reset \
5--hard && cmake -B build -DCMAKE_BUILD_TYPE=Release -DBUILD_UTIL=OFF -DBUILD_TX=OFF -DBUILD_TESTS=OFF -DENABLE_WALLET=OFF -DINSTALL_MAN=OFF && cmake \
6--build build -j$(nproc)' \
7'COMMIT={COMMIT} ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=800000 -dbcache=2000 -printtoconsole=0'

 0Benchmark 1: COMMIT=5e90501abe13731579c61b13092b31b9fc4f8c4d ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=800000 -dbcache=2000 -printtoconsole=0
 1  Time (abs ≡):        35216.011 s               [User: 28946.209 s, System: 4439.914 s]
 2
 3Benchmark 2: COMMIT=ffc9df87a0da35b451720556bdb3f0f65f315c8f ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=800000 -dbcache=2000 -printtoconsole=0
 4  Time (abs ≡):        35007.008 s               [User: 28473.073 s, System: 4140.873 s]
 5
 6Benchmark 3: COMMIT=f5e7b1508a06c0c7a9c06d1ac270840fd72e9e6f ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=800000 -dbcache=2000 -printtoconsole=0
 7  Time (abs ≡):        35625.822 s               [User: 28383.805 s, System: 4280.052 s]
 8
 9Summary
10  COMMIT=ffc9df87a0da35b451720556bdb3f0f65f315c8f ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=800000 -dbcache=2000 -printtoconsole=0 ran
11    1.01 times faster than COMMIT=5e90501abe13731579c61b13092b31b9fc4f8c4d ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=800000 -dbcache=2000 -printtoconsole=0
12    1.02 times faster than COMMIT=f5e7b1508a06c0c7a9c06d1ac270840fd72e9e6f ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=800000 -dbcache=2000 -printtoconsole=0

git checkout 5e90501abe13731579c61b13092b31b9fc4f8c4d

HEAD is now at 5e90501abe test: chainstate write test for periodic chainstate flush

git checkout ffc9df87a0da35b451720556bdb3f0f65f315c8f

HEAD is now at ffc9df87a0 validation: add randomness to periodic write interval

git checkout f5e7b1508a06c0c7a9c06d1ac270840fd72e9e6f

HEAD is now at f5e7b1508a Increase DATABASE_WRITE_INTERVAL MIN/MAX to 100/120

Based on these I’m fine with 50/70 or 100/120 minute writes. Approach ACK - let’s discuss whether this is the final form.

DrahtBot added the label Needs rebase on Nov 11, 2024

andrewtoth force-pushed on Nov 11, 2024

andrewtoth commented at 3:26 pm on November 11, 2024: contributor

Rebased for renaming TRACE5 -> TRACEPOINT.

DrahtBot removed the label Needs rebase on Nov 11, 2024

andrewtoth force-pushed on Dec 4, 2024

andrewtoth commented at 1:49 am on December 4, 2024: contributor

Rebased to test performance with #30039.

l0rinc commented at 6:27 pm on December 7, 2024: contributor

Reran the benchmarks after the rebase (2 sets of reindex-chainstate of 800k blocks) - ~~and there doesn’t seem to be any speed regression whatsoever because of this change now.~~

0# hyperfine \
1--runs 2 \
2--export-json /mnt/my_storage/ibd_full-write-chainstate-every-hour-rebased.json \
3--parameter-list COMMIT 1a5d7e0ff5ff8fb73139dc2141dbe231b12f68fe,5ae300e55ca18b5398f1255d53dd1c50ba5cd6ee \
4--prepare 'git checkout {COMMIT} && git clean -fxd && git reset --hard && cmake -B build -DCMAKE_BUILD_TYPE=Release -DBUILD_UTIL=OFF -DBUILD_TX=OFF -DBUILD_TESTS=OFF -DENABLE_WALLET=OFF -DINSTALL_MAN=OFF && cmake --build build -j$(nproc)' \
5'COMMIT={COMMIT} ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=800000 -dbcache=30000 -printtoconsole=0 -reindex-chainstate -connect=0'

 0Benchmark 1: COMMIT=1a5d7e0ff5ff8fb73139dc2141dbe231b12f68fe ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=800000 -dbcache=30000 -printtoconsole=0 -reindex-chainstate -connect=0
 1  Time (mean ± σ):     13914.554 s ± 45.835 s    [User: 14777.415 s, System: 621.571 s]
 2  Range (min … max):   13882.143 s … 13946.964 s    2 runs
 3  
 4Benchmark 2: COMMIT=5ae300e55ca18b5398f1255d53dd1c50ba5cd6ee ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=800000 -dbcache=30000 -printtoconsole=0 -reindex-chainstate -connect=0
 5  Time (mean ± σ):     13803.626 s ±  6.910 s    [User: 14715.296 s, System: 619.771 s]
 6  Range (min … max):   13798.740 s … 13808.512 s    2 runs
 7  
 8Summary
 9  COMMIT=b981f7e9c010ebd26e49a66a58f5a3e6e2efb3f5 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=800000 -dbcache=30000 -printtoconsole=0 -reindex-chainstate -connect=0 ran
10    1.01 ± 0.00 times faster than COMMIT=1a5d7e0ff5ff8fb73139dc2141dbe231b12f68fe ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=800000 -dbcache=30000 -printtoconsole=0 -reindex-chainstate -connect=0

~~i.e. basically the same speed as before, given enough memory (where this feature shines)~~

Edit: To make sure the reindex measurements don’t cheat (i.e. are representative of an actual IBD), I redid them until 870k blocks with full IBD:

0hyperfine \
1--runs 1 \
2--export-json /mnt/my_storage/ibd_full-write-chainstate-every-hour-rebased.json \
3--parameter-list COMMIT 1a5d7e0ff5ff8fb73139dc2141dbe231b12f68fe,5ae300e55ca18b5398f1255d53dd1c50ba5cd6ee,b981f7e9c010ebd26e49a66a58f5a3e6e2efb3f5 \
4--prepare 'rm -rf /mnt/my_storage/BitcoinData/* && git checkout {COMMIT} && git clean -fxd && git reset --hard && cmake -B build -DCMAKE_BUILD_TYPE=Release -DBUILD_UTIL=OFF -DBUILD_TX=OFF -DBUILD_TESTS=OFF -DENABLE_WALLET=OFF -DINSTALL_MAN=OFF && cmake --build build -j$(nproc)' \
5'COMMIT={COMMIT} ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=870000 -dbcache=30000 -printtoconsole=0'

Where the commits are:

0# git show --quiet 1a5d7e0ff5ff8fb73139dc2141dbe231b12f68fe                                                                    
1    test: chainstate write test for periodic chainstate flush                                                                                                                             
2# git show --quiet 5ae300e55ca18b5398f1255d53dd1c50ba5cd6ee                                                                     
3    validation: add randomness to periodic write interval                                                                                                                                 
4# git show --quiet b981f7e9c010ebd26e49a66a58f5a3e6e2efb3f5                                                                     
5    Increase DATABASE_WRITE_INTERVAL MIN/MAX to 100/120

measuring:

0Benchmark 1: COMMIT=1a5d7e0ff5ff8fb73139dc2141dbe231b12f68fe ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=870000 -dbcache=30000 -printtoconsole=0
1  Time (abs ≡):        37811.719 s               [User: 43400.650 s, System: 1956.739 s]
2  
3Benchmark 2: COMMIT=5ae300e55ca18b5398f1255d53dd1c50ba5cd6ee ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=870000 -dbcache=30000 -printtoconsole=0
4  Time (abs ≡):        39494.552 s               [User: 44407.945 s, System: 1865.964 s]
5  
6Benchmark 3: COMMIT=b981f7e9c010ebd26e49a66a58f5a3e6e2efb3f5 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=870000 -dbcache=30000 -printtoconsole=0
7  Time (abs ≡):        39363.239 s               [User: 44125.045 s, System: 1974.385 s]

which indicates a 4% slowdown (even with 100-120 minutes)

0Summary
1  COMMIT=1a5d7e0ff5ff8fb73139dc2141dbe231b12f68fe ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=870000 -dbcache=30000 -printtoconsole=0 ran
2    1.04 times faster than COMMIT=b981f7e9c010ebd26e49a66a58f5a3e6e2efb3f5 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=870000 -dbcache=30000 -printtoconsole=0
3    1.04 times faster than COMMIT=5ae300e55ca18b5398f1255d53dd1c50ba5cd6ee ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=870000 -dbcache=30000 -printtoconsole=0

So this change seems to slow down actual IBD by roughly 4% - we have to decide if that’s acceptable.

in src/test/chainstate_write_tests.cpp:28 in a030c105c9 outdated

23+
24+BOOST_FIXTURE_TEST_CASE(chainstate_write_interval, TestingSetup)
25+{
26+    const auto sub{std::make_shared<TestSubscriber>()};
27+    m_node.validation_signals->RegisterSharedValidationInterface(sub);
28+    Chainstate& chainstate{Assert(m_node.chainman)->ActiveChainstate()};

l0rinc commented at 10:52 am on December 9, 2024:

I think we can simplify the test structure here a bit:

struct in TestSubscriber already implies public inheritance
we can remove the explicitly defined default constructor
make the validator test local
remove unused param names, they’re just noise here
avoid repeated chainstate

0BOOST_FIXTURE_TEST_CASE(chainstate_write_interval, TestingSetup)
1{
2    struct TestSubscriber final : CValidationInterface {
3        bool m_did_flush{false};
4        void ChainStateFlushed(ChainstateRole, const CBlockLocator&) override { m_did_flush = true; }
5    };
6    const auto sub{std::make_shared<TestSubscriber>()};
7    m_node.validation_signals->RegisterSharedValidationInterface(sub);
8    auto& chainstate{Assert(m_node.chainman)->ActiveChainstate()};

andrewtoth commented at 7:56 pm on December 12, 2024:

Done.

l0rinc approved

l0rinc commented at 11:00 am on December 9, 2024: contributor

I’m mostly okay with this change, but we need to decide if the 4% IBD slowdown is worth the improved consistency (or if the performance hit can be mitigated somehow).

andrewtoth force-pushed on Dec 12, 2024

andrewtoth commented at 7:58 pm on December 12, 2024: contributor

we need to decide if the 4% IBD slowdown is worth the improved consistency (or if the performance hit can be mitigated somehow).

I think it’s definitely worth it. It’s a very small difference, and IBD performance is only impacted if running with a very high dbcache anyways. Presumably you are in a rush if you are doing that and you don’t want to have to redo everything if it crashes midway through.

The improvements to steady state IO spikiness and shutdown speed are well worth it IMO.

l0rinc commented at 8:24 pm on December 12, 2024: contributor

ACK 5326e68bd645ab9a1b0a6e9095a2695e4e1eeaf7

This will make IBD more consistent and only affect the speed when the cache size is already maxed out. In such cases, we aim to avoid the very long flushing process at the end. This change will even out the process, resulting in much more predictable behavior.

Edit: Since this PR is meant to alleviate hanging at the end for bigger cache sizes, it could make sense to add a warning for the final flush if the used dbcache is big (e.g. >1 GiB i.e. 2024-12-17T10:59:30Z UpdateTip: ... cache=5816.9MiB(42730941txo))

andrewtoth commented at 4:54 pm on December 17, 2024: contributor

Since this PR is meant to alleviate hanging at the end for bigger cache sizes, it could make sense to add a warning for the final flush if the used dbcache is big (e.g. >1 GiB i.e. 2024-12-17T10:59:30Z UpdateTip: … cache=5816.9MiB(42730941txo)) @l0rinc I think something like that could be made in a separate PR without this change. Just add a warning log for any large flush. It’s not really related to this change.

ryanofsky referenced this in commit 9355578a77 on Dec 27, 2024

DrahtBot added the label Needs rebase on Dec 27, 2024

test: chainstate write test for periodic chainstate flush 0ad7d7abdb

in src/validation.cpp:2886 in e31152233d outdated

2884-        }
2885         // The cache is large and we're within 10% and 10 MiB of the limit, but we have time now (not in the middle of a block processing).
2886         bool fCacheLarge = mode == FlushStateMode::PERIODIC && cache_state >= CoinsCacheSizeState::LARGE;
2887         // The cache is over the limit, we have to write now.
2888         bool fCacheCritical = mode == FlushStateMode::IF_NEEDED && cache_state >= CoinsCacheSizeState::CRITICAL;
2889         // It's been a while since we wrote the block index to disk. Do this frequently, so we don't need to redownload after a crash.

mzumsande commented at 7:48 pm on December 27, 2024:

This comment could be adjusted - fPeriodicWrite is no longer just about the block index.

andrewtoth commented at 3:32 pm on December 28, 2024:

Adjusted, thanks:

    // It's been a while since we wrote the block index and chain state to disk. Do this frequently, so we don't need to redownload or reindex after a crash.

andrewtoth force-pushed on Dec 28, 2024

andrewtoth commented at 3:33 pm on December 28, 2024: contributor

Rebased due to conflicts from #31534.

DrahtBot removed the label Needs rebase on Dec 28, 2024

l0rinc commented at 4:35 pm on December 28, 2024: contributor

ACK b37452ca437856ed5d8effce2cef737d9df479c1

The differences since last ACK:

the conflicting new logging - which had to be reindented;
chain state and reindex mentioned in the comment of fPeriodicWrite.

in src/validation.cpp:2894 in b37452ca43 outdated

2889@@ -2886,7 +2890,8 @@ bool Chainstate::FlushStateToDisk(
2890         bool should_write = (mode == FlushStateMode::ALWAYS) || fCacheLarge || fCacheCritical || fPeriodicWrite || fFlushForPrune;
2891 
2892         if (should_write || m_next_write == NodeClock::time_point::max()) {
2893-            m_next_write = nNow + DATABASE_WRITE_INTERVAL;
2894+            constexpr auto range{DATABASE_WRITE_INTERVAL_MAX - DATABASE_WRITE_INTERVAL_MIN + 1min};
2895+            m_next_write = FastRandomContext().rand_uniform_delay(nNow + DATABASE_WRITE_INTERVAL_MIN, range);

sipa commented at 1:24 pm on January 9, 2025:

This will result in a delay between 50 and 70 minutes, but it will always be a multiple of 1 minute, which seems like an unnecessary restriction.

Making the type of range std::chrono::milliseconds (and dropping the + 1min) should fix that, I think.

maflcko commented at 1:42 pm on January 9, 2025:

Are you sure? rand_uniform_delay should cast the range to the duration type of the timepoint, which is the system clocks type.

See also the unit test:

0src/test/random_tests.cpp:        BOOST_CHECK_EQUAL(26443, ctx.rand_uniform_delay(time_point, 9h).time_since_epoch().count());

l0rinc commented at 2:10 pm on January 9, 2025:

I’ve checked

 0diff --git a/src/validation.cpp b/src/validation.cpp
 1--- a/src/validation.cpp	(revision 1ac02221219a2ae6a8605dfddff4390a60712d56)
 2+++ b/src/validation.cpp	(date 1736431426006)
 3@@ -2869,6 +2869,10 @@
 4 
 5         if (should_write || m_next_write == NodeClock::time_point::max()) {
 6             constexpr auto range{DATABASE_WRITE_INTERVAL_MAX - DATABASE_WRITE_INTERVAL_MIN + 1min};
 7+            for (int i = 0; i < 10; ++i) {
 8+                auto random_delay = FastRandomContext().rand_uniform_delay(nNow + DATABASE_WRITE_INTERVAL_MIN, range);
 9+                std::cout << TicksSinceEpoch<std::chrono::seconds>(random_delay) << std::endl;
10+            }
11             m_next_write = FastRandomContext().rand_uniform_delay(nNow + DATABASE_WRITE_INTERVAL_MIN, range);
12         }

and running

0cmake -B build -DCMAKE_BUILD_TYPE=Debug && cmake --build build -j$(nproc) && build/src/test/test_bitcoin --run_test='chainstate_write_tests/chainstate_write_interval'

returns:

 01736435061
 11736434895
 21736434716
 31736434751
 41736435693
 51736435233
 61736435341
 71736435021
 81736434864
 91736435588
101736442098
111736442667
121736442545
131736441973
141736441728
151736441947
161736441794
171736442257
181736441865
191736441620

So I’m also not sure this is indeed rounded to minutes - unless I misunderstood something.

sipa commented at 2:39 pm on January 9, 2025:

@maflcko @l0rinc You’re both right. No problem in that case.

I think the + 1min still needs to go, though.

l0rinc commented at 2:57 pm on January 9, 2025:

Won’t that make it asymmetric to the hour?

andrewtoth commented at 3:08 pm on January 9, 2025:

Hmm yeah the + 1min is just noise. If we want to increase the range by 1 minute, we could just update DATABASE_WRITE_INTERVAL_MAX = 71min.

andrewtoth commented at 3:09 pm on January 9, 2025:

Removed the + 1min.

l0rinc commented at 3:38 pm on January 9, 2025:

You’re right, I suggested the + 1min back when I also thought that the range is in minutes only.

sipa commented at 1:24 pm on January 9, 2025: member

utACK b37452ca437856ed5d8effce2cef737d9df479c1

andrewtoth force-pushed on Jan 9, 2025

in src/validation.cpp:98 in b7f3b6fd96 outdated

 97+/** Time window to wait between writing blocks/block index and chainstate to disk.
 98+ *  Randomize writing time inside the window to prevent a situation where the
 99+ *  network over time settles into a few cohorts of synchronized writers.
100+*/
101+static constexpr auto DATABASE_WRITE_INTERVAL_MIN{50min};
102+static constexpr auto DATABASE_WRITE_INTERVAL_MAX{70min};

l0rinc commented at 3:40 pm on January 9, 2025:

It’s not a max now (i.e. not a closed range), but a threshold, but the difference may not be important anymore

l0rinc approved

l0rinc commented at 3:41 pm on January 9, 2025: contributor

ACK b7f3b6fd96ca2d5d1f9538affef682f778bd07e8

DrahtBot requested review from sipa on Jan 9, 2025

sipa commented at 6:26 pm on January 9, 2025: member

ACK b7f3b6fd96ca2d5d1f9538affef682f778bd07e8

In the log message [bench] FlushStateToDisk: write coins cache to disk (598202 coins, 86527KiB) started, the number of coins and the size is the total cache size now, including non-dirty entries which aren’t actually being written. I guess it should be changed to just account for the dirty ones, as well as in the “Flushing large UTXO set to disk” warning message. That can be done as a follow-up, though.

andrewtoth commented at 6:42 pm on January 9, 2025: contributor

it should be changed to just account for the dirty ones

That would be a follow-up from #28233.

sipa commented at 1:10 pm on January 21, 2025: member

@DrahtBot apparently wants this as a review?

ACK b7f3b6fd96ca2d5d1f9538affef682f778bd07e8

maflcko commented at 1:17 pm on January 21, 2025: member

@DrahtBot apparently wants this as a review?

Any comment should work, but sometimes GitHub is down and the bot is missing a notification. Also, it is not possible to unrequest a review without triggering other GitHub bugs, so it is currently disabled: https://github.com/maflcko/DrahtBot/blob/8bde72c4821013c4a91753805b7b1be4af00513e/webhook_features/src/features/summary_comment.rs#L357

sipa commented at 7:29 pm on January 21, 2025: member

it should be changed to just account for the dirty ones

That would be a follow-up from #28233.

See #31703.

in src/validation.cpp:2923 in 7998eda034 outdated

2920@@ -2928,34 +2921,33 @@ bool Chainstate::FlushStateToDisk(
2921                 m_blockman.UnlinkPrunedFiles(setFilesToPrune);
2922             }
2923             m_last_write = nNow;

achow101 commented at 6:56 pm on April 30, 2025:

In 7998eda03402a8dd8227a506c022f8bef81f38ef “validation: write chainstate to disk every hour”

I think it makes more sense to update the timer after the chainstate write, in the same place that m_last_flush was. Even after the change in semantics in the later commits.

andrewtoth commented at 10:15 pm on April 30, 2025:

Done - moved to after the write.

https://github.com/bitcoin/bitcoin/compare/b7f3b6fd96ca2d5d1f9538affef682f778bd07e8..2aadd877f0db599d67d7ca74b8173b0148c9e098.

l0rinc commented at 10:13 am on May 1, 2025:

The move doesn’t seem to affect the evaluation of the very first fPeriodicWrite call, since AcceptBlock (having FlushStateMode::NONE) is always called before ActivateBestChain (havingFlushStateMode::PERIODIC), so m_next_write will already be set (because of m_next_write == NodeClock::time_point::max()) by the time we get to evaluate whether the next period write is due (i.e NodeClock::now() >= m_next_write).

Note that before this code move, the writes happened “roughly an hour after start or the last write was scheduled”, but after this change “roughly an hour after start or the last write finished” - which isn’t negligible on most machines, so this introduces a slight wall-clock drift (i.e. first flush after ~60 minutes, second one at ~60 + ~10 + ~60, third after ~60 + ~10 + ~60 + ~10 + ~60, etc).

I don’t mind, I’m okay with both approaches, just wanted to note that this introduces an extra delay now, compared to what the PR was doing originally (i.e. “write … every hour”).

andrewtoth commented at 11:34 am on May 1, 2025:

before this code move, the writes happened “roughly an hour after start or the last write was scheduled”, but after this change “roughly an hour after start or the last write finished”

I believe that was the intention of this change (to me at least, not sure if @achow101 had something else in mind).

Very long flushes on slow disks could take over an hour (hopefully not anymore when this is merged), so that would schedule the next flush immediately. Lengthy flushes would schedule next flushes sooner than we’d like.

l0rinc commented at 11:44 am on May 1, 2025:

Yes, agree, just noted that this introduces a drift - which, as you also pointed out, makes sense.

andrewtoth force-pushed on Apr 30, 2025

validation: write chainstate to disk every hour

Remove the 24 hour periodic flush interval and
write the chainstate along with blocks and block
index every hour

d73bd9fbe4

refactor: rename fDoFullFlush to should_write b557fa7a17

refactor: replace m_last_write with m_next_write

Co-Authored-By: l0rinc <pap.lorinc@gmail.com>

2e2f410681

validation: add randomness to periodic write interval

Co-Authored-By: Pieter Wuille <pieter@wuille.net>
Co-Authored-By: l0rinc <pap.lorinc@gmail.com>

e976bd3045

in src/validation.cpp:2953 in c779743980 outdated

2976-                   (bool)fFlushForPrune);
2977+        }
2978+
2979+        if (should_write || m_next_write == NodeClock::time_point::max()) {
2980+            constexpr auto range{DATABASE_WRITE_INTERVAL_MAX - DATABASE_WRITE_INTERVAL_MIN};
2981+            m_next_write = FastRandomContext().rand_uniform_delay(nNow + DATABASE_WRITE_INTERVAL_MIN, range);

davidgumberg commented at 10:31 pm on April 30, 2025:

nNow is still getting set above before the write

andrewtoth commented at 10:35 pm on April 30, 2025:

Indeed - fixed!

andrewtoth force-pushed on Apr 30, 2025

achow101 commented at 10:44 pm on April 30, 2025: member

ACK e976bd3045010ee217aa0f2dca4c962aabb789d5

DrahtBot requested review from l0rinc on Apr 30, 2025

DrahtBot requested review from sipa on Apr 30, 2025

davidgumberg commented at 1:12 am on May 1, 2025: contributor

utACK https://github.com/bitcoin/bitcoin/commit/e976bd3045010ee217aa0f2dca4c962aabb789d5

This will likely slow down IBD by a small amount for those running large dbcache values close to or greater than the old maximum (~16GB) because we lose some write “cut-through”¹ but I seriously doubt there are that many people allocating >16GB to their DBCache, and I don’t think sparing those users ~10 minutes during IBD is worth having every node on the network have a medium sized stall to flush every 24 hours, and the risk of losing an IBD after an unclean shutdown.

I think it would be more productive and interesting to look for ways to make dbcache flushes faster, and/or non-blocking, rather than trying to optimize around the flushing behavior that exists today.

If we only flush once at the last possible moment, the disk representation of the chainstate (aka CCoinsViewDB or coinsdb) will be spared of ever having to hear about outputs that were created and spent below the chain tip. ↩︎

in src/validation.cpp:2954 in e976bd3045

2977+        }
2978+
2979+        if (should_write || m_next_write == NodeClock::time_point::max()) {
2980+            constexpr auto range{DATABASE_WRITE_INTERVAL_MAX - DATABASE_WRITE_INTERVAL_MIN};
2981+            m_next_write = FastRandomContext().rand_uniform_delay(NodeClock::now() + DATABASE_WRITE_INTERVAL_MIN, range);
2982         }

l0rinc commented at 9:40 am on May 1, 2025:

nit: to signal the real boundaries of the random range (i.e. current time + random wait in a strictly positive interval), we could extract the start time outside, e.g.:

0m_next_write = NodeClock::now() + FastRandomContext().rand_uniform_delay(DATABASE_WRITE_INTERVAL_MIN, range);

or if you’re feeling adventurous:

0m_next_write = NodeClock::now() +
1               DATABASE_WRITE_INTERVAL_MIN +
2               FastRandomContext().randrange<std::chrono::minutes>(DATABASE_WRITE_INTERVAL_MAX - DATABASE_WRITE_INTERVAL_MIN);

(in this case we could likely store DATABASE_WRITE_INTERVAL directly instead)

in src/validation.cpp:2951 in e976bd3045

2974-                   (uint64_t)coins_count,
2975-                   (uint64_t)coins_mem_usage,
2976-                   (bool)fFlushForPrune);
2977+        }
2978+
2979+        if (should_write || m_next_write == NodeClock::time_point::max()) {

l0rinc commented at 9:52 am on May 1, 2025:

nit: if you think the second part needs explanation (i.e. we’re scheduling the first periodic write regardless of whether that call wrote or not, but if we write after that, we’ll bump the next scheduled write to avoid flushing too often), we could add a comment or extract to a meaningful constant:

0        const bool next_write_set{m_next_write < NodeClock::time_point::max()};
1        if (!next_write_set || should_write) {

or

0        const bool first_flush{m_next_write == NodeClock::time_point::max()};
1        if (first_flush || should_write) {

in src/validation.cpp:2892 in e976bd3045

2903+        // It's been a while since we wrote the block index and chain state to disk. Do this frequently, so we don't need to redownload or reindex after a crash.
2904+        bool fPeriodicWrite = mode == FlushStateMode::PERIODIC && nNow >= m_next_write;
2905+        // Combine all conditions that result in a write to disk.
2906+        bool should_write = (mode == FlushStateMode::ALWAYS) || fCacheLarge || fCacheCritical || fPeriodicWrite || fFlushForPrune;
2907+        // Write blocks, block index and best chain related state to disk.
2908+        if (should_write) {

l0rinc commented at 11:14 am on May 1, 2025:

The scenarios have become quite complex, would be cool to have a debug log here for why the write was triggered in the first place.

l0rinc approved

l0rinc commented at 11:19 am on May 1, 2025: contributor

ACK e976bd3045010ee217aa0f2dca4c962aabb789d5

I’m glad to see this moving along, this will help other optimizations as well by eliminating the worst-case scenario of crashing at the end of IBD with an OOM.

I’m fine with merging as it is, I only have nits and minor suggestions - but will glady reack if they’re applied. I’ve retested it locally by running all tests and doing a few thousand blocks of IBD in debug mode (without any changes and with modified 1min..2min interval to trigger both normal and periodic writes), and am running a full reindex-chainstate with 45GiB memory to compare the performance characteristics (likely not a blocker).

sipa commented at 2:09 pm on May 1, 2025: member

utACK e976bd3045010ee217aa0f2dca4c962aabb789d5

achow101 merged this on May 1, 2025

achow101 closed this on May 1, 2025

andrewtoth deleted the branch on May 1, 2025

l0rinc commented at 3:50 pm on May 2, 2025: contributor

Post-merge good news - the periodic flushing doesn’t seem to introduce any slowdown for reindex-chainstate:

 0COMMITS="68ac9f116c0228a277f18f60ba2278b56356e6ac 58e7798f7e85ecda97e32eec29c190e5ac958dc0"; \
 1STOP_HEIGHT=888888; DBCACHE=45000; \
 2CC=gcc; CXX=g++; \
 3BASE_DIR="/mnt/my_storage"; DATA_DIR="$BASE_DIR/BitcoinData"; LOG_DIR="$BASE_DIR/logs"; \
 4(echo ""; for c in $COMMITS; do git fetch origin $c -q && git log -1 --pretty=format:'%h %s' $c || exit 1; done; echo "") && \
 5hyperfine \
 6  --sort 'command' \
 7  --runs 1 \
 8  --export-json "$BASE_DIR/rdx-${COMMITS// /-}-$STOP_HEIGHT-$DBCACHE-$CC.json" \
 9  --parameter-list COMMIT ${COMMITS// /,} \
10  --prepare "killall bitcoind; rm -f $DATA_DIR/debug.log; git checkout {COMMIT}; git clean -fxd; git reset --hard; \
11    cmake -B build -DCMAKE_BUILD_TYPE=Release -DENABLE_WALLET=OFF && cmake --build build -j$(nproc) --target bitcoind && \
12    ./build/bin/bitcoind -datadir=$DATA_DIR -stopatheight=$STOP_HEIGHT -dbcache=5000 -printtoconsole=0; sleep 100" \
13  --cleanup "cp $DATA_DIR/debug.log $LOG_DIR/debug-{COMMIT}-$(date +%s).log" \
14  "COMPILER=$CC ./build/bin/bitcoind -datadir=$DATA_DIR -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 -stopatheight=$STOP_HEIGHT -dbcache=$DBCACHE"

68ac9f116c Merge bitcoin/bitcoin#32383: util: Remove fsbridge::get_filesystem_error_message() 58e7798f7e validation: add randomness to periodic write interval

0Benchmark 1: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 -stopatheight=888888 -dbcache=45000 (COMMIT = 68ac9f116c0228a277f18f60ba2278b56356e6ac)
1  Time (abs ≡):        15689.654 s               [User: 19488.784 s, System: 640.007 s]
2
3Benchmark 2: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 -stopatheight=888888 -dbcache=45000 (COMMIT = 58e7798f7e85ecda97e32eec29c190e5ac958dc0)
4  Time (abs ≡):        15710.405 s               [User: 19523.635 s, System: 651.407 s]

0Relative speed comparison
1        1.00          COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 -stopatheight=888888 -dbcache=45000 (COMMIT = 68ac9f116c0228a277f18f60ba2278b56356e6ac)
2        1.00          COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 -stopatheight=888888 -dbcache=45000 (COMMIT = 58e7798f7e85ecda97e32eec29c190e5ac958dc0)

The two lines are in fact so parallel, that I was afraid we’re not actually doing anything: time_vs_height

The good news is that IBD was about 5% slower after this change - demonstrating that it’s actually working :D

 0COMMITS="fc6346dbc8dc3db40aad4079210332b5f8b332ed 5b8046a6e893b7fad5a93631e6d1e70db31878af"; \
 1STOP_HEIGHT=888888; DBCACHE=45000; \
 2CC=gcc; CXX=g++; \
 3BASE_DIR="/mnt/my_storage"; DATA_DIR="$BASE_DIR/BitcoinData"; LOG_DIR="$BASE_DIR/logs"; \
 4(echo ""; for c in $COMMITS; do git fetch origin $c -q && git log -1 --pretty=format:'%h %s' $c || exit 1; done; echo "") && \
 5hyperfine \
 6  --sort 'command' \
 7  --runs 1 \
 8  --export-json "$BASE_DIR/ibd-${COMMITS// /-}-$STOP_HEIGHT-$DBCACHE-$CC.json" \
 9  --parameter-list COMMIT ${COMMITS// /,} \
10  --prepare "killall bitcoind; rm -rf $DATA_DIR/*; git checkout {COMMIT}; git clean -fxd; git reset --hard; \
11    cmake -B build -DCMAKE_BUILD_TYPE=Release -DENABLE_WALLET=OFF && cmake --build build -j$(nproc) --target bitcoind && \
12    ./build/bin/bitcoind -datadir=$DATA_DIR -stopatheight=1 -printtoconsole=0; sleep 100" \
13  --cleanup "cp $DATA_DIR/debug.log $LOG_DIR/debug-{COMMIT}-$(date +%s).log" \
14  "COMPILER=$CC ./build/bin/bitcoind -datadir=$DATA_DIR -blocksonly -printtoconsole=0 -stopatheight=$STOP_HEIGHT -dbcache=$DBCACHE"

fc6346dbc8 Merge bitcoin/bitcoin#32389: doc: Fix test_bitcoin path 5b8046a6e8 Merge bitcoin/bitcoin#30611: validation: write chainstate to disk every hour

0Benchmark 1: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -blocksonly -printtoconsole=0 -stopatheight=888888 -dbcache=45000 (COMMIT = fc6346dbc8dc3db40aad4079210332b5f8b332ed)
1  Time (abs ≡):        26781.780 s               [User: 32015.650 s, System: 1516.174 s]
2 
3Benchmark 2: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -blocksonly -printtoconsole=0 -stopatheight=888888 -dbcache=45000 (COMMIT = 5b8046a6e893b7fad5a93631e6d1e70db31878af)
4  Time (abs ≡):        28128.527 s               [User: 34259.094 s, System: 2326.679 s]

0Relative speed comparison
1        1.00          COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -blocksonly -printtoconsole=0 -stopatheight=888888 -dbcache=45000 (COMMIT = fc6346dbc8dc3db40aad4079210332b5f8b332ed)
2        1.05          COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -blocksonly -printtoconsole=0 -stopatheight=888888 -dbcache=45000 (COMMIT = 5b8046a6e893b7fad5a93631e6d1e70db31878af)

You can see the horizontal smudges (and the added vertical lines) for when we’re flushing the dirty entries in the red line (missing from the reference blue line):

time_vs_height

It also demonstrates nicely the time difference of flush duration vs periodic flush interval, and the random nature of the +- 10 minutes.

We can also see the final flush is taking less time now that intermediary dirties are gone (~6 minutes):

02025-05-02T12:01:38Z UpdateTip: new best=00000000000000000001347938c263a968987bf444eb9596ab0597f721e4e9e8 height=888888 version=0x24bc4000 log2_work=95.509247 tx=1169009164 date='2025-03-22T09:15:51Z' progress=0.986153 cache=32366.8MiB(219037232txo)
1...
22025-05-02T12:01:38Z [warning] Flushing large (31 GiB) UTXO set to disk, it may take several minutes
32025-05-02T12:07:25Z Shutdown: done

vs master (13 minutes):

02025-05-02T03:58:55Z UpdateTip: new best=00000000000000000001347938c263a968987bf444eb9596ab0597f721e4e9e8 height=888888 version=0x24bc4000 log2_work=95.509247 tx=1169009164 date='2025-03-22T09:15:51Z' progress=0.986264 cache=27373.6MiB(175983713txo)
1...
22025-05-02T03:58:55Z [warning] Flushing large (26 GiB) UTXO set to disk, it may take several minutes
32025-05-02T04:11:52Z Shutdown: done

I think we still have to investigate:

Why is there no sign of periodic flushing during reindex?
Why did we get 2 flushing warnings during reindex after this change?
Why did the reported utxo size increase during IBD after the change?

Edit:

Looks like we may be double-counting something now: cache_vs_time

I’ve also opened a follow-up with a few remaining nits and some logging to be able to provide better insights: https://github.com/bitcoin/bitcoin/pull/32404

andrewtoth commented at 8:54 pm on May 2, 2025: contributor

Why is there no sign of periodic flushing during reindex?

Indeed, it seems we do not get to this line during reindex-chainstate. Will have to investigate. Periodically flushing during reindex-chainstate would be useful.

Why did the reported utxo size increase during IBD after the change?

There is less opportunity for utxo cut-through after this change. So, some utxos that would previously have been deleted are now kept as dirty entries in the cache until the next periodic flush.

mzumsande commented at 9:06 pm on May 2, 2025: contributor

Indeed, it seems we do not get to this line during reindex-chainstate.

It makes sense: reindex-chainstate is nothing more than one call to ActivateBestChain() (which will connect all connectable blocks). Since we only flush at the end of the function, we cannot flush more than once currently.

I think that it should probably be possible to add some kind of periodic flushing to the outer one of the two nested loops of ABC (after we release cs_main) to enable this.

Sjors commented at 11:11 am on May 5, 2025: member

To clarify, is this PR supposed to write to disk during IBD? My own (post merge, master @ 5b8046a6e893b7fad5a93631e6d1e70db31878af) testing with a very large -dbcache doesn’t show that, since the ~/.bitcoin/chainstate directory remains 16 kb for at least a few hours.

l0rinc commented at 11:50 am on May 5, 2025: contributor

Can you please show the exact args you were using? Are you using -addnode=local-network? Is it possible that behaves like a reindex-chainstate regarding periodic flushes (see #30611 (comment))?

This is currently only triggered with actual IBD with real peers.

Enabling regular flushes for reindex(-chainstate) (and maybe even for a fixed nodes, not sure) is done separately in this PR: #32414 Can you check if that fixes it for you?

Sjors commented at 1:21 pm on May 5, 2025: member

@l0rinc ah, I did have an addnode with a local peer in my config file for this otherwise fresh node. Let me try again without that. But more importantly I did something wrong with compilation too, so it’s possible I ran a completely different version. Trying again.

But even after several days (in standby, so IBD wasn’t finished), it didn’t flush, so that’s probably due to the local peer.

andrewtoth commented at 2:21 pm on May 5, 2025: contributor

@Sjors I just tested this by modifying the lines here to be 1min and 2min, deleted my blocks, chainstate, and indexes directories and ran ./build/bin/bitcoind -maxconnections=0 -addnode=<local_node> -debug=coindb. I can confirm it flushes between 1 and 2 minutes after startup and continuously after, due to the change in chainstate directory size and the coindb logs.

andrewtoth commented at 3:26 pm on May 5, 2025: contributor

Just tested with -dbcache=45000 -connect=<local_node> and after about an hour I have [warning] Flushing large (7 GiB) UTXO set to disk, it may take several minutes.

Sjors commented at 3:34 pm on May 5, 2025: member

Indeed I also saw that flush message after 1 hour, the chaintate dir grew, while cache=... in the logs didn’t drop. Will keep an eye on it.

validation: write chainstate to disk every hour #30611

Code Coverage & Benchmarks

Reviews

Conflicts

git checkout 5e90501abe13731579c61b13092b31b9fc4f8c4d

git checkout ffc9df87a0da35b451720556bdb3f0f65f315c8f

git checkout f5e7b1508a06c0c7a9c06d1ac270840fd72e9e6f