More recent checkpoints?

achow101 commented at 5:18 pm on February 24, 2016: member

How come the latest checkpoint is only block 295000? Why aren’t more recent block checkpoints included?

gmaxwell commented at 5:25 pm on February 24, 2016: contributor

Because the functionality is largely deprecated; it causes a lot of confusion about the security model and most of what it usefully accomplished is now (or can be) accomplished in better ways.

achow101 commented at 5:29 pm on February 24, 2016: member

Oh, I thought that checkpoints made syncing faster.

If they are deprecated, why do we still keep them?

gmaxwell commented at 5:34 pm on February 24, 2016: contributor

At least on reasonably fast hardware they don’t change it that much, I benchmarked with and without prior to 0.12 release and it was perhaps a 10% difference (e.g. 3.5 hrs vs 4). The fact that libsecp256k1 made signature verification 7x-ish faster made a big difference. They were initially added to the software as the result of another not-understood-at-the-time bug that harmed verification performance which was long since fixed.

Removing them entirely requires replacing the role they serve in preventing low difficulty header flooding attacks, which would need some design and development.

laanwj added the label Validation on Feb 24, 2016

mruddy commented at 9:51 pm on February 27, 2016: contributor

I got interested in this particular performance question when I went to prune a x86_64 full node’s datadir that had a full txindex so that I could put it on some of the lower power ARM SBCs (single board computers) that I have. Since pruning is not compatible with -txindex, that gets switched off and then the whole chain has to get re-indexed before it can be pruned.

When 0.12 came out a few days ago, I wondered if updating the checkpoints to 395,000 would make much of a difference. I ran tests with various system configurations. The two competing versions of the software I used were the official 0.12 release and master + the patch to use block 395,000 as the latest checkpoint. The tests were simply to re-index the chain. I used an x86_64 quad core that runs at roughly 2.8 GHz when frequency scaling is in performance mode and a little less (it fluctuates during the test) when in powersave. I tried an unencrypted 5400 rpm sata drive and a samsung 840 pro ssd with luks full disk encryption. 8GB ram with dbcache set to 6000. Between any two competing runs on the same system config, the checkpoint change did NOT make any real difference. And, like Greg said, across the tests, on the order of 4 hours +- half an hour is basically what I got consistently.

I also started to run tests on some armv7 hard float machines (odroid xu4, c1+, and ras pi 2 – didn’t test a Jetson TK1 yet). But, man those are so much slower (for a number of reasons including storage speed, no assembly optimizations, etc…). It’s like 9-15 minutes on my x86_64 configs vs around 1.5-2.5 hours on the various SBC configs for the first 200,000 blocks or so. I didn’t even let those tests complete since I just didn’t have the time.

With the ARM machines in mind, I was thinking of putting a PR forward for block 395,000. If there’s any desire for that, I can do it. But, I didn’t yet because I just figured that re-indexing on the ARM SBCs is just using the wrong tool for the job. If you can run an x86_64 full node and prune datadir and then copy it to the ARM machine, do that. It’s more of a question of if we should try to support the lower power machines for people that don’t have the higher power luxury.

gmaxwell commented at 10:45 pm on February 27, 2016: contributor

@mruddy Thanks for the confirmation.

There is still a lot of speedup available for signature validation on ARM– we’re not using the assembly that wumpus wrote (mostly as a result of the development team not having enough expertise to review it; and because arm is so slow that we gain testing confidence slowly). Those alone are a 2x validation speed improvement.

But as you say, validating performance on a low end SBC is just not great. It can be papered over by bypassing it, but I think there are better ways to do that (e.g. don’t validate in IBD for blocks that are burried by a month of POW) which don’t keep sending researchers down blind alleys like static checkpoints do. (The biggest effect of checkpoints these days is their invocation in papers describing centralized consensus systems as justification for a (false) equivalence of their approach and Bitcoin.)

mruddy commented at 0:14 am on February 28, 2016: contributor

@gmaxwell Regarding the ARM sig val speedup (https://github.com/bitcoin/secp256k1/pull/366 and https://github.com/bitcoin/secp256k1/pull/173), I don’t have the assembly experience to review it, but I did just get a Jetson TK1 SBC (http://elinux.org/Jetson_TK1) that should be pretty fast and able to use faster SSD storage (assuming that’s the case, I’d still need to try it). If I benchmarked 0.12 on that and then added the ARM assembly optimizations and ran the same re-indexing benchmark test, would that help you guys move forward with those changes? It would provide a comparison of the speedup, but probably more importantly, it would also show that the speedup does not introduce some chain splitting regression (at least as far as validating the ~400k blocks that have been created so far). I guess the question is, has the slowness of ARM SBC validation hindered your regression testing efforts in a way that my faster board could possibly help?

mruddy commented at 12:25 pm on February 28, 2016: contributor

FWIW, I setup the Jetson TK1 last night with a Samsung SSD 840 EVO 250GB, set the quad core Cortex-A15 cluster to full speed @ 2.32GHz, compiled tag v0.12.0, and let it chug away re-indexing a datadir that I put on the SSD (with -dbcache=1024).

At 07:19:31 into the test, while attempting to process block 00000000000000001562e39680215040651c7dfd111cfbb9a7d6550f336297cf height=336596, it errored with “LoadExternalBlockFile: Deserialize or I/O error - std::bad_alloc”. I guess my dbcache was too aggressive. The first 200,000 blocks took 00:45:49 to process.

I figure since I got past block 295,000 I can still have some data to compare against the checkpoint at 395,000 patch, so I just patched that and started another test. We’ll see if it gets any farther and/or any faster. That’ll at least be something until I figure out what dbcache setting works better.

Here’s a chart of the HH:MM:SS it took to process the previous 5000 blocks by block height:

rebroad commented at 8:59 pm on February 28, 2016: contributor

@gmaxwell The month deep blocks patch sounds like an excellent idea indeed. I was about to run a full node on a Raspberry Pi and was thinking of patching it to allow me to set checkpoints from the command line / config file, using a block hash from another full node I run, or perhaps extending this further, allowing it to have configured a node which it trusts and can use as a checkpoint against. I’m wondering why this hasn’t already been done by anyone else so far.

mruddy commented at 10:03 pm on February 28, 2016: contributor

I just got data from my second Jetson TK1 run. Same config as before except that I patched in block 395,000 as the latest checkpoint. This time I got the LoadExternalBlockFile: Deserialize or I/O error - std::bad_alloc at 00000000000000000ee1ba35093a736f150a113d0231d50c29b37c3f05d1ae62 height=346233. So it got 9637 blocks farther and the run time was 19 seconds shorter. So, that points to the checkpoint update making some difference, but it’s not all that meaningful since at this rate, re-indexing on this class machine is still seemingly futile. @rebroad I have a Raspberry Pi 2, and that is a much less powerful computer than this Jetson TK1 ARM SBC config that I’m telling you all about. I expect you’d have the same kind of problems (only slower) if you tried to re-index the whole chain on the Ras Pi.

BTW, this was the patch:

 0diff --git a/src/chainparams.cpp b/src/chainparams.cpp
 1index 9cf9949..63058ae 100644
 2--- a/src/chainparams.cpp
 3+++ b/src/chainparams.cpp
 4@@ -135,11 +135,12 @@ public:
 5             (225430, uint256S("0x00000000000001c108384350f74090433e7fcf79a606b8e797f065b130575932"))
 6             (250000, uint256S("0x000000000000003887df1f29024b06fc2200b55f8af8f35453d7be294df2d214"))
 7             (279000, uint256S("0x0000000000000001ae8c72a0b0c301f67e3afca10e819efa9041e458e9bd7e40"))
 8-            (295000, uint256S("0x00000000000000004d9b4ef50f0f9d686fd69db2e03af35a100370c64632a983")),
 9-            1397080064, // * UNIX timestamp of last checkpoint block
10-            36544669,   // * total number of transactions between genesis and last checkpoint
11+            (295000, uint256S("0x00000000000000004d9b4ef50f0f9d686fd69db2e03af35a100370c64632a983"))
12+            (395000, uint256S("0x0000000000000000014b0f613fa97d3f512666a8aa4fd06132ff4fd330a0a664")),
13+            1453737986, // * UNIX timestamp of last checkpoint block
14+            105734718,  // * total number of transactions between genesis and last checkpoint
15                         //   (the tx=... number in the SetBestChain debug.log lines)
16-            60000.0     // * estimated number of transactions per day after checkpoint
17+            120000.0    // * estimated number of transactions per day after checkpoint
18         };
19     }
20 };

Graph from the second run:

mruddy commented at 1:22 am on October 17, 2016: contributor

@gmaxwell I was studying the current checkpoint implementation in master today and was wondering if they are still validly serving their role with respect to what you mentioned about them “preventing low difficulty header flooding attacks”.

With headers-first IBD, it seems like the checkpoints are avoidable by a dishonest sync node peer. For example, a new node connects to a dishonest sync node that sends only headers that it has computed for a chain that never includes the earliest mainnet checkpoint, block height=11,111 hash=0000000069e244f73d78e8fd29ba2fd2ed618bd6fa2ee92559f542fdb26e7c1d (and therefore any of the other configured checkpoints either). GetLastCheckpoint will never find a checkpoint in mapBlockIndex (because none ever get added – mapBlockIndex is attacker controlled) and that makes CheckIndexAgainstCheckpoint never fail.

Is this the kind of DoS that the checkpoints were meant to protect against?

EDIT: Now that I typed this up a little while ago, it occurs to me that what you probably meant was about general low difficulty blocks/branches being flooded out to all nodes. The checkpoints will protect against that still. That’s different than just targeting new IBD nodes that don’t have a header chain yet. I was just too focused on the one idea.

laanwj commented at 9:02 am on October 18, 2016: member

Because the functionality is largely deprecated; it causes a lot of confusion about the security model

Yes, the checkpoints need to go. This is a source of enduring confused complaints as to the security model.

May make sense to replace some of the functionality (from important to unimportant respectively):

Another way to avoid initial sync DoS by low-difficulty flooding
Skip signature checking for known chain
A block-to-progress map (for progress display in the UI)

None of these requires checkpoints as such.

mruddy commented at 11:42 am on October 18, 2016: contributor

What I was saying before was that the current implementation does not serve the most important of those functions of being a “way to avoid initial sync DoS by low-difficulty flooding”. They protect against another type of DoS flooding, but that type would be done against all nodes, not just initially syncing ones. This is what I was trying to clarify with Greg in my last comment, so let me know if you see it differently.

For the second function, did you know that the current checkpoints impl does not skip all the signature checking for pre-checkpoint blocks? For example, during headers-first IBD, it is a race between how fast the header chain is built relative to the next checkpoint available and how fast the block responses come in from async block requests. Once a checkpointed header is sync’ed, then all blocks before it will skip signature validation, but before that happens, sig checking is on.

As a way to judge progress, the current checkpoints are so out of date that I don’t think they serve the purpose well either.

Checkpoints, as I assumed they worked before I looked at how they actually work, could be used to remove consensus code that is only used to validate blocks buried deep in the chain. #8391 is the reason why I dug into how the checkpoints are actually working. Thus they could be used as a way to cleanup the codebase if implemented differently. PoW purists would probably not like that, but then they probably would not like #8391 either. What do you think?

achow101 closed this on Oct 18, 2016

gmaxwell commented at 6:10 am on March 10, 2017: contributor

What I was saying before was that the current implementation does not serve the most important of those functions of being a “way to avoid initial sync DoS by low-difficulty flooding”.

I am terribly sorry for not responding to you until now. I just saw your reply.

They protect any node once it has the initial headers, which they get very fast at start. So the exposure there is limited and install time only. (e.g. if someone can jam up you setting up a new node, that is hardly that big of a deal, vs knocking out running nodes.)

For the second function, did you know that the current checkpoints impl does not skip all the signature checking for pre-checkpoint blocks? For example, during headers-first IBD, it is a race between how fast the header chain is built relative to the next checkpoint available and how fast the block responses

Yes, we knew… but all the headers were typically received before the block gets to block 50,000 and there are virtually no signatures before there. Checkpoints are no longer used for this, but the current mechanism used now has the same behavior.

could be used to remove consensus code that is only used to validate blocks buried deep in the chain.

It would be fine to recast the rule as applying everywhere with exceptions or what not (which is what we’ve done for some things in the past). It is not acceptable to fix a particular chain, and a lot of work has gone into getting checkpoints out.

MarcoFalke locked this on Sep 8, 2021

More recent checkpoints? #7591