RFC: blocks: add `-reobfuscate-blocks` arg to xor existing blk/rev on startup

l0rinc commented at 11:51 pm on September 5, 2025: contributor

Draft for now - comments are welcome

Should we repurpose the existing -blocksxor arg instead?
Would multithreading help enough to justify the extra complexity, given the current IO-bound profile?
Should we split ObfuscateBlocks out of init? I have split it into many local lambdas, but we may want to find better home for those methods…
Which additional scenarios should we cover in testing (resume mid-run, explicit 16-hex key path, large datadirs, pruned)?
Can we safely assume the extra memory and disk space is available?
Release notes, translations?
Given user concern about what they store on disk, should we consider including this in v30?

Context

Recent discussions highlighted that many nodes which synced before Bitcoin Core v28 have their block and undo files stored effectively in the clear (zero XOR key). This patch adds a simple, resumable maintenance tool to obfuscate previously raw block files, rotate an existing key to a fresh random one, or de-obfuscate (set key to zero) if consciously chosen, all without requiring resync. The operation can be cancelled and restarted safely.

Implementation

The new startup option -reobfuscate-blocks[=VALUE] accepts either 16 hex characters as an exact 8-byte XOR key (little-endian in-memory layout) or a boolean to generate a random 64-bit key. e.g. -reobfuscate-blocks=0000000000000000 sets the key to zero, effectively removing obfuscation.

If we detect unobfuscated blocks at start time we suggest this new option in a warning.

At startup, we iterate over all (blk|rev)*.dat files, read them with the old XOR key and write them back with the new key (<name>.reobfuscated). After successful write, we immediately delete the old file. Once all files are staged, we rename them back and atomically swap xor.dat.reobfuscated → xor.dat and continue operation.

We log the old and new keys and print progress roughly per-percent as files complete (i.e. max 100 progress logs).

Constraints

Re-obfuscation resumes automatically (detected via xor.dat.reobfuscated) even without the flag. In worst-case a crash should only force us to redo previous work.
We need to load the whole blockfile in memory (~160MB peak memory usage) and write and extra blockfile.
Single-threaded, processing one file at a time to keep code simple and avoid complexity of interleaving renames and key swaps across threads.
Fast in practice with sequential read/modify/write per blockfile - after recent obfuscation vectorization, this path is very quick.

Performance

cpu	hdd/ssd	block count	size	files	time (min)	blocks/min
Apple M4 Max laptop	SSD	~909k	~707 GB	9,982	6.2	146,612
Intel Core i9	SSD	~909k	~725 GB	10,238	23.1	39,351
Intel Core i7	HDD	~909k	~720 GB	10,156	208.7	4,356
Raspberry Pi 4B	HDD	~738k	~601 GB	8,511	1237	596

Similar work:

DrahtBot commented at 11:51 pm on September 5, 2025: contributor

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/33324.

Reviews

See the guideline for information on the review process.

Type	Reviewers
Concept ACK	TheCharlatan, stickies-v

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#33231 (net: Prevent node from binding to the same CService by w0xlt)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

LLM Linter (✨ experimental)

Possible typos and grammar issues:

blocksdir -> blocks directory [more natural and clear phrasing; “blocksdir” is nonstandard and may confuse readers]

drahtbot_id_5_m

refactor: inline constant `f_obfuscate = false` parameter 4e3d61b84c

l0rinc force-pushed on Sep 6, 2025

refactor: add path + string and file removal helpers baad60d064

l0rinc force-pushed on Sep 7, 2025

init: add -`reobfuscate-blocks` argument 09a083ca9e

l0rinc force-pushed on Sep 8, 2025

blocks: add `-reobfuscate-blocks` to xor existing blk/rev on startup

### Context

Recent discussions highlighted that many nodes which synced before Bitcoin Core v28 have their block and undo files stored effectively in the clear (zero XOR key). This patch adds a simple, resumable maintenance tool to obfuscate previously raw block files, rotate an existing key to a fresh random one, or deobfuscate (set key to zero) if consciously chosen, all without requiring resync. The operation can be cancelled and restarted safely.

### Implementation

The new startup option `-reobfuscate-blocks[=VALUE]` accepts either 16 hex characters as an exact 8-byte XOR key (little-endian in-memory layout) or a boolean to generate a random 64-bit key. e.g. `-reobfuscate-blocks=0000000000000000` sets the key to zero, effectively removing obfuscation.

If we detect unobfuscated blocks at start time we suggest this new option in a warning.

At startup, we iterate over all `(blk|rev)*.dat` files, read them with the old XOR key and write them back with the new key (`<name>.reobfuscated`).
The implementation actually combines the two keys and reads directly into the new obfuscated version to only do a single iteration over the data. This works if the original blocks aren't obfuscated or if the new blocks aren't or if both are.
After successful write, we immediately delete the old file. Once all files are staged, we rename them back and atomically swap `xor.dat.reobfuscated` → `xor.dat` and continue operation.

We log the old and new keys and print progress roughly per-percent as files complete (i.e. max 100 progress logs).

### Constraints

* Re-obfuscation resumes automatically (detected via `xor.dat.reobfuscated`) even without the flag. In worst-case a crash should only force us to redo previous work.
* We need to load the whole blockfile in memory (<130MB) and write and extra blockfile.
* Single-threaded, processing one file at a time to keep code simple and avoid complexity of interleaving renames and key swaps across threads.
* Fast in practice with sequential read/modify/write per blockfile - after recent obfuscation vectorization, this path is very quick.

### Performance

> M4 Max laptop with SSD

* ~900k blocks (~707GB) - `[obfuscate] finished migrating 9982 file(s) in 372s`

> i9 with SSD

* ~900k blocks (~725GB) - `[obfuscate] finished migrating 10238 file(s) in 1386s`

-----

Similar attempts: #32451 and andrewtoth/blocks-xor

Co-authored-by: Andrew Toth <andrewstoth@gmail.com>
Co-authored-by: Murch <murch@murch.one>

d3962f65fa

l0rinc force-pushed on Sep 8, 2025

TheCharlatan commented at 8:21 am on September 8, 2025: contributor

Concept ACK

stickies-v commented at 10:06 pm on September 8, 2025: contributor

Concept ACK, this seems like useful functionality to expose.

Should we split ObfuscateBlocks out of init? I have split it into many local lambdas, but we may want to find better home for those methods…

I don’t like using startup options for one-time operations (I feel the same about e.g. -reindex). Without having thought it through too much yet, maybe we can bundle this e.g. as part of bitcoin-util or a separate bitcoin-xor-blocks utility?

Should we repurpose the existing -blocksxor arg instead?

With this PR, IIUC we’d have -blocksxor, reobfuscate-blocks, and the existence of the xor.dat file that all have some redundancy and thus potential for conflict (e,g. blocksxor=0, reobfuscate-blocks=1, and a non-zero xor.dat file). Reducing that complexity seems like it would be useful.

DrahtBot added the label Needs rebase on Sep 9, 2025

DrahtBot commented at 11:39 pm on September 9, 2025: contributor

🐙 This pull request conflicts with the target branch and needs rebase.

RFC: blocks: add -reobfuscate-blocks arg to xor existing blk/rev on startup #33324