blocks: add -reobfuscate-blocks argument to enable (de)obfuscating existing blocks #33324

pull l0rinc wants to merge 5 commits into bitcoin:master from l0rinc:l0rinc/reobfuscate-blocks changing 10 files +258 −12
  1. l0rinc commented at 11:51 pm on September 5, 2025: contributor

    Context

    Recent discussions highlighted that many nodes which synced before Bitcoin Core v28 have their block and undo files stored effectively in the clear (zero XOR key). This patch adds a simple, resumable maintenance tool to obfuscate previously raw block files, rotate an existing key to a fresh random one, or de-obfuscate (set key to zero) if consciously chosen, all without requiring resync. The operation can be cancelled and restarted safely.

    Implementation

    The new startup option -reobfuscate-blocks[=VALUE] accepts either 16 hex characters as an exact 8-byte XOR key (little-endian in-memory layout) or a boolean to generate a random 64-bit key. e.g. -reobfuscate-blocks=0000000000000000 sets the key to zero, effectively removing obfuscation.

    If we detect unobfuscated blocks at start time we suggest this new option in a warning.

    At startup, we iterate over all undo and block files (grouping the block and undo files for more uniform iteration), read them with the old XOR key and write them back with the new key (<name>.reobfuscated). The implementation actually combines the two keys and reads directly into the new obfuscated version to only do a single iteration over the data. This works if the original blocks aren’t obfuscated or if the new blocks aren’t or if both are. After successful write, we immediately delete the old file. Once all files are staged, we rename them back and atomically swap xor.dat.reobfuscatedxor.dat and continue operation.

    We log the old and new keys and print progress roughly per-percent as files complete (i.e. max 100 progress logs).

    Constraints

    • Re-obfuscation resumes automatically (detected via xor.dat.reobfuscated) even without the flag. In worst-case a crash should only force us to redo previous work.
    • Single-threaded, processing one file at a time to keep code simple and avoid complexity of interleaving renames and key swaps across threads.
    • Fast in practice with sequential read/modify/write per blockfile - after recent obfuscation vectorization, this path is very quick.

    Reproducer

    0cmake -B build -DENABLE_IPC=ON -DBUILD_GUI=ON
    1cmake --build build -j$(nproc)
    2# command line
    3./build/bin/bitcoind -reobfuscate-blocks -stopatheight=1
    4# same with GUI
    5./build/bin/bitcoin-qt -reobfuscate-blocks -stopatheight=1
    

    Single-threaded Performance

    cpu hdd/ssd block count size files time (min) blocks/min
    Apple M4 Max laptop SSD ~909k ~707 GB 9,982 8.4 146,613
    Intel Core i9 SSD ~909k ~725 GB 10,238 23.1 39,351
    Raspberry Pi 5 SSD ~914k ~728 GB 10,276 72.78 12,558
    Intel Core i7 HDD ~909k ~720 GB 10,156 208.7 4,356
    Raspberry Pi 4B HDD ~915k ~730 GB 10,304 1467 624

    Similar work:

  2. DrahtBot commented at 11:51 pm on September 5, 2025: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/33324.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    Concept ACK sedited, stickies-v

    If your review is incorrectly listed, please copy-paste <!–meta-tag:bot-skip–> into the comment that the bot should ignore.

    Conflicts

    No conflicts as of last run.

    LLM Linter (✨ experimental)

    Possible places where named args for integral literals may be used (e.g. func(x, /*named_arg=*/0) in C++, and func(x, named_arg=0) in Python):

    • notifications.progress(_(“Reobfuscating blocks…”), 0, false) in src/init.cpp
    • notifications.progress(_(“Reobfuscating blocks…”), percentage, false) in src/init.cpp
    • notifications.progress(_(“Reobfuscating blocks…”), 100, false) in src/init.cpp

    2025-12-10

  3. l0rinc force-pushed on Sep 6, 2025
  4. l0rinc force-pushed on Sep 7, 2025
  5. l0rinc force-pushed on Sep 8, 2025
  6. l0rinc force-pushed on Sep 8, 2025
  7. sedited commented at 8:21 am on September 8, 2025: contributor
    Concept ACK
  8. stickies-v commented at 10:06 pm on September 8, 2025: contributor

    Concept ACK, this seems like useful functionality to expose.

    Should we split ObfuscateBlocks out of init? I have split it into many local lambdas, but we may want to find better home for those methods…

    I don’t like using startup options for one-time operations (I feel the same about e.g. -reindex). Without having thought it through too much yet, maybe we can bundle this e.g. as part of bitcoin-util or a separate bitcoin-xor-blocks utility?

    Should we repurpose the existing -blocksxor arg instead?

    With this PR, IIUC we’d have -blocksxor, reobfuscate-blocks, and the existence of the xor.dat file that all have some redundancy and thus potential for conflict (e,g. blocksxor=0, reobfuscate-blocks=1, and a non-zero xor.dat file). Reducing that complexity seems like it would be useful.

  9. DrahtBot added the label Needs rebase on Sep 9, 2025
  10. l0rinc force-pushed on Sep 10, 2025
  11. DrahtBot removed the label Needs rebase on Sep 10, 2025
  12. in src/init.cpp:1366 in aa587f3740 outdated
    1361+    std::vector<std::byte> buf;
    1362+    buf.resize(node::MAX_BLOCKFILE_SIZE);
    1363+
    1364+    // Migrate undo and block files atomically
    1365+    for (const auto& [name, files] : {std::make_pair("undo", collect_undo_files()),
    1366+                                      std::make_pair("block", collect_block_files())}) {
    


    ajtowns commented at 5:10 am on September 11, 2025:
    I was surprised when it hit 100% of the undo files then restarted on the block files and was much slower – might be better to do the slow files first, or ideally to do block and undo files intermixed so you just have a single 0% to 100% run.

    l0rinc commented at 3:56 am on September 12, 2025:
    I had that version before, but didn’t like that the small and big files made the percentages look unevenly spaced. But I have reverted that version and shuffled the files, this should make the progress feel more uniform - thank you for the observation!
  13. in src/init.cpp:1345 in aa587f3740 outdated
    1346+        old_blocks.read(buf);
    1347+
    1348+        AutoFile new_blocks{fsbridge::fopen(file + suffix, "wb")};
    1349+        new_blocks.write_buffer(buf);
    1350+
    1351+        if (old_blocks.fclose() || !new_blocks.Commit() || new_blocks.fclose()) return false;
    


    ajtowns commented at 5:22 am on September 11, 2025:

    Would be good to try to reset the timestamp of new_blocks to match that of old_blocks here.

    0        // attempt to preserve timestamp
    1        fs::last_write_time(file + suffix, fs::last_write_time(file));
    

    l0rinc commented at 3:56 am on September 12, 2025:
    Done, thanks!
  14. in src/init.cpp:1335 in aa587f3740 outdated
    1341+    }};
    1342+
    1343+    auto migrate_single_blockfile{[&](const fs::path& file, const Obfuscation& delta_obfuscation, std::vector<std::byte>& buf) -> bool {
    1344+        AutoFile old_blocks{fsbridge::fopen(file, "rb"), delta_obfuscation}; // deobfuscate & reobfuscate with a single combined key
    1345+        buf.resize(fs::file_size(file)); // reuse buffer
    1346+        old_blocks.read(buf);
    


    ajtowns commented at 5:23 am on September 11, 2025:

    Rather than reading the entire blockfile into memory at once, consider chunking it:

    0        size_t left = fs::file_size(file);
    1        while (left > 0) {
    2            size_t chunk = std::min<size_t>(left, 2 * MAX_BLOCK_SERIALIZED_SIZE);
    3            buf.resize(chunk);
    4            old_blocks.read(buf);
    5            new_blocks.write_buffer(buf);
    6            left -= chunk;
    7        }
    

    l0rinc commented at 5:49 am on September 11, 2025:
    We could do that with the recently introduced buffered readers - but that’s considerably slower. Is it a problem to read all of it in memory when we don’t even have dbcache yet? The total memory usage is just 160 MB during migration, we should be fine until 1 GB at least, right?

    ajtowns commented at 6:20 am on September 11, 2025:

    I don’t think buffered readers is the right thing (that’s for when you want to process small amounts of data while still reading it from the file in large chunks), and trying the above didn’t seem particularly slow to me.

    I guess it could be simplified a bit to:

    0buf.resize(2 * MAX_BLOCK_SERIALIZED_SIZE);
    1while (true) {
    2    size_t size = old_blocks.detail_fread(buf);
    3    if (size == 0) break;
    4    new_blocks.write_buffer(std::span(buf, 0, size));
    5}
    

    l0rinc commented at 3:58 am on September 12, 2025:
    What’s problem would chunking solve in your opinion? I don’t mind doing it, but the current version is slightly simpler and slightly faster, so I need at least some justification for giving up both :)

    ajtowns commented at 12:36 pm on September 12, 2025:

    Loading a large file entirely into memory when it’s not necessary is just bad practice. What if we changed to .blk files of 1GB each? What if we’re running on a node that’s memory constrained and configured dbcache down to 4MB?

    If we’re worried about speed, then doing it in parallel helps on my system since obfuscation ends up CPU bound when single-threaded – with the current code, it takes 238s (4min); with 8 threads it’s 65s; with 16 threads it’s 47s. Using 16MB chunks (BLOCKFILE_CHUNK_SIZE), 8 threads is also ~128MB of memory, but a user on a severely memory constrained system could reduce the thread count if they wanted. Here’s roughly what I’m thinking: https://github.com/ajtowns/bitcoin/commits/202509-reobfus/


    l0rinc commented at 1:48 pm on September 12, 2025:

    Loading a large file

    This is run on request, before anything else loads, it’s not that large, only 160 MB memory is needed.

    For reference, applying the mentioned dbcache=4 (which isn’t used here yet) still makes the node use > 1 GB memory:

    Edit: doing an actual massif memory measurement with dbcache=4 and -blocksonly reveals that the actual memory usage is lower than that (but still higher than the 160MB needed for a single blockfile):

     0Command:            ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=200000 -dbcache=4 -blocksonly -printtoconsole=1                                               
     1Massif arguments:   --time-unit=ms --massif-out-file=/mnt/my_storage/logs/massif-e66f04d0131b8c2db13ddd649e9eb20910eb6d1d-200000-4.out                                                    
     2ms_print arguments: /mnt/my_storage/logs/massif-e66f04d0131b8c2db13ddd649e9eb20910eb6d1d-200000-4.out                                                                                     
     3--------------------------------------------------------------------------------                                                                                                          
     4                                                                                                                                                                                          
     5                                                                                                                                                                                          
     6    MB                                                                                                                                                                                    
     7383.1^#                                                                                                                                                                                   
     8     |#                                                                                                                                                                                   
     9     |#                                                                                                                                                                                   
    10     |#                                                                                                                                                                                   
    11     |#                                                                                                                                                                                   
    12     |#                                                                                                                                                                                   
    13     |#                                   :  : :::      :  :    ::::  @                                                                                                                   
    14     |#   :@: :::@::::::::::::::@:::@::::::::::::::::::@:::::::@::::::@:::::::                                                                                                            
    15     |#::::@::: :@::: ::::::::::@:::@::::::::::::::::::@:::::::@::::::@:::::::                                                                                                            
    16     |#:: :@::: :@::: ::::::::::@:::@::::::::::::::::::@:::::::@::::::@:::::::                                                                                                            
    17     |#:: :@::: :@::: ::::::::::@:::@::::::::::::::::::@:::::::@::::::@:::::::                                                                                                            
    18     |#:: :@::: :@::: ::::::::::@:::@::::::::::::::::::@:::::::@::::::@:::::::                                                                                                            
    19     |#:: :@::: :@::: ::::::::::@:::@::::::::::::::::::@:::::::@::::::@:::::::                                                                                                            
    20     |#:: :@::: :@::: ::::::::::@:::@::::::::::::::::::@:::::::@::::::@:::::::                                                                                                            
    21     |#:: :@::: :@::: ::::::::::@:::@::::::::::::::::::@:::::::@::::::@:::::::                                                                                                            
    22     |#:: :@::: :@::: ::::::::::@:::@::::::::::::::::::@:::::::@::::::@:::::::                                                                                                            
    23     |#:: :@::: :@::: ::::::::::@:::@::::::::::::::::::@:::::::@::::::@:::::::                                                                                                            
    24     |#:: :@::: :@::: ::::::::::@:::@::::::::::::::::::@:::::::@::::::@:::::::                                                                                                            
    25     |#:: :@::: :@::: ::::::::::@:::@::::::::::::::::::@:::::::@::::::@:::::::
    26     |#:: :@::: :@::: ::::::::::@:::@::::::::::::::::::@:::::::@::::::@:::::::
    27   0 +----------------------------------------------------------------------->h
    28     0                                                                   2.506
    

    Here’s roughly what I’m thinking: ajtowns/bitcoin@202509-reobfus (commits)

    Multithreading is indeed a very good argument for chunking, thanks a lot for the patch, I’ll apply it soon and add you as coauthor!


    l0rinc commented at 7:02 am on September 20, 2025:
    Thanks again for the review, I have pushed a change to fix the CI and took a few suggestion from your branch (chunking, code simplifications), but kept the original file iteration with progress indicator for now. The parallelization complicates the situation considerably, I will see if I can find a simpler way or if single-threaded execution is also acceptable. Edit: grouped the block and undo files for more uniform iteration instead of shuffling

    l0rinc commented at 3:33 am on September 26, 2025:

    I have pushed a new version (rebased, extended test), let me know what you think.

    I have implemented a very simple multithreaded version but I couldn’t convince it to achieve any speedup whatsoever - I guess xor operations are a lot cheaper than disk reads/writes. The total CPU usage was at 20% even with 50 threads.

    I have pushed my threaded solution to https://github.com/l0rinc/bitcoin/pull/40/files#diff-b1e19192258d83199d8adaa5ac31f067af98f63554bfdd679bd8e8073815e69dR1361-R1379, but I kept the single-threaded version here.

  15. in src/init.cpp:1295 in aa587f3740 outdated
    1290+        return files;
    1291+    }};
    1292+    auto collect_block_files{[&]() -> std::set<fs::path> {
    1293+        std::set<fs::path> files;
    1294+        while (true) {
    1295+            if (auto f{m_blockman.GetBlockPosFilename(FlatFilePos(files.size(), 0))}; fs::exists(f)) {
    


    ajtowns commented at 5:43 am on September 11, 2025:
    This doesn’t seem like it would work correctly with pruning, when blk0000.dat has been deleted?

    l0rinc commented at 3:58 am on September 12, 2025:
    Good call, changed it back to regex matching
  16. ajtowns commented at 5:46 am on September 11, 2025: contributor
    Having it be a startup option like -reindex seems fine to me.
  17. l0rinc force-pushed on Sep 12, 2025
  18. ajtowns commented at 7:32 am on September 15, 2025: contributor
    tidy wants emplace_back over push_back
  19. luke-jr commented at 1:29 am on September 18, 2025: member

    I agree a separate utility for this seems better - this requires very little of the existing codebase, in theory.

    Also suggest making the files with the same names, but in a new directory, and then atomically rename the directory when complete, rather than every single file.

  20. l0rinc commented at 1:49 am on September 18, 2025: contributor

    I agree a separate utility for this seems better

    Can you quote what you’re agreeing with specifically, not sure who suggested that. Besides, @andrewtoth already has a tool for that, it was mentioned in the PR description.

    but in a new directory, and then atomically rename

    I will think about it, could make sense, but in that case unrelated files should also be copied over (maybe duplicated to be safe) - and listing the directory content wouldn’t make the progress obvious. What’s wrong with the current approach?

  21. l0rinc force-pushed on Sep 20, 2025
  22. l0rinc force-pushed on Sep 20, 2025
  23. l0rinc force-pushed on Sep 22, 2025
  24. l0rinc force-pushed on Sep 26, 2025
  25. l0rinc force-pushed on Oct 3, 2025
  26. l0rinc commented at 1:01 am on October 3, 2025: contributor
    Added kernel notifications (thanks @ryanofsky) and improved crash resistance at the very last step (final rename back to old names) - try it out with ./build/bin/bitcoin-qt -reobfuscate-blocks -stopatheight=1.
  27. l0rinc marked this as ready for review on Oct 3, 2025
  28. l0rinc renamed this:
    RFC: blocks: add `-reobfuscate-blocks` arg to xor existing blk/rev on startup
    blocks: add `-reobfuscate-blocks` arg to xor existing blk/rev on startup
    on Oct 3, 2025
  29. refactor: inline constant `f_obfuscate = false` parameter 1670fb14d0
  30. refactor: add path + string and file removal helpers f055e4585a
  31. init: add `-reobfuscate-blocks` argument 7b7df5ef26
  32. blocks: add `-reobfuscate-blocks` to xor existing blk/rev on startup
    ### Context
    
    Recent discussions highlighted that many nodes which synced before Bitcoin Core v28 have their block and undo files stored effectively in the clear (zero XOR key). This patch adds a simple, resumable maintenance tool to obfuscate previously raw block files, rotate an existing key to a fresh random one, or deobfuscate (set key to zero) if consciously chosen, all without requiring resync. The operation can be cancelled and restarted safely.
    
    ### Implementation
    
    The new startup option `-reobfuscate-blocks[=VALUE]` accepts either 16 hex characters as an exact 8-byte XOR key (little-endian in-memory layout) or a boolean to generate a random 64-bit key. e.g. `-reobfuscate-blocks=0000000000000000` sets the key to zero, effectively removing obfuscation.
    
    If we detect unobfuscated blocks at start time, we suggest this new option in a warning.
    
    At startup, we iterate over all undo and block files (grouping the block and undo files for more uniform iteration), read them with the old XOR key and write them back with the new key (`<name>.reobfuscated`). The implementation actually combines the two keys and reads directly into the new obfuscated version to only do a single iteration over the data. This works if the original blocks aren't obfuscated or if the new blocks aren't or if both are.
    After successful write, we immediately delete the old file. Once all files are staged, we rename them back and atomically swap `xor.dat.reobfuscated` → `xor.dat` and continue operation.
    
    We log the old and new keys and print progress roughly per-percent as undo and block files complete (i.e. max 2 * 100 progress logs).
    
    ### Constraints
    
    * Reobfuscation resumes automatically (detected via `xor.dat.reobfuscated`) even without the flag. In worst-case a crash should only force us to redo previous work.
    * Single-threaded, processing one file at a time to keep code simple and avoid complexity of interleaving renames and key swaps across threads.
    * Fast in practice with sequential read/modify/write per blockfile - after recent obfuscation vectorization, this path is very quick.
    
    ### Reproducer
    
    ```bash
    cmake -B build -DENABLE_IPC=ON -DBUILD_GUI=ON
    cmake --build build -j$(nproc)
    ./build/bin/bitcoind -reobfuscate-blocks -stopatheight=1
    ```
    
    ### Single-threaded Performance
    
     cpu                 | hdd/ssd | block count | size    | files  | time (min) | blocks/min
    ---------------------|---------|-------------|---------|--------|------------|------------
     Apple M4 Max laptop | SSD     | ~909k       | ~707 GB | 9,982  | 8.4        | 146,613
     Intel Core i9       | SSD     | ~909k       | ~725 GB | 10,238 | 23.1       | 39,351
     Raspberry Pi 5      | SSD     | ~914k       | ~728 GB | 10,276 | 72.78      | 12,558
     Intel Core i7       | HDD     | ~909k       | ~720 GB | 10,156 | 208.7      | 4,356
     Raspberry Pi 4B     | HDD     | ~915k       | ~730 GB | 10,304 | 1467       | 624
    
    -----
    
    Similar work: #32451 and andrewtoth/blocks-xor
    
    Co-authored-by: Andrew Toth <andrewstoth@gmail.com>
    Co-authored-by: Murch <murch@murch.one>
    Co-authored-by: Anthony Towns <aj@erisian.com.au>
    79a19aeaaf
  33. gui: add kernel notifications for reobfuscation progress
    ### Reproducer
    
    ```bash
    cmake -B build -DENABLE_IPC=ON -DBUILD_GUI=ON
    cmake --build build -j$(nproc)
    ./build/bin/bitcoin-qt -reobfuscate-blocks -stopatheight=1
    ```
    
    Co-authored-by: Ryan Ofsky <ryan@ofsky.org>
    d1f2cfc817
  34. l0rinc force-pushed on Dec 10, 2025
  35. l0rinc renamed this:
    blocks: add `-reobfuscate-blocks` arg to xor existing blk/rev on startup
    blocks: add `-reobfuscate-blocks` argument to enable (de)obfuscating existing blocks
    on Dec 11, 2025

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-12-13 21:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me