contrib: add xor-blocks tool to obfuscate blocks directory #32451

pull andrewtoth wants to merge 1 commits into bitcoin:master from andrewtoth:xor-blocks changing 4 files +225 −0
  1. andrewtoth commented at 6:02 pm on May 8, 2025: contributor

    I wrote a tool in Rust to xor the blocks directory with a random key. It was pointed out to me that there already exists some Rust code in contrib, so this might be a welcome addition to the toolkit here.

    This lets you obfuscate the blocks blk.dat and rev.dat files if you synced with a version prior to v28.

    It checks if a xor.dat file exists, and if it is zero it overwrites it with a non-zero random key. It then goes through each *.dat file in the blocks directory, checking if the first 4 bytes are the magic bytes. If so it reads the whole file into memory, xors all bytes with the key, then writes to a temporary file. It then renames the temporary file to the dat file it xor’d. This lets users safely run this on any blocks directory, as long as they let it completely finish once before starting bitcoind.

  2. DrahtBot commented at 6:03 pm on May 8, 2025: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/32451.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    Concept ACK 1440000bytes

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

  3. DrahtBot added the label Scripts and tools on May 8, 2025
  4. bitcoin deleted a comment on May 8, 2025
  5. in contrib/xor-blocks/src/main.rs:122 in dc41366454 outdated
    117+
    118+    let mut block = fs::read(path)?;
    119+    block
    120+        .iter_mut()
    121+        .enumerate()
    122+        .for_each(|(i, b)| *b ^= key[i % key.len()]);
    


    l0rinc commented at 8:12 pm on May 8, 2025:
    To speed this up significantly, we could probably process the data in 8 byte chunks, cast it to 64 bits, xor it with a 64 bit key and do the last bytes (if any) byte-by-byte against the current vector key - similarly to https://github.com/bitcoin/bitcoin/pull/31144

    Kixunil commented at 10:56 pm on May 12, 2025:
    Yeah, Rust even has chunks_exact method for this (and the rest can be xored after the loop). It should also auto-vectorize.

    andrewtoth commented at 11:06 pm on May 17, 2025:
    Done. I went one further and use Rust’s u128, so we double the key and xor 16 bytes at a time.
  6. Sjors commented at 10:14 am on May 9, 2025: member

    as long as they let it completely finish once before starting bitcoind

    That seems likely to cause issues. It seems better if we track the height at which xor starts. A script could then work backwards and lower the height. And in that case it might as well be an RPC command.

  7. laanwj commented at 10:40 am on May 9, 2025: member

    It seems better if we track the height at which xor starts

    i really like this idea. “Start XORing from now on” as a default would be enough to prevent new problems, and potentially there could be a background process that updates historical blocks (which would be way less performance critical as no one is waiting for it).

    That said it’s more complicated and more error-prone, i understand why the decision was made to do all or nothing.

  8. in contrib/xor-blocks/src/main.rs:14 in dc41366454 outdated
     9+
    10+fn main() -> Result<(), io::Error> {
    11+    let start = Instant::now();
    12+
    13+    let blocks_path = if env::args().len() > 1 {
    14+        env::args().nth(1).expect("the arg to exist").into()
    


    LarryRuane commented at 5:14 pm on May 10, 2025:
    Consider accepting this argument in the same way it’s specified with bitcoind and bitcoin-cli, that is, -datadir=<dir>. Besides being consistent with these existing executables, it would also be cleaner if more arguments were added in the future (because the data directory wouldn’t be specified in a different way from other arguments).

    Kixunil commented at 10:39 pm on May 12, 2025:

    Sadly, that’s not really that easy because of damn Windows non-compliant UTF-16. I wrote a crate for it called parse_arg but it’s soft-deprecated there. Feel free to copy-paste or just depdend on it. Maybe I’ll end up putting it into another dedicated crate.

    But what I’d like to see here is having proper usage page if no arguments are provided. clap or configure_me can do that but for a simple tool a bit of manual code is likely fine.


    andrewtoth commented at 11:05 pm on May 17, 2025:

    I added checking for -datadir= and --datadir=, but not datadir= or ---datadir=. Just like bitcoind :).

    I’d like to see here is having proper usage page if no arguments are provided

    If no args are provided, it just picks up the default datadir. IMO that’s the best behavior. If you give more than 1 arg it prints out a usage instruction as an error. Is that what you had in mind?

  9. LarryRuane commented at 5:24 pm on May 10, 2025: contributor

    This is a great tool, thanks for implementing it!

    Consider obtaining the .lock file in the data directory in the same way bitcoind does. This will prevent bitcoind from starting if this tool is running, and vice versa.

    Just to illustrate, if I try to start a second bitcoind, I get this error, so maybe do the same thing here.

    0$ bitcoind
    1Error: Cannot obtain a lock on directory /home/larry/.bitcoin. Bitcoin Core is probably already running.
    
  10. LarryRuane commented at 5:35 pm on May 10, 2025: contributor

    as long as they let it completely finish once before starting bitcoind

    That seems likely to cause issues. It seems better if we track the height at which xor starts. A script could then work backwards and lower the height. And in that case it might as well be an RPC command.

    I have an idea for a follow-up PR (haven’t started implementing yet) which would cause bitcoind to, upon opening a block file, look at the first 4 bytes, and if it’s MAGIC, then don’t xor anything while reading or writing to that file. But if xor-ing with xor.dat yields MAGIC, then do the xor when reading or writing the file. That is, make the xor per-file. Since this PR makes each file update atomic (by writing to a temp file then doing a rename, rather than rewriting the file in-place), each file will either be fully xored or not. So I think that would work.

    That way, one could start this xor-blocks tool, interrupt it at any time (for example, suppose you unexpectedly need to use the node immediately), and safely start bitcoind. Later, you can shut down bitcoind and restart xor-blocks and let it finish. It wouldn’t even have to depend on converting the blocks files in order, nor keep track of that separate state (height or block file number), which seems more robust.

  11. 1440000bytes commented at 10:03 pm on May 11, 2025: none

    Concept ACK

    Could be a useful tool for paranoid users who can run this tool.

    Is it possible to add this feature in bitcoin core itself?

  12. in contrib/xor-blocks/src/main.rs:60 in dc41366454 outdated
    11+    let start = Instant::now();
    12+
    13+    let blocks_path = if env::args().len() > 1 {
    14+        env::args().nth(1).expect("the arg to exist").into()
    15+    } else {
    16+        #[allow(deprecated)]
    


    Kixunil commented at 10:41 pm on May 12, 2025:
    Have you tested that this picks the right one on Windows? That may be a good reason for the attribute.

    andrewtoth commented at 11:03 pm on May 17, 2025:

    I have unfortunately not tested on Windows. If anyone could test I would be grateful. If this is the wrong home dir though then a user can just specify it manually using the -datadir= option.

    It seems this function will be undeprecated in a future release https://github.com/rust-lang/rust/issues/132650.


    1440000bytes commented at 6:39 pm on May 18, 2025:

    If this is the wrong home dir though then a user can just specify it manually using the -datadir= option.

    I tested on Windows and it works with -datadir=

    1. Installed bitcoin core v27.0 to sync a few signet blocks
    2. Installed bitcoin core v29.0 to sync more blocks and create xor.dat file
    3. Tried this tool which works only with -datadir= set
     0> cargo run --release
     1    Finished `release` profile [optimized] target(s) in 0.11s
     2     Running `target\release\blocks-xor.exe`
     3Error: Error { msg: "The system cannot find the path specified. (os error 3)" }
     4error: process didn't exit successfully: `target\release\blocks-xor.exe` (exit code: 1)
     5
     6> cargo run --release -- -datadir="C:\Users\test\AppData\Roaming\Bitcoin\signet"
     7    Finished `release` profile [optimized] target(s) in 0.02s
     8     Running `target\release\blocks-xor.exe -datadir=C:\Users\test\AppData\Roaming\Bitcoin\signet`
     9Obfuscating blocks dir. Do not start bitcoind until finished!
    10Done in 0 seconds! Blocksdir is now obfuscated.
    

    andrewtoth commented at 6:47 pm on May 18, 2025:
    @1440000bytes thanks for testing! Unfortunately this tool will only work for mainnet. It looks like I also have to rename Local to Roaming in the Windows default datadir.

    1440000bytes commented at 7:00 pm on May 18, 2025:

    I think new bitcoin core versions use local for data directory. So, you will have to support both local and roaming in this tool.

    Yes, it works for mainnet. I re-tested.


    andrewtoth commented at 7:02 pm on May 18, 2025:
    @1440000bytes do you know which version switched to Local? If it was 28 or 29 then we only have to support Roaming.

    1440000bytes commented at 7:13 pm on May 18, 2025:
  13. in contrib/xor-blocks/src/main.rs:25 in dc41366454 outdated
    20+                "macos" => "Library/Application Support/Bitcoin",
    21+                "windows" => "AppData\\Local\\Bitcoin",
    22+                "linux" => ".bitcoin",
    23+                _ => {
    24+                    println!("Unknown OS");
    25+                    return Ok(());
    


    Kixunil commented at 10:41 pm on May 12, 2025:
    This is an error and should return non-zero exit code.

    andrewtoth commented at 11:01 pm on May 17, 2025:
    Done.
  14. in contrib/xor-blocks/src/main.rs:36 in dc41366454 outdated
    31+    let paths = fs::read_dir(&blocks_path)?
    32+        .map(|res| res.map(|e| e.path()))
    33+        .collect::<Result<Vec<_>, io::Error>>()?;
    34+
    35+    let xor_path: std::path::PathBuf = blocks_path.join("xor.dat");
    36+    if !fs::exists(&xor_path)? {
    


    Kixunil commented at 10:43 pm on May 12, 2025:

    This function is really bad and should’ve been deprecated, you should use try_exists instead.

    But also, why do this extra check? Just read already does the same thing with less code and less problems.


    andrewtoth commented at 11:01 pm on May 17, 2025:
    Done. We just use read now.
  15. in contrib/xor-blocks/src/main.rs:98 in dc41366454 outdated
    49+    }
    50+    let key = if key == [0u8; 8] {
    51+        loop {
    52+            let key: [u8; 8] = rand::random();
    53+            // Don't use keys with 4 bytes of leading zeros
    54+            // They won't let us detect the first 4 bytes of magic in the files
    


    Kixunil commented at 10:46 pm on May 12, 2025:
    Why though? After “unxoring” the magic bytes just will be the same. Anyway, might still be a good idea to mask it better.

    l0rinc commented at 11:50 am on May 13, 2025:

    we would reapply the xor on restart in that case, since we wouldn’t detect that we’ve processed it already.

    Alternatively we could keep a counter of processed blocks - modify the xor.dat to make sure we can’t start bitcoind (or @LarryRuane’s lock would probably also do the same) and add a height of how many blocks we’ve processed so far (though that would make parallelization more difficult). When we’re done we’d restore the xor.dat file (which we have to modify anyway if we’re xor-ing)


    Kixunil commented at 12:04 pm on May 13, 2025:
    Oh, I see, that’s clever! But then yes, this code tries to be atomic but it isn’t because it doesn’t call sync_data.
  16. in contrib/xor-blocks/src/main.rs:64 in dc41366454 outdated
    59+        }
    60+    } else {
    61+        key
    62+    };
    63+
    64+    fs::write(xor_path, key)?;
    


    Kixunil commented at 10:48 pm on May 12, 2025:
    BTW, this is not atomic. Does it matter? I don’t really know. Ideally the tool could continue if it fails but that’s not easy. Still, at least making sure this one doesn’t get corrupted may be important?

    andrewtoth commented at 11:00 pm on May 17, 2025:
    Fixed, we don’t overwrite the key now if we didn’t create a new one. We also call sync_data after writing the new random one.
  17. in contrib/xor-blocks/src/main.rs:77 in dc41366454 outdated
    72+        if let Err(e) = xor_block(&path, key) {
    73+            println!(
    74+                "Error xor-ing file {:?}: {e:?}",
    75+                path.iter()
    76+                    .next_back()
    77+                    .expect("path to have a last component")
    


    Kixunil commented at 10:49 pm on May 12, 2025:
    Why not path.display()?

    andrewtoth commented at 10:59 pm on May 17, 2025:
    Done.
  18. in contrib/xor-blocks/src/main.rs:112 in dc41366454 outdated
    107+        return Ok(());
    108+    }
    109+
    110+    let mut file = fs::File::open(path)?;
    111+    let mut buf = [0u8; 4];
    112+    file.read_exact(&mut buf)?;
    


    Kixunil commented at 10:51 pm on May 12, 2025:
    Why not reuse the file handle later and perhaps take advantage of BufReader?

    andrewtoth commented at 10:59 pm on May 17, 2025:
    Done.
  19. in contrib/xor-blocks/src/main.rs:118 in dc41366454 outdated
    113+
    114+    if buf != MAGIC {
    115+        return Ok(());
    116+    }
    117+
    118+    let mut block = fs::read(path)?;
    


    Kixunil commented at 10:55 pm on May 12, 2025:
    This loads the entire block file into memory. I’m not sure if such memory usage is desirable.

    l0rinc commented at 11:46 am on May 13, 2025:
    We’re regularly loading blocks into memory - if we want any performance out of this, we definitely want that in my experience

    Kixunil commented at 12:06 pm on May 13, 2025:
    Could also do it with like 4096B blocks at the time. I don’t see how it would be slower since that’s what read has to do anyway. (Though perhaps using BufReader would be better.)

    andrewtoth commented at 10:59 pm on May 17, 2025:
    Done. I use BufReader and read 16 bytes at a time. Memory usage is much lower. Speed doesn’t seem to be impacted.
  20. in contrib/xor-blocks/src/main.rs:191 in dc41366454 outdated
    122+        .for_each(|(i, b)| *b ^= key[i % key.len()]);
    123+
    124+    let mut tmp_path = path.as_os_str().to_owned();
    125+    tmp_path.push(".tmp");
    126+    fs::write(&tmp_path, block)?;
    127+    fs::rename(&tmp_path, path)?;
    


    Kixunil commented at 10:57 pm on May 12, 2025:
    Cool, this is almost atomic. I say almost because you’re not calling sync_data (because you’re not using file handle - I think you should).

    andrewtoth commented at 10:58 pm on May 17, 2025:
    I use BufWriter now and call flush. That should call sync_data under the hood I presume?

    andrewtoth commented at 2:54 pm on May 18, 2025:
    Unfortunately it does not. I fixed with writer.into_inner()?.sync_data()?;. The into_inner will flush the writer before returning.
  21. in contrib/xor-blocks/src/main.rs:52 in dc41366454 outdated
    47+        println!("This script doesn't work with a non-zero key with 4 bytes of leading zeros");
    48+        return Ok(());
    49+    }
    50+    let key = if key == [0u8; 8] {
    51+        loop {
    52+            let key: [u8; 8] = rand::random();
    


    maflcko commented at 8:21 am on May 13, 2025:
    too bad simply getting a few bytes of os rand is still experimental and only in rust nightly. Maybe using the getrandom crate directly can cut down the dependencies?

    Kixunil commented at 12:00 pm on May 13, 2025:
    Oh, it’s actually very easy to get 8B of non-secure entropy and this is sufficient for masking: std::hash::DefaultHasher::new().finish().to_ne_bytes() So it can cut the dependency entirely.

    maflcko commented at 2:21 pm on May 13, 2025:

    std::hash::DefaultHasher::new().finish()

    Pretty sure this returns a constant. Maybe you wanted to say std::hash::RandomState::new().build_hasher().finish()?


    Kixunil commented at 8:33 pm on May 13, 2025:
    As I understand it it returns different values across restarts (which suffices here) but I may be wrong. The doc is confusing. Anyway, your solution doesn’t hurt.

    maflcko commented at 9:27 pm on May 13, 2025:

    The source says that this is a constant: https://doc.rust-lang.org/src/std/hash/random.rs.html#109-111

    0    pub fn new() -> DefaultHasher {
    1
    2        DefaultHasher(SipHasher13::new_with_keys(0, 0))
    3
    4    }
    

    whereas the one using RandomState uses hashmap_random_keys


    Kixunil commented at 9:31 pm on May 13, 2025:
    OK then, RandomState is definitely needed.

    andrewtoth commented at 10:57 pm on May 17, 2025:
    Done.
  22. maflcko commented at 8:25 am on May 13, 2025: member
    I wonder if -reindex could be taught to do this instead. This may be simpler for users to set. Also, it will be atomic, safe and interruptible. However, it will eat more CPU and thus take a longer time. I’d say this is probably fine, because the only users will be one-time users that have an old leftover datadir. The benefit on top would be that anyone doing a reindex for other reasons, likely will get an obfuscation “for free”.
  23. in contrib/xor-blocks/src/main.rs:41 in dc41366454 outdated
    36+    if !fs::exists(&xor_path)? {
    37+        println!("No xor.dat file. Make sure you are running Bitcoin Core v28 or higher.");
    38+        return Ok(());
    39+    }
    40+
    41+    println!("Xor'ing blocks dir. Do not start bitcoind until finished!");
    


    l0rinc commented at 11:37 am on May 13, 2025:
    xor is an implementation detail - we could make it slightly more user friendly and call it obfuscation instead

    andrewtoth commented at 10:56 pm on May 17, 2025:
    Replaced all logs of xor with obfuscation.
  24. in contrib/xor-blocks/src/main.rs:120 in dc41366454 outdated
    115+        return Ok(());
    116+    }
    117+
    118+    let mut block = fs::read(path)?;
    119+    block
    120+        .iter_mut()
    


    l0rinc commented at 11:45 am on May 13, 2025:
    would it help if we used .par_iter() here instead (not sure if that would require adding a new lib or not, but seems like a naturally parallelizable task)?

    Kixunil commented at 12:01 pm on May 13, 2025:
    Requires the rayon dependency. But I strongly suspect the entire thing is bottlenecked on IO.
  25. in contrib/xor-blocks/src/main.rs:38 in dc41366454 outdated
    33+        .collect::<Result<Vec<_>, io::Error>>()?;
    34+
    35+    let xor_path: std::path::PathBuf = blocks_path.join("xor.dat");
    36+    if !fs::exists(&xor_path)? {
    37+        println!("No xor.dat file. Make sure you are running Bitcoin Core v28 or higher.");
    38+        return Ok(());
    


    yancyribbens commented at 5:04 pm on May 13, 2025:
    This should also return a non-zero exit code and error.

    andrewtoth commented at 10:56 pm on May 17, 2025:
    Done.
  26. andrewtoth force-pushed on May 17, 2025
  27. andrewtoth commented at 10:55 pm on May 17, 2025: contributor

    Thank you @l0rinc @Sjors @laanwj @LarryRuane @1440000bytes @Kixunil @maflcko @yancyribbens for your reviews and suggestions!

    The latest version changes the following:

    • data is XOR’d 16 bytes at a time using the doubled key as a u128. This is done using pointer casting and unsafe dereferencing, so we don’t have to copy the data back and forth when converting [u8; 16] to u128.
    • A custom datadir option is now passed as -datadir=, similar to bitcoind.
    • Files are read 16 bytes at a time and the bytes are written to a temporary file immediately after XORing. This substantially reduces memory usage.
    • File::sync_data() is called before renaming to make the writing atomic.
    • The xor.dat key file is only written when overwriting a zero key, not every time the program is run. File::sync_data() is also called after writing.
    • The rand crate has been removed in favor of std::hash::RandomState. This removes all dependencies from Cargo.toml.
    • Errors are returned in appropriate places where success was returned previously.
    • Various other improvements to syntax.
  28. andrewtoth commented at 11:16 pm on May 17, 2025: contributor

    Consider obtaining the .lock file in the data directory in the same way bitcoind does. This will prevent bitcoind from starting if this tool is running, and vice versa. @LarryRuane great suggestion. Unfortunately file locking is still experimental in Rust (?), so users would have to run nightly to be able to run the program. I think we should wait until it’s stabilized instead of pulling in third party deps to do the locking.

    It seems better if we track the height at which xor starts

    i really like this idea. “Start XORing from now on” as a default would be enough to prevent new problems, and potentially there could be a background process that updates historical blocks (which would be way less performance critical as no one is waiting for it).

    That said it’s more complicated and more error-prone, i understand why the decision was made to do all or nothing. @Sjors @laanwj There are many edge cases to track here. If you had to service an RPC or p2p request for a block from a pre-xor’d file, you’d have to switch on the fly to xoring vs not. Similar if you got reorged from a block in an xor’d file to not. If you requested a pruned block from a peer and now store it in a newly xor’d file.

    I think all or nothing is a more sane approach for something that should only be a temporary solution (ideally everyone resyncs at some point).

    Is it possible to add this feature in bitcoin core itself? @1440000bytes Yes, I suppose it’s possible. It would be similar to how the UTXO db was migrated from per-tx values to per-output. It was a 10 minute or so migration when the UTXO set was much smaller. This tool unfortunately will take a while longer (20 minutes or so on my NVME, many hours for spinning disk drives).

    I wonder if -reindex could be taught to do this instead. @maflcko That would be convenient, but a reindex is a much longer task than this tool. So I don’t think “instead”, but rather “also”.

  29. andrewtoth force-pushed on May 17, 2025
  30. andrewtoth force-pushed on May 18, 2025
  31. andrewtoth force-pushed on May 18, 2025
  32. in contrib/xor-blocks/src/main.rs:173 in 15aeb390b8 outdated
    168+        .create(true)
    169+        .truncate(true)
    170+        .open(&tmp_path)?;
    171+    let mut writer = BufWriter::new(file);
    172+
    173+    let key_u128 = unsafe { *(&key as *const _ as *const u128) };
    


    maflcko commented at 3:04 pm on May 18, 2025:
    are you sure this is always safe from an alignment perspective? (IIUC bytes don’t have to be aligned, but u128 may have to). So it seems safer to make the storage u128 and then cast the other way (to u8)?

    andrewtoth commented at 4:19 pm on May 18, 2025:
    Right, a byte array is alignment of 1, u128 is 16. I removed the unsafe cast for the key, but we still use unsafe for casting the buffer. It seems to always be at an address aligned to 16, but not sure how we can guarantee that.

    andrewtoth commented at 4:55 pm on May 18, 2025:
    Ah I figured it out. Declare a u128 first then point to it as the buffer.
  33. andrewtoth force-pushed on May 18, 2025
  34. contrib: add xor-blocks tool to obfuscate blocks directory
    Co-Authored-By: Larry Ruane <larryruane@gmail.com>
    0c23055d77
  35. andrewtoth force-pushed on May 18, 2025

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-05-24 21:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me