blockstorage: XOR blocksdir *.dat files #28052

pull maflcko wants to merge 2 commits into bitcoin:master from maflcko:2306-fs_stuff- changing 9 files +89 −12
  1. maflcko commented at 8:07 am on July 8, 2023: member

    Currently the *.dat files in the blocksdir store the data received from remote peers as-is. This may be problematic when a program other than Bitcoin Core tries to interpret them by accident. For example, an anti-virus program or other program may scan them and move them into quarantine, or delete them, or corrupt them. This may cause Bitcoin Core to fail a reorg, or fail to reply to block requests (via P2P, RPC, REST, …).

    Fix this, similar to #6650, by rolling a random XOR pattern over the dat files when writing or reading them.

    Obviously this can only protect against programs that accidentally and unintentionally are trying to mess with the dat files. Any program that intentionally wants to mess with the dat files can still trivially do so.

    The XOR pattern is only applied when the blocksdir is freshly created, and there is an option to disable it (on creation), so that people can disable it, if needed.

  2. DrahtBot commented at 8:07 am on July 8, 2023: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage

    For detailed information about the code coverage, see the test coverage report.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK willcl-ark, TheCharlatan
    Concept ACK jamesob, sipa, dergoegge, darosior, kristapsk
    Stale ACK pablomartin4btc

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #29817 (kernel: De-globalize fReindex by TheCharlatan)
    • #28687 (C++20 std::views::reverse by stickies-v)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  3. maflcko renamed this:
    XOR blocksdir *.dat files
    blockstorage: XOR blocksdir *.dat files
    on Jul 8, 2023
  4. DrahtBot added the label Block storage on Jul 8, 2023
  5. DrahtBot added the label CI failed on Jul 8, 2023
  6. maflcko force-pushed on Jul 8, 2023
  7. jamesob commented at 5:19 pm on July 8, 2023: member
    Concept ACK. Certainly worth doing.
  8. sipa marked this as ready for review on Jul 8, 2023
  9. sipa commented at 5:28 pm on July 8, 2023: member

    Oops, I clicked wrong.

    I just meant to Concept ACK.

  10. maflcko marked this as a draft on Jul 9, 2023
  11. luke-jr commented at 6:02 pm on July 9, 2023: member
    AFAIK there is software which reads these files intentionally. IMO they can be expected to adapt, but having the option to disable it seems like a good idea in case they aren’t ready right away.
  12. TheCharlatan commented at 8:30 am on July 10, 2023: contributor
    Strong Concept ACK.
  13. maflcko force-pushed on Jul 10, 2023
  14. maflcko commented at 10:21 am on July 10, 2023: member

    Thanks for the feedback so far. I’ve kept the option to disable, but made it a simple bool instead of allowing the user to pick the XOR-key.

    If someone is interested, a new commit can be added to upgrade the linearize scripts to read the XOR-key.

  15. DrahtBot added the label Needs rebase on Jul 11, 2023
  16. maflcko force-pushed on Jul 12, 2023
  17. DrahtBot removed the label Needs rebase on Jul 12, 2023
  18. dergoegge commented at 12:19 pm on July 13, 2023: member
    Concept ACK
  19. DrahtBot added the label Needs rebase on Jul 31, 2023
  20. maflcko force-pushed on Aug 1, 2023
  21. maflcko commented at 2:06 pm on August 1, 2023: member
    Rebased on the latest #https://github.com/bitcoin/bitcoin/pull/28060
  22. DrahtBot removed the label Needs rebase on Aug 1, 2023
  23. fanquake referenced this in commit ceda819886 on Aug 1, 2023
  24. DrahtBot removed the label CI failed on Aug 1, 2023
  25. in test/functional/feature_reindex.py:44 in f0d2c3f73e outdated
    39@@ -39,9 +40,18 @@ def out_of_order(self):
    40         # we're generating them rather than getting them from peers), so to
    41         # test out-of-order handling, swap blocks 1 and 2 on disk.
    42         blk0 = self.nodes[0].blocks_path / "blk00000.dat"
    43+        with open(self.nodes[0].blocks_path / "xor.key", "rb") as xor_f:
    44+            xor_dat = deser_string(xor_f)
    


    maflcko commented at 4:39 pm on August 1, 2023:

    I guess it may be simpler to serialize and deserialize the xor key as a raw byte span (without the size indicator). This should make it even easier to read in external scripts, for example in vanilla python.

    Edit: And readers can assume it to be constant size. For example:

    0NUM_XOR_BYTES{8}
    

    maflcko commented at 8:06 am on August 2, 2023:
    Done
  26. maflcko force-pushed on Aug 1, 2023
  27. darosior commented at 7:23 am on August 2, 2023: member
    Concept ACK.
  28. maflcko marked this as ready for review on Aug 2, 2023
  29. maflcko force-pushed on Aug 2, 2023
  30. maflcko force-pushed on Aug 2, 2023
  31. maflcko commented at 8:25 am on August 2, 2023: member
    Rebased, and made every commit a refactor, except for the one that has release notes and test changes :)
  32. willcl-ark commented at 1:53 pm on August 2, 2023: member

    Concept ACK.

    I’ve not reviewed the code in detail yet, but this is a nice followup to #28060.

    It seems to work well so far. I wrote a quick-n-dirty python script to manually xor my data dir using the same 8 byte key from this pull, and Bitcoin Core at commit fa3c1d230a didn’t have any problems reading the xored blocks or key afterwards.

  33. sidhujag referenced this in commit 3a522e929e on Aug 9, 2023
  34. DrahtBot added the label CI failed on Sep 10, 2023
  35. maflcko force-pushed on Sep 11, 2023
  36. kristapsk commented at 11:29 am on September 11, 2023: contributor
    Concept ACK
  37. DrahtBot removed the label CI failed on Sep 11, 2023
  38. fanquake referenced this in commit 8209e86eeb on Sep 14, 2023
  39. DrahtBot added the label Needs rebase on Sep 14, 2023
  40. maflcko force-pushed on Sep 14, 2023
  41. DrahtBot removed the label Needs rebase on Sep 14, 2023
  42. DrahtBot added the label CI failed on Sep 14, 2023
  43. maflcko force-pushed on Sep 14, 2023
  44. maflcko commented at 3:14 pm on September 14, 2023: member
    Given that most of this is refactoring, which makes sense on its own, I’ve split that up into #28483 and turned this pull into a draft for now.
  45. maflcko marked this as a draft on Sep 14, 2023
  46. DrahtBot removed the label CI failed on Sep 14, 2023
  47. fanquake referenced this in commit c9f288244b on Sep 26, 2023
  48. DrahtBot added the label Needs rebase on Sep 26, 2023
  49. maflcko marked this as ready for review on Sep 28, 2023
  50. maflcko force-pushed on Sep 28, 2023
  51. DrahtBot added the label CI failed on Sep 28, 2023
  52. DrahtBot removed the label Needs rebase on Sep 28, 2023
  53. DrahtBot removed the label CI failed on Sep 28, 2023
  54. maflcko force-pushed on Oct 4, 2023
  55. pablomartin4btc commented at 7:55 pm on October 26, 2023: member

    tACK fa9f3f70011842e2c4358d15ab4639734d976f5c

     02023-10-26T18:07:06Z init message: Verifying blocks…
     12023-10-26T18:07:06Z Verifying last 6 blocks at level 3
     22023-10-26T18:07:06Z Verification progress: 0%
     32023-10-26T18:07:06Z ERROR: ReadBlockFromDisk: Deserialize or I/O error - ReadCompactSize(): size too large: iostream error at FlatFilePos(nFile=5, nPos=62189381)
     42023-10-26T18:07:06Z Verification error: ReadBlockFromDisk failed at 160000, hash=0000003ca3c99aff040f2563c2ad8f8ec88bd0fd6b8f0895cfaf1ef90353a62c
     52023-10-26T18:07:06Z : Corrupted block database detected.
     6Please restart with -reindex or -reindex-chainstate to recover.
     7: Corrupted block database detected.
     8Please restart with -reindex or -reindex-chainstate to recover.
     92023-10-26T18:07:06Z Aborted block database rebuild. Exiting.
    102023-10-26T18:07:06Z Shutdown: In progress...
    112023-10-26T18:07:06Z scheduler thread exit
    122023-10-26T18:07:06Z Flushed fee estimates to fee_estimates.dat.
    132023-10-26T18:07:06Z Shutdown: done
    

    2023-10-26T18:12:03Z Using xor key for *.dat files: '0000000000000000'

    Now, considering the possibility of the XOR key file getting corrupted (for example, accidentally opening it with an app that offers to ‘fix’ a corruption) or being deleted, especially on a fresh run when a random XOR key is generated (which I’ve also tested), wouldn’t it be better to save the key in the .conf or settings?

    Before having that thought, I was considering proposing to give the option to the user to specify the filename, but that would add complexity, and I’m not sure if it’s worth it.

  56. DrahtBot requested review from sipa on Oct 26, 2023
  57. DrahtBot requested review from jamesob on Oct 26, 2023
  58. DrahtBot requested review from willcl-ark on Oct 26, 2023
  59. DrahtBot requested review from dergoegge on Oct 26, 2023
  60. DrahtBot requested review from darosior on Oct 26, 2023
  61. DrahtBot requested review from TheCharlatan on Oct 26, 2023
  62. maflcko commented at 8:23 am on October 27, 2023: member

    wouldn’t it be better to save the key in the .conf or settings?

    The key is not a configuration or setting that is supposed to be modified by a user or that can be changed at all. Once it is persisted to storage, it is set in stone for all (future) dat files.

  63. DrahtBot added the label Needs rebase on Oct 30, 2023
  64. maflcko force-pushed on Oct 30, 2023
  65. DrahtBot removed the label Needs rebase on Oct 30, 2023
  66. pablomartin4btc commented at 4:48 pm on October 30, 2023: member
    re ACK fa6b180beb20d64962c2b5440eeacb31ed889a81
  67. maflcko force-pushed on Nov 22, 2023
  68. lrs955 commented at 12:31 pm on November 23, 2023: none

    why not save the xor key inside the leveldb block database?

    wouldn’t it be better to save the key in the .conf or settings?

    The key is not a configuration or setting that is supposed to be modified by a user or that can be changed at all. Once it is persisted to storage, it is set in stone for all (future) dat files.

  69. maflcko commented at 12:34 pm on November 23, 2023: member

    why not save the xor key inside the leveldb block database?

    Just seems more complicated with no benefit?

  70. lrs955 commented at 6:17 pm on November 23, 2023: none

    why not save the xor key inside the leveldb block database?

    Just seems more complicated with no benefit?

    I understand. Maybe im wrong but isn’t also the xor pattern of the chainstate saved in the database itself? Different leveldb database but still in a key a database.

  71. DrahtBot added the label CI failed on Dec 13, 2023
  72. DrahtBot removed the label CI failed on Dec 14, 2023
  73. DrahtBot added the label Needs rebase on Dec 14, 2023
  74. Return XOR CAutoFile from BlockManager::Open*File()
    This is a refactor, because the XOR key is empty.
    fa333f733d
  75. maflcko force-pushed on Mar 22, 2024
  76. DrahtBot removed the label Needs rebase on Mar 22, 2024
  77. willcl-ark commented at 2:15 pm on March 22, 2024: member
    While you’re re-touching, I think it could be worth adding an entry to doc/files.md too, especially as losing/not backing-up this file results in requiring a full reindex?
  78. maflcko commented at 3:03 pm on March 22, 2024: member

    especially as losing/not backing-up this file results in requiring a full reindex?

    I don’t think this is true. The xor key can trivially be restored by xor-ing the first 8 bytes of the first block file with the first 8 bytes of the first block file of a datadir created with the xor disabled. This works, because the first block written should be the genesis block, always.

    In python:

    0>>> bytes(a ^ b for a,b in zip(open('blk00000.dat', 'rb').read()[:8], open('clean-blk00000.dat', 'rb').read()[:8])).hex()
    1'228b70bf9c1abb3f'
    2>>> open('xor.key', 'rb').read().hex()  # double-check
    3'228b70bf9c1abb3f'
    
  79. Add -blocksxor boolean option faaf3cf535
  80. maflcko force-pushed on Mar 22, 2024
  81. maflcko commented at 4:12 pm on March 22, 2024: member

    doc/files.md

    Done

  82. in src/node/blockstorage.cpp:1174 in faaf3cf535
    1169+    if (!fs::is_empty(opts.blocks_dir) || opts.xor_key == false) std::fill(xor_key.begin(), xor_key.end(), std::byte{});
    1170+    const fs::path xor_key_path{opts.blocks_dir / "xor.dat"};
    1171+    if (fs::exists(xor_key_path)) {
    1172+        // A pre-existing xor key file has priority.
    1173+        AutoFile xor_key_file{fsbridge::fopen(xor_key_path, "rb")};
    1174+        xor_key_file >> xor_key;
    


    willcl-ark commented at 9:25 am on May 1, 2024:

    Jjust noting here that if the user specifies -blocksxor=01234567 and a file-based key already exists in the datadir, the key from the file is silently used.

    This is the correct behaviour IMO, but wonder if we should log that the user-set option has been overridden?

    As @maflcko pointed out to me previously, even if a user mistakenly thought they were using a custom xor key set as a CL option and removed the key file, they could still restore it. So I don’t see any dangers with this current behaviour.


    maflcko commented at 9:52 am on May 1, 2024:

    -blocksxor=01234567

    The option is boolean, so I guess it could throw here. In any case, the setting won’t have an effect on an existing blocksdir, even on -reindex. Maybe the manpage can be adjusted to clarify this?

  83. willcl-ark approved
  84. willcl-ark commented at 9:38 am on May 1, 2024: member

    ACK faaf3cf535999c2c9a84337414c8b35435384862

    Nice change with the same rationale as the previous mempool XOR changes (preventing anti-virus falsely flagging/modifying block files).

    One non-blocking comment left as a result of manual testing.

  85. DrahtBot requested review from pablomartin4btc on May 1, 2024
  86. in test/functional/feature_reindex.py:42 in faaf3cf535
    38@@ -39,9 +39,19 @@ def out_of_order(self):
    39         # we're generating them rather than getting them from peers), so to
    40         # test out-of-order handling, swap blocks 1 and 2 on disk.
    41         blk0 = self.nodes[0].blocks_path / "blk00000.dat"
    42+        with open(self.nodes[0].blocks_path / "xor.dat", "rb") as xor_f:
    


    TheCharlatan commented at 3:05 pm on May 11, 2024:
    Just a comment: This test implicitly checks if the blockfiles are XORed. Would a simple additional check on the undo data make sense?
  87. in src/node/blockstorage.cpp:1188 in faaf3cf535
    1183+            strprintf("The blocksdir XOR-key can not be disabled when a random key was already stored! "
    1184+                      "Stored key: '%s', stored path: '%s'.",
    1185+                      HexStr(xor_key), fs::PathToString(xor_key_path)),
    1186+        };
    1187+    }
    1188+    LogInfo("Using xor key for *.dat files: '%s'\n", HexStr(xor_key));
    


    TheCharlatan commented at 7:24 pm on May 11, 2024:

    Nit: This prints as:

    0Using xor key for *.dat files: '0000000000000000'`
    

    But the log lines informing the user on data obfuscation in the other DBs use:

    0Using obfuscation key for /home/drgrid/.bitcoin/blocks/index: 0000000000000000
    1Using obfuscation key for /home/drgrid/.bitcoin/chainstate: a3b154cf1d9714d7
    

    I think it would be nice to be consistent here, both by listing the full directory and using the same text.

  88. in src/node/blockstorage.cpp:1169 in faaf3cf535
    1164+    std::array<std::byte, 8> xor_key{};
    1165+
    1166+    // Initialize random fresh key, ...
    1167+    FastRandomContext{}.fillrand(xor_key);
    1168+    // ... but allow legacy or user-disabled key.
    1169+    if (!fs::is_empty(opts.blocks_dir) || opts.xor_key == false) std::fill(xor_key.begin(), xor_key.end(), std::byte{});
    


    TheCharlatan commented at 7:48 pm on May 11, 2024:

    Nit: I found this a bit hard to follow and a bit unfortunate that a new random key would be generated every time whether it were used or not. How about:

     0@@ -1166,4 +1165,0 @@ static auto InitBlocksdirXorKey(const BlockManager::Options& opts)
     1-    // Initialize random fresh key, ...
     2-    FastRandomContext{}.fillrand(xor_key);
     3-    // ... but allow legacy or user-disabled key.
     4-    if (!fs::is_empty(opts.blocks_dir) || opts.xor_key == false) std::fill(xor_key.begin(), xor_key.end(), std::byte{});
     5@@ -1171 +1167,8 @@ static auto InitBlocksdirXorKey(const BlockManager::Options& opts)
     6-    if (fs::exists(xor_key_path)) {
     7+    bool key_exists = fs::exists(xor_key_path);
     8+
     9+    if (!key_exists && opts.xor_key && fs::is_empty(opts.blocks_dir)) {
    10+        // Initialize random fresh key, ...
    11+        FastRandomContext{}.fillrand(xor_key);
    12+    }
    13+
    14+    if (key_exists) {
    
  89. TheCharlatan approved
  90. TheCharlatan commented at 7:50 pm on May 11, 2024: contributor
    ACK faaf3cf535999c2c9a84337414c8b35435384862
  91. DrahtBot commented at 11:56 am on May 20, 2024: contributor

    🐙 This pull request conflicts with the target branch and needs rebase.

  92. DrahtBot added the label Needs rebase on May 20, 2024

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-06-29 07:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me