addrman: Avoid crash on corrupt data, Force Check after deserialize #22734

pull MarcoFalke wants to merge 4 commits into bitcoin:master from MarcoFalke:2108-addrmanNoCrash changing 6 files +46 −49
  1. MarcoFalke commented at 7:16 am on August 18, 2021: member

    Assert should only be used for program internal logic errors, not to sanitize external user input.

    The assert was introduced via the debug-only runtime option -checkaddrman in commit 803ef70fd9f65ef800567ff9456fac525bc3e3c2, thus won’t need a backport.

    Also, it doesn’t really make sense to continue when the deserialized addrman doesn’t pass the sanity check.

    For example, if nLastSuccess is negative, it would later result in integer overflows. Thus, this patch fixes #22931.

    Also, Fixes #22503 Fixes #22504 Fixes #22519

    Closes #22498

    Steps to test:

    0mkdir -p /tmp/test_235/regtest/
    1echo 'H4sIAAAAAAAAA/u1f+stZmUGYgELgwPRakfBKBgFo2AUjIJRMApGwSgYBaNgFIyCUTBswdyGpFnLjUKjP9e0bvjYusl6b+L2e7Vs2dd6N//Pua0/xQUALJAn93IQAAA=' | base64 --decode | zcat > /tmp/test_235/regtest/peers.dat
    2./src/qt/bitcoin-qt -regtest -datadir=/tmp/test_235/ -checkaddrman=1 -printtoconsole | grep -A2 'Loading P2P addresses'
    

    Output before:

    02021-09-10T11:28:37Z init message: Loading P2P addresses
    12021-09-10T11:28:37Z ADDRMAN CONSISTENCY CHECK FAILED!!! err=-16
    2bitcoin-qt: addrman.cpp:765: void CAddrMan::Check() const: Assertion `false' failed.
    3
    4(program crashes)
    

    Output after:

    02021-09-10T11:26:00Z init message: Loading P2P addresses…
    12021-09-10T11:26:00Z Error: Invalid or corrupt peers.dat (Corrupt data. Consistency check failed with code -16: iostream error). If you believe this is a bug, please report it to https://github.com/bitcoin/bitcoin/issues. As a workaround, you can move the file ("/tmp/test_235/regtest/peers.dat") out of the way (rename, move, or delete) to have a new one created on the next start.
    2
    3(program exits)
    
  2. fanquake added the label P2P on Aug 18, 2021
  3. mzumsande commented at 8:27 am on August 18, 2021: member

    I agree that corrupt data shouldn’t assert.

    Check() used to be restricted to tests / use of the checkaddrman option, it was never run per default. This PR makes it run always for Unserialize(), so it introduces stricter rules for the level of inconsistencies we tolerate from peers.dat. I think this change of default behavior should be mentioned in the OP and maybe somewhere in the code, because it also introduces new possibilities of p2p addrman poisoning to make us drop our peers.dat - so we should be confident all checks in Check() are correct before enabling this.

  4. MarcoFalke renamed this:
    addrman: Avoid crash on corrupt data
    addrman: Avoid crash on corrupt data, Run Check after deserialize
    on Aug 18, 2021
  5. MarcoFalke renamed this:
    addrman: Avoid crash on corrupt data, Run Check after deserialize
    addrman: Avoid crash on corrupt data, Force Check after deserialize
    on Aug 18, 2021
  6. MarcoFalke commented at 8:30 am on August 18, 2021: member
    Thanks, renamed PR title
  7. mzumsande commented at 8:41 am on August 18, 2021: member
    Thanks! I think there is a bug in the addman_serdeser fuzz test (see CI fail) that would need fixing, see https://github.com/mzumsande/bitcoin/commit/f4840614c0b945460a833ba8e9d0da2df3ce80c3
  8. jnewbery commented at 9:22 am on August 18, 2021: member

    With this change, if we have a bug that causes us to write a peers.dat file that’s corrupt in any way, then it would mostly be undetected - the addrman would be cleared and startup would continue as normal. As @mzumsande points out, if it’s possible to trigger such a bug over p2p, then an adversary could cause victims to lose their entire addrman on restart.

    It might be safer to do the ForceCheck, but then catch the exception, rename peers.dat to peers.bak, and terminate with some error message like “Corrupt peers.dat file found. addrman database has been wiped and peers.dat has been renamed to peers.bak. Please report this to the Bitcoin Core developers.” That way, we’d get bug reports if there was a bug in our serialization code, and there may be a way to salvage the peers.bak file if needed.

  9. in src/addrman.h:475 in fab42d7dab outdated
    467@@ -468,7 +468,12 @@ class CAddrMan
    468             LogPrint(BCLog::ADDRMAN, "addrman lost %i new and %i tried addresses due to collisions or invalid addresses\n", nLostUnk, nLost);
    469         }
    470 
    471-        Check();
    472+        const int check_code{ForcedCheck()};
    473+        if (check_code != 0) {
    474+            throw std::ios_base::failure(strprintf(
    475+                "Corrupt data. Consistency check failed with code %s.",
    476+                check_code));
    


    vasild commented at 9:27 am on August 18, 2021:

    nit: maybe mention “addrman” somewhere in this message. The following looks too generic:

    02021-08-18T07:12:57Z ERROR: DeserializeDB: Deserialize or I/O error - Corrupt data. Consistency check failed with code -16.: iostream error
    

    nit: remove the dot after %s: with code -16.: iostream error


    MarcoFalke commented at 11:21 am on September 10, 2021:

    The new failure mentions the file name peers.dat. Is this enough?

    0Error: Invalid or corrupt peers.dat (Corrupt data. Consistency check failed with code -16.: iostream error). If you believe this is a bug, please report it to https://github.com/bitcoin/bitcoin/issues. As a workaround, you can move the file ("/tmp/bitcoin_func_test_ybw3djdd/node0/regtest/peers.dat") out of the way (rename, move, or delete) to have a new one created on the next start.
    

    MarcoFalke commented at 11:25 am on September 10, 2021:
    Thanks, removed dot.

    vasild commented at 11:37 am on September 10, 2021:
    peers.dat looks even better than addrman.
  10. vasild commented at 9:37 am on August 18, 2021: member

    Concept ACK fab42d7dab

    I think that the check should also be run (unconditionally) before serialize and if a problem is detected then don’t write the corrupted stuff on disk.

    Keeping the corrupted peers.dat on unserialize seems like a reasonable thing to do, but right now there are no tools to salvage it, so it would be useless. Maybe just stop the startup and ask the user to remove (or rename) the damaged peers.dat?

    Notice - if it is possible to remotely corrupt peers.dat, then currently master is even worse than this PR - reading a corrupted peers.dat and keep operating with a “corrupted” addrman could lead to undefined behavior.

  11. naumenkogs commented at 6:22 pm on August 20, 2021: member

    With this change, if we have a bug that causes us to write a peers.dat file that’s corrupt in any way, then it would mostly be undetected - the addrman would be cleared and startup would continue as normal. As @mzumsande points out, if it’s possible to trigger such a bug over p2p, then an adversary could cause victims to lose their entire addrman on restart.

    It might be safer to do the ForceCheck, but then catch the exception, rename peers.dat to peers.bak, and terminate with some error message like “Corrupt peers.dat file found. addrman database has been wiped and peers.dat has been renamed to peers.bak. Please report this to the Bitcoin Core developers.” That way, we’d get bug reports if there was a bug in our serialization code, and there may be a way to salvage the peers.bak file if needed.

    I agree with John.

  12. DrahtBot commented at 2:22 am on August 21, 2021: member

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #22974 (addrman: Improve performance of Good by mzumsande)
    • #22950 ([p2p] Pimpl AddrMan to abstract implementation details by amitiuttarwar)
    • #22910 ([RFC] Encapsulate asmap in NetGroupManager by jnewbery)
    • #22872 (log: improve checkaddrman logging with duration in milliseconds by jonatack)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  13. MarcoFalke commented at 12:45 pm on August 21, 2021: member
    I agree with @vasild that this is an unrelated bug that already exists. I’ve created a separate pull to fix that: #22762
  14. MarcoFalke added the label Waiting for author on Aug 21, 2021
  15. DrahtBot added the label Needs rebase on Sep 1, 2021
  16. MarcoFalke removed the label Waiting for author on Sep 1, 2021
  17. MarcoFalke added the label Waiting for author on Sep 10, 2021
  18. MarcoFalke force-pushed on Sep 10, 2021
  19. MarcoFalke force-pushed on Sep 10, 2021
  20. DrahtBot removed the label Needs rebase on Sep 10, 2021
  21. in src/addrman.cpp:393 in 4fd5305667 outdated
    384@@ -385,7 +385,12 @@ void CAddrMan::Unserialize(Stream& s_)
    385         LogPrint(BCLog::ADDRMAN, "addrman lost %i new and %i tried addresses due to collisions or invalid addresses\n", nLostUnk, nLost);
    386     }
    387 
    388-    Check();
    389+    const int check_code{ForceCheckAddrman()};
    390+    if (check_code != 0) {
    391+        throw std::ios_base::failure(strprintf(
    392+            "Corrupt data. Consistency check failed with code %s",
    393+            check_code));
    


    vasild commented at 8:01 am on September 15, 2021:

    It would be nice if this mentions “addrman” or “peers.dat”. Without it the error message looks too generic, it is unclear where the problem is and the user may freak out that the “corrupt data” applies to the wallet:

    terminate called after throwing an instance of ‘std::ios_base::failure[abi:cxx11]’ what(): Corrupt data. Consistency check failed with code -1: iostream error


    MarcoFalke commented at 8:30 am on September 15, 2021:

    Can you explain what code path didn’t catch the exception? Raw exceptions shouldn’t be displayed to the user at all. See also #22734#issue-714811417 on steps to test this pull request.

    (Or see the attached test to reproduce)


    vasild commented at 9:01 am on September 15, 2021:
    Uff, sorry, I remember this was discussed before. I was misled somehow by the fuzzer output (quoted above).

    MarcoFalke commented at 9:02 am on September 15, 2021:
    No worries. Resolving for now.
  22. vasild approved
  23. vasild commented at 8:05 am on September 15, 2021: member

    ACK 4fd53056672c227d37bccff0d0d260febf590f8f

    CAddrMan::Unserialize() already, before this PR, could throw if it encounters a corrupted peers.dat. I think it is logical to do the same if the consistency checks fail after the unserialize process has finished “successfully”. Also, it is important enough and not too expensive to do the consistency checks always, unconditionally, regardless of -checkaddrman at the end of unserialize.

    (the fuzzer is upset, probably needs some petting)

  24. MarcoFalke removed the label Waiting for author on Sep 15, 2021
  25. MarcoFalke force-pushed on Sep 15, 2021
  26. MarcoFalke commented at 9:00 am on September 15, 2021: member

    (the fuzzer is upset, probably needs some petting)

    Thanks. Force pushed to pet the fuzzer and fixup the docs. (no other code changes)

  27. vasild approved
  28. vasild commented at 10:00 am on September 15, 2021: member
    ACK fa6f5c9ea4dad2e15065001fef9ae6a9325c632b
  29. MarcoFalke commented at 10:11 am on September 15, 2021: member
    @mzumsande @naumenkogs @jnewbery Now that #22762 is merged, I rebased this pull. This should address your concerns.
  30. naumenkogs commented at 12:06 pm on September 15, 2021: member
    ACK fa6f5c9ea4dad2e15065001fef9ae6a9325c632b
  31. DrahtBot added the label Needs rebase on Sep 20, 2021
  32. move-only: Move CAddrMan::Check to cpp file
    This speeds up compilation of the whole program because the included
    header file is smaller.
    
    Can be reviewed with --color-moved=dimmed-zebra --color-moved-ws=ignore-all-space
    fae5c633dc
  33. Refactor: Turn the internal addrman check helper into a forced check fa298971e6
  34. addrman: Replace assert with throw on corrupt data
    Assert should only be used for program internal logic errors, not to
    sanitize external user input.
    fa7a883f5a
  35. fuzz: Move all addrman fuzz targets to one file
    Can be reviewed with --color-moved=dimmed-zebra
    fa3669f72f
  36. MarcoFalke force-pushed on Sep 21, 2021
  37. DrahtBot removed the label Needs rebase on Sep 21, 2021
  38. MarcoFalke commented at 10:45 am on September 21, 2021: member
    Rebased; Only trivial conflicts in the tests.
  39. naumenkogs commented at 10:59 am on September 21, 2021: member
    ACK fa3669f72f69662049b55ad1a482b4a0f9f7ae40
  40. vasild approved
  41. vasild commented at 12:03 pm on September 21, 2021: member
    ACK fa3669f72f69662049b55ad1a482b4a0f9f7ae40
  42. jnewbery commented at 3:48 pm on September 21, 2021: member
    Code review ACK fa3669f72f69662049b55ad1a482b4a0f9f7ae40
  43. MarcoFalke merged this on Sep 21, 2021
  44. MarcoFalke closed this on Sep 21, 2021

  45. MarcoFalke deleted the branch on Sep 21, 2021
  46. sidhujag referenced this in commit 43bc87779b on Sep 21, 2021
  47. fanquake referenced this in commit b8cca9cac7 on Sep 23, 2021
  48. sidhujag referenced this in commit efc24d069b on Sep 24, 2021
  49. Fabcien referenced this in commit e6ed010a5e on Oct 19, 2022
  50. DrahtBot locked this on Oct 30, 2022

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-12-19 03:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me