Fix de-serialization bug where AddrMan is left corrupted #7696

pull EthanHeilman wants to merge 1 commits into bitcoin:master from EthanHeilman:bug changing 4 files +146 −0
  1. EthanHeilman commented at 2:40 am on March 16, 2016: contributor

    CAddrDB::Read is used to manage the loading of AddrMan from peers.dat. As shown in the code below, when CAddrDB::Read catches an exception from the de-serialization code it returns addrman “as-is”, despite the fact that it failed to load correctly.

     0    try {
     1        // de-serialize file header (network specific magic number) and ..
     2        ssPeers >> FLATDATA(pchMsgTmp);
     3
     4        // ... verify the network matches ours
     5        if (memcmp(pchMsgTmp, Params().MessageStart(), sizeof(pchMsgTmp)))
     6            return error("%s: Invalid network magic number", __func__);
     7
     8        // de-serialize address data into one CAddrMan object
     9        ssPeers >> addr;
    10    }
    11    catch (const std::exception& e) {
    12        return error("%s: Deserialize or I/O error - %s", __func__, e.what());
    13    }
    

    https://github.com/bitcoin/bitcoin/blob/master/src/net.cpp#L2330

    This use of a corrupted addrman can cause a bitcoind client to get in a state such that when bitcoind starts it will not run correctly. Once bitcoind gets into such a state the only fix is for the user to manually delete the offending peers.dat file.

    This pull request fixes this behavior so that an exception during the de-serialization process will leave addrman in a clean state. Unittests verify this new behavior.

  2. in src/test/net_tests.cpp: in cb1b0556be outdated
    61+    return CDataStream(vchData, SER_DISK, CLIENT_VERSION);
    62+}
    63+
    64+BOOST_FIXTURE_TEST_SUITE(addrman_tests, BasicTestingSetup)
    65+
    66+BOOST_AUTO_TEST_CASE(caddrdb_deserialiation)
    


    paveljanik commented at 7:43 am on March 16, 2016:
    strange test name ;-) z missing

    EthanHeilman commented at 12:36 pm on March 16, 2016:
    Thanks. I changed it to caddrdb_read, since that is a better descriptor of what I am testing.
  3. in src/Makefile.test.include: in cb1b0556be outdated
    59@@ -60,6 +60,7 @@ BITCOIN_TESTS =\
    60   test/merkle_tests.cpp \
    61   test/miner_tests.cpp \
    62   test/multisig_tests.cpp \
    63+  test/net_tests.cpp \
    


    paveljanik commented at 7:47 am on March 16, 2016:
    You test addrman, not net. Please name the file addrman_tests.cpp.

    EthanHeilman commented at 12:23 pm on March 16, 2016:
    I am testing CAddrDB, which is called from and lives in net.h and net.cpp not CAddrMan.

    jonasschnelli commented at 12:32 pm on March 16, 2016:
    No strong opinion: IMO we don’t need to use an identical filename for what is tested in the test. I think addrman_tests.cpp would be more clear.

    EthanHeilman commented at 12:49 pm on March 16, 2016:

    If someone feels strongly I’m happy to move it to addrman_tests.cpp. I wrote all the tests in addrman_tests and choose to put this test in net_tests since it feels more common to net than addrman.

    Ideally CAddrDB, CBanDB and other files which manage interaction with the data directory should be refactored into a common file/class but that is much larger task (for instance CBanDB is very tightly coupled to net.cpp).

  4. paveljanik commented at 7:49 am on March 16, 2016: contributor

    Travis fails with:

    0test/net_tests.cpp(93): error: in "addrman_tests/caddrdb_deserialiation": check addrman1.size() == 3 has failed
    1test/net_tests.cpp(105): error: in "addrman_tests/caddrdb_deserialiation": check addrman2.size() == 3 has failed
    

    But only in one configuration.

  5. jonasschnelli added the label P2P on Mar 16, 2016
  6. jonasschnelli commented at 12:33 pm on March 16, 2016: contributor
    Concept ACK, more tests are better. Haven’t verified the tests though.
  7. EthanHeilman commented at 3:10 pm on March 16, 2016: contributor
    @paveljanik The non-determinism of the test passing is worrying especially because the test that was failing was not testing new behavior. I haven’t been able to replicate this failure locally, any idea what is happening with it?
  8. EthanHeilman commented at 3:12 pm on March 16, 2016: contributor
    Can someone add the labels bug and data corruption.
  9. laanwj added the label Bug on Mar 16, 2016
  10. laanwj added the label Data corruption on Mar 16, 2016
  11. laanwj commented at 4:20 pm on March 24, 2016: member

    Concept ACK.

    Instead of relying on the output of CAddrDB::Read to be “clean” in case of an error I’d prefer a different approach, though: ignore and reset its output when the function fails. In my experience this is a more robust approach - it’s easy to forget resetting it in some code path. In a lot of APIs the output of failing functions is thus undefined.

  12. EthanHeilman commented at 9:48 pm on March 24, 2016: contributor
    @laanwj I’m not sure I follow, do you want the “clearing” to happen closer to the source of the error? In the serialization code?
  13. laanwj commented at 9:51 am on March 25, 2016: member

    @EthanHeilman At the call site:

    0        if (adb.Read(addrman))
    1            LogPrintf("Loaded %i addresses from peers.dat  %dms\n", addrman.size(), GetTimeMillis() - nStart);
    2        else {
    3            addrman.clear(); // Addrman can be in an inconsistent state after failure, reset it
    4            LogPrintf("Invalid or missing peers.dat; recreating\n");
    5            DumpAddresses();
    6        }
    

    It’s hard to guarantee that adb.Read won’t leave the passed addrman in an inconsistent state after failure in all possible internal code paths. This would at least catch all possible cases.

    The cleanest way if we had a clean slate, I think, would be to have Read return an AddrMan* which is null if an error happened, then instantiate a new one on faiilure. But as we have that global object that’s too impactful.

  14. EthanHeilman commented at 4:27 pm on March 25, 2016: contributor
    @laanwj Good idea, I amended the commit to include addrman.Clear in StartNode’s Read() error condition.
  15. in src/net.cpp: in a7c4eb7ef7 outdated
    2336+}
    2337+
    2338+bool CAddrDB::Read(CAddrMan& addr, CDataStream& ssPeers, uint256& hashIn)
    2339+{
    2340     // verify stored checksum matches input data
    2341     uint256 hashTmp = Hash(ssPeers.begin(), ssPeers.end());
    


    laanwj commented at 4:31 pm on March 25, 2016:
    Could make sense to move the hash check to the outer read function - after all, it’s not part of the inner stream but of the wrapping file. Would simplify testing as there’d be no need to pass hashIn here.

    EthanHeilman commented at 5:07 pm on March 25, 2016:

    My thinking with passing hashIn as a parameter was that it would allow me to write a test on the “inner read” to verify an error is thrown if hashIn is not the same as the hash of ssPeers. Something like:

     0BOOST_AUTO_TEST_CASE(caddrdb_hash)
     1{
     2    CAddrManUncorrupted addrmanUncorrupted;
     3    CService addr1 = CService("250.7.1.1", 8333);
     4    addrmanUncorrupted.Add(CAddress(addr1), CService("252.5.1.1", 8333));
     5
     6    // Test that the de-serialization does not throw an exception.
     7    CDataStream ssPeers1 = AddrmanToStream(addrmanUncorrupted);
     8
     9    addrmanUncorrupted.Add(CAddress(addr3), CService("253.3.3.33", 8333));
    10    CDataStream ssPeersDiff = AddrmanToStream(addrmanUncorrupted);
    11    uint256 badHashIn = Hash(ssPeersDiff.begin(), ssPeersDiff.end());
    12
    13    CAddrDB adb;
    14    CAddrMan addrman;
    15    adb.Read(addrman, ssPeers2, badHashIn); // Assert this throws an exception since the hashes don't match
    16}
    

    but I decided to wait until I had more time to work on it.

    How interested are you in a system for mocking out filesystem objects rather than having to do everything at the stream level?


    laanwj commented at 5:36 pm on March 25, 2016:

    Ok, yes, that makes sense.

    How interested are you in a system for mocking out filesystem objects rather than having to do everything at the stream level?

    Depends on the (non-test) code impact. I’m not entirely sure that is necessary - e.g. you could create a temporary file and make it read that.


    EthanHeilman commented at 6:51 pm on March 25, 2016:

    Depends on the (non-test) code impact.

    It is on my list of things to investigate, but I was considering creating a class which wraps File/fopen and can be mocked out in unitests similar to RandomInt in addrman.h.

    https://github.com/bitcoin/bitcoin/blob/master/src/addrman.h#L239

    I share your concern as this would require a big diff when all the instances of File are replaced with mockable File. I assume this is a problem others have solved so I’m looking into alternates.

    I’m not entirely sure that is necessary - e.g. you could create a temporary file and make it read that.

    YMMV, but in my experience tmp files in unittests cause hard to debug test failures due to filesystem weirdness and added state (file deletes sometimes fail).


    laanwj commented at 7:22 am on March 26, 2016:

    YMMV, but in my experience tmp files in unittests cause hard to debug test failures due to filesystem weirdness and added state (file deletes sometimes fail).

    I agree. It’s somewhat preferable to not clutter files around the place. On the other hand it actually tests the filesystem interaction then :)

    I share your concern as this would require a big diff when all the instances of File are replaced with mockable File.

    C++ has some support for this by replacing inputs/outputs streams by string streams, unfortunately - or not, I’m not entirely up to date of the pros and cons of it - we don’t really use C++’s file API.

    We mainly use fopen and FILE*. Note that on some platforms (including Linux and MacOSX IIRC) for FILE* there is: fmemopen, open_memstream. May be interesting.

    Then one could change the APIs to take a FILE* instead of filename. On platforms without the function it could fall back to a temporary file, on platforms with the function it can use a memory stream.

  16. EthanHeilman commented at 7:20 pm on April 18, 2016: contributor
    Does this pull-request need any additional changes?
  17. laanwj commented at 1:33 pm on April 19, 2016: member
    I still don’t really like passing the hash into bool CAddrDB::Read, I think it makes for an awful API :)
  18. Fix de-serialization bug where AddrMan is corrupted after exception
    * CAddrDB modified so that when de-serialization code throws an exception Addrman is reset to a clean state
    * CAddrDB modified to make unit tests possible
    * Regression test created to ensure bug is fixed
    * StartNode modifed to clear adrman if CAddrDB::Read returns an error code.
    1475ecf611
  19. EthanHeilman commented at 11:26 pm on May 4, 2016: contributor
    @laanwj I removed hash from CAddrDB::Read
  20. sipa commented at 4:33 pm on May 5, 2016: member
    utACK 1475ecf61141e03f63a79d59831c411e0e8a5c0a
  21. laanwj commented at 5:32 pm on May 5, 2016: member
    @EthanHeilman Looks good to me now, thanks. utACK
  22. in src/test/net_tests.cpp: in 1475ecf611
    117+        ssPeers1 >> FLATDATA(pchMsgTmp);
    118+        ssPeers1 >> addrman1;
    119+    } catch (const std::exception& e) {
    120+        exceptionThrown = true;
    121+    }
    122+    // Even through de-serialization failed adddrman is not left in a clean state.
    


    sipa commented at 8:35 am on May 17, 2016:
    adddrman
  23. sipa merged this on May 17, 2016
  24. sipa closed this on May 17, 2016

  25. sipa referenced this in commit 5c3f8ddcaa on May 17, 2016
  26. sickpig referenced this in commit 6024389810 on Apr 13, 2017
  27. sickpig referenced this in commit c9c43d17b2 on Apr 14, 2017
  28. sickpig referenced this in commit ac53931cd0 on Apr 14, 2017
  29. zkbot referenced this in commit f40121446d on Nov 12, 2020
  30. zkbot referenced this in commit 049951dc45 on Feb 11, 2021
  31. zkbot referenced this in commit b3a6729944 on Feb 16, 2021
  32. zkbot referenced this in commit e85265fbd5 on Feb 17, 2021
  33. zkbot referenced this in commit b4b07a1bbd on Feb 17, 2021
  34. MarcoFalke locked this on Sep 8, 2021

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-12-30 15:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me