uint256::GetCheapHash bigendian compatibility #7078

pull arowser wants to merge 1 commits into bitcoin:master from arowser:ppc changing 1 files +2 −4
  1. arowser commented at 2:10 am on November 23, 2015: contributor

    Tested on PPC/MIPS : Before change:

    0test_bitcoin --run_test=addrman_tests
    1Running 5 test cases...
    2test/addrman_tests.cpp(141): error in "addrman_new_collisions": check addrman.size() == 3 failed
    3test/addrman_tests.cpp(145): error in "addrman_new_collisions": check addrman.size() == 4 failed
    4test/addrman_tests.cpp(166): error in "addrman_tried_collisions": check addrman.size() == i failed
    5test/addrman_tests.cpp(166): error in "addrman_tried_collisions": check addrman.size() == i failed
    6test/addrman_tests.cpp(166): error in "addrman_tried_collisions": check addrman.size() == i failed
    7
    8*** 5 failures detected in test suite "Bitcoin Test Suite"
    

    After:

    0test_bitcoin --run_test=addrman_tests
    1Running 5 test cases...
    2
    3*** No errors detected
    
  2. jgarzik commented at 2:54 am on November 23, 2015: contributor
    eh, use WriteLE64()?
  3. arowser force-pushed on Nov 23, 2015
  4. dcousens commented at 0:52 am on November 24, 2015: contributor

    LGTM, utACK

    edit: See @laanwj’s comment

    edit2: Conflicted, utACK on the code

  5. laanwj commented at 7:58 am on November 24, 2015: member

    This is on purpose! The “hash” is meant to be as fast as possible by using endian-dependent operations. E.g. within one program run it is stable and that is all that matters - it is not supposed to be used with anything that leaves the program. This is mentioned, literally, in the comment: https://github.com/bitcoin/bitcoin/blob/master/src/uint256.h#L118

    Edit: as for the test failure, have you looked into why this causes a test failure? In principle a different hash function should not break the tests (or at least the correctness of the code), so this error may lie deeper.

  6. laanwj added the label Tests on Nov 24, 2015
  7. arowser commented at 7:32 am on November 25, 2015: contributor

    The failed test case is addrman_new_collisions and addrman_tried_collisions: https://github.com/bitcoin/bitcoin/blob/master/src/test/addrman_tests.cpp#L141

    The testcase make a hash collision, The value of “hash1 % ADDRMAN_BUCKET_SIZE” https://github.com/bitcoin/bitcoin/blob/master/src/addrman.cpp#L29 is always 28 when addr1 is 250.1.1.1 or addr1 is 250.1.1.4 when little endian, but its different in big endian, its the reason that test failed.

     0addrman.Add(CAddress(addr1), source)
     1    {
     2    ...
     3    int nUBucketPos = infoOld.GetBucketPosition(nKey, true, nUBucket);
     4        --> {
     5            uint64_t hash1 = (CHashWriter(SER_GETHASH, 0) << nKey << (fNew ? 'N' : 'K') << nBucket << GetKey()).GetHash().GetCheapHash();
     6            return hash1 % ADDRMAN_BUCKET_SIZE; //this is same when little endian when addr1 is 250.1.1.1 or 250.1.1.4
     7            }
     8        ...
     9    ClearNew(int nUBucket, int nUBucketPos)
    10         -->{
    11            if (vvNew[nUBucket][nUBucketPos] != -1)
    12            ...
    13            vvNew[nUBucket][nUBucketPos] = -1;
    14            }
    15    }
    

    The test result in big endian should be consistent with the little endian.

  8. laanwj commented at 7:38 am on November 25, 2015: member
    Bleh, so the test relies very strongly on the exact outcome of the hash function (as the algorithm is “randomized” by that), even though both results are correct inthemselves?
  9. luke-jr commented at 7:41 am on November 25, 2015: member
    These addrman “tests” look pretty crappy in general… lots of checking internal implementation details rather than expected external behaviour. :/
  10. arowser commented at 8:02 am on November 25, 2015: contributor
    The addrman_new_collisions first call addrman.MakeDeterministic(), it set addrman.nKey is NULL, then the addrman addr placement in new table will be deterministic, it seems try to simulate hash collisions when addrman.add() .
  11. laanwj commented at 8:05 am on November 25, 2015: member

    The primary reason I’m protesting here is because GetCheapHash is used in CSignatureCacheHasher, which is a performance critical path, and also the only use beside addrman.

    It is possible that adding a (conditional) byte swap has hardly impact on performance, but that would have to be benchmarked.

  12. uint256::GetCheapHash bigendian compatibility c434940e83
  13. in src/uint256.h: in dc4815d79a outdated
    124      */
    125     uint64_t GetCheapHash() const
    126     {
    127         uint64_t result;
    128-        memcpy((void*)&result, (void*)data, 8);
    129+        WriteLE64((unsigned char*)&result, *(uint64_t *)data);
    


    laanwj commented at 8:06 am on November 25, 2015:

    Nit: what you want here is ReadLE64, not WriteLE64:

    0return ReadLE64(data);
    
  14. arowser force-pushed on Nov 25, 2015
  15. sipa commented at 8:53 pm on November 26, 2015: member

    I doubt the performance penalty of ReadLE64 is going to matter really on any system.

    I liked the idea of having an unportable fast function, though, but being able to rely on behaviour does make testing easier too.

    Unsure.

  16. laanwj commented at 9:48 am on November 27, 2015: member

    I liked the idea of having an unportable fast function, though, but being able to rely on behaviour does make testing easier too.

    Right - in the case of addrman, the overhead is nothing compared to the double-SHA256, in all its uses:

    0src/addrman.cpp:    uint64_t hash1 = (CHashWriter(SER_GETHASH, 0) << nKey << GetKey()).GetHash().GetCheapHash();
    1src/addrman.cpp:    uint64_t hash2 = (CHashWriter(SER_GETHASH, 0) << nKey << GetGroup() << (hash1 % ADDRMAN_TRIED_BUCKETS_PER_GROUP)).GetHash().GetCheapHash();
    2src/addrman.cpp:    uint64_t hash1 = (CHashWriter(SER_GETHASH, 0) << nKey << GetGroup() << vchSourceGroupKey).GetHash().GetCheapHash();
    3src/addrman.cpp:    uint64_t hash2 = (CHashWriter(SER_GETHASH, 0) << nKey << vchSourceGroupKey << (hash1 % ADDRMAN_NEW_BUCKETS_PER_SOURCE_GROUP)).GetHash().GetCheapHash();
    4src/addrman.cpp:    uint64_t hash1 = (CHashWriter(SER_GETHASH, 0) << nKey << (fNew ? 'N' : 'K') << nBucket << GetKey()).GetHash().GetCheapHash();
    

    So the two usecases of GetCheapHash are wildly different. In the case of sigcache every cycle counts, in the case of addrman we may just as well be using something else.

    E.g. could also remove the function from uint256 class and have tailored implementations at the call sites

  17. sipa commented at 2:17 pm on November 27, 2015: member
    @laanwj Even the signature cache now uses a SHA256 (though that should at some point be replaced with a fast non-cryptographic but highly collision-resistant hash function). A single byteswap (and only on BE platforms…) is not going to be measurable.
  18. laanwj commented at 2:24 pm on November 27, 2015: member
    OK, just going ahead and merging this then
  19. laanwj merged this on Nov 27, 2015
  20. laanwj closed this on Nov 27, 2015

  21. laanwj referenced this in commit 93e0514fd0 on Nov 27, 2015
  22. MarcoFalke locked this on Sep 8, 2021

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-10-05 01:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me