Use SipHash for node eviction #8086
pull sipa wants to merge 4 commits into bitcoin:master from sipa:moresiphash changing 8 files +104 −62-
sipa commented at 9:14 am on May 22, 2016: member
-
gmaxwell commented at 10:04 am on May 22, 2016: contributorutACK
-
in src/hash.h: in 2e099337ef outdated
181 int count; 182 183 public: 184 CSipHasher(uint64_t k0, uint64_t k1); 185 CSipHasher& Write(uint64_t data); 186+ CSipHasher& Write(const unsigned char* data, size_t size);
TheBlueMatt commented at 0:25 am on May 23, 2016:Should probably note somewhere that the two Write methods are partially-mutually-exclusive per-object.TheBlueMatt commented at 2:37 am on May 23, 2016: memberThis is one of those things where having a cryptographic hash function probably isnt /critical/, but is really preferable. Is the speed of SHA256 really slow enough to matter here afer you accepted a new TCP connection (and all associated OS overhead of doing so)?pstratem commented at 4:11 am on May 23, 2016: contributornACK
This really does need to be a cryptographic hash function.
The performance overhead of opening the connection is almost certainly many many times the cost of this comparison.
gmaxwell commented at 5:19 am on May 23, 2016: contributorSiphash is a cryptographic function with all the properties we would desire here: https://eprint.iacr.org/2014/722
It is generally suitable any place a cryptographic hash would be used where the small output size wouldn’t be an issue. This is a fine example of such a location, similar to hash tables where the table’s smallness makes a larger hash output irrelevant, with only a few hundred peers a 64 bit hash will reliably distinguish them.
TheBlueMatt commented at 6:13 am on May 23, 2016: memberIn principal, yes, SipHash was designed to be effectively a cryptographic hash with small output size, but I don’t want to fall into the trap of calling something a cryptographic hash when it has only had one or two cryptanalyses published. If we care about the performance difference here, I’d say it’s fine, but I’m not sure that we do?
On May 22, 2016 10:19:47 PM PDT, Gregory Maxwell notifications@github.com wrote:
Siphash is a cryptographic function with all the properties we would desire here: https://eprint.iacr.org/2014/722
It is generally suitable any place a cryptographic hash would be used where the small output size wouldn’t be an issue. This is a fine example of such a location, similar to hash tables where the table’s smallness makes a larger hash output irrelevant, with only a few hundred peers a 64 bit hash will reliably distinguish them.
You are receiving this because you commented. Reply to this email directly or view it on GitHub: #8086 (comment)
gmaxwell commented at 6:45 am on May 23, 2016: contributorKeep in mind that the input is also very small, and the keyspace is very large. It’s likely that this function is nearly a keyed permutation for inputs this small.
As far as performance, with 125 peers and sha256 here it will take about 2,632,000 cycles to run the hashes here. That is about 2ms cpu per inbound connection. It’s not negligible.
(Though sipa should perhaps change it to run the hashing in the pre-processing step instead of in the comparison for a 2*log2(n) speedup– or better, store it in cnode and compute it on connect to make it O(1) in the number of operations per disconnect.)
gmaxwell commented at 9:18 am on May 23, 2016: contributorPlease walk me through the attack you’re imagining here. There are 16 bits of network groups for IPv4, 32 bits for IPv6. In both cases many of the possible values aren’t routable on the internet.. Each node has it’s own 128-bit secret random seed. The node prefers to keep four hosts based on how their netgroup is ordered by a salted hash. The attacker does X and achieves Y. Help me out with the blanks here. :)jonasschnelli added the label P2P on May 23, 2016gmaxwell commented at 6:18 pm on May 24, 2016: contributorCome one guys, you can’t throw up a bunch of stop signs for security then fail to actually walk me through your security analysis. We want security but not security cargo cult.
Beyond performance there is a consistency point here, we should use the same cryptographic hash-function in all the cases where we need some non-normative small hash, unless there is a good reason not to. If there is a reason it’s not suitable here, then its likely not suitable in other cases either.
pstratem commented at 6:56 pm on May 24, 2016: contributor@gmaxwell If this fails it potentially harms the partition resistance of the network.
If the other uses of SipHash fail they at worst are a denial of service attack (which existed before SipHash).
Given the extremely small performance difference compared to the enormous cost of accepting a new connection I just don’t see the (admittedly small) risk being justified.
TheBlueMatt commented at 7:11 pm on May 24, 2016: memberIf there is no performance difference, why change it? Might as well use a stronger hash anywhere we can if its free.gmaxwell commented at 7:40 pm on May 24, 2016: contributorAgain, please actually specify the attack. I gave you a template, fill it out.
If the other uses of SipHash fail they at worst are a denial of service attack
The usage in addrman is functionally identical to this.
if there is no performance difference
in this PR it’s a pretty substantial performance difference (it’s about 2ms per connection), though performance can be achieved in other ways that I pointed out.
Might as well use a stronger hash
On what basis do you believe that another construction would be stronger?
As an aside– right now the behavior of this is kind of busted. It orders peers by their netgroup and protects four, but it makes no effort to make the four be from different netgroups, which means there is an obvious attack strategy of running four hosts per netgroup in a lots of different netgroups. To avoid that, it should (e.g.) sort by netgroup hash, lastblocktime>0, connection uptime and skip over protecting peers that have the same netgroup as peers that were already protected.
It’s more than a little disappointing to see furious hand-waving about the hash function when the basic functionality isn’t getting anywhere near that amount of attention.
sipa force-pushed on May 25, 2016sipa commented at 1:46 pm on May 25, 2016: memberI think you’re all exaggerating:
- @pstratem @TheBlueMatt SipHash is more than sufficient in this case (hell, multiplying the vchGroup (interpreted as a number) by a random odd 64-bit integer likely already results in a sufficiently unpredictable permutation).
- @gmaxwell After caching at the CNode level, performance of the hash function is irrelevant. I think SipHash is more appropriate, but SHA256 is certainly not inappropriate.
So let’s just combine the approaches.
I’ve included #8088 in this PR, and made a few more changes (= make the whole eviction logic work using nKeyedNetGroup, rather than only in the comparison). I’ve also made CNode::addr const (to make sure the precomputed keyed netgroup doesn’t get invalidated) and moved the precomputation to the .cpp file. @theuni For combining with #8085, the CNode::CalculateKeyedNetGroup()’s k0 and k1 belong inside ConnMan, and perhaps the CNode::nKeyedNetGroup values do too (to prevent storing ConnMan-dependent information inside CNode). @TheBlueMatt I’ve expanded the comments in CSipHasher.
sipa force-pushed on May 26, 2016sipa force-pushed on May 26, 2016in src/net.h: in 280e979f8c outdated
8@@ -9,6 +9,8 @@ 9 #include "amount.h" 10 #include "bloom.h" 11 #include "compat.h" 12+#include "crypto/common.h"
theuni commented at 3:18 pm on May 27, 2016:includes still needed?
sipa commented at 11:34 pm on May 28, 2016:Fixed.theuni commented at 4:16 pm on May 27, 2016: memberutACK (excluding the siphash impl itself, which i’m not qualified to review) either way. I don’t see the harm in siphash (see previous disclaimer, though), since the input is at most 32bits anyway. But I agree with @sipa that the hash type shouldn’t make much difference anyway once cached.sipa force-pushed on May 28, 2016Avoid recalculating vchKeyedNetGroup in eviction logic.
Lazy calculate vchKeyedNetGroup in CNode::GetKeyedNetGroup.
Support SipHash with arbitrary byte writes 9bf156bb9eUse 64-bit SipHash of netgroups in eviction c31b24f745Use C++11 thread-safe static initializers 888483098ein src/net.cpp: in 9d1da31c1f outdated
2611@@ -2638,3 +2612,17 @@ bool CBanDB::Read(banmap_t& banSet) 2612 int64_t PoissonNextSend(int64_t nNow, int average_interval_seconds) { 2613 return nNow + (int64_t)(log1p(GetRand(1ULL << 48) * -0.0000000000000035527136788 /* -1/2^48 */) * average_interval_seconds * -1000000.0 + 0.5); 2614 } 2615+ 2616+/* static */ uint64_t CNode::CalculateKeyedNetGroup(const CAddress& ad) 2617+{ 2618+ static uint64_t k0 = 0, k1 = 0;
laanwj commented at 1:16 pm on June 7, 2016:This code is not thread-safe - does it matter?
Otherwise maybe make this a static instance of a structure with the initialization code in the constructor, and C++11 semantics will make sure it will only get initialized once in a thread-safe way.
sipa commented at 2:34 pm on June 7, 2016:It doesn’t matter (the old code wasn’t thread safe either), but better use good practices, and using static initializers is trivial. Fixed.sipa force-pushed on Jun 7, 2016laanwj commented at 1:09 pm on June 8, 2016: memberI’ve ported the SIPhash test to python using https://github.com/majek/pysiphash , source is here: https://gist.github.com/laanwj/b292fedecf6029fc5307968b965e3366
However I get mismatching results:
0OK b'' 1OK b'00' 2Mismatch for b'01020304050607': 15dd418547d24915 versus 93f5f5799a932462 3Mismatch for b'08090a0b0c0d0e0f': 242272d800a348b4 versus 3f2acc7f57c29bdb 4Mismatch for b'1011': 6307967a77964b0c versus 4bc1b3f0968dd39c 5Mismatch for b'12131415161718191a': c5b1fd856729544f versus 2f2e6163076bcfad 6Mismatch for b'1b1c1d1e1f': 7a789bd84a5c633b versus 7127512f72f27cce 7Mismatch for b'2021222324252627': ad2b02a542ea7faa versus 0e3ea96b5304a7d0 8Mismatch for b'28292a2b2c2d2e2f': 3c24a11813c26e21 versus e612a3cb9ecba951 9OK b'000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f'
As some of the results do match, I’m not sure whether I made a mistake in my conversion, or whether there’s a bug in either bitcoin core’s or that python implementation.
laanwj commented at 1:12 pm on June 8, 2016: memberOh, found the issue: I assumed .Finalize would finalize and reset the hasher. It just returns the current hash value, allowing further calls to continue. When I pass through the previous value, it works fine. Updated https://gist.github.com/laanwj/b292fedecf6029fc5307968b965e3366
utACK 8884830
laanwj referenced this in commit eebc232187 on Jun 8, 2016sipa closed this on Jun 8, 2016
rebroad referenced this in commit 202b149109 on Dec 7, 2016lateminer referenced this in commit 0c6f5bd006 on Oct 18, 2018MarcoFalke locked this on Sep 8, 2021
github-metadata-mirror
This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-12-19 03:12 UTC
This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me