Use SipHash for node eviction

sipa commented at 9:14 am on May 22, 2016: member

gmaxwell commented at 10:04 am on May 22, 2016: contributor

utACK

in src/hash.h: in 2e099337ef outdated

181     int count;
182 
183 public:
184     CSipHasher(uint64_t k0, uint64_t k1);
185     CSipHasher& Write(uint64_t data);
186+    CSipHasher& Write(const unsigned char* data, size_t size);

TheBlueMatt commented at 0:25 am on May 23, 2016:

Should probably note somewhere that the two Write methods are partially-mutually-exclusive per-object.

TheBlueMatt commented at 2:37 am on May 23, 2016: member

This is one of those things where having a cryptographic hash function probably isnt /critical/, but is really preferable. Is the speed of SHA256 really slow enough to matter here afer you accepted a new TCP connection (and all associated OS overhead of doing so)?

pstratem commented at 4:11 am on May 23, 2016: contributor

nACK

This really does need to be a cryptographic hash function.

The performance overhead of opening the connection is almost certainly many many times the cost of this comparison.

gmaxwell commented at 5:19 am on May 23, 2016: contributor

Siphash is a cryptographic function with all the properties we would desire here: https://eprint.iacr.org/2014/722

It is generally suitable any place a cryptographic hash would be used where the small output size wouldn’t be an issue. This is a fine example of such a location, similar to hash tables where the table’s smallness makes a larger hash output irrelevant, with only a few hundred peers a 64 bit hash will reliably distinguish them.

TheBlueMatt commented at 6:13 am on May 23, 2016: member

In principal, yes, SipHash was designed to be effectively a cryptographic hash with small output size, but I don’t want to fall into the trap of calling something a cryptographic hash when it has only had one or two cryptanalyses published. If we care about the performance difference here, I’d say it’s fine, but I’m not sure that we do?

On May 22, 2016 10:19:47 PM PDT, Gregory Maxwell notifications@github.com wrote:

Siphash is a cryptographic function with all the properties we would desire here: https://eprint.iacr.org/2014/722

It is generally suitable any place a cryptographic hash would be used where the small output size wouldn’t be an issue. This is a fine example of such a location, similar to hash tables where the table’s smallness makes a larger hash output irrelevant, with only a few hundred peers a 64 bit hash will reliably distinguish them.

You are receiving this because you commented. Reply to this email directly or view it on GitHub: #8086 (comment)

gmaxwell commented at 6:45 am on May 23, 2016: contributor

Keep in mind that the input is also very small, and the keyspace is very large. It’s likely that this function is nearly a keyed permutation for inputs this small.

As far as performance, with 125 peers and sha256 here it will take about 2,632,000 cycles to run the hashes here. That is about 2ms cpu per inbound connection. It’s not negligible.

(Though sipa should perhaps change it to run the hashing in the pre-processing step instead of in the comparison for a 2*log2(n) speedup– or better, store it in cnode and compute it on connect to make it O(1) in the number of operations per disconnect.)

pstratem commented at 7:20 am on May 23, 2016: contributor

@gmaxwell I’ll do a fix that still uses SHA256

gmaxwell commented at 9:18 am on May 23, 2016: contributor

Please walk me through the attack you’re imagining here. There are 16 bits of network groups for IPv4, 32 bits for IPv6. In both cases many of the possible values aren’t routable on the internet.. Each node has it’s own 128-bit secret random seed. The node prefers to keep four hosts based on how their netgroup is ordered by a salted hash. The attacker does X and achieves Y. Help me out with the blanks here. :)

pstratem commented at 10:05 am on May 23, 2016: contributor

@gmaxwell I don’t know but with #8088 the performance difference is going to be basically not measurable.

It’s now reduced to the difference between a single SHA256 vs SipHash and comparison between 256 bits vs 64 bits.

jonasschnelli added the label P2P on May 23, 2016

gmaxwell commented at 6:18 pm on May 24, 2016: contributor

Come one guys, you can’t throw up a bunch of stop signs for security then fail to actually walk me through your security analysis. We want security but not security cargo cult.

Beyond performance there is a consistency point here, we should use the same cryptographic hash-function in all the cases where we need some non-normative small hash, unless there is a good reason not to. If there is a reason it’s not suitable here, then its likely not suitable in other cases either.

pstratem commented at 6:56 pm on May 24, 2016: contributor

@gmaxwell If this fails it potentially harms the partition resistance of the network.

If the other uses of SipHash fail they at worst are a denial of service attack (which existed before SipHash).

Given the extremely small performance difference compared to the enormous cost of accepting a new connection I just don’t see the (admittedly small) risk being justified.

TheBlueMatt commented at 7:11 pm on May 24, 2016: member

If there is no performance difference, why change it? Might as well use a stronger hash anywhere we can if its free.

gmaxwell commented at 7:40 pm on May 24, 2016: contributor

Again, please actually specify the attack. I gave you a template, fill it out.

If the other uses of SipHash fail they at worst are a denial of service attack

The usage in addrman is functionally identical to this.

if there is no performance difference

in this PR it’s a pretty substantial performance difference (it’s about 2ms per connection), though performance can be achieved in other ways that I pointed out.

Might as well use a stronger hash

On what basis do you believe that another construction would be stronger?

As an aside– right now the behavior of this is kind of busted. It orders peers by their netgroup and protects four, but it makes no effort to make the four be from different netgroups, which means there is an obvious attack strategy of running four hosts per netgroup in a lots of different netgroups. To avoid that, it should (e.g.) sort by netgroup hash, lastblocktime>0, connection uptime and skip over protecting peers that have the same netgroup as peers that were already protected.

It’s more than a little disappointing to see furious hand-waving about the hash function when the basic functionality isn’t getting anywhere near that amount of attention.

sipa force-pushed on May 25, 2016

sipa commented at 1:46 pm on May 25, 2016: member

I think you’re all exaggerating:

@pstratem @TheBlueMatt SipHash is more than sufficient in this case (hell, multiplying the vchGroup (interpreted as a number) by a random odd 64-bit integer likely already results in a sufficiently unpredictable permutation).
@gmaxwell After caching at the CNode level, performance of the hash function is irrelevant. I think SipHash is more appropriate, but SHA256 is certainly not inappropriate.

So let’s just combine the approaches.

I’ve included #8088 in this PR, and made a few more changes (= make the whole eviction logic work using nKeyedNetGroup, rather than only in the comparison). I’ve also made CNode::addr const (to make sure the precomputed keyed netgroup doesn’t get invalidated) and moved the precomputation to the .cpp file. @theuni For combining with #8085, the CNode::CalculateKeyedNetGroup()’s k0 and k1 belong inside ConnMan, and perhaps the CNode::nKeyedNetGroup values do too (to prevent storing ConnMan-dependent information inside CNode). @TheBlueMatt I’ve expanded the comments in CSipHasher.

theuni commented at 4:20 pm on May 26, 2016: member

@sipa Roger. Seems CalculateKeyedNetGroup isn’t being called, though. I assume it’s meant to be called from CNode’s ctor?

sipa force-pushed on May 26, 2016

sipa commented at 4:57 pm on May 26, 2016: member

@theuni Nice catch, it got lost in code movement. I’ve turned nKeyedNetGroup into a const as well, initialized in the ctor.

sipa force-pushed on May 26, 2016

in src/net.h: in 280e979f8c outdated

 8@@ -9,6 +9,8 @@
 9 #include "amount.h"
10 #include "bloom.h"
11 #include "compat.h"
12+#include "crypto/common.h"

theuni commented at 3:18 pm on May 27, 2016:

includes still needed?

sipa commented at 11:34 pm on May 28, 2016:

Fixed.

theuni commented at 4:16 pm on May 27, 2016: member

utACK (excluding the siphash impl itself, which i’m not qualified to review) either way. I don’t see the harm in siphash (see previous disclaimer, though), since the input is at most 32bits anyway. But I agree with @sipa that the hash type shouldn’t make much difference anyway once cached.

sipa force-pushed on May 28, 2016

sipa commented at 11:35 pm on May 28, 2016: member

@theuni Even if you don’t think you’re the right person to review the SipHash code, you’re certainly able to review its tests (the values in the unit test come from another implementation).

Avoid recalculating vchKeyedNetGroup in eviction logic.

Lazy calculate vchKeyedNetGroup in CNode::GetKeyedNetGroup.

053930ffc4

Support SipHash with arbitrary byte writes 9bf156bb9e

Use 64-bit SipHash of netgroups in eviction c31b24f745

Use C++11 thread-safe static initializers 888483098e

in src/net.cpp: in 9d1da31c1f outdated

2611@@ -2638,3 +2612,17 @@ bool CBanDB::Read(banmap_t& banSet)
2612 int64_t PoissonNextSend(int64_t nNow, int average_interval_seconds) {
2613     return nNow + (int64_t)(log1p(GetRand(1ULL << 48) * -0.0000000000000035527136788 /* -1/2^48 */) * average_interval_seconds * -1000000.0 + 0.5);
2614 }
2615+
2616+/* static */ uint64_t CNode::CalculateKeyedNetGroup(const CAddress& ad)
2617+{
2618+    static uint64_t k0 = 0, k1 = 0;

laanwj commented at 1:16 pm on June 7, 2016:

This code is not thread-safe - does it matter?

Otherwise maybe make this a static instance of a structure with the initialization code in the constructor, and C++11 semantics will make sure it will only get initialized once in a thread-safe way.

sipa commented at 2:34 pm on June 7, 2016:

It doesn’t matter (the old code wasn’t thread safe either), but better use good practices, and using static initializers is trivial. Fixed.

sipa force-pushed on Jun 7, 2016

laanwj commented at 1:09 pm on June 8, 2016: member

I’ve ported the SIPhash test to python using https://github.com/majek/pysiphash , source is here: https://gist.github.com/laanwj/b292fedecf6029fc5307968b965e3366

However I get mismatching results:

0OK b''
1OK b'00'
2Mismatch for b'01020304050607': 15dd418547d24915 versus 93f5f5799a932462
3Mismatch for b'08090a0b0c0d0e0f': 242272d800a348b4 versus 3f2acc7f57c29bdb
4Mismatch for b'1011': 6307967a77964b0c versus 4bc1b3f0968dd39c
5Mismatch for b'12131415161718191a': c5b1fd856729544f versus 2f2e6163076bcfad
6Mismatch for b'1b1c1d1e1f': 7a789bd84a5c633b versus 7127512f72f27cce
7Mismatch for b'2021222324252627': ad2b02a542ea7faa versus 0e3ea96b5304a7d0
8Mismatch for b'28292a2b2c2d2e2f': 3c24a11813c26e21 versus e612a3cb9ecba951
9OK b'000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f'

As some of the results do match, I’m not sure whether I made a mistake in my conversion, or whether there’s a bug in either bitcoin core’s or that python implementation.

laanwj commented at 1:12 pm on June 8, 2016: member

Oh, found the issue: I assumed .Finalize would finalize and reset the hasher. It just returns the current hash value, allowing further calls to continue. When I pass through the previous value, it works fine. Updated https://gist.github.com/laanwj/b292fedecf6029fc5307968b965e3366

utACK 8884830

laanwj referenced this in commit eebc232187 on Jun 8, 2016

sipa commented at 1:55 pm on June 8, 2016: member

Superseded by #8173.

sipa closed this on Jun 8, 2016

rebroad referenced this in commit 202b149109 on Dec 7, 2016

lateminer referenced this in commit 0c6f5bd006 on Oct 18, 2018

MarcoFalke locked this on Sep 8, 2021

Use SipHash for node eviction #8086