Show nodeid instead of addresses (for anonymity) unless otherwise requested #3764

rebroad commented at 9:58 PM on February 27, 2014: contributor

This attempts to fix the problem with revealing too much information about IP addresses, but still provides a way to track the progress of different nodes (via the NodeId). Enabling IP addresses is now optional via the -logips command line option.

rebroad cross-referenced this on Feb 27, 2014 from issue Don't log IP addresses for transactions. by rebroad

laanwj commented at 8:48 AM on February 28, 2014: member

I like the idea of logging node IDs instead of logging IP addresses.

In principle, all -logips then has to do is log a node ID to IP:port mapping once on connection and it's possible to figure out the IPs if necessary (Though it may be more useful in practice to actually log IPs in every message anyway).

sipa commented at 1:04 PM on February 28, 2014: member

Concept ACK; I haven't reviewed the code

rebroad commented at 6:29 PM on February 28, 2014: contributor

@laanwj yes, could make -logips display IPs instead of NodeIDs, if this is what most people want. I went with just correlating them once (upon first connection). so that it keeps the debug.log more readable and smaller in size, but the info is still there if it needs to be referenced.

luke-jr commented at 6:30 PM on February 28, 2014: member

node=(IP-or-id) seems ideal to me

rebroad commented at 6:32 PM on February 28, 2014: contributor

@luke-jr = instead of : ?

luke-jr commented at 6:35 PM on February 28, 2014: member

@rebroad Whatever matches the existing style. The points I was making were 1) use the same key whether showing ID or IP; 2) show IP in every place ID would normally be

rebroad commented at 7:13 PM on February 28, 2014: contributor

@luke-jr Now using node=

rebroad commented at 11:10 PM on February 28, 2014: contributor

@luke-jr has suggested that the IP address could be added to each and every line of debug.log instead of the node number, whereas I prefer that debug.log correlates the node number with the IP address only once, so reduce debug.log output and to make it more readable.

Can I seek consensus on this please? I'm happy to adapt this pull to the majority preference.

(Luke's suggestion might possibly involve creating a new variable within the CNode struct to contain the string to display so that there didn't need to be two LogPrintfs for every time one or the other is displayed).

laanwj commented at 7:36 AM on March 1, 2014: member

I like the approach of logging the IP:port info for a node only once, on connection. After that, log only the Node ID for messages concerning that node.

This minimizes the code difference between the -logips and -nologips case (ie: one message) and avoids ugly if() statements duplicating the debug message apart from the IP information...

in src/net.cpp:None in 25917368e7 outdated

 965 |                  pnode->AddRef();
 966 |                  {
 967 |                      LOCK(cs_vNodes);
 968 |                      vNodes.push_back(pnode);
 969 |                  }
 970 | +                if (fLogIPs)

laanwj commented at 7:38 AM on March 1, 2014:

If you want -logips to affect every one of these messages I'd suggest another implementation:

Define a function IdenfityNode or such that returns a string with either the node ID or IP, based on fLogIPs
Use this function in all the concerned debug messages

This avoids these kinds of duplicated messages everywhere.

rebroad commented at 8:15 PM on March 1, 2014:

@laanwj Could you point me towards a function that already does this please?

laanwj commented at 9:57 AM on March 2, 2014:

You will have to implement it yourself. It doesn't exist yet.

sipa commented at 5:26 PM on May 18, 2014:

  std::string IdentityNode(const CNode *pnode) {
    if (fLogIPs)
      return pnode->addr.ToString();
    else
      return strprintf("%d", pnode->GetId());
  }

Use a function like that, please?

laanwj commented at 1:55 PM on June 27, 2014:

@sipa agreed

gmaxwell commented at 8:40 PM on March 1, 2014: contributor

I like the general ID of always logging a node ID with every message we emit which can be resolved against getpeerinfo— enabling after the fact diagnosis of a source while the client is still up. It needs to be a nonce though, not a sequential ID or we can correlate them based on connect time. The the logips logging can just make the connect message emit the nodeids.

There is some privacy leak, that you can see that the same node triggered two messages, but I don't think that one is concerning enough to overcome the loss of useful debugging messages.

rebroad commented at 12:30 AM on March 2, 2014: contributor

@gmaxwell I'm not sure I understand you. How would a nonce be more useful than the node id? How/when/where would you want getpeerinfo to be used?

gmaxwell commented at 1:05 AM on March 2, 2014: contributor

I'm not sure what you don't understand. Right now the connect messages show IPs and so you get some good idea of what order nodes are, so the IDs would be linking. Thats bad. Additionally, esp if we later add crypto, I worry about metadata only attackers who can observe what order people connected in. The linkage between IDs and IPs should not be externally observable.

rebroad commented at 7:31 AM on March 2, 2014: contributor

@gmaxwell the IPs as they are currently shown will no longer be shown by default once this patch is applied, only when the -logips option is given for the very reasons you mention.

gmaxwell commented at 10:17 AM on March 2, 2014: contributor

Can you please reread my answer, including the second half of it?

ghost commented at 6:47 PM on March 2, 2014: none

Wouldn't these changes like this make the current ongoing mtgox investigations harder why is this even being prioritized. On Mar 2, 2014 1:44 PM, "R E Broadley" notifications@github.com wrote:

@gmaxwell https://github.com/gmaxwell I already read your answer 3 times, and on the 4th reading my interpretation of it is no different.

Reply to this email directly or view it on GitHubhttps://github.com/bitcoin/bitcoin/pull/3764#issuecomment-36462239 .

laanwj commented at 6:20 AM on March 3, 2014: member

No priority IMO, this is just @rebroad's pet peeve.

gmaxwell commented at 6:25 AM on March 3, 2014: contributor

@rebroad "Additionally, esp if we later add crypto, I worry about metadata only attackers who can observe what order people connected in. The linkage between IDs and IPs should not be externally observable." someone watching who connects in what order would know the IDs even if they didn't know the content of the messages. getting the logs would tell them the content.

luke-jr commented at 6:37 AM on March 3, 2014: member

Logging IPs is a security regression IMO.

laanwj commented at 6:43 AM on March 3, 2014: member

Thinking about it, we already concluded in #3741:

The point was to not log IPs (esp. those associated with transactions) by default
We are logging almost no IPs by default already: only on connect and some kinds of misbehavior:
- Nearly all of the messages affected in this pull are only logged when specific debug classes (mempool, net) are active

So I'm not entirely sure what is the point. As for the code this introduces a lot of clutter (duplicated debug messages) making it harder to maintain. It also adds yet another option to the long list.

rebroad commented at 6:25 PM on March 3, 2014: contributor

@gmaxwell I'm not understanding what you're not understanding. There would be no linkage as you mention unless this optional command line argument is given. Logging which node is doing what is a useful debugging too, especially when working on code such as parallel block download or anything which needs to correlate what was sent to nodes and what was received from nodes. It needs to be a debug option therefore. The alternative I can suggest is that instead of the "logips" option as I've done here, I change it to "lognodeids" option, and we leave the display of IP addresses in as it currently is, but make logging node ids dependant on this command line option given. Either way, the "problem" as you put it would still exist, which is the correlation between node ids and IP addresses, but since these options would only be enabled by developers, and are necessary for certain development work, I don't really see there being an issue. @laanwj Hopefully my explanation above also answers your query.

sipa commented at 7:08 PM on March 3, 2014: member

@rebroad: @gmaxwell is saying that if someone is able to monitor your network activity (just metadata, nor the actual traffic) + has access to these logs (even without -logips) this would allow reconstructing which nodes were talking to eachother.

I personally think this is a very limited problem, and the solution suggested (using random instead of sequentially assigned identifiers) doesn't help much against it (-logtimestamps is default now, so you would still be able to correlate with the log).

I think this can even improve usefulness of logs (both with and without -logips, though I'd prefer to only log IPs even with -logips at initial connection), as sequentially assigned ids are actually more readable than IPs when looking through logs to diagnose sync issues.

rebroad commented at 11:47 PM on March 3, 2014: contributor

@sipa thanks. So, from the sounds of what you're saying, it doesn't sound like I can make any further improvements to this pull request?

in src/main.cpp:None in 268625ae21 outdated

2384 | @@ -2385,6 +2385,7 @@ void PushGetBlocks(CNode* pnode, CBlockIndex* pindexBegin, uint256 hashEnd)
2385 |      pnode->pindexLastGetBlocksBegin = pindexBegin;
2386 |      pnode->hashLastGetBlocksEnd = hashEnd;
2387 |  
2388 | +    LogPrint("net", "sending getblocks. node=%d\n", pnode->id);

sipa commented at 12:14 AM on March 4, 2014:

Nit: I'd call it "peer" rather than "node". I know the source code is already confused in that regard, but it's not really an identifier of a particular node we're showing - just which # peer it is for us.

in src/net.cpp:None in 268625ae21 outdated

 549 | @@ -550,7 +550,10 @@ void CNode::PushVersion()
 550 |      CAddress addrYou = (addr.IsRoutable() && !IsProxy(addr) ? addr : CAddress(CService("0.0.0.0",0)));
 551 |      CAddress addrMe = GetLocalAddress(&addr);
 552 |      RAND_bytes((unsigned char*)&nLocalHostNonce, sizeof(nLocalHostNonce));
 553 | -    LogPrint("net", "send version message: version %d, blocks=%d, us=%s, them=%s, peer=%s\n", PROTOCOL_VERSION, nBestHeight, addrMe.ToString(), addrYou.ToString(), addr.ToString());
 554 | +    if (fLogIPs)

sipa commented at 12:21 AM on March 4, 2014:

Alternative: only have one place where the actual IP is logged.

My suggestion: in net.h, at the end of the CNode::CNode() constructor:

if (hSocket != INVALID_SOCKET) LogPrint("net", "Added connection to %s peer=%d", addrNameIn, id);

rebroad commented at 12:44 AM on April 2, 2014: contributor

@laanjw - this is primarily motivated by helping to debug block sync. Would it be better if it wasn't displayed in the commandline help, and perhaps only displayed node id for blocks and not for txs?

in src/net.cpp:None in af1f442793 outdated

 551 | @@ -552,7 +552,10 @@ void CNode::PushVersion()
 552 |      CAddress addrYou = (addr.IsRoutable() && !IsProxy(addr) ? addr : CAddress(CService("0.0.0.0",0)));
 553 |      CAddress addrMe = GetLocalAddress(&addr);
 554 |      RAND_bytes((unsigned char*)&nLocalHostNonce, sizeof(nLocalHostNonce));
 555 | -    LogPrint("net", "send version message: version %d, blocks=%d, us=%s, them=%s, peer=%s\n", PROTOCOL_VERSION, nBestHeight, addrMe.ToString(), addrYou.ToString(), addr.ToString());
 556 | +    if (fLogIPs)
 557 | +        LogPrint("net", "send version message: version %d, blocks=%d, us=%s, them=%s, peer=%s\n", PROTOCOL_VERSION, nBestHeight, addrMe.ToString(), addrYou.ToString(), addr.ToString());
 558 | +    else
 559 | +        LogPrint("net", "send version message: version %d, blocks=%d, us=%s\n", PROTOCOL_VERSION, nBestHeight, addrMe.ToString());

sipa commented at 2:19 PM on April 6, 2014:

Can you print the peer id instead of the ip here, in this case?

sipa commented at 2:19 PM on April 6, 2014: member

ACK, apart from minor nit above.

rebroad commented at 6:14 AM on April 30, 2014: contributor

@sipa Hopefully I've addressed your minor nit.

rebroad commented at 1:52 AM on May 9, 2014: contributor

What is holding this up please?

rebroad commented at 2:37 AM on May 16, 2014: contributor

@sipa @laanwj Ok, I've just rebased this again. Is there anything still holding this up?

rebroad commented at 2:39 AM on May 16, 2014: contributor

@luke-jr This pull request makes the logging of IPs optional instead of done by default as it currently is - so hopefully addresses your security concerns.

luke-jr commented at 2:43 AM on May 16, 2014: member

I know.

in src/net.cpp:None in fd546b01fe outdated

 482 | @@ -483,7 +483,7 @@ CNode* ConnectNode(CAddress addrConnect, const char *pszDest)
 483 |      {
 484 |          addrman.Attempt(addrConnect);
 485 |  
 486 | -        LogPrint("net", "connected %s\n", pszDest ? pszDest : addrConnect.ToString());
 487 | +        if (fLogIPs) LogPrint("net", "connected %s\n", pszDest ? pszDest : addrConnect.ToString());

sipa commented at 5:22 PM on May 18, 2014:

List the nodeid here as well? So you can correlate a messages back to what IP sent them if necessary (only with -logips) anyway.

rebroad commented at 11:34 AM on May 21, 2014:

done

in src/net.h:None in fd546b01fe outdated

 314 | @@ -315,7 +315,7 @@ class CNode
 315 |          }
 316 |  
 317 |          // Be shy and don't send version until we hear
 318 | -        if (hSocket != INVALID_SOCKET && !fInbound)
 319 | +        if ((hSocket != INVALID_SOCKET) && (!fInbound))

sipa commented at 5:28 PM on May 18, 2014:

Why?

rebroad commented at 11:34 AM on May 21, 2014:

mistake. undone.

in src/net.h:None in fd546b01fe outdated

 429 | @@ -430,7 +430,7 @@ class CNode
 430 |              nRequestTime = it->second;
 431 |          else
 432 |              nRequestTime = 0;
 433 | -        LogPrint("net", "askfor %s   %d (%s)\n", inv.ToString().c_str(), nRequestTime, DateTimeStrFormat("%H:%M:%S", nRequestTime/1000000).c_str());
 434 | +        LogPrint("net", "askfor %s  %d (%s) peer=%d\n", inv.ToString().c_str(), nRequestTime, DateTimeStrFormat("%H:%M:%S", nRequestTime/1000000).c_str(), id);

sipa commented at 5:28 PM on May 18, 2014:

Optional: the .c_str() could be dropped here.

rebroad commented at 11:38 AM on May 21, 2014:

done

sipa commented at 5:34 PM on May 18, 2014: member

I don't really care about adding an optional and by-default-off method for listing IPs. For debugging it is useful, and nobody will turn it on for any reason but that.

The arguments against even providing the ability to link log messages from the same node are interesting. I'm not sure it's such a big deal. If it is, maybe we can list the peer=id message piece only when -logips is provided as well (and thus even stricten the default behaviour, which currently does list some IPs already)?

rebroad commented at 11:40 AM on May 21, 2014: contributor

@sipa point addressed all except the making the peer logged only if a command line option is specified. I can do this if it's preferred. What is the consensus on this? @gmaxwell @laanwj @luke-jr @gavinandresen @jgarzik ?

rebroad commented at 2:49 AM on June 28, 2014: contributor

rebaed, and nits addressed.

laanwj commented at 8:10 AM on July 1, 2014: member

@rebroad you still need to define the IdentityNode function as suggested by me/@sipa to avoid if (fLogIPs) all over the place.

in src/init.cpp:None in c033cd7386 outdated

 287 | @@ -288,6 +288,7 @@ std::string HelpMessage(HelpMessageMode mode)
 288 |      strUsage += "  -genproclimit=<n>      " + _("Set the processor limit for when generation is on (-1 = unlimited, default: -1)") + "\n";
 289 |      strUsage += "  -help-debug            " + _("Show all debugging options (usage: --help -help-debug)") + "\n";
 290 |      strUsage += "  -logtimestamps         " + _("Prepend debug output with timestamp (default: 1)") + "\n";
 291 | +    strUsage += "  -logips                " + _("Include IP addresses in debug output (default: 0)") + "\n";

Diapolo commented at 11:34 AM on July 1, 2014:

Nit²: This needs to be placed above -logtimestamps.

rebroad commented at 3:40 AM on July 2, 2014: contributor

@laanwj There is now only one place where there is an if (fLogIPs), so the function is not needed since this would add more code than it removes.

Show nodeid instead of addresses (for anonymity) unless otherwise requested. 2e36866fec

BitcoinPullTester commented at 4:47 AM on July 4, 2014: none

Automatic sanity-testing: PASSED, see http://jenkins.bluematt.me/pull-tester/p3764_2e36866fecb7420cd73047a7aa762a6e5e225695/ for binaries and test log. This test script verifies pulls every time they are updated. It, however, dies sometimes and fails to test properly. If you are waiting on a test, please check timestamps to verify that the test.log is moving at http://jenkins.bluematt.me/pull-tester/current/ Contact BlueMatt on freenode if something looks broken.

laanwj merged this on Jul 4, 2014

laanwj closed this on Jul 4, 2014

laanwj referenced this in commit 00fa78c35b on Jul 4, 2014

rebroad deleted the branch on Jul 5, 2014

gmaxwell commented at 1:54 PM on July 15, 2014: contributor

Was was this merged without adding the peer index to getpeerinfo? The peer numbers given in the logs are not useful.

gmaxwell cross-referenced this on Jul 15, 2014 from issue Add peerid to getpeerinfo to allow correlation with the logs. by gmaxwell

dagurval cross-referenced this on Apr 25, 2017 from issue Remove option -logips. Re-add logging of IPs. by dagurval

Sjors cross-referenced this on Oct 25, 2019 from issue net: SocketHandler logs peer id for close and disconnect by Sjors

bitcoin locked this on Sep 8, 2021