Remove bitnodes.io from dnsseeds. #5545

gmaxwell commented at 9:40 AM on December 27, 2014: contributor

I'm not comfortable with retaining this entry.

Remove bitnodes.io from dnsseeds.

I'm not comfortable with retaining this entry.

a094b3d973

gmaxwell added the label P2P on Dec 27, 2014

gmaxwell added this to the milestone 0.10.0 on Dec 27, 2014

ayeowch commented at 9:46 AM on December 27, 2014: contributor

Hi Greg, care to elaborate more? I notice it could be the surge of few getaddr.bitnodes.io crawlers from AWS but they are not mine though: 54.194.231.211 54.183.64.54 54.174.10.182 54.66.214.167 54.94.200.247 54.67.33.14 54.77.251.214 54.66.220.137 54.173.72.127 54.94.195.96

The crawler for the Bitnodes project should only come from 148.251.238.178.

gmaxwell commented at 9:52 AM on December 27, 2014: contributor

Please explain to me why 148.251.238.178 is still staying pinned up to every node in the network? It was my understanding that after the last round of concerns you said you would discontinue that behaviour.

Are you saying you have no other hosts connecting outbound at large scale to the network than 148.251.238.178?

ayeowch commented at 10:00 AM on December 27, 2014: contributor

I only operate Bitnodes crawler from 148.251.238.178. I am not sure about the AWS ones; anyone could run the crawler using the same user agent: https://github.com/ayeowch/bitnodes/blob/master/protocol.py#L103

At the moment, my crawler does keep at most 1 active connection with each reachable node to aggregate the metrics for https://getaddr.bitnodes.io/dashboard/. It keeps the bandwidth usage to minimal without sending further data after a handshake except for periodical ping/pong messages. You will see an inbound from 148.251.238.178: { "addr" : "148.251.238.178:27426", "addrlocal" : "<REMOVED>", "services" : "00000001", "lastsend" : 1419674287, "lastrecv" : 1419674209, "bytessent" : 69195478, "bytesrecv" : 155751, "conntime" : 1418186362, "pingtime" : 0.00000000, "version" : 70001, "subver" : "/getaddr.bitnodes.io:0.1/", "inbound" : true, "startingheight" : 333662, "banscore" : 0, "syncnode" : true }

luke-jr commented at 10:05 AM on December 27, 2014: member

Um, if bitnodes is maintaining a connection to every peer they can, then ACK ACK this PR...

ayeowch commented at 10:10 AM on December 27, 2014: contributor

@luke-jr Most seeders, e.g. /bitcoinseeder:0.01/, enumerate all reachable nodes periodically anyway to get a snapshot of the network. Bitnodes simply use this data to come up with the network metrics to allow us to know the state/size of the network at any one time. I will ACK this PR anyway since it has come up again and again. It's quite silly.

luke-jr commented at 10:12 AM on December 27, 2014: member

There's a big difference from crawling the network periodically and maintaining a constant connection. You'll note my DNS seed provides the same network metrics (longer than you have, actually) without keeping any persistent connections...

ayeowch commented at 10:16 AM on December 27, 2014: contributor

@luke-jr My background is in networking so I may be looking at different perspective as I imagine it is inefficient to disconnect from nodes that you are already connected to if you are going to connect to them again after hearing about them from addr response from a peer in the next crawl.

laanwj commented at 2:56 PM on December 27, 2014: member

@ayeowch The problem of staying connected is that you're holding up a connection slot permanently, that otherwise could be used by a node or SPV wallet. It is better-behaved to poll once in a reasonable while, request e.g. the version and immediately disconnect.

sipa commented at 3:42 PM on December 27, 2014: member

@ayeowch You're right that there is a certain overhead involved in connecting and disconnecting from a peer, but that's almost nothing compared to the resources you're requiring from them for staying connected (a few MB of RAM per connection, plus all CPU involved in sending and tracking advertized blocks and transactions). In addition, as @laanwj already mentions: connection slots are limited resource that is valuable for the network. Please don't waste it.

jgarzik commented at 7:55 PM on December 27, 2014: contributor

ACK, based on criteria provided.

ayeowch commented at 9:25 PM on December 27, 2014: contributor

@laanwj @sipa Thanks for the valuable input. I had in the past configured the crawler to connect and disconnect after a full handshake. This results in the crawler having to reconnect after a full snapshot of the network is taken. This makes the crawler appear abusive in nature to some users. @sipa I have monitored the transferred bytes closely between the crawler and each node but didn't realize the undesirable RAM/CPU usage coming from it just to maintain the connection.

I am going to revisit my approach again in the next couple of days and at the very least revert the crawler back to its previous behaviour, i.e. connect and disconnect from each node to take snapshot of the network.

petertodd commented at 10:09 AM on December 28, 2014: contributor

ACK for now, but NACK if problems are fixed in the next few days.

sipa commented at 10:16 AM on December 28, 2014: member

@andyeow Thanks for considering to change your approach.

Let's wait a few days before merging.

gmaxwell commented at 10:36 AM on December 28, 2014: contributor

I will continue to be uncomfortable if the behaviour is changed.

I really don't want to get into a debate here but we've had several instances where the behaviour of the bitnodes crawlers was bad in a way which was very obvious all (or nearly all) of the regular contributors to the project. In each case there was some discussion and at least I left with the impression that the behaviour would be fixed. Then later we find it doing something substantially similar, not changed, or even worse.

Clearly there are some consistent problems with communication and/or understanding.

In the case of the dnsseed itself, much of its behaviour is even more difficult to monitor or audit than the crawler. Plus, we have no great and urgent need for more dnsseeds, in fact our reliance on them is lower than it was in the past. With that in mind, I'd rather not waste my cycles trying to figure out ways to determine if this seed is behaving correctly, and I suspect that Ayeowch would probably rather not spend time defending the seed service which is nothing but costs to him, especially with us being demanding and seemingly unappreciative.

I'm currently working on tools that will help protect the users/network from abusive or intrusive behaviour like we've seen from this crawler software, regardless of well things are or aren't communicated or what people's intentions are, so I do not consider the behaviour of those particular hosts gating here (though it should also be improved), but a rather just as a sign that there is some kind of persistent serious misunderstanding and miscommunication and that I'd rather not have a DNS seed operating under that.

Cheers.

ayeowch commented at 10:58 AM on December 28, 2014: contributor

@gmaxwell Well said Greg :) I am glad that you already have work underway to better establish what is considered abusive/non-abusive towards the network.

As much as I wanted to defend myself and certainly want to help with the network by offering the DNS seeder, I agree much of the problem with the crawler was due to miscommunication. English is not my first language so that could be another reason? I'm still working on the crawler right now to avoid keeping connection. The only immediate gating issue for me is on the node alert feature on the Bitnodes site. The feature currently sends an alert to subscribed node owner as soon as it is disconnected from the crawler, so I will have to think of another way to do this. If you have thoughts about this or any issues regarding the crawler, please feel free to let me know (ayeowch@gmail.com).

At any rate, I am now less than comfortable having my DNS seeder in the reference implementation so let's proceed to ACK the removal for now.

luke-jr commented at 6:57 PM on December 28, 2014: member

I think it's probably fair to take up a connection slot on a node that someone has manually requested monitoring of.

zander commented at 9:26 AM on December 29, 2014: none

Using the new peers view I see 10 incoming connections with the useragent bitnodes.io, that app needs fixing!

Next to that, it makes no sense to have a crawler in the seed. They should find nodes, not the other way around.

ACK.

ayeowch commented at 9:32 AM on December 29, 2014: contributor

@zander I notice the 10 crawlers coming from 54.x.x.x too; they are not mine and does not contribute to stats for the project website at getaddr.bitnodes.io. I mentioned this in my second comment in this issue. I have reverted my crawler earlier today to maintain only 1 instance to perform a full handshake with a reachable node and disconnect immediately. It will however reconnect again after a full snapshot of the network is taken, i.e. approx. 5 to 8 minutes later. You should be able to see it appearing in your getpeerinfo as an inbound from 148.251.238.178.

laanwj merged this on Dec 31, 2014

laanwj closed this on Dec 31, 2014

laanwj referenced this in commit 639c79a41f on Dec 31, 2014

gmaxwell referenced this in commit 18021d08f7 on Dec 31, 2014

mikehearn commented at 4:19 PM on December 31, 2014: contributor

I think I will be keeping Addy's seed in both bitcoinj and Bitcoin XT for a couple of reasons:

I don't see how removing a seed helps Bitcoin users. Ignoring for a moment that several popular services hold open lots of connections to the p2p network, the fix for this is discussion, dialog and new features, not reducing startup redundancy in Bitcoin Core.
There's a written DNS seed policy, which currently says nothing about the behaviour of the associated crawlers. Rather than just targeting Addy's service specifically if this is really the highest priority it'd be better to come up with a specific policy and provide notice to seed operators, time for transition etc.
Even if we accept the rationale that removing seeds as 'punishment' is a good way to effect change, Addy has now modified his crawler but the commit was merged anyway. Why?

Addy, for what it's worth your English is excellent. I don't think the issue here is language related.

ayeowch cross-referenced this on Feb 1, 2015 from issue /getaddr.bitnodes.io:0.1/ peers by ghost

bitcoin locked this on Sep 8, 2021