Server can't always recover from a change in external IP address #10262

unsystemizer commented at 9:20 AM on April 23, 2017: contributor

Describe the issue

If External IP changes, server sometimes loses incoming connections until daemon is restarted.

Can you reliably reproduce the issue?

Yes, although it can take several attempts.

If so, please list the steps to reproduce below:

On the router configure port forwarding to 8333 (TCP & UDP)
On the server, run bitcoind -listen=1 -server=1 -upnp=1 -debug=net -onlynet=ipv4 -blocksonly = 1 and let it establish incoming connections.
Then disable PPPoE, wait 10 mins, enable it again (to obtain a new IP address) and wait for it to again receive incoming connections.

Sometimes I need to try 2-3 times and wait 10-15 minutes until I re-enable pppoe. I'm not sure if it's simply a matter of trying several times or the length of disconnection has to be over X minutes. And in my case I always get a new IP address, so I'm not sure if that is what messes it up or simply the fact that it went offline for a while.

Expected vs. Actual behaviour

Expected: router continues forwarding incoming 8333 to $lan-ip:8333 and bitcoind server continues accepting incoming connections.

Actual: AdvertiseLocal changes to the new IP and ProcessMessages also shows the new external IP address, but incoming connections to bitcoind never recover. Only outgoing connections to the network recover.

What version of bitcoin-core are you using?

"/Satoshi:0.14.1/UASF-Segwit:0.3(BIP148)/" ( v0.14.1.0-g1809845)

Any extra information that might be useful in the debugging process.

With Port Forwarding set up, uPnP shouldn't be required. I first tried Port Forwarding alone and noticed no difference. I first built using "default" (from build-unix.md) instructions and then one more time with libupnp explicitly enabled in --configure and bitcoin.conf and was able to reproduce this independently of uPnP. External port-check to $router-ip:8333 confirms the port is open after router (pppoe) restart.

Errors I get after external connections start failing:

2017-04-23 08:29:21 sending headers (82 bytes) peer=22
2017-04-23 08:29:21 received: inv (217 bytes) peer=23
2017-04-23 08:29:21 got inv: tx 8c7eb2218f2440aa729c43ab7ec01a0b4a21967b5cdf6c024989fb2cb73128f3  new peer=23
2017-04-23 08:29:21 transaction (8c7eb2218f2440aa729c43ab7ec01a0b4a21967b5cdf6c024989fb2cb73128f3) inv sent in violation of protocol peer=23
2017-04-23 08:29:21 got inv: tx c45cd995af1eb3f3e93827d50ce20b497f097ceb11c17786cd279012ada6099b  new peer=23
2017-04-23 08:29:21 transaction (c45cd995af1eb3f3e93827d50ce20b497f097ceb11c17786cd279012ada6099b) inv sent in violation of protocol peer=23
2017-04-23 08:29:21 got inv: tx 55d0122ba1e9ff679ea0faccb03dee635bab4b82f0de10dc642f23ab42b31757  new peer=23
2017-04-23 08:29:21 transaction (55d0122ba1e9ff679ea0faccb03dee635bab4b82f0de10dc642f23ab42b31757) inv sent in violation of protocol peer=23
2017-04-23 08:29:21 got inv: tx 5cc62dd856948bbb208fbf621d2a40b0c7cb290a47589947d7f1a378adac08f3  new peer=23
2017-04-23 08:29:21 transaction (5cc62dd856948bbb208fbf621d2a40b0c7cb290a47589947d7f1a378adac08f3) inv sent in violation of protocol peer=23
2017-04-23 08:29:21 got inv: tx a0b8c68dfdc0f46be908da01a130c6b7c5fc558eaabf8345c395091e1954ed0c  new peer=23
2017-04-23 08:29:21 transaction (a0b8c68dfdc0f46be908da01a130c6b7c5fc558eaabf8345c395091e1954ed0c) inv sent in violation of protocol peer=23
2017-04-23 08:29:21 got inv: tx f350743826f242945bd471f584238d95b9b702bb5c2d51e3a152b5e64da92144  new peer=23
2017-04-23 08:29:21 transaction (f350743826f242945bd471f584238d95b9b702bb5c2d51e3a152b5e64da92144) inv sent in violation of protocol peer=23

A more detailed log which shows two pppoe restarts (the second attempt was successful) can be found here (free pastebin, so it's going to remain for 30 days): https://pastebin.com/TiYpEnJe

jonasschnelli added the label P2P on Apr 23, 2017

Real-Duke commented at 7:19 AM on April 25, 2017: none

Same behavior here with builds >12.1 Have a look at #9056 It was closed because of having 8 correct outgoing connections but the real problem was the recover of the dropped incoming connection after the external IP change

unsystemizer commented at 10:58 AM on April 25, 2017: contributor

Glad to hear this was observed before. For the record, I left my node struggling to establish inbound connections to 8333 for 6 more hours but it never could (it was logging inv sent in violation of protocol all the time) so I stopped it.

TheBlueMatt commented at 12:25 PM on April 25, 2017: contributor

Wait, now I'm confused, incoming or outgoing connections? If your IP changes it is expected that you may not have any incoming connections for a while - no one knows where to connect for some time while your IP propagates around the network. Outgoing is a different story, however.

On April 23, 2017 5:20:53 AM EDT, unsystemizer notifications@github.com wrote:

Describe the issue

If External IP changes, server sometimes loses incoming connections until daemon is restarted.

Can you reliably reproduce the issue?

Yes, although it can take several attempts.

If so, please list the steps to reproduce below:

On the router configure port forwarding to 8333 (TCP & UDP)

On the server, run bitcoind -listen=1 -server=1 -upnp=1 -debug=net -onlynet=ipv4 -blocksonly = 1, then disable pppoe, wait 10 mins, enable it again and wait for it to again receive incoming connections. Sometimes I need to try 2-3 times and wait 10-15 minutes until I re-enable pppoe. I'm not sure if it's simply a matter of trying several times or the length of disconnection.

Expected vs. Actual behaviour

Expected: router continues forwarding incoming 8333 to $lan-ip:8333 and bitcoind server continues accepting incoming connections.

Actual: AdvertiseLocal changes to the new IP and ProcessMessages also shows the new external IP address, but incoming connections to bitcoind never recover. Only outgoing connections to the network recover.

What version of bitcoin-core are you using?

"/Satoshi:0.14.1/UASF-Segwit:0.3(BIP148)/" ( v0.14.1.0-g1809845)

Any extra information that might be useful in the debugging

process.

With Port Forwarding set up, uPnP shouldn't be required. But I tried Port Forwarding alone, same issue. I first built using "default" (from build-unix.md) instructions, and then one more time with libupnp explicitly enabled. External port-check to $router-ip:8333 confirms the port is open after router (pppoe) restart.

Errors I get after external connections start failing:
2017-04-23 08:29:21 sending headers (82 bytes) peer=22
2017-04-23 08:29:21 received: inv (217 bytes) peer=23
2017-04-23 08:29:21 got inv: tx
8c7eb2218f2440aa729c43ab7ec01a0b4a21967b5cdf6c024989fb2cb73128f3  new
peer=23
2017-04-23 08:29:21 transaction
(8c7eb2218f2440aa729c43ab7ec01a0b4a21967b5cdf6c024989fb2cb73128f3) inv
sent in violation of protocol peer=23
2017-04-23 08:29:21 got inv: tx
c45cd995af1eb3f3e93827d50ce20b497f097ceb11c17786cd279012ada6099b  new
peer=23
2017-04-23 08:29:21 transaction
(c45cd995af1eb3f3e93827d50ce20b497f097ceb11c17786cd279012ada6099b) inv
sent in violation of protocol peer=23
2017-04-23 08:29:21 got inv: tx
55d0122ba1e9ff679ea0faccb03dee635bab4b82f0de10dc642f23ab42b31757  new
peer=23
2017-04-23 08:29:21 transaction
(55d0122ba1e9ff679ea0faccb03dee635bab4b82f0de10dc642f23ab42b31757) inv
sent in violation of protocol peer=23
2017-04-23 08:29:21 got inv: tx
5cc62dd856948bbb208fbf621d2a40b0c7cb290a47589947d7f1a378adac08f3  new
peer=23
2017-04-23 08:29:21 transaction
(5cc62dd856948bbb208fbf621d2a40b0c7cb290a47589947d7f1a378adac08f3) inv
sent in violation of protocol peer=23
2017-04-23 08:29:21 got inv: tx
a0b8c68dfdc0f46be908da01a130c6b7c5fc558eaabf8345c395091e1954ed0c  new
peer=23
2017-04-23 08:29:21 transaction
(a0b8c68dfdc0f46be908da01a130c6b7c5fc558eaabf8345c395091e1954ed0c) inv
sent in violation of protocol peer=23
2017-04-23 08:29:21 got inv: tx
f350743826f242945bd471f584238d95b9b702bb5c2d51e3a152b5e64da92144  new
peer=23
2017-04-23 08:29:21 transaction
(f350743826f242945bd471f584238d95b9b702bb5c2d51e3a152b5e64da92144) inv
sent in violation of protocol peer=23
A more detailed log which shows two pppoe restarts (the second attempt was successful) can be found here (free pastebin, so it's going to remain for 30 days): https://pastebin.com/TiYpEnJe

Real-Duke commented at 7:38 PM on April 25, 2017: none

Incoming connections are dropped and won't be recovered. Outgoing is fine because you stay with the 8 outgoing connections after an IP change. But normaly within minutes you have also ~6 to 9 incoming connnections again! The longer the node runs it becomes more and more. I have discovered this on my side with all builds after 0.12.1 Maybe thats the reason why there are still so many 0.12.1 Nodes listened on Bitnodes? https://bitnodes.21.co/nodes/

unsystemizer commented at 4:03 AM on April 26, 2017: contributor

@TheBlueMatt - "Not have any incoming connections for a while": I left my node online for 7-8 hours after the IP change and it still didn't recover. That seems like "a while", but if this isn't considered enough, then it should probably be documented what recovery time is considered reasonable. Although I don't think with 0.14.1 (the one I have now) they ever recover.

I just looked again:

a) cat debug.log | grep AdvertiseLocal - last external IP change (after I submitted this bug report) was almost 3 days ago.

$ cat debug.log | grep AdvertiseLocal
2017-04-23 18:06:58 AdvertiseLocal: advertising address A.B.C.D:8333
$ date
Wed Apr 26 03:46:52 UCT 2017

b) Check incoming connections to port 8333 - there are none. Outgoing: 8, all OK. c) Use bitnodes.21.co to check my node's status using the current External IP - status is green.

From this it would appear that even 2 1/2 days after the change, bitcoind server doesn't recover.

If there's a better way to debug this, please let me know.

Real-Duke commented at 8:38 AM on April 26, 2017: none

"c) Use bitnodes.21.co to check my node's status using the current External IP - status is green."

100% same here! But if I click on the next page "Activate Node" I see him with status "Pending" and 5 minutes later as "Down" (Check on external IP still -green- !) But now it becomes courious: ~15 minutes after activating the node on bitnodes my node starts to recover the connections...but nothing happens until I activate it on this page! You can see it on my statpage: -edit: deleted- I hit the activate button at 7.20 CEST this morning, new Internet connection was established at 5.00 CEST

laanwj commented at 8:45 AM on April 26, 2017: member

$ cat debug.log | grep AdvertiseLocal
2017-04-23 18:06:58 AdvertiseLocal: advertising address A.B.C.D:8333

Is this A.B.C.D here your old or new address?

unsystemizer commented at 11:05 AM on April 26, 2017: contributor

It's the new external address. "Discovery" of new external IP has worked well.

By the way, to @Real-Duke's point - I just looked and noticed the same (edit/clarification: that doing a check on Bitnodes helps, not that I have to activate anything). As you can see in my comment (https://github.com/bitcoin/bitcoin/issues/10262#issuecomment-297231844), I used Bitnodes to check my status after 03:46:52 UCT 2017. Now I see that only minutes after that there were connections to my server - as if Bitnodes propagated my server to the network within 10 mins while my own server on its own couldn't do that for close to 3 days:

03:47:58 - when I used bitnodes.21.co to check my server status
03:57:28 - first client connected to the new external IP (very bottom of the log file below)

2017-04-26 03:29:51' progress=0.999999 cache=13.1MiB(13381tx) warning='1 of last 100 blocks have unexpected version'
2017-04-26 03:47:58 receive version message: /bitnodes.21.co:0.1/: version 70015, blocks=0, us=A.B.C.D:8333, peer=182
2017-04-26 03:51:44 receive version message: /bitnodes.21.co:0.1/: version 70015, blocks=463535, us=A.B.C.D:8333, peer=183
2017-04-26 03:51:51 receive version message: /bitnodes.21.co:0.1/: version 70015, blocks=463535, us=A.B.C.D:8333, peer=184
2017-04-26 03:52:02 connect() to 104.224.12.82:8333 failed after select(): Connection refused (111)
2017-04-26 03:52:28 connect() to 87.184.248.72:8333 failed after select(): No route to host (113)
2017-04-26 03:53:27 receive version message: /Satoshi:0.14.0/: version 70015, blocks=463535, us=A.B.C.D:41354, peer=185
2017-04-26 03:57:02 connect() to 87.118.115.176:8333 failed after select(): Connection refused (111)
2017-04-26 03:57:03 UpdateTip: new best=0000000000000000016ef2c56927088da2795d14d3568a21ce782b847a04ab2e height=463536 version=0x20000000 log2_work=86.337308 tx=216395175 date='2017-04-26 03:56:08' progress=0.999999 cache=15.8MiB(19204tx) warning='1 of last 100 blocks have unexpected version'
2017-04-26 03:57:28 receive version message: /Satoshi:0.14.0/: version 70015, blocks=463535, us=A.B.C.D:8333, peer=186

Real-Duke commented at 7:01 AM on April 27, 2017: none

IP change at 5.00 CEST and haven't check on bitnodes until now. Node stays since 4 hours with 9(?) connections

Edit 14.00 CEST Decided to check with Bitnodes 9 hours later. Tadaa within 10 minutes my connections start growing. Hope this helps the devs a little.

unsystemizer commented at 10:33 AM on April 27, 2017: contributor

I had another change as well and incoming connections haven't recovered. I haven't done any service status checking or registration with Bitnodes since then. I decided to run a check again:

[1] 10:13:59 - first Bitnodes "check client" (0 blocks) connects, followed by more "full node" (?) clients
[2] 10:24:01 - 3rd party clients begin to connect to my server's port 8333

Log (debug logging disabled this time):

[1] 2017-04-27 10:13:59 receive version message: /bitnodes.21.co:0.1/: version 70015, blocks=0, us=E.F.G.H:8333, peer=460
2017-04-27 10:17:20 UpdateTip: new best=000000000000000000770c94f22235c42fff041e7e60b67295ea32ae836cdf99 height=463723 version=0x20000000 log2_work=86.343472 tx=216786295 date='2017-04-27 10:16:12' progress=0.999999 cache=10.6MiB(5609tx)
2017-04-27 10:17:20 receive version message: /Satoshi:0.14.0/: version 70015, blocks=463723, us=E.F.G.H:37009, peer=461
2017-04-27 10:18:49 connect() to 178.239.50.27:8333 failed after select(): Connection refused (111)
2017-04-27 10:20:03 receive version message: /bitnodes.21.co:0.1/: version 70015, blocks=463722, us=E.F.G.H:8333, peer=462
2017-04-27 10:21:05 UpdateTip: new best=0000000000000000015e22a0988fcc1416debc85901e6f22efd9adfe6c15795d height=463724 version=0x20000002 log2_work=86.343505 tx=216787937 date='2017-04-27 10:19:36' progress=0.999999 cache=21.8MiB(10559tx)
2017-04-27 10:22:07 receive version message: /bitnodes.21.co:0.1/: version 70015, blocks=463722, us=E.F.G.H:8333, peer=463
[2] 2017-04-27 10:24:01 receive version message: /Satoshi:0.14.0/: version 70015, blocks=463724, us=E.F.G.H:8333, peer=464
2017-04-27 10:24:11 receive version message: /bitnodes.21.co:0.1/: version 70015, blocks=463724, us=E.F.G.H:8333, peer=465
2017-04-27 10:24:42 receive version message: /TestClient.0.0.1/: version 70002, blocks=435862, us=E.F.G.H:8333, peer=466

This is consistent with behavior observed yesterday. At this point I think this is sufficient info to reproduce this problem so I'll stop posting new logs unless asked.

TheBlueMatt commented at 1:47 AM on April 28, 2017: contributor

"Decided to check with Bitnodes 9 hours later. Tadaa within 10 minutes my connections start growing."

That sounds like a smoking gun to me. Generally it is not expected that you will receive inbound connections very fast because address rumoring is deliberately slow through the network. Your comment seems to indicate that bitnodes is agressively announcing your address for you in their network crawling, which will result in more nodes knowing about your new address and, probabilistically, more nodes connecting to you (especially spy nodes and random nodes set to use bitnodes as a seed address database). It may be that address rumoring could be made a bit faster, but nodes that just came online aren't so useful anyway as they aren't as likely to remain online as some others.

unsystemizer commented at 3:28 AM on April 28, 2017: contributor

I agree that they aren't as likely to remain online, but it may be a good idea to have a formula that doesn't penalize up-to-date nodes. In general I see why address rumoring should be non-instant, but my node is up-to-date block height-wise so the network would benefit more if it was used earlier rather than later compared to nodes that are 50,000 blocks behind.

I'm not sure how long it takes to recover on average (as I've never observed it for more than 2 1/2 days). If we have 500 server nodes that change IP on a daily basis and take 2 day to recover, about 1,000 servers could be constantly MIA. My server sometimes gets a new IP every day, so without Bitnodes I could be running it 24x7x365 and never receive any incoming connections. That doesn't seem right. A wait of 1-2 hours without incoming connections would seem reasonable to me but 12+ hours or even several days means that any server that gets a new IP daily is mostly unused.

Should I test how long it takes to recover without the help of Bitnodes? Or close this issue?

Real-Duke commented at 7:59 AM on April 28, 2017: none

I can't understand the considerations because my Node is 24/7 up and 100% sync. With releasse of v0.13.0 something with the peer handling was changed wich caused this problem. While running my node with (for me) latest "stable" 0.12.1 I never observed something like these happen after my daily IP change, even not after months of continous uptime.

unsystemizer commented at 11:55 AM on April 28, 2017: contributor

It doesn't appear the protocol differentiates between "synced up" and "syncing" nodes and uptime isn't (can't be) considered either - it seems it's simply about addresses & connections. One consideration is that someone could bring up and down hundreds or thousands of fresh nodes and cause malicious churn on the network. But by over-guarding against that we also eliminate "good" nodes that behave well but change their public IP. Because Bitnodes propagate new node info anyway, I wonder if preventive measures built into Core client serve good purpose.

TheBlueMatt commented at 1:13 PM on April 28, 2017: contributor

I agree that they aren't as likely to remain online, but it may be a good idea to have a formula that doesn't penalize up-to-date nodes

It's not about nodes being deliberately penalized, but simply a question of how long it takes for other nodes on the network to hear about your new address. Sadly there are other considerations beyond "how quick can we make rumoring". In the last few releases some bugs were fixed which made address relay gameable (ie allowing nodes to be more aggressive with broadcasting their own address to get other nodes to connect to them more), which is likely a big part of what you're seeing here.

When you upgraded to 0.13.1/0.14.X, the preferential peering kicked in and more of your peers are other 0.13.1/0.14 nodes, which may rumor a bit slower. It's possible there is a bug here and someone should go do another more careful reading of addrman, but it's also pretty explainable just from reasonable code changes.

On April 27, 2017 11:28:53 PM EDT, unsystemizer notifications@github.com wrote:

I agree that they aren't as likely to remain online, but it may be a good idea to have a formula that doesn't penalize up-to-date nodes. In general I see why address rumoring should be non-instant, but my node has been online forever and it's up-to-date block height-wise so the network would perhaps benefit more if it was used earlier rather than later.

Last but not least I'm not sure how long it takes to recover (as I never observed for more than 2 1/2 days). If we have 500 server nodes that change IP on a daily basis and take 2 day to recover, about 1,000 servers could be constantly MIA. My server sometimes gets a new IP every day, so without Bitnodes I could be running it 24x7x365 and never receive any incoming connections.

Should I test how long it takes to recover without the help of Bitnodes? Or close this issue?

Real-Duke commented at 12:30 PM on May 1, 2017: none

Even if you have to make a restart of the node it doesent get incomming connections when you have forgotten to delete the huge file "peers.dat" (It growths up to ~3MB in 24h) My workaround for now: Cronjob for 5 am (after my IP-change) is shuting down core - delete the file "peers.dat" - restarting core - works for now but is no solution

TheBlueMatt commented at 4:16 PM on May 1, 2017: contributor

@Real-Duke I believe you're treating a small symptom, not the actual issue. By restarting your bitcoind, you now advertise your new address to your new peers. Deleting the peers.dat shouldn't have any (significant effect). Alternatively, you could go through and disconnect all of your existing peers without restarting (or just wait a bit longer for it to re-advertise its new IP). Ultimately, you're just always caught in a race to have your new IP be known around the network to legitimate peers before your IP changes again (I used to host shit on a residential DTAG connection, I know the pain...), which I actually believe you likely do not get even by restarting - many of the "fast to connect" peers are likely to be spy nodes that aren't useful anyway.

Real-Duke commented at 5:31 PM on May 1, 2017: none

Sorry but if I follow your thoughts it might be the best way to shutdown my node and use the hardware and bandwith for something else. A full node without incoming connections is usefull for nobody and not worth to keep it on.

sipa commented at 5:33 PM on May 1, 2017: member

@Real-Duke A full node that is not reachable by others is very valuable, if you use it to process transactions.

Real-Duke commented at 5:39 PM on May 1, 2017: none

Hi sipa, I can easy solve my problem and downgrade to 0.12.1 again but I won't do so. My Fullnode is running on his own on separate hardware only to contribute to the network. For my transactions I use the SPV Wallet electrum

sipa commented at 5:42 PM on May 1, 2017: member

@Real-Duke There is likely not much your hardware can provide that is needed by the network. If it did, you'd have connections. Maybe there will be a shortage of connectable full nodes in the network again at some point, but there has not been any in years. You should run a full node because it offers you value... for example running your own electrum server?

MarcoFalke commented at 7:55 PM on May 8, 2020: member

Is this still an issue with a recent version of Bitcoin Core? If yes, what are the steps to reproduce?

MarcoFalke closed this on May 8, 2020

bitcoin locked this on Feb 15, 2022