Re-issue getblocks to the next suitable peer when the previously selected one disappears #1271

rebroad commented at 2:55 PM on May 12, 2012: contributor

Fixes Issue #1234 - re-issues getblocks to the next suitable peer when the previously selected one disappears.

edit@laanwj: clarified title, was "issue1234"

in .gitignore:None in 8e2904dd40 outdated

  19 | @@ -20,3 +20,5 @@ qrc_*.cpp
  20 |  *.pro.user
  21 |  #mac specific
  22 |  .DS_Store
  23 | +/build/
  24 | +/.git.freshlycloned/

sipa commented at 3:00 PM on May 12, 2012:

That doesn't seem very necessary.

rebroad commented at 3:02 PM on May 12, 2012:

it's not.. I've removed this and made a new fixup commit now. Just need to squash(?) the commits together now.. @sipa, can you remind me the git command please?

(the /build/ line is needed though, right? without it, git status reports on the contents of the build directory...)

sipa commented at 3:06 PM on May 12, 2012: member

git fetch upstream; git rebase -i upstream/master; (in the opened editor, move the line of the fixup commit up to below the gitognore commit, and change the 'pick' to 'fixup', then save, and force-push the resulting branch)

sipa commented at 3:22 PM on May 12, 2012: member

with force-push i mean: git push -f origin <branchname>

rebroad commented at 3:29 PM on May 12, 2012: contributor

@sipa thankyou.. now 1 commit.

jgarzik commented at 4:01 AM on May 13, 2012: contributor

Thread safety of 'nAskedForBlocks' ? Accessed in both ProcessMessage() and CNode::CloseSocketDisconnect()
fAskedForBlocks should be set to false, if found to be true in CNode::CloseSocketDisconnect()

rebroad commented at 9:16 PM on May 13, 2012: contributor

@jgarzik 1) accessed in both, yes. thread safety - not needed from what I can tell, but please feel free to explain why you think it is. 2) No. Once asked for blocks, it's true. It never becomes false, since the past cannot be changed. Any new CNode, it's set to false on creation.

sipa commented at 9:27 PM on May 13, 2012: member

you modify the value from two threads (message handler thread and socket handler thread)
Changing it to false when decrementing nAskedForBlocks would make it obvious that it cannot be decremented twice (even though that probably is already the case)

gmaxwell commented at 1:49 AM on May 14, 2012: contributor

Of course, this only helps if the peer is actually disconnected. If he just becomes slow/unresponsive, or he simply doesn't have the chain past a certain point ... we'll still be waiting. It's a small enough change that I guess it makes sense for now, though at some point I think we need to move to something like this: https://en.bitcoin.it/wiki/User:Gmaxwell/Reverse_header-fetching_sync

rebroad commented at 5:59 PM on May 14, 2012: contributor

@gmaxwell there's certainly room for improvement, but this is a small (intentionally, to increase the change of it bring pulled) step towards making it get the blocks more quickly. This particular change has been tested in my fork for over a month, but I've also got other code that checks for stuck downloads (which then timeout's the askfor and asks elsewhere). Currently it's not ideal in that it often causes the same blocks to be downloaded from several nodes (as sometimes they do wake up again), so isn't as bandwidth efficient as I'd like it to be, and it also has various timeouts hardcoded, which is based on my internet connection. My eventual plan is to a patch that will determine the speed of the network and peers over time, and factor that knowledge into the block download process. @sipa, I see what you're saying. it does get modified from those two places. It seems to work though. Are you saying it could end up being two different values within the two different threads? I've never noticed this happen during over a month of testing so far. I could move the nAskedForBlocks++ code from ProcessMessages() into the socket handler thread. It probably belongs there more than ProcessMessages() now anyway, since it's not part of the strCommend == "version" code any longer. I was going to do this as a later pull, since as it is it works, and is less of a change to keep it in ProcessMessages(), and move it elsewhere later.

in .gitignore:None in db4eb7fe5b outdated

  19 | @@ -20,3 +20,4 @@ qrc_*.cpp
  20 |  *.pro.user
  21 |  #mac specific
  22 |  .DS_Store
  23 | +/build/

rebroad commented at 6:17 PM on May 14, 2012:

out of interest, how come I needed to add this line, but no one else seems to need it?

luke-jr commented at 6:28 PM on May 18, 2012:

My directory is so cluttered that I use "git status -uno" ;)

in src/main.cpp:None in c36298da9c outdated

2403 | +        printf("initial getblocks to %s\n", pfrom->addr.ToString().c_str());
2404 | +        pfrom->PushGetBlocks(pindexBest, uint256(0));
2405 | +    }
2406 | +
2407 | +
2408 | +    if (strCommand == "version") ;

TheBlueMatt commented at 2:13 PM on May 20, 2012:

Why was this moved to outside of all the if statements, and why the addition of the if statement here? Rebase error?

rebroad commented at 10:22 AM on May 21, 2012:

Not a rebase error. It's a way for the if statement's elses so continue working. i.e. without the re-if, the unknown command code would become true. Strictly speaking this code, now that it's no longer strictly part of ProcessMessages() could be moved now to just after the line that calls that. I guess I wanted to keep the change minimal, by keeping the lines as close to their original location as possible.

TheBlueMatt commented at 1:40 PM on May 21, 2012:

It would be simpler to move this block above the if "version" block, that way you wouldnt have to add another if statement. Also, correct me if I'm wrong, but the way its written here, instead of waiting for a new block to come in after being disconnected from the first node, you are now just waiting for any message. Though tx inv messages come in quite often, it may make more sense to move this block to the disconnect handler, where it would be called much quicker after a node disconnects (though I havent looked at locking implications of that). Wherever it ends up going, you need to make sure you have already exchanged version messages with the target node.

rebroad commented at 2:22 PM on May 21, 2012:

@TheBlueMatt it was originally above the if version block, but then it would miss an opportunity to fire off right after that version message, which would be an ideal time. It can't go into the disconnect handler, as then for which node would it be sending the getblocks?

TheBlueMatt commented at 2:27 PM on May 21, 2012:

Firing off right after the version message is ideal when you want to fire off right after a new connection, but if you just dropped a connection and just want to fire off on any other random node right away, it shouldnt matter (because its very likely that you already have one or more connections that have already exchanged version messages). If you move it to the disconnect handler, you can just chose a random node from vNodes (given that the selected node has already passed the version message exchange stuff).

rebroad commented at 5:03 PM on May 21, 2012:

@TheBlueMatt In my experience, a significant number of the connected nodes can be stale, and non-responsive, therefore it makes far more sense to fire off the getblocks to a node that's still communicating. Picking a random node wouldn't achieve this, but picking a node that's sending messages would.

TheBlueMatt commented at 5:16 PM on May 21, 2012:

Fair enough, though that probably also means you need to decrease your timeouts (also, maybe we should put those pong messages to good use...). In any case, I would say throw that block at the end of ProcessMessage so that a. it cleans up the diff (no new if statements, and the if/elses become one block again and b. as you point out, let the if "version" block complete before running this block. Finally, make sure you have exchanged version messages with the target node before asking for blocks, and then Id say this looks good to me.

TheBlueMatt cross-referenced this on May 20, 2012 from issue Add -proxytoo option, which allows proxy use non-exclusively, unlike the -proxy option. by rebroad

in src/net.cpp:None in f71afd2002 outdated

 525 | @@ -526,7 +526,14 @@ void CNode::CloseSocketDisconnect()
 526 |      fDisconnect = true;
 527 |      if (hSocket != INVALID_SOCKET)
 528 |      {
 529 | -        printf("disconnecting node %s\n", addrName.c_str());
 530 | +        if (fDebug)
 531 | +            printf("%s ", DateTimeStrFormat("%x %H:%M:%S", GetTime()).c_str());

sipa commented at 7:45 PM on June 12, 2012:

why print the time?

rebroad commented at 11:10 PM on June 14, 2012:

I have no idea at all! I don't know how this got in here! I'll remove it.

TheBlueMatt commented at 3:08 PM on June 13, 2012: contributor

Note that though I think this pull is good and should be added to specifically fix #1234, it appears that the motivation for this patch is to fix an issue where some ISPs (specifically @rebroad's) are closing connections without RSTing them after a certain amount of time and I would kinda like to see a specific fix for that to fix the underlying issue here instead of just working around it.

TheBlueMatt commented at 1:53 PM on August 13, 2012: contributor

For some reason really old pulls don't show up in github's API (the total pull count returned seems to be limited, but its not mentioned in the API Docs...), so pull tester won't test this or any old pulls (maybe it will if the total pull count decreases?). If anyone needs this, or any older pull explicitly tested, I can run it manually.

BitcoinPullTester commented at 8:05 AM on August 14, 2012: none

Automatic sanity-testing: PASSED, see http://jenkins.bluematt.me/pull-tester/9543ca35cab6dfc53de84cb59dc4aedcc9253c09 for binaries and test log.

BitcoinPullTester commented at 9:02 PM on September 7, 2012: none

Automatic sanity-testing: FAILED BUILD/TEST, see http://jenkins.bluematt.me/pull-tester/8ce572c8ad0960ff08a577f1ed2bf49ed0108ab0 for binaries and test log.

This could happen for one of several reasons:

It chanages paths in makefile.linux-mingw or otherwise changes build scripts in a way that made them incompatible with the automated testing scripts
It does not build on either Linux i386 or Win32 (via MinGW cross compile)
The test suite fails on either Linux i386 or Win32
The block test-cases failed (lookup the first bNN identifier which failed in https://github.com/TheBlueMatt/test-scripts/blob/master/FullBlockTestGenerator.java)

luke-jr commented at 5:56 AM on November 13, 2012: member

This needs to be rebased.

luke-jr commented at 11:54 AM on November 25, 2012: member

Did you keep the branch name the same and push it to github? Looks like maybe you pushed it as "master" instead of "issue1234"

rebroad commented at 11:58 AM on November 25, 2012: contributor

I did:

git fetch upstream master git checkout issue1234 git rebase -i upstream/master git push --force origin issue 1234

The rebase didn't require any manual intervention, which I was surprised by, so I'm wondering if I did something wrong before the push...

luke-jr commented at 1:21 PM on November 25, 2012: member

Any errors?

rebroad commented at 1:29 PM on November 25, 2012: contributor

none

luke-jr commented at 1:36 PM on November 25, 2012: member

Is your remote actually named "upstream"? The default is "origin".

rebroad commented at 1:54 PM on November 25, 2012: contributor

$ git remote -v origin http://rebroad@github.com/rebroad/bitcoin.git (fetch) origin http://rebroad@github.com/rebroad/bitcoin.git (push) upstream http://github.com/bitcoin/bitcoin.git (fetch) upstream http://github.com/bitcoin/bitcoin.git (push)

sipa cross-referenced this on Nov 25, 2012 from issue Block download management by sipa

rebroad commented at 5:25 PM on December 20, 2012: contributor

rebased as requested. tested too... this patch is still helping with stale connections. reducing the time block download is delayed by 7 minutes on average (based on a wait of 10 minutes for the next block, compared with a wait of 3 minutes for a stale node to timeout).

rebroad commented at 5:27 PM on December 20, 2012: contributor

@TheBlueMatt just to clarify. This patch has nothing to do with ISPs that RST connections. It's needed for all ISPs for where any connection goes stale and eventually (after about 3 minutes in my last tests) times out.

rebroad cross-referenced this on Dec 20, 2012 from issue Parallel block download by rebroad

Fixes Issue #1234 - re-issues getblocks when current node providing them disappears.

Conflicts:

	src/net.cpp

Conflicts:

	.gitignore

9fc4c03556

gavinandresen commented at 6:39 PM on April 9, 2013: contributor

Closing this, mostly replaced by sipa's "make sure you always have a peer to sync to" patch.

gavinandresen closed this on Apr 9, 2013

rebroad deleted the branch on Sep 17, 2014

suprnurd referenced this in commit 2a43d23f95 on Dec 5, 2017

bitcoin locked this on Sep 8, 2021