Fixes Issue #1234 - re-issues getblocks to the next suitable peer when the previously selected one disappears.
edit@laanwj: clarified title, was "issue1234"
19 | @@ -20,3 +20,5 @@ qrc_*.cpp 20 | *.pro.user 21 | #mac specific 22 | .DS_Store 23 | +/build/ 24 | +/.git.freshlycloned/
That doesn't seem very necessary.
it's not.. I've removed this and made a new fixup commit now. Just need to squash(?) the commits together now.. @sipa, can you remind me the git command please?
(the /build/ line is needed though, right? without it, git status reports on the contents of the build directory...)
git fetch upstream; git rebase -i upstream/master; (in the opened editor, move the line of the fixup commit up to below the gitognore commit, and change the 'pick' to 'fixup', then save, and force-push the resulting branch)
with force-push i mean: git push -f origin <branchname>
Thread safety of 'nAskedForBlocks' ? Accessed in both ProcessMessage() and CNode::CloseSocketDisconnect()
fAskedForBlocks should be set to false, if found to be true in CNode::CloseSocketDisconnect()
@jgarzik 1) accessed in both, yes. thread safety - not needed from what I can tell, but please feel free to explain why you think it is. 2) No. Once asked for blocks, it's true. It never becomes false, since the past cannot be changed. Any new CNode, it's set to false on creation.
Of course, this only helps if the peer is actually disconnected. If he just becomes slow/unresponsive, or he simply doesn't have the chain past a certain point ... we'll still be waiting. It's a small enough change that I guess it makes sense for now, though at some point I think we need to move to something like this: https://en.bitcoin.it/wiki/User:Gmaxwell/Reverse_header-fetching_sync
@gmaxwell there's certainly room for improvement, but this is a small (intentionally, to increase the change of it bring pulled) step towards making it get the blocks more quickly. This particular change has been tested in my fork for over a month, but I've also got other code that checks for stuck downloads (which then timeout's the askfor and asks elsewhere). Currently it's not ideal in that it often causes the same blocks to be downloaded from several nodes (as sometimes they do wake up again), so isn't as bandwidth efficient as I'd like it to be, and it also has various timeouts hardcoded, which is based on my internet connection. My eventual plan is to a patch that will determine the speed of the network and peers over time, and factor that knowledge into the block download process. @sipa, I see what you're saying. it does get modified from those two places. It seems to work though. Are you saying it could end up being two different values within the two different threads? I've never noticed this happen during over a month of testing so far. I could move the nAskedForBlocks++ code from ProcessMessages() into the socket handler thread. It probably belongs there more than ProcessMessages() now anyway, since it's not part of the strCommend == "version" code any longer. I was going to do this as a later pull, since as it is it works, and is less of a change to keep it in ProcessMessages(), and move it elsewhere later.
19 | @@ -20,3 +20,4 @@ qrc_*.cpp 20 | *.pro.user 21 | #mac specific 22 | .DS_Store 23 | +/build/
out of interest, how come I needed to add this line, but no one else seems to need it?
My directory is so cluttered that I use "git status -uno" ;)
2403 | + printf("initial getblocks to %s\n", pfrom->addr.ToString().c_str()); 2404 | + pfrom->PushGetBlocks(pindexBest, uint256(0)); 2405 | + } 2406 | + 2407 | + 2408 | + if (strCommand == "version") ;
Why was this moved to outside of all the if statements, and why the addition of the if statement here? Rebase error?
Not a rebase error. It's a way for the if statement's elses so continue working. i.e. without the re-if, the unknown command code would become true. Strictly speaking this code, now that it's no longer strictly part of ProcessMessages() could be moved now to just after the line that calls that. I guess I wanted to keep the change minimal, by keeping the lines as close to their original location as possible.
It would be simpler to move this block above the if "version" block, that way you wouldnt have to add another if statement. Also, correct me if I'm wrong, but the way its written here, instead of waiting for a new block to come in after being disconnected from the first node, you are now just waiting for any message. Though tx inv messages come in quite often, it may make more sense to move this block to the disconnect handler, where it would be called much quicker after a node disconnects (though I havent looked at locking implications of that). Wherever it ends up going, you need to make sure you have already exchanged version messages with the target node.
@TheBlueMatt it was originally above the if version block, but then it would miss an opportunity to fire off right after that version message, which would be an ideal time. It can't go into the disconnect handler, as then for which node would it be sending the getblocks?
Firing off right after the version message is ideal when you want to fire off right after a new connection, but if you just dropped a connection and just want to fire off on any other random node right away, it shouldnt matter (because its very likely that you already have one or more connections that have already exchanged version messages). If you move it to the disconnect handler, you can just chose a random node from vNodes (given that the selected node has already passed the version message exchange stuff).
@TheBlueMatt In my experience, a significant number of the connected nodes can be stale, and non-responsive, therefore it makes far more sense to fire off the getblocks to a node that's still communicating. Picking a random node wouldn't achieve this, but picking a node that's sending messages would.
Fair enough, though that probably also means you need to decrease your timeouts (also, maybe we should put those pong messages to good use...). In any case, I would say throw that block at the end of ProcessMessage so that a. it cleans up the diff (no new if statements, and the if/elses become one block again and b. as you point out, let the if "version" block complete before running this block. Finally, make sure you have exchanged version messages with the target node before asking for blocks, and then Id say this looks good to me.
525 | @@ -526,7 +526,14 @@ void CNode::CloseSocketDisconnect() 526 | fDisconnect = true; 527 | if (hSocket != INVALID_SOCKET) 528 | { 529 | - printf("disconnecting node %s\n", addrName.c_str()); 530 | + if (fDebug) 531 | + printf("%s ", DateTimeStrFormat("%x %H:%M:%S", GetTime()).c_str());
why print the time?
I have no idea at all! I don't know how this got in here! I'll remove it.
Note that though I think this pull is good and should be added to specifically fix #1234, it appears that the motivation for this patch is to fix an issue where some ISPs (specifically @rebroad's) are closing connections without RSTing them after a certain amount of time and I would kinda like to see a specific fix for that to fix the underlying issue here instead of just working around it.
For some reason really old pulls don't show up in github's API (the total pull count returned seems to be limited, but its not mentioned in the API Docs...), so pull tester won't test this or any old pulls (maybe it will if the total pull count decreases?). If anyone needs this, or any older pull explicitly tested, I can run it manually.
Automatic sanity-testing: PASSED, see http://jenkins.bluematt.me/pull-tester/9543ca35cab6dfc53de84cb59dc4aedcc9253c09 for binaries and test log.
Automatic sanity-testing: FAILED BUILD/TEST, see http://jenkins.bluematt.me/pull-tester/8ce572c8ad0960ff08a577f1ed2bf49ed0108ab0 for binaries and test log.
This could happen for one of several reasons:
This needs to be rebased.
Did you keep the branch name the same and push it to github? Looks like maybe you pushed it as "master" instead of "issue1234"
I did:
git fetch upstream master git checkout issue1234 git rebase -i upstream/master git push --force origin issue 1234
The rebase didn't require any manual intervention, which I was surprised by, so I'm wondering if I did something wrong before the push...
Any errors?
none
Is your remote actually named "upstream"? The default is "origin".
$ git remote -v origin http://rebroad@github.com/rebroad/bitcoin.git (fetch) origin http://rebroad@github.com/rebroad/bitcoin.git (push) upstream http://github.com/bitcoin/bitcoin.git (fetch) upstream http://github.com/bitcoin/bitcoin.git (push)
rebased as requested. tested too... this patch is still helping with stale connections. reducing the time block download is delayed by 7 minutes on average (based on a wait of 10 minutes for the next block, compared with a wait of 3 minutes for a stale node to timeout).
@TheBlueMatt just to clarify. This patch has nothing to do with ISPs that RST connections. It's needed for all ISPs for where any connection goes stale and eventually (after about 3 minutes in my last tests) times out.
Conflicts:
src/net.cpp
Conflicts:
.gitignore
Closing this, mostly replaced by sipa's "make sure you always have a peer to sync to" patch.