Optimize CScript.FindAndDelete #7884

pull gavinandresen wants to merge 2 commits into bitcoin:master from gavinandresen:optimize_FindAndDelete changing 2 files +133 −1

gavinandresen commented at 3:50 PM on April 15, 2016: contributor

The FindAndDelete function could move the same bytes multiple times if the signature being deleted appeared in the script multiple times.

These two commits add a unit test for FindAndDelete and optimize it so that bytes are moved at most once.
laanwj added the label Validation on Apr 15, 2016
laanwj added the label Resource usage on Apr 15, 2016

in src/script/script.h:None in ad1fa51e8d outdated

 578 |          {
 579 |              while (end() - pc >= (long)b.size() && memcmp(&pc[0], &b[0], b.size()) == 0)
 580 |              {
 581 | -                pc = erase(pc, pc + b.size());
 582 | +                // Lazy copy-in-place if there is a match:
 583 | +                while (copied != pc-nShift) {

kazcw commented at 10:42 PM on April 15, 2016:

A test for !nShift here would prevent copying everything upto the first match to itself

in src/script/script.h:None in ad1fa51e8d outdated
```
 591 |          }
 592 |          while (GetOp(pc, opcode));
 593 | +
 594 | +        if (nShift > 0)
 595 | +        {
 596 | +            while (copied != pc-nShift) {
```
kazcw commented at 10:44 PM on April 15, 2016:

I think this changes the result of FindAndDelete for invalid CScripts that end in a pushdata without enough data. Replacing "pc" with "end()" would keep the original behavior in this case.
sipa commented at 9:30 AM on April 16, 2016: member
Concept ACK on the improvement, but there are maybe a few more edge cases to test:
- Removing a byte sequence that spans multiple opcodes
- Trying to remove a sequence that only occurs in the middle of opcodes (which doesn't work)
- Trying to remove "OP_0 OP_1" from "OP_0 OP_0 OP_1 OP_1" leaves "OP_0 OP_1" (because we don't go backward after a replacement)
gavinandresen force-pushed on Apr 18, 2016
gavinandresen force-pushed on Apr 18, 2016
Unit test for CScript::FindAndDelete 9d213d2b39

Optimize CScript::FindAndDelete

CScript::FindAndDelete, called by the CHECKSIG Script opcodes, could
end up moving the bytes of the Script multiple times if there were
multiple matches for the Find.

This commit tweaks the algorithm so it never moves bytes in the Script
more than once per call to FindAndDelete.

aae53f20dc

gavinandresen force-pushed on Apr 18, 2016
gavinandresen commented at 7:42 PM on April 18, 2016: contributor

Added unit tests as suggested by @sipa. Fixed to be compatible with invalid CScripts that end with partial PUSHDATA's (and added unit tests) as suggested by @kazcw

And saved a couple of lines of code (and probably made faster, but copying is never done in normal operation with reasonable transactions) by switching to using std::copy instead of an explicit loop.

Thanks for the code reviews!
in src/test/script_tests.cpp:None in aae53f20dc
```
1109 | +    BOOST_CHECK(s == expect);
1110 | +
1111 | +    // This is an odd edge case: strip of the push-three-bytes
1112 | +    // prefix, leaving 02ff03 which is push-two-bytes:
1113 | +    s = ScriptFromHex("0302ff030302ff03");
1114 | +    d = ScriptFromHex("03");
```
TheBlueMatt commented at 12:58 AM on April 19, 2016:

I'd prefer to not have tests that check that FindAndDelete do exactly what we currently do when called against invalid scripts when we dont require that in any callers.

gavinandresen commented at 3:03 PM on April 25, 2016:

RE: behavior with invalid Scripts: I don't care one way or the other; seems safer to behave exactly the same, since this is consensus-critical code.
pstratem cross-referenced this on Apr 19, 2016 from issue Optimize and Cleanup CScript::FindAndDelete by pstratem
in src/script/script.h:None in aae53f20dc
```
 579 |          {
 580 |              while (end() - pc >= (long)b.size() && memcmp(&pc[0], &b[0], b.size()) == 0)
 581 |              {
 582 | -                pc = erase(pc, pc + b.size());
 583 | +                // Lazy copy-in-place if there is a match:
 584 | +                if (nShift == 0) copied = pc;
```
dcousens commented at 2:27 AM on April 21, 2016:

Isn't nShift always == 0 at the beginning? Why bother initializing copied above?

gavinandresen commented at 11:14 AM on April 21, 2016:

Why initialize copied instead of just eliminating the if/else and always copying? Because std::copy behavior is undefined if the first and last arguments point to the same element.

kazcw commented at 4:03 PM on April 21, 2016:

Why would that be? The Standard's definition of std::copy doesn't seem to say anything that would prohibit an empty range.

gavinandresen commented at 4:30 PM on April 21, 2016:

The standard says:

"result shall not be in the range [first,last)."

kazcw commented at 5:21 PM on April 21, 2016:

last is not in the range [first,last)

gavinandresen commented at 3:14 PM on April 25, 2016:

@kazcw : The first time a match is found, copied will be begin(). nShift will be zero, and pc will be pointing into the array somewhere.

If we removed copied = pc and just did: copied = std::copy(copied+nShift, pc, copied);

... then first is copied, last is pc, and result is copied. copied IS in the range [copied, pc)
sipa commented at 8:01 AM on May 17, 2016: member

Superseded by #7907.
sipa closed this on May 17, 2016
bitcoin locked this on Sep 8, 2021

Contributors

Labels

Validation Resource usage

Linked (view graph)

#7907 Optimize and Cleanup CScript::FindAndDelete