Make util phexdigit array reusable #3076

lano1106 commented at 4:45 PM on October 10, 2013: contributor

class template base_uint had its own private lookup table. This is saving 256 bytes per instantiation.

The result is not spectacular as bitcoin-qt has only shrinked of about 1Kb but it is still valid improvement.

Also, I have replaced a for loop with a memset() call.

A unit_test has been added to check the change.

Signed-off-by: Olivier Langlois olivier@olivierlanglois.net

in src/uint256.h:None in 919f1aa042 outdated

 334 | -            *p1 = phexdigit[(unsigned char)*psz--];
 335 | +            *p1 = ::HexDigit((unsigned char)*psz--);
 336 |              if (psz >= pbegin)
 337 |              {
 338 | -                *p1 |= (phexdigit[(unsigned char)*psz--] << 4);
 339 | +                *p1 |= (::HexDigit((unsigned char)*psz--) << 4);

lano1106 commented at 7:22 PM on October 10, 2013:

I have a small concern about the return type (signed char) that is right shifted by 4 bits in relation to the sign bit. Unit test show that it is ok but I am not sure that this is fully portable.

Unless language does an implicit integer promotion, maybe we could help by changing return type to int16 or just int.

laanwj commented at 8:53 PM on October 10, 2013:

Yes, he is returning a signed char from HexDigit, whereas unsigned chars are used in the rest of the logic. I think it's a typo?

lano1106 commented at 4:50 AM on October 11, 2013:

No this is intentional. I have kept the original type from the util.cpp array type. It is signed and -1 as a special value to indicate not an hex char.

Let me rephrase the concern. As soon as the sign bit of an integer is on after doing bit manips, you have a negative number that will turn on more bits to 1 if you store the value into a bigger int type. think about -1.

Can't give a specific example of problem from that but I have the vague impression of having been bitten by that one in the past.

By having a signed char array, this let you use -1 as a reserved value (0 is valid) and minimize its size. By having the return type bigger (int16 or int) this protect you against having surprises from the sign bit when manipulating the result.

right now it is all theorical as the code is tested and works as expected. just highlighting the risk and propose a simple way to mitigate it.

laanwj commented at 7:03 AM on October 11, 2013:

No, this is far from theoretical. Any invalid hex character will return -1 and thus cause things to go haywire. You need to either check for -1 here or make sure that HexDigit returns 0 in case of an invalid hexchar. See my comment below.

laanwj commented at 5:00 PM on October 11, 2013:

Agreed. So any parse errors will have been caught on first pass. I hadn't noticed that.

In that case adding an explicit cast of the result to (unsigned char) should be enough to make sure the shift is correct.

laanwj commented at 8:56 PM on October 10, 2013: member

phexdigit is used in bignum.h too, this can share the same definition now

in src/uint256.h:None in 919f1aa042 outdated

  13 | @@ -14,7 +14,12 @@
  14 |  
  15 |  typedef long long  int64;
  16 |  typedef unsigned long long  uint64;
  17 | +extern const signed char p_util_hexdigit[256]; // defined in util.cpp
  18 |  
  19 | +inline signed char HexDigit(unsigned char c)
  20 | +{
  21 | +    return p_util_hexdigit[c];

laanwj commented at 8:58 PM on October 10, 2013:

This must return 0 when p_util_hexdigit[c] < 0, otherwise the functionality is changed. (the hexdigit in util.cpp has -1 for non-hex characters whereas hexdigit in uint256.h and bignum.h has zeros)

lano1106 commented at 4:55 AM on October 11, 2013:

I prefer how it was done in util.cpp as it allows you to discriminate between the char '0' and an invalid hex char. Note that I have modified the uint256.h code in consequence when it search for the end of the number.

laanwj commented at 7:01 AM on October 11, 2013:

Sure, I agree that is generally more useful, but if you intend on keeping it this way, you need to check for the -1 value at the caller site .

If you return a special error token can't just bitshift and or it into the target value without checking. If you return 0 you can because it's a no-op/ignore. And as SetHex has no way to return parsing errors the only way is to ignore parsing errors.

lano1106 commented at 3:08 PM on October 11, 2013:

laanwj,

I absolutle agree with you and it is done exactly as you say it should.

in SetHex():

while loop to find the end of the hex str by checking for first char that returns -1.
Then when the string is processed backward to convert BE -> LE, there is no need to check again for -1 as it is impossible that a char that previously did not return -1 does otherwise in subsequent calls.

Agree?

lano1106 commented at 4:56 AM on October 11, 2013: contributor

if you like the idea of this pull request, I'll create a v2 that makes bignum use the same array.

in src/uint256.h:None in 919f1aa042 outdated

 309 | @@ -305,8 +310,7 @@ class base_uint
 310 |  
 311 |      void SetHex(const char* psz)
 312 |      {
 313 | -        for (int i = 0; i < WIDTH; i++)
 314 | -            pn[i] = 0;
 315 | +        memset(pn,0,sizeof(pn));

Diapolo commented at 10:24 AM on October 11, 2013:

An optimizing compiler could just put this away, right? But I guess the same is true for the loop...

laanwj commented at 11:03 AM on October 11, 2013:

An optimizing compiler could remove the zeroing but only if it can prove that the rest of the function still does the same without it. It's not an issue here.

lano1106 commented at 3:20 PM on October 11, 2013:

Diapolo, you are right. with -O2 the for loop or the memset generates identical code.

It is then only a matter of taste. The memset has the merit of being more compact.

Tell me which way you want the code to be in order to accept the request. I'll do the change

laanwj commented at 12:31 PM on October 20, 2013: member

Some further comments then IMO this can be merged

uint256.h:

inline signed char HexDigit(unsigned char c)
extern const signed char p_util_hexdigit[256];

this function and data structure declaration should be moved to util.h, because util.c is where the implementation is. Any reason you did this differently?
the argument can be a normal char instead of signed char, so the cast is done inside the function instead of on all call sites. Return type should remain signed char as you said.

lano1106 commented at 3:34 PM on October 23, 2013: contributor

Wladimir,

I agree with changing input param type.

However, the function cannot be moved in util.h Why? Because this would create a circular dependency.

uint256.h would need to include util.h and util.h already include uint256.h. This is arguable that in that case, the array definition could then be moved into a new cpp file uint256.cpp which do not exists right now. I have decided to not go there but feel free to ask if you feel that this would be important.

Another option is to create 2 new files just for that:

hexdigit.h/cpp

laanwj commented at 7:32 AM on October 25, 2013: member

I understand... bitcoin has always been rife with circular dependencies and I wouldn't want to reintroduce one. Creating two new files is overkill, most of the core devs don't like to have many small header files (though it would be more modular, it would also be harder to remember where things are, and increase the file-to-file switching needed during development).

ACK after squashing into one commit

Make util phexdigit array reusable

class template base_uint had its own private lookup table.
This is saving 256 bytes per instantiation.

The result is not spectacular as bitcoin-qt has only shrinked of
about 1Kb but it is still valid improvement.

Also, I have replaced a for loop with a memset() call.

Made CBigNum::SetHex() use the new HexDigit() function.

Signed-off-by: Olivier Langlois <olivier@olivierlanglois.net>

f171ec0c7d

lano1106 commented at 3:12 AM on October 28, 2013: contributor

Wladimir,

thanks for contributing to my git education. It has been my first commit squashing ever :-)

BitcoinPullTester commented at 3:35 AM on October 28, 2013: none

Automatic sanity-testing: PASSED, see http://jenkins.bluematt.me/pull-tester/f171ec0c7d084b6bb163d1466edd814cf4dcbc93 for binaries and test log. This test script verifies pulls every time they are updated. It, however, dies sometimes and fails to test properly. If you are waiting on a test, please check timestamps to verify that the test.log is moving at http://jenkins.bluematt.me/pull-tester/current/ Contact BlueMatt on freenode if something looks broken.

laanwj referenced this in commit bac72640ee on Nov 4, 2013

laanwj merged this on Nov 4, 2013

laanwj closed this on Nov 4, 2013

Bushstar referenced this in commit c22169d579 on Apr 8, 2020

bitcoin locked this on Sep 8, 2021