This PR is a follow-up to #30436.
Only changes test-code and modifies/adds comments.
Byte order of hex string representation was wrongfully documented as little-endian, but are in fact closer to “big-endian” (endianness is a memory-order concept rather than a numeric concept). [arith_]uint256
both store their data in arrays with little-endian byte order (arith_uint256
has host byte order within each uint32_t
element).
uint256_tests.cpp - Avoid using variable from the left side of the condition in the right side. Credits to @maflcko: #30436 (review)
setup_common.cpp - Skip needless ArithToUint256-conversion. Credits to @stickies-v: #30436 (review)
Logical reasoning for endianness
- Comparing an
arith_uint256
(base_uint<256>
) to auint64_t
compares the beginning of the array, and verifies the remaining elements are zero.
0template <unsigned int BITS>
1bool base_uint<BITS>::EqualTo(uint64_t b) const
2{
3 for (int i = WIDTH - 1; i >= 2; i--) {
4 if (pn[i])
5 return false;
6 }
7 if (pn[1] != (b >> 32))
8 return false;
9 if (pn[0] != (b & 0xfffffffful))
10 return false;
11 return true;
12}
…that is consistent with little endian ordering of the array.
- They have the same endianness (but
arith_*
has host-ordering of eachuint32_t
element):
0arith_uint256 UintToArith256(const uint256 &a)
1{
2 arith_uint256 b;
3 for(int x=0; x<b.WIDTH; ++x)
4 b.pn[x] = ReadLE32(a.begin() + x*4);
5 return b;
6}
String conversions
The reversal of order which happens when converting hex-strings <=> uint256 means strings are actually closer to big-endian, see the end of base_blob<BITS>::SetHexDeprecated
:
0 unsigned char* p1 = m_data.data();
1 unsigned char* pend = p1 + WIDTH;
2 while (digits > 0 && p1 < pend) {
3 *p1 = ::HexDigit(trimmed[--digits]);
4 if (digits > 0) {
5 *p1 |= ((unsigned char)::HexDigit(trimmed[--digits]) << 4);
6 p1++;
7 }
8 }
Same reversal here:
0template <unsigned int BITS>
1std::string base_blob<BITS>::GetHex() const
2{
3 uint8_t m_data_rev[WIDTH];
4 for (int i = 0; i < WIDTH; ++i) {
5 m_data_rev[i] = m_data[WIDTH - 1 - i];
6 }
7 return HexStr(m_data_rev);
8}
It now makes sense to me that SetHexDeprecated
, upon receiving a shorter hex string that requires zero-padding, would pad as if the missing hex chars where towards the end of the little-endian byte array, as they are the most significant bytes. “Big-endian” string representation is also consistent with the case where SetHexDeprecated
receives too many hex digits and discards the leftmost ones, as a form of integer narrowing takes place.
How I got it wrong in #30436
Previously I used the less than (<
) comparison to prove endianness, but for uint256
it uses memcmp
and thereby gives priority to the lower bytes at the beginning of the array.
0 constexpr int Compare(const base_blob& other) const { return std::memcmp(m_data.data(), other.m_data.data(), WIDTH); }
arith_uint256
is different in that it begins by comparing the bytes from the end, as it is using little endian representation, where the bytes toward the end are more significant.
0template <unsigned int BITS>
1int base_uint<BITS>::CompareTo(const base_uint<BITS>& b) const
2{
3 for (int i = WIDTH - 1; i >= 0; i--) {
4 if (pn[i] < b.pn[i])
5 return -1;
6 if (pn[i] > b.pn[i])
7 return 1;
8 }
9 return 0;
10}
(The commit documents that base_blob::Compare()
is doing lexicographic ordering unlike the arith_*
-variant which is doing numeric ordering).