This PR is a follow-up to #30436.
Only changes test-code and modifies/adds comments.
Byte order of hex string representation was wrongfully documented as little-endian, but are in fact closer to “big-endian” (endianness is a memory-order concept rather than a numeric concept). [arith_]uint256 both store their data in arrays with little-endian byte order (arith_uint256 has host byte order within each uint32_t element).
uint256_tests.cpp - Avoid using variable from the left side of the condition in the right side. Credits to @maflcko: #30436 (review)
setup_common.cpp - Skip needless ArithToUint256-conversion. Credits to @stickies-v: #30436 (review)
Logical reasoning for endianness
- Comparing an
arith_uint256(base_uint<256>) to auint64_tcompares the beginning of the array, and verifies the remaining elements are zero.
0template <unsigned int BITS>
1bool base_uint<BITS>::EqualTo(uint64_t b) const
2{
3 for (int i = WIDTH - 1; i >= 2; i--) {
4 if (pn[i])
5 return false;
6 }
7 if (pn[1] != (b >> 32))
8 return false;
9 if (pn[0] != (b & 0xfffffffful))
10 return false;
11 return true;
12}
…that is consistent with little endian ordering of the array.
- They have the same endianness (but
arith_*has host-ordering of eachuint32_telement):
0arith_uint256 UintToArith256(const uint256 &a)
1{
2 arith_uint256 b;
3 for(int x=0; x<b.WIDTH; ++x)
4 b.pn[x] = ReadLE32(a.begin() + x*4);
5 return b;
6}
String conversions
The reversal of order which happens when converting hex-strings <=> uint256 means strings are actually closer to big-endian, see the end of base_blob<BITS>::SetHexDeprecated:
0 unsigned char* p1 = m_data.data();
1 unsigned char* pend = p1 + WIDTH;
2 while (digits > 0 && p1 < pend) {
3 *p1 = ::HexDigit(trimmed[--digits]);
4 if (digits > 0) {
5 *p1 |= ((unsigned char)::HexDigit(trimmed[--digits]) << 4);
6 p1++;
7 }
8 }
Same reversal here:
0template <unsigned int BITS>
1std::string base_blob<BITS>::GetHex() const
2{
3 uint8_t m_data_rev[WIDTH];
4 for (int i = 0; i < WIDTH; ++i) {
5 m_data_rev[i] = m_data[WIDTH - 1 - i];
6 }
7 return HexStr(m_data_rev);
8}
It now makes sense to me that SetHexDeprecated, upon receiving a shorter hex string that requires zero-padding, would pad as if the missing hex chars where towards the end of the little-endian byte array, as they are the most significant bytes. “Big-endian” string representation is also consistent with the case where SetHexDeprecated receives too many hex digits and discards the leftmost ones, as a form of integer narrowing takes place.
How I got it wrong in #30436
Previously I used the less than (<) comparison to prove endianness, but for uint256 it uses memcmp and thereby gives priority to the lower bytes at the beginning of the array.
0 constexpr int Compare(const base_blob& other) const { return std::memcmp(m_data.data(), other.m_data.data(), WIDTH); }
arith_uint256 is different in that it begins by comparing the bytes from the end, as it is using little endian representation, where the bytes toward the end are more significant.
0template <unsigned int BITS>
1int base_uint<BITS>::CompareTo(const base_uint<BITS>& b) const
2{
3 for (int i = WIDTH - 1; i >= 0; i--) {
4 if (pn[i] < b.pn[i])
5 return -1;
6 if (pn[i] > b.pn[i])
7 return 1;
8 }
9 return 0;
10}
(The commit documents that base_blob::Compare() is doing lexicographic ordering unlike the arith_*-variant which is doing numeric ordering).