changed field types in some structures to equivalent unambiguous types #4180

kdomanski commented at 10:11 PM on May 12, 2014: contributor

int -> int32_t unsigned int -> uint32_t unsigned chat -> uint8_t

Thus the implementation becomes not only unambiguous, but also more compliant with the protocol specification.

kdomanski commented at 10:47 PM on May 12, 2014: contributor

There's actually a significantly greater number of such ambiguities in classes within main.h which are not explicitly described by protocol specs. Apart from the ambiguity they often cause unnecessary casts. I'd like to convert them en masse into (u)int*_t and exterminate the casts but I need an opinion whether or not it's worth doing. I remember some scepticism about this from #bitcoin-dev.

brandondahler commented at 4:14 AM on May 30, 2014: contributor

I would be weary of just find-replacing all int's to int32_t's, etc.

In my opinion they should only be converted to using {,u}int{8,16,32,64}_t if the data structure is rigidly defined to be that size. Otherwise int should be considered the default type for numerical values and long used if it would make sense and be convenient to use a 64-bit value if available.

in src/main.h:None in 9d958f3518 outdated

 612 | @@ -613,11 +613,11 @@ bool AcceptBlockHeader(CBlockHeader& block, CValidationState& state, CBlockIndex
 613 |  class CBlockFileInfo
 614 |  {
 615 |  public:
 616 | -    unsigned int nBlocks;      // number of blocks stored in file
 617 | -    unsigned int nSize;        // number of used bytes of block file
 618 | -    unsigned int nUndoSize;    // number of used bytes in the undo file
 619 | -    unsigned int nHeightFirst; // lowest height of block in file
 620 | -    unsigned int nHeightLast;  // highest height of block in file
 621 | +    uint32_t nBlocks;      // number of blocks stored in file

sipa commented at 9:52 AM on May 31, 2014:

Can you fix the indendation of the comments?

in src/core.h:None in 9d958f3518 outdated

 288 | @@ -289,11 +289,11 @@ class CTxInUndo
 289 |  public:
 290 |      CTxOut txout;         // the txout data before being spent
 291 |      bool fCoinBase;       // if the outpoint was the last unspent: whether it belonged to a coinbase
 292 | -    unsigned int nHeight; // if the outpoint was the last unspent: its height
 293 | -    int nVersion;         // if the outpoint was the last unspent: its version
 294 | +    uint32_t nHeight; // if the outpoint was the last unspent: its height

sipa commented at 9:53 AM on May 31, 2014:

Fix comment indentation.

in src/main.h:None in 9d958f3518 outdated

 971 | @@ -972,9 +972,9 @@ class CValidationState {
 972 |          MODE_INVALID, // network rule violation (DoS value may be set)
 973 |          MODE_ERROR,   // run-time error
 974 |      } mode;
 975 | -    int nDoS;
 976 | +    int32_t nDoS;

sipa commented at 9:56 AM on May 31, 2014:

No particular reason why nDoS needs to be exactly 32 bits.

laanwj added the label Improvement on Jul 31, 2014

laanwj commented at 7:40 AM on July 31, 2014: member

Rebase needed + verification needs to be done here that the resulting code is exactly the same.

kdomanski commented at 4:48 PM on July 31, 2014: contributor

I will proceed with this within a few days.

petertodd commented at 9:52 PM on July 31, 2014: contributor

Thus the implementation becomes not only unambiguous, but also more compliant with the protocol specification.

The protocol specification is not the wiki, it's the sourcecode. If the wiki doesn't match that code, it should be changed - not the other way around.

Anyway, the changes look ok, but you definitely need to check case-by-case that they really cause no changes. I'd suggest writing up a report with a line or two describing each individual change. A lot of work I know, but that's just the price of working on Bitcoin Core.

kdomanski commented at 11:51 PM on July 31, 2014: contributor

The protocol specification is not the wiki, it's the sourcecode.

I'll quote you every time someone on #bitcoin-dev says there is no One True Implementation.

sipa commented at 11:57 PM on July 31, 2014: member

I disagree when it's just about the p2p protocol, as the language that two nodes speak to each other does not impact the network. When it's about the consensus rules, it's more complicated.

Anyway: I think in many places this is the correct thing to do. As is, the serialization framework really only works because of very strict assumptions on data type sizes. I think it's perfectly acceptable to turn every data type that is ever serialized (appears inside a READDATA(x)) without varint or compact size into a fixed-size type.

petertodd commented at 12:03 AM on August 1, 2014: contributor

I'll quote you every time someone on #bitcoin-dev says there is no One True Implementation.

As you should. Those people are to a first approximation wrong. @sipa With serialization at least a change in semantics will usually fail hard and immediately - hashes won't match. What I'd worry about more is subtle changes in semantics, e.g. overflows.

sipa commented at 2:11 AM on August 1, 2014: member

@petertodd And you should worry about that. But replacing types by typedef'ed identical types shouldn't cause problems.

laanwj commented at 7:35 AM on August 1, 2014: member

It's easy enough to verify that the binary stays the same. I'm volunteering to check (for x86_32 and x86_64) once this is rebased.

petertodd commented at 1:04 PM on August 1, 2014: contributor

@laanwj Doh! Yeah, that's the right way to do it; satisfies my concerns for sure.

kdomanski commented at 7:02 PM on August 1, 2014: contributor

Huh, changing even one "unsigned int" to uint64_t (I'm on x86_64) gives me a different binary hash.

sipa commented at 7:12 PM on August 1, 2014: member

Unsigned int is 32 bit.

Also, make sure to strip the binary before comparison.

brandondahler commented at 10:49 PM on August 1, 2014: contributor

@kdomanski you are thinking about long and unsigned long -- they are either 32-bit or 64-bit depending on the architecture being used. It wont be possible to change variables that are longs without changing the binary.

sipa commented at 10:53 PM on August 1, 2014: member

Nor is there any need. If there are types being serialized right now whose size depends on the architecture, Bitcoin just wouldn't work already. The purpose of this is to change the types that have an implicitly assumed size to types with that size fixed. It shouldn't change anything for any architecture, but may allow us to port to new architectures in the future for which these assumptions don't hold.

laanwj commented at 3:47 PM on August 4, 2014: member

You have to take a few precautions to make sure that executables remain the same:

Strip the executable (as said)
Provide -frandom-seed=bitcoin, to make sure gcc uses a deterministic random seed
Possibly define NDEBUG (assertion errors contain line numbers; not that these are expected to change in this case) - yes, bitcoin makes this difficult
Empty out CLIENT_BUILD / CLIENT_DATE so that no differences are introduced based on the git hash, the build date or '-dirty'

I may have forgotten a few. Anyhow, as said I'm willing to do the checks if you rebase this.

kdomanski commented at 1:45 PM on August 7, 2014: contributor

@laanwj Done. If something doesn't work out, please post assembly diff.

laanwj commented at 6:30 AM on August 14, 2014: member

ACK.

Preparation

Create the following patch stripbuildinfo.patch:

diff --git a/src/main.cpp b/src/main.cpp
index ba521b6..8cc6fb2 100644
--- a/src/main.cpp
+++ b/src/main.cpp
@@ -27,9 +27,9 @@
 using namespace std;
 using namespace boost;

-#if defined(NDEBUG)
-# error "Bitcoin cannot be compiled without assertions."
-#endif
+
+
+

 //
 // Global state
diff --git a/src/version.cpp b/src/version.cpp
index d86caa3..32dd4e9 100644
--- a/src/version.cpp
+++ b/src/version.cpp
@@ -67,5 +67,5 @@ const std::string CLIENT_NAME("Satoshi");
 #    endif
 #endif

-const std::string CLIENT_BUILD(BUILD_DESC CLIENT_VERSION_SUFFIX);
-const std::string CLIENT_DATE(BUILD_DATE);
+const std::string CLIENT_BUILD("");
+const std::string CLIENT_DATE("");

Create a script do_build.sh and chmod +x it:

#!/bin/bash
git reset --hard && \
git clean -f -x -d && \
git checkout $1 && \
git apply ../stripbuildinfo.patch && \
./autogen.sh && \
./configure --disable-wallet --without-gui --without-cli --disable-tests CPPFLAGS="-DNDEBUG -O2 -frandom-seed=bitcoin" && \
make -j3 src/bitcoind && cp src/bitcoind ../bitcoind.$1 && \
strip -o ../bitcoind.$1.stripped -R.note.gnu.build-id ../bitcoind.$1

Followed steps for each platform

### clone the git tree
# do this in a separate tree as the script will wipe everything inside it for sake of hygiene
git clone https://github.com/kdomanski/bitcoin.git bitcoin-compare
cd bitcoin-compare

### build both versions
../do_build.sh a43fde7
../do_build.sh 003bbd5

### comparison
sha256sum ../*.stripped

x86-64

(On a 64-bit Ubuntu 14.04 system) Output:

cd7a6023a386a1abf6dc67596c58f69c7ff7549e6d6c6a31cd7097e59a4bc929  ../bitcoind.003bbd5.stripped
cd7a6023a386a1abf6dc67596c58f69c7ff7549e6d6c6a31cd7097e59a4bc929  ../bitcoind.a43fde7.stripped

Result: OK

x86-32

(On a 32-bit Ubuntu 12.04 VM) Output:

c12d214aa09b8ea1d1ca4fd5600621fb79875650620eab1ca94f71f91068056d  ../bitcoind.003bbd5.stripped
c12d214aa09b8ea1d1ca4fd5600621fb79875650620eab1ca94f71f91068056d  ../bitcoind.a43fde7.stripped

Result: OK

ARM32

(Cubox-i Debian Jessie/Sid)

Changed build.sh to use one thread (-j1) and added --with-boost-libdir=/usr/lib/arm-linux-gnueabihf to build flags Output:

947ceb723a99c766e99ec4a9f85769c63bb74300866f67168ccae6cb3dfeabd1  ../bitcoind.003bbd5.stripped
947ceb723a99c766e99ec4a9f85769c63bb74300866f67168ccae6cb3dfeabd1  ../bitcoind.a43fde7.stripped

Result: OK

petertodd commented at 6:34 AM on August 14, 2014: contributor

@laanwj Nice work!

jgarzik commented at 12:16 PM on August 14, 2014: contributor

ACK

changed field types in some structures to equivalent unambiguous types 40f841d3e3

in src/core.h:None in a43fde7e3f outdated

 494 | @@ -495,8 +495,8 @@ class CBlock : public CBlockHeader
 495 |  
 496 |      uint256 BuildMerkleTree() const;
 497 |  
 498 | -    std::vector<uint256> GetMerkleBranch(int nIndex) const;
 499 | -    static uint256 CheckMerkleBranch(uint256 hash, const std::vector<uint256>& vMerkleBranch, int nIndex);
 500 | +    std::vector<uint256> GetMerkleBranch(int32_t nIndex) const;

laanwj commented at 2:35 PM on August 14, 2014:

Just looked over this for one final time and noticed that here you change the prototype but not the function definition itself (in core.cpp)! I'd suggest to change these back to int, as the nIndex parameters for *MerkleBranchare not directly associated with fields on the data structure that are serialized.

kdomanski commented at 4:12 PM on August 14, 2014: contributor

@laanwj I restored the prototypes.

sipa commented at 6:07 PM on August 14, 2014: member

Untested ACK

BitcoinPullTester commented at 7:28 PM on August 18, 2014: none

Automatic sanity-testing: PASSED, see http://jenkins.bluematt.me/pull-tester/p4180_40f841d3e330b83d252e4cc2874256a4cb4c6641/ for binaries and test log. This test script verifies pulls every time they are updated. It, however, dies sometimes and fails to test properly. If you are waiting on a test, please check timestamps to verify that the test.log is moving at http://jenkins.bluematt.me/pull-tester/current/ Contact BlueMatt on freenode if something looks broken.

laanwj referenced this in commit 9f3d476779 on Aug 30, 2014

laanwj commented at 4:28 AM on August 30, 2014: member

Merged via 9f3d476 (fixed trivial conflict: removal of print functions)

laanwj closed this on Aug 30, 2014

jtimon cross-referenced this on Aug 30, 2014 from issue Create Proof(OfWork) class by jtimon

kdomanski deleted the branch on Sep 1, 2014

laanwj cross-referenced this on Nov 12, 2014 from issue Split operators in EvalScript's switch by jtimon

jtimon cross-referenced this on Jul 24, 2015 from issue Move message processing to new 'procmsg' module. by jgarzik

reddink referenced this in commit 6dfb64135b on May 27, 2020

bitcoin locked this on Sep 8, 2021