Remove double serialization; use software encoder for fee estimation

sipa commented at 1:38 am on May 17, 2021: member

Based on #21981.

This adds a software-based platform-independent float/double encoder/decoder (platform independent in the sense that it only uses arithmetic and library calls, but never inspects the binary representation). This should strengthen our guarantee that encoded float/double values are portable across platforms. It then removes the functionality to serialize doubles from serialize.h, and replaces its only (non-test) use for fee estimation data serialization with the software encoder.

At least on x86/ARM, the only difference should be how certain NaN values are encoded/decoded (but not whether they are NaN or not).

It comes with tests that verify on is_iec559 platforms (which are the only ones we support, at least for now) that the serialized bytes exactly match the binary representation of floats in memory (for non-NaN).

sipa force-pushed on May 17, 2021

DrahtBot added the label Build system on May 17, 2021

DrahtBot added the label Utils/log/libs on May 17, 2021

sipa removed the label Build system on May 17, 2021

sipa force-pushed on May 17, 2021

DrahtBot commented at 3:43 am on May 17, 2021: member

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#21981 (Remove unused float serialization by MarcoFalke)
#21969 (refactor: Switch serialize to uint8_t (Bundle 1/2) by MarcoFalke)
#10443 (Add fee_est tool for debugging fee estimation code by ryanofsky)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

practicalswift commented at 9:40 am on May 17, 2021: contributor

Strong Concept ACK

laanwj commented at 10:23 am on May 17, 2021: member

While I think this is intriguing technically, I’m ~0 on this conceptually

I think using floating point in places where 100% precision or portability across platforms is important is mistaken in the first place. This gives the wrong impression.

It raises deeper questions for me: do we really need floating point in Bitcoin at all? Narrower: do we really need to serialize/deserialize it? (clearly it’s already not used for anything consensus critical)

MarcoFalke commented at 10:28 am on May 17, 2021: member

Haven’t checked, but I presume it is only used for fees.dat?

laanwj commented at 10:42 am on May 17, 2021: member

If so, I’d prefer to find an alternative way of serializing those values specifically (as fixed-point or numerator / denominator pairs of integers) and dropping the general float/double (de)serialization.

practicalswift commented at 2:08 pm on May 17, 2021: contributor

Narrower: do we really need to serialize/deserialize it?

As MarcoFalke discovered ser_float_to_uint32 and ser_uint32_to_float are currently unused (#21981), and AFAICT TxConfirmStats::Write and TxConfirmStats::Read are the only remaining users of ser_double_to_uint64 and ser_uint64_to_double.

Looks promising! :)

sipa commented at 6:10 pm on May 17, 2021: member

@laanwj That"s fair. I agree that avoiding serialization of floating point values directly is much more desirable.

My thinking here is that it effectively gives us a well-specified serialization with testable properties without needing to break compatibility with existing files.

How about rebasing this PR on top of #21981, and making it remove serialization support for double too in serialization.h, and instead make the feedata writing/reading code invoke EncodeDouble/DecodeDouble directly?

laanwj commented at 7:39 am on May 18, 2021: member

How about rebasing this PR on top of #21981, and making it remove serialization support for double too in serialization.h, and instead make the feedata writing/reading code invoke EncodeDouble/DecodeDouble directly?

I was about to comment this! Let’s make this code private to the single case where it is used? It keeps the current format but also prevents future serialization of these types :smile:

sipa force-pushed on May 18, 2021

sipa renamed this:
~~Software float encoding~~
Remove double serialization; use software encoder for fee estimation
on May 18, 2021

sipa force-pushed on May 18, 2021

sipa commented at 7:55 pm on May 18, 2021: member

@laanwj Done.

practicalswift commented at 10:06 pm on May 18, 2021: contributor

cr ACK 892522deba52f35fe5b08e2d7aebeb80600f9cdc: patch looks correct :)

Very happy to see ser_double_to_uint64/ser_uint64_to_double go and src/compat/assumptions.h shrink :)

laanwj commented at 8:23 am on May 19, 2021: member

Thanks! Happy to see this now.

Code review ACK 892522deba52f35fe5b08e2d7aebeb80600f9cdc Code review re-ACK 66545da2008cd9e806e41b74522ded259cd64f86

in src/test/serfloat_tests.cpp:31 in ae87a8aa4f outdated

23+    } else {
24+        // Everything else is.
25+        BOOST_CHECK(!std::isnan(f2));
26+        uint64_t i2 = EncodeDouble(f2);
27+        BOOST_CHECK_EQUAL(f, f2);
28+        BOOST_CHECK_EQUAL(i, i2);

MarcoFalke commented at 6:15 am on May 20, 2021:

would it make sense to also assert equality with the “legacy” serialization helpers in both this unit test and the fuzz test?

Otherwise we might run into a corrupted fees.dat after an upgrade. A problem worse than the problem this pull is trying to solve (making fees.dat architecture independent).

MarcoFalke commented at 6:20 am on May 20, 2021:

Oh sorry, I see that is what the “should produce exactly the same as the in-memory representation for non-NaN” test is doing.

MarcoFalke commented at 6:23 am on May 20, 2021:

Though, it looks like the fuzz test isn’t fuzzing the in-memory representation, only the round-trip?

sipa commented at 11:15 pm on May 24, 2021:

Added back.

MarcoFalke commented at 10:22 am on June 7, 2021:

Ah thanks, though the fuzzed_data_provider.ConsumeFloatingPoint only consumes “nice” floats, not odd ones. So the fuzzer is only testing the happy path.

MarcoFalke commented at 11:56 am on June 7, 2021:

Fixed in #22180

in src/test/serfloat_tests.cpp:41 in ae87a8aa4f outdated

36+    BOOST_CHECK_EQUAL(TestDouble(0.0), 0);
37+    BOOST_CHECK_EQUAL(TestDouble(-0.0), 0x8000000000000000);
38+    BOOST_CHECK_EQUAL(TestDouble(std::numeric_limits<double>::infinity()), 0x7ff0000000000000);
39+    BOOST_CHECK_EQUAL(TestDouble(-std::numeric_limits<double>::infinity()), 0xfff0000000000000);
40+
41+    if (std::numeric_limits<float>::is_iec559) {

MarcoFalke commented at 6:24 am on May 20, 2021:

Any reason to check for float, but then test double?

sipa commented at 11:15 pm on May 24, 2021:

The technical term for this would be an “oops”. Fixed.

Remove unused float serialization e40224d0c7

Add platform-independent float encoder/decoder 2be4cd94f4

Add unit tests for serfloat module bda33f98e2

Convert existing float encoding tests afd964d70b

Convert uses of double-serialization to {En,De}codeDouble fff1cae43a

Remove support for double serialization 66545da200

sipa force-pushed on May 24, 2021

in src/test/fuzz/float.cpp:24 in 66545da200

37-        stream << f;
38-        float f_deserialized;
39-        stream >> f_deserialized;
40-        assert(f == f_deserialized);
41+        uint64_t encoded = EncodeDouble(d);
42+        if constexpr (std::numeric_limits<double>::is_iec559) {

practicalswift commented at 7:08 pm on May 25, 2021:

Note to other reviewers: We assume this to hold true in assumptions.h:

0static_assert(std::numeric_limits<double>::is_iec559, "IEEE 754 double assumed");

in src/test/serfloat_tests.cpp:50 in 66545da200

45+    BOOST_CHECK_EQUAL(TestDouble(2.0), 0x4000000000000000ULL);
46+    BOOST_CHECK_EQUAL(TestDouble(4.0), 0x4010000000000000ULL);
47+    BOOST_CHECK_EQUAL(TestDouble(785.066650390625), 0x4088888880000000ULL);
48+
49+    // Roundtrip test on IEC559-compatible systems
50+    if (std::numeric_limits<double>::is_iec559) {

practicalswift commented at 7:08 pm on May 25, 2021:

Note to other reviewers: We assume this to hold true in assumptions.h:

0static_assert(std::numeric_limits<double>::is_iec559, "IEEE 754 double assumed");

sipa commented at 7:11 pm on May 25, 2021:

I should remove that assumption now, it’s no longer needed.

practicalswift commented at 7:30 pm on May 25, 2021:

I think we need that assumption as long as we’re doing floating-point division by zero? I still think we do that in ConnectBlock, CreateTransaction and EstimateMedianVal :)

sipa commented at 7:35 pm on May 25, 2021:

Ah, good point.

laanwj commented at 7:52 am on May 26, 2021:

Maybe a weaker assumption could do there. In any case, it’s out of scope for this PR. Happy to leave it as it is for the foreseeable future unless anyone has good reason to work on porting bitcoind to non-IEC559 platforms.

(in which case I heartily suggest: please get rid of all floating point code. it shouldn’t be necessary in financial-ish code)

practicalswift commented at 7:09 pm on May 25, 2021: contributor

cr re-ACK 66545da2008cd9e806e41b74522ded259cd64f86

laanwj merged this on May 26, 2021

laanwj closed this on May 26, 2021

sidhujag referenced this in commit efa10235fa on May 27, 2021

MarcoFalke commented at 11:36 am on June 7, 2021: member

review ACK

kittywhiskers referenced this in commit 15c67f4fef on Jun 16, 2021

kittywhiskers referenced this in commit c8cd8bfcaa on Jun 16, 2021

kittywhiskers referenced this in commit cf8d5a958b on Jun 16, 2021

kittywhiskers referenced this in commit 43f1909376 on Jun 16, 2021

kittywhiskers referenced this in commit f42dcbbfb1 on Jun 24, 2021

kittywhiskers referenced this in commit d129516f22 on Jun 25, 2021

kittywhiskers referenced this in commit e8d09709c9 on Jun 25, 2021

kittywhiskers referenced this in commit eecf91cd44 on Jun 26, 2021

kittywhiskers referenced this in commit 441a37f26b on Jun 26, 2021

kittywhiskers referenced this in commit 7d319145ab on Jun 27, 2021

kittywhiskers referenced this in commit f946c68f83 on Jun 27, 2021

UdjinM6 referenced this in commit 7d664c7c53 on Jun 27, 2021

gades referenced this in commit 4915046193 on May 1, 2022

gwillen referenced this in commit 95ce7de90f on Jun 1, 2022

DrahtBot locked this on Aug 18, 2022

Remove double serialization; use software encoder for fee estimation #21966

Conflicts