Make scalar/field choice depend on C-detected _

sipa commented at 6:00 pm on August 9, 2020: contributor

This PR does two things:

It removes the ability to select the 5x52 field with a 8x32 scalar, or the 10x26 field with a 4x64 scalar. It’s both 128-bit wide versions, or neither.
The choice is made automatically by the C code, unless overridden by a USE_FORCE_WIDEMUL_INT{64,128} define (which is available through configure with a hidden option –with-test-override-wide-multiplication={auto,int64,int128}).

This reduces the reliance on autoconf for this performance-critical configuration option, and also reduces the number of different combinations to test.

This removes one theoretically useful combination: if you had x86_64 asm but no __int128 support in your compiler, it was possible to use the 64-bit field before but the 32-bit scalar. I think this doesn’t matter as all compilers/systems that support (our) x86_64 asm also support __int128. Furthermore, #767 will break this.

As an unexpected side effect, this also means the gen_context static precomputation tool will now use __int128 based implementations when available (which required an addition to the 5x52 field; see first commit).

sipa force-pushed on Aug 9, 2020

real-or-random commented at 10:13 pm on August 9, 2020: contributor

Concept ACK

I like the naming here which demonstrates that this is only about multiplications. That seems useful as a first step towards the idea that has been mentioned in #711 (comment) and in the following comment.

sipa commented at 10:18 pm on August 9, 2020: contributor

Indeed, this could be extended with a –with-wide-multiplication=mul128 for MSVC for example (well, MSVC platforms likely wouldn’t use the configure script at all, but the equivalent macro config).

real-or-random commented at 10:52 pm on August 9, 2020: contributor

Maybe we could get rid of autoconf here too by checking __SIZEOF_INT128__: https://stackoverflow.com/a/54815033

sipa commented at 0:10 am on August 10, 2020: contributor

@real-or-random That would certainly simplify things further.

One reason to not do this may be to keep the ability to explicitly select uint64_t based multiplication in a test.

For example, I think we expect that all x86_64 Linux/gcc configurations would end up using the __int128 based multiplication, and without configurability, every such instance on Travis would do so. However, perhaps there are reasonable x86_64 Linux/gcc (or very similar) configurations out there that we don’t know about, or can’t test for, without __int128. If so, it may be better to keep it configurable, and have at least one explicit test with uint64_t based multiply.

real-or-random commented at 0:39 am on August 10, 2020: contributor

We could still disable in the tests using macros, e.g., by simply adding -U__SIZEOF_INT128__ to the CPPFLAGS. I guess part of the question is whether we want to expose an easy interface for configuring to the user, because ./configure shines here. But I don’t think we want that in that case. Forcing a narrower type is not very useful for users.

gmaxwell commented at 2:47 am on August 10, 2020: contributor

I don’t think it should be exposed to the user. It should be possible to override w/ ifdefs for testing/benchmarking by developers (see for example the safegcd PR where testing 32-bit on x86_64 is both common and super useful), or for building for weirdo platforms where the type exists but doesn’t work. This kind of thing can be documented in a development focused readme.

Exposing it to joe-user just begs him to set it in a dumb way and hurt his performance. :)

Another option if there is some reservation around define macro sniffing is to have it in autotools but hide the setting or explicitly note that it’s for testing. Also ./configure setting can trigger overrides of macro-sniffed stuff, so the question of detection via autotools vs macros is largely orthogonal.

I think that in general if there is a simple, portable, and reliable macro test it should be preferred– autotools checking is slow, more likely to run into host vs target issues, and depends on using autotools, which not all targets will do. Especially in this project because the whole thing is one compilation unit and there are very few things that need to be detected (endianness, type sizes, clz instructions, not much else…) , so you don’t end up with the issue of a huge detection header that is redundantly run for every object.

sipa commented at 7:14 am on August 10, 2020: contributor

I think I prefer a hidden configure flag, say “–with-test-override-widemul=X” or so. It feels a bit ugly and strangely roundabout to go undefine system defines to guide the autodetection you wrote yourself into the right thing for testing.

I’d imagine it works by having USE_FORCE_WIDEMUL_X macros that are usually all undefined, but can be set by the configure script. There is no autodetection or verification in the script; it’s just passed through.

gmaxwell commented at 7:50 am on August 10, 2020: contributor

Right, I wasn’t thinking undefining system macros would be good. -DUSE_FORCE_FOO makes perfect sense. (and could applied to inject testing CLZs and stuff like that if you later feel it would be useful to do so)

elichai commented at 8:23 am on August 10, 2020: contributor

Concept ACK.

FWIW there are quite a lot of users who just use 32bit because they don’t use autotools and for some reason decided it was easier to just always use 32bit code, which is a shame. and this could help them (if they’ll update their libsecp…)

examples: https://github.com/mimblewimble/rust-secp256k1-zkp/pull/71 https://github.com/ethereum/go-ethereum/pull/21203 (was fixed just 2 months ago)

real-or-random commented at 9:09 am on August 10, 2020: contributor

Right, I wasn’t thinking undefining system macros would be good. -DUSE_FORCE_FOO makes perfect sense. (and could applied to inject testing CLZs and stuff like that if you later feel it would be useful to do so)

The undef thing was just a lazy example. CPPFLAGS=-DUSE_FORCE_FOO is essentially a hidden flag that can be passed to autoconf on the command line, so I don’t think there is a need to keep any autoconf code to pass the flag through (but I’m not opposed to that either).

in src/basic-config.h:24 in 89acf9e3ca outdated

23-#undef USE_SCALAR_4X64
24-#undef USE_SCALAR_8X32
25 #undef USE_SCALAR_INV_BUILTIN
26 #undef USE_SCALAR_INV_NUM
27+#undef USE_WIDEMUL_INT64
28+#under USE_WIDEMUL_INT128

real-or-random commented at 9:29 am on August 10, 2020:

s/under/undef

sipa commented at 9:03 pm on August 11, 2020:

Fixed.

gmaxwell commented at 12:21 pm on August 10, 2020: contributor

It would be really good if configure still showed the configuration, and maybe there was some string that the test/bench utilities could print— so otherwise some weird detection failure doesn’t result in a user silently getting half the speed with no evidence of the cause. I’ve had cases before where I was accidentally linking the wrong library and was confused by the performance.

e.g.

“libsecp256k1-deadbeef gcc 10.2.1 le x86_64 s=64asm f=64asm clz ctz nobignum endo …”

Add SECP256K1_FE_STORAGE_CONST_GET to 5x52 field

So far this has not been needed, as it's only used by the static precomputation
which always builds with 32-bit fields.

This prepares for the ability to have __int128 detected on the C side, breaking
that restriction.

0d7727f95e

sipa force-pushed on Aug 10, 2020

sipa commented at 9:39 pm on August 10, 2020: contributor

Changed to make the C side detect the availability of __int128, with the ability to override through a define (and a hidden configure option).

An unexpected side effect here is that gen_context will now start using the __int128 based implementations as well, if they are available. This means I had to add SECP256K1_FE_STORAGE_CONST_GET for the 5x52 field, which didn’t exist as it was never needed!

sipa renamed this:
~~Merge scalar/field config into widemul setting~~
Make scalar/field choice depend on C-detected __int128 availability
on Aug 10, 2020

sipa force-pushed on Aug 10, 2020

Autodetect __int128 availability on the C side

Instead of supporting configuration of the field and scalar size independently,
both are now controlled by the availability of a 64x64->128 bit multiplication
(currently only through __int128). This is autodetected from the C code through
__SIZEOF_INT128__, but can be overridden using configure's
--with-test-override-wide-multiply, or by defining
USE_FORCE_WIDEMUL_{INT64,INT128} manually.

79f1f7a4f1

sipa force-pushed on Aug 10, 2020

sipa cross-referenced this on Aug 10, 2020 from issue Report build configuration in test/bench output by sipa

real-or-random commented at 9:04 am on August 11, 2020: contributor

ACK 79f1f7a4f123765cf07be92ae894d882c5845191 diff looks good and tests pass

real-or-random approved

elichai commented at 11:09 pm on August 11, 2020: contributor

tACK 79f1f7a4f123765cf07be92ae894d882c5845191 I tested all the compilers available at godbolt(https://godbolt.org/z/8cqP1q) and:

A. they all managed to parse it(some very old ones required replacing uint64_t with unsigned long long) except cc64 and ppci 0.5.5. B. None returned 0. C. All the architectures I recognized returned the value I had expected(128 for 64bit and 64 for 32bit). This also includes a few MSVC compilers there.

Interesting finding, clang added support(x86-64) for 128bit on 3.3. and gcc on 4.6.4 (from the versions available on godbolt, I did not check outside).

gmaxwell commented at 11:27 pm on August 11, 2020: contributor

clang 3.3. was in 2013 while gcc 4.6 was in 2011.

real-or-random merged this on Aug 12, 2020

real-or-random closed this on Aug 12, 2020

jasonbcox referenced this in commit b0c5519ad6 on Sep 29, 2020

deadalnix referenced this in commit 736660ebe9 on Sep 30, 2020

elichai cross-referenced this on Mar 9, 2021 from issue Define field/scalar impl in secp256k1-sys/build.rs by fanatid

Make scalar/field choice depend on C-detected __int128 availability #793