Valgrind errors with struct assignment on ARM and PPC64LE #776

issue gmaxwell openend this issue on July 28, 2020
  1. gmaxwell commented at 4:50 am on July 28, 2020: contributor

    For some time there has been a clang issue where clang emits a memcpy of a struct to itself and valgrind flags it as undefined behaviour. Clang developers gave the response that when the compiler does it, nothing is undefined behaviour. This is response is mostly true, but memcpy is a libc defined function and can change its behaviour out from under the compiler (and has in the past, it used to work okay with overlapping memory) – in fact, memcpy can be replaced by dynamic linker at runtime (at the direction of the caller).

    In any case wisdom of clang’s behaviour aside, valgrind can’t tell if it was the code or the compiler (which is also part of why valgrind is useful for finding compiler induced non-constant time behaviour).

    For this library it is a practical problem both because valgrind errors are rightfully frightening to developers who would call the library but also because some people want to make running valgrind part of “make check” which would potentially elevate these useless errors to build failures (and teach users to ignore these sorts of reports).

    It’s also the case that where these issues are triggered is unstable and highly environment specific.

    E.g. PPC64LE Fedora 32 Clang 10.0.0-2 -Os fails but other optimization levels do not.

    Checkout this matrix of configurations on ARM8:

     0CC=gcc CFLAGS='-Os -g' ./configure --with-bignum=no --enable-experimental --enable-module-ecdh --enable-module-recovery --with-scalar=64bit --with-field=64bit --disable-openssl-tests && make clean && make -j4 valgrind_ctime_test && libtool --mode=execute valgrind ./valgrind_ctime_test
     1ERROR SUMMARY: 0 errors from 0 contexts 
     2CC=gcc CFLAGS='-Os -g' ./configure --with-bignum=no --enable-experimental --enable-module-ecdh --enable-module-recovery --with-scalar=64bit --with-field=64bit --disable-openssl-tests && make clean && make -j4 valgrind_ctime_test && libtool --mode=execute valgrind ./valgrind_ctime_test
     3ERROR SUMMARY: 0 errors from 0 contexts 
     4CC=gcc CFLAGS='-O3 -g' ./configure --with-bignum=no --enable-experimental --enable-module-ecdh --enable-module-recovery --with-scalar=64bit --with-field=64bit --disable-openssl-tests && make clean && make -j4 valgrind_ctime_test && libtool --mode=execute valgrind ./valgrind_ctime_test
     5ERROR SUMMARY: 0 errors from 0 contexts 
     6CC=clang CFLAGS='-O2 -g' ./configure --with-bignum=no --enable-experimental --enable-module-ecdh --enable-module-recovery --with-scalar=64bit --with-field=64bit --disable-openssl-tests && make clean && make -j4 valgrind_ctime_test && libtool --mode=execute valgrind ./valgrind_ctime_test
     7ERROR SUMMARY: 0 errors from 0 contexts 
     8CC=clang CFLAGS='-Os -g' ./configure --with-bignum=no --enable-experimental --enable-module-ecdh --enable-module-recovery --with-scalar=64bit --with-field=64bit --disable-openssl-tests && make clean && make -j4 valgrind_ctime_test && libtool --mode=execute valgrind ./valgrind_ctime_test
     9ERROR SUMMARY: 1 errors from 1 contexts 
    10CC=clang CFLAGS='-O3 -g' ./configure --with-bignum=no --enable-experimental --enable-module-ecdh --enable-module-recovery --with-scalar=64bit --with-field=64bit --disable-openssl-tests && make clean && make -j4 valgrind_ctime_test && libtool --mode=execute valgrind ./valgrind_ctime_test
    11ERROR SUMMARY: 0 errors from 0 contexts 
    12CC=gcc CFLAGS='-O2 -g' ./configure --with-bignum=no --enable-experimental --enable-module-ecdh --enable-module-recovery --with-scalar=32bit --with-field=32bit --disable-openssl-tests && make clean && make -j4 valgrind_ctime_test && libtool --mode=execute valgrind ./valgrind_ctime_test
    13ERROR SUMMARY: 0 errors from 0 contexts 
    14CC=gcc CFLAGS='-Os -g' ./configure --with-bignum=no --enable-experimental --enable-module-ecdh --enable-module-recovery --with-scalar=32bit --with-field=32bit --disable-openssl-tests && make clean && make -j4 valgrind_ctime_test && libtool --mode=execute valgrind ./valgrind_ctime_test
    15ERROR SUMMARY: 0 errors from 0 contexts 
    16CC=gcc CFLAGS='-O3 -g' ./configure --with-bignum=no --enable-experimental --enable-module-ecdh --enable-module-recovery --with-scalar=32bit --with-field=32bit --disable-openssl-tests && make clean && make -j4 valgrind_ctime_test && libtool --mode=execute valgrind ./valgrind_ctime_test
    17ERROR SUMMARY: 0 errors from 0 contexts 
    18CC=clang CFLAGS='-O2 -g' ./configure --with-bignum=no --enable-experimental --enable-module-ecdh --enable-module-recovery --with-scalar=32bit --with-field=32bit --disable-openssl-tests && make clean && make -j4 valgrind_ctime_test && libtool --mode=execute valgrind ./valgrind_ctime_test
    19ERROR SUMMARY: 0 errors from 0 contexts 
    20CC=clang CFLAGS='-Os -g' ./configure --with-bignum=no --enable-experimental --enable-module-ecdh --enable-module-recovery --with-scalar=32bit --with-field=32bit --disable-openssl-tests && make clean && make -j4 valgrind_ctime_test && libtool --mode=execute valgrind ./valgrind_ctime_test
    21ERROR SUMMARY: 0 errors from 0 contexts 
    22CC=clang CFLAGS='-O3 -g' ./configure --with-bignum=no --enable-experimental --enable-module-ecdh --enable-module-recovery --with-scalar=32bit --with-field=32bit --disable-openssl-tests && make clean && make -j4 valgrind_ctime_test && libtool --mode=execute valgrind ./valgrind_ctime_test
    23ERROR SUMMARY: 0 errors from 0 contexts 
    24CC=gcc CFLAGS='-O2 -g' ./configure --with-bignum=no --enable-experimental --enable-endomorphism --enable-module-ecdh --enable-module-recovery --with-scalar=64bit --with-field=64bit --disable-openssl-tests && make clean && make -j4 valgrind_ctime_test && libtool --mode=execute valgrind ./valgrind_ctime_test
    25ERROR SUMMARY: 0 errors from 0 contexts 
    26CC=gcc CFLAGS='-Os -g' ./configure --with-bignum=no --enable-experimental --enable-endomorphism --enable-module-ecdh --enable-module-recovery --with-scalar=64bit --with-field=64bit --disable-openssl-tests && make clean && make -j4 valgrind_ctime_test && libtool --mode=execute valgrind ./valgrind_ctime_test
    27ERROR SUMMARY: 2 errors from 2 contexts 
    28CC=gcc CFLAGS='-O3 -g' ./configure --with-bignum=no --enable-experimental --enable-endomorphism --enable-module-ecdh --enable-module-recovery --with-scalar=64bit --with-field=64bit --disable-openssl-tests && make clean && make -j4 valgrind_ctime_test && libtool --mode=execute valgrind ./valgrind_ctime_test
    29ERROR SUMMARY: 0 errors from 0 contexts 
    30CC=clang CFLAGS='-O2 -g' ./configure --with-bignum=no --enable-experimental --enable-endomorphism --enable-module-ecdh --enable-module-recovery --with-scalar=64bit --with-field=64bit --disable-openssl-tests && make clean && make -j4 valgrind_ctime_test && libtool --mode=execute valgrind ./valgrind_ctime_test
    31ERROR SUMMARY: 0 errors from 0 contexts 
    32CC=clang CFLAGS='-Os -g' ./configure --with-bignum=no --enable-experimental --enable-endomorphism --enable-module-ecdh --enable-module-recovery --with-scalar=64bit --with-field=64bit --disable-openssl-tests && make clean && make -j4 valgrind_ctime_test && libtool --mode=execute valgrind ./valgrind_ctime_test
    33ERROR SUMMARY: 2 errors from 2 contexts 
    34CC=clang CFLAGS='-O3 -g' ./configure --with-bignum=no --enable-experimental --enable-endomorphism --enable-module-ecdh --enable-module-recovery --with-scalar=64bit --with-field=64bit --disable-openssl-tests && make clean && make -j4 valgrind_ctime_test && libtool --mode=execute valgrind ./valgrind_ctime_test
    35ERROR SUMMARY: 0 errors from 0 contexts 
    36CC=gcc CFLAGS='-O2 -g' ./configure --with-bignum=no --enable-experimental --enable-endomorphism --enable-module-ecdh --enable-module-recovery --with-scalar=32bit --with-field=32bit --disable-openssl-tests && make clean && make -j4 valgrind_ctime_test && libtool --mode=execute valgrind ./valgrind_ctime_test
    37ERROR SUMMARY: 0 errors from 0 contexts 
    38CC=gcc CFLAGS='-Os -g' ./configure --with-bignum=no --enable-experimental --enable-endomorphism --enable-module-ecdh --enable-module-recovery --with-scalar=32bit --with-field=32bit --disable-openssl-tests && make clean && make -j4 valgrind_ctime_test && libtool --mode=execute valgrind ./valgrind_ctime_test
    39ERROR SUMMARY: 2 errors from 2 contexts 
    40CC=gcc CFLAGS='-O3 -g' ./configure --with-bignum=no --enable-experimental --enable-endomorphism --enable-module-ecdh --enable-module-recovery --with-scalar=32bit --with-field=32bit --disable-openssl-tests && make clean && make -j4 valgrind_ctime_test && libtool --mode=execute valgrind ./valgrind_ctime_test
    41ERROR SUMMARY: 0 errors from 0 contexts 
    42CC=clang CFLAGS='-O2 -g' ./configure --with-bignum=no --enable-experimental --enable-endomorphism --enable-module-ecdh --enable-module-recovery --with-scalar=32bit --with-field=32bit --disable-openssl-tests && make clean && make -j4 valgrind_ctime_test && libtool --mode=execute valgrind ./valgrind_ctime_test
    43ERROR SUMMARY: 0 errors from 0 contexts 
    44CC=clang CFLAGS='-Os -g' ./configure --with-bignum=no --enable-experimental --enable-endomorphism --enable-module-ecdh --enable-module-recovery --with-scalar=32bit --with-field=32bit --disable-openssl-tests && make clean && make -j4 valgrind_ctime_test && libtool --mode=execute valgrind ./valgrind_ctime_test
    45ERROR SUMMARY: 0 errors from 0 contexts 
    46CC=clang CFLAGS='-O3 -g' ./configure --with-bignum=no --enable-experimental --enable-endomorphism --enable-module-ecdh --enable-module-recovery --with-scalar=32bit --with-field=32bit --disable-openssl-tests && make clean && make -j4 valgrind_ctime_test && libtool --mode=execute valgrind ./valgrind_ctime_test
    47ERROR SUMMARY: 0 errors from 0 contexts 
    

    So out of 24 configurations, 4 fail.

    Each of the failing cases look like:

     0==45040== Source and destination overlap in memcpy(0x1ffeffefe0, 0x1ffeffefe0, 84)
     1==45040==    at 0x484F1D4: __GI_memcpy (in /usr/lib/aarch64-linux-gnu/valgrind/vgpreload_memcheck-arm64-linux.so)
     2==45040==    by 0x486B5B3: secp256k1_ge_neg (group_impl.h:95)
     3==45040==    by 0x4870737: secp256k1_ecmult_const.constprop.0 (ecmult_const_impl.h:260)
     4==45040==    by 0x4872A6B: secp256k1_ecdh (main_impl.h:53)
     5==45040==    by 0x109137: main (valgrind_ctime_test.c:73)
     6==45040== 
     7==45040== Source and destination overlap in memcpy(0x1ffeffefe0, 0x1ffeffefe0, 84)
     8==45040==    at 0x484F1D4: __GI_memcpy (in /usr/lib/aarch64-linux-gnu/valgrind/vgpreload_memcheck-arm64-linux.so)
     9==45040==    by 0x486B5B3: secp256k1_ge_neg (group_impl.h:95)
    10==45040==    by 0x487075F: secp256k1_ecmult_const.constprop.0 (ecmult_const_impl.h:266)
    11==45040==    by 0x4872A6B: secp256k1_ecdh (main_impl.h:53)
    12==45040==    by 0x109137: main (valgrind_ctime_test.c:73)
    

    (The clang build with only one error gets only the ecmult_const_impl.h:260 one, because :266 is endo only)

    So there are failures with both GCC and Clang Os only. GCC does it for 32 and 64 only with endo. Clang does it only with 64 bit but doesn’t care about endo.

    GCC previously had a bug a long time ago on this which was closed as resolved: https://gcc.gnu.org/bugzilla//show_bug.cgi?id=19410

    Bug in LLVM tracker: https://bugs.llvm.org/show_bug.cgi?id=11763

    I think this is a serious impedement to using valgrind except as a development/review tool, because the location of these issues is unpredictable, depends on random build flags, depends on the compiler version, and could come and go due to unrelated changes to the codebase. There are a large number of struct assignments in the code too, so a large number of places where one could crop up.

  2. real-or-random commented at 8:51 am on July 28, 2020: contributor
  3. gmaxwell commented at 8:58 am on July 28, 2020: contributor

    I think that won’t work well: they can show up in 100 different places, they’ll move around, etc. Obviously we’re not going to be maintaining a separate file with every struct assignment in the codebase. :P

    I think suppressions might be workable for our CI purposes where we’re targeting a small set of compilers and if we move something we can up just fix it on the spot. We also should be concerned about callers: I don’t want to know how many security critical programs went years without being tested under valgrind just because they linked openssl and it returned ub tainted randomness.

    For something like the ct timer tester I think we could make it just totally suppress the memcpy arguments check– that’s not what we’re looking for with that test.

  4. real-or-random commented at 9:40 am on July 28, 2020: contributor

    I think that won’t work well: they can show up in 100 different places, they’ll move around, etc. Obviously we’re not going to be maintaining a separate file with every struct assignment in the codebase. :P

    The syntax is pretty clever, we could for example suppress it in secp256k1_ge_neg.

    I think suppressions might be workable for our CI purposes where we’re targeting a small set of compilers and if we move something we can up just fix it on the spot. We also should be concerned about callers: I don’t want to know how many security critical programs went years without being tested under valgrind just because they linked openssl and it returned ub tainted randomness.

    For something like the ct timer tester I think we could make it just totally suppress the memcpy arguments check– that’s not what we’re looking for with that test.

    I was thinking more into these directions. That the user can’t use valgrind reliably is sad but not our fault and don’t see what we could do about it except document it. And maybe report a valgrind bug. The only related bug I could find is https://bugs.kde.org/show_bug.cgi?id=148543.

  5. gmaxwell commented at 9:49 am on July 28, 2020: contributor

    Yeah, we should talk to the valgrind authors. This isn’t technically a valgrind bug and GCC has previously fixed the behaviour. But LLVM/Clang indicate that they will not change the behaviour and instead require that they be used with a libc that can self memcpy safely, and this isn’t a crazy position.

    The memcpy-to-self seems like something that is pretty unlikely to be an actual bug, so perhaps at a minimum valgrind could be convinced to classify that as a different kind of error from general overlap.

  6. gmaxwell commented at 10:10 am on July 28, 2020: contributor
    Do we know anyone that is involved in the C standard? … perhaps they could be convinced to at least elevate self-memcpy to implementation defined. :) I normally don’t consider the testing of something even started until a compiler bug or two is reported… getting the language changed would be a new bar to meet in the future. :)
  7. elichai commented at 10:24 am on July 28, 2020: contributor

    Do we know anyone that is involved in the C standard? … perhaps they could be convinced to at least elevate self-memcpy to implementation defined. :) I normally don’t consider the testing of something even started until a compiler bug or two is reported… getting the language changed would be a new bar to meet in the future. :)

    I think I might know someone on twitter, I’ll post a tweet and start tagging :)

  8. real-or-random commented at 10:33 am on July 28, 2020: contributor

    Some more references: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667

    https://web.archive.org/web/20180926060810/http://valgrind.10908.n7.nabble.com/memcpy-x-x-n-is-not-OK-clang-llvm-vs-memcheck-td44514.html

    I’m not very motivated to talk to Valgrind after this thread but yeah, it’s old, hm.

    Do we know anyone that is involved in the C standard? … perhaps they could be convinced to at least elevate self-memcpy to implementation defined. :)

    I don’t believe that this is a language issue. I tend to agree with the clang devs here.

  9. gmaxwell commented at 11:01 am on July 28, 2020: contributor

    Clang devs don’t have any control over libc. But if they think must-support-memcpy-self is a reasonable requirement for memcpy, then isn’t that a case that it’s a reasonable implementation defined behaviour for memcpy? ::shrugs:: That is in fact defacto what they’re relying on now– that they’re only used with a memcpy that has an implementation defined behaviour for self copies.

    You shouldn’t let that thread throw you on valgrind– the person with the argument for not changing valgrind doesn’t appear to listed as a valgrind author, could just be an internet hothead. I’m not sure what change I’d ask from valgrind other than making it so you can separately suppress memcpy-self from overlaps (if that isn’t already the case). I wouldn’t want it to stop returning errors for actual bugs in source code.

    [Especially now that I see that thread pointed out a plausible memcpy construction that would break self assignment.]

  10. real-or-random commented at 11:12 am on July 28, 2020: contributor

    Clang devs don’t have any control over libc. But if they think must-support-memcpy-self is a reasonable requirement for memcpy, then isn’t that a case that it’s a reasonable implementation defined behaviour for memcpy? ::shrugs::

    I think this would miss the point. Clang and libc together form a C implementation, and things like “implementation-defined” apply to C programs calling memcpy and not to internal calls between one part of a C implementation and another part, these are literally implementation details. You’re right of course, it’s somewhat crazy for clang to rely on it if you control only half of the C implementation, and rely on undocumented behavior provided by the other part. (I don’t know enough to judge about the potential real issues on PowerPC.)

    You shouldn’t let that thread throw you on valgrind– the person with the argument for not changing valgrind doesn’t appear to listed as a valgrind author, could just be an internet hothead. I’m not sure what change I’d ask from valgrind other than making it so you can separately suppress memcpy-self from overlaps (if that isn’t already the case). I wouldn’t want it to stop returning errors for actual bugs in source code.

    Fair. Do you want to give a try? Otherwise I can do it next week or so, I guess I don’t have the time right now.

  11. gmaxwell commented at 11:37 am on July 28, 2020: contributor

    it’s somewhat crazy for clang to rely on it if you control only half of the C implementation,

    Yeah. That’s also my point. They’re perfectly right that the “implementation” (complier + libc) can do whatever it wants, I agree. But relying on undocumented libc behaviour is not wise, esp because glibc has changed before – memcpy on overlapping regions used to work.

    Fair. Do you want to give a try? Otherwise I can do it next week or so, I guess I don’t have the time right now.

    I’m going to seek some friendly input for a few days first, but sure.

  12. elichai commented at 2:52 pm on October 19, 2020: contributor

    I can no longer reproduce this, not with clang or with gcc, I’m not sure if it is because of a code change or new compilers or valgrind changed

     0$ clang --version
     1clang version 11.0.0 (https://github.com/llvm/llvm-project.git 176249bd6732a8044d457092ed932768724a6f06)
     2Target: x86_64-unknown-linux-gnu
     3
     4$ gcc --version
     5gcc (GCC) 10.2.0
     6
     7$ CC=clang CFLAGS='-Os' ./configure --with-bignum=no --enable-experimental --enable-module-ecdh --enable-module-recovery --enable-module-schnorrsig --enable-module-extrakeys --disable-openssl-tests && make clean && make valgrind_ctime_test && libtool --mode=execute valgrind ./valgrind_ctime_test
     8==3791018== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
     9
    10$ CC=gcc CFLAGS='-Os' ./configure --with-bignum=no --enable-experimental --enable-module-ecdh --enable-module-recovery --enable-module-schnorrsig --enable-module-extrakeys --disable-openssl-tests && make clean &&make valgrind_ctime_test && libtool --mode=execute valgrind ./valgrind_ctime_test
    11==3793460== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
    
  13. real-or-random commented at 3:39 pm on October 19, 2020: contributor
    @elichai I’m not sure if we ever had this issue on x86_64. The matrix in the initial comment is on ARM8.
  14. elichai commented at 3:42 pm on October 19, 2020: contributor

    @elichai I’m not sure if we ever had this issue on x86_64. The matrix in the initial comment is on ARM8.

    Oops I forgot that part. you’re right.

  15. real-or-random commented at 2:27 pm on October 30, 2020: contributor
    This seems to happen with clang 7 also on -O2 builds: https://app.shippable.com/github/real-or-random/secp256k1/runs/9/29/console (I was playing around with Shippable as a CI) Not sure what the crucial difference here is to Travis, which also runs clang 7. Maybe it’s that Shippable uses an older valgrind version,
  16. real-or-random commented at 2:30 pm on October 30, 2020: contributor

    Fair. Do you want to give a try? Otherwise I can do it next week or so, I guess I don’t have the time right now.

    I’m going to seek some friendly input for a few days first, but sure. @gmaxwell Did you reach out to them?

  17. real-or-random cross-referenced this on Nov 17, 2020 from issue 32bit armv7l valgrind error by rustyrussell
  18. real-or-random commented at 10:26 am on November 18, 2020: contributor

    So we see this now with GCC too: https://github.com/ElementsProject/lightning/issues/4203

    There’s an age-old open GCC bug: https://gcc.gnu.org/bugzilla//show_bug.cgi?id=32667 The proposed solution in clang is simply to document the issue: https://reviews.llvm.org/D86993

    So far we’ve seen this only in secp256k1_ge_neg in this line: https://github.com/bitcoin-core/secp256k1/blob/c6b6b8f1bb044d7d1aa065ebb674adde98a36a8e/src/group_impl.h#L84 As long as this the case, I believe we should just mitigate this in our code. Here are a few things we could do:

    The first four look all reasonable to me and I’d probably ACK any of those. If you ask me, we should just do the first. In the interest of avoiding bikeshedding, I would open the PR but it’ll be easier if someone with an ARM system can do it because I can’t test (without compiling for my phone, which is a PITA).

  19. gmaxwell commented at 4:52 pm on November 18, 2020: contributor
    or split into inplace and neg, and neg branches then calls inplace. Most usage would just use inplace directly.
  20. elichai commented at 10:26 am on November 19, 2020: contributor
    • Prepend if (r != a).

    I’d really like to avoid pointer comparison, AFAIU comparing 2 pointers that point to different objects(ala not in the same array) is UB in C.

  21. real-or-random commented at 10:53 am on November 19, 2020: contributor
    • Prepend if (r != a).

    I’d really like to avoid pointer comparison, AFAIU comparing 2 pointers that point to different objects(ala not in the same array) is UB in C.

    No, it’s defined. The ==/!= operators never yield UB. See http://port70.net/~nsz/c/c99/n1256.html#6.5.9p6

  22. jonasnick commented at 9:24 pm on November 20, 2020: contributor

    So far we’ve seen this only in secp256k1_ge_neg in this line:

    With default configuration flags, ARMv8 and GCC 10.2.0, I get the Source and destination overlap in memcpy valgrind error in secp256k1_gej_add_var.

  23. real-or-random commented at 8:55 am on November 23, 2020: contributor

    @jonasnick Ok, damn. Is this the only instance?

    I’m not sure how to move forward then. I’m tempted to say “if it’s only in two places, then let’s fix it” but I worry that we’ll end up replacing every struct assignment in the code base.

    At least we’ve seen this only with ge and gej. I hope this is because they’re the largest structs we have in the code and their size is maybe not nicely aligned to register sizes due to the infinity int. (Then it makes more sense to call memcpy for the compiler instead of optimize using other instructions, I’ve seen this when looking into zeroing secrets.) And we only have a handful of struct assignments in group_impl.h, so I think we should still try to eliminate them.

    We should still benchmark if it makes a difference. Or directly split the function into two variants. If the compiler inserts memcpy in ARM with -O2, this may be a performance issue even now (secp256k1_gej_add_var is used in a loop in ecmults…)

  24. real-or-random cross-referenced this on Jun 7, 2021 from issue Valgrind error in `rustsecp256k1_v0_4_0_ecdh`/`rustsecp256k1_v0_4_0_ge_neg` on ppcel64 by TheBlueMatt
  25. real-or-random renamed this:
    Valgrind errors with struct assignment in Os builds
    Valgrind errors with struct assignment on ARM
    on Jun 7, 2021
  26. TheBlueMatt commented at 7:31 pm on June 7, 2021: contributor
    This also occurs with on ppcel64.
  27. real-or-random renamed this:
    Valgrind errors with struct assignment on ARM
    Valgrind errors with struct assignment on ARM and PPC64LE
    on Jun 7, 2021
  28. real-or-random commented at 7:33 pm on June 7, 2021: contributor

    This also occurs with on ppcel64.

    Right! This was in fact noted in the opening comment here but I mis-renamed the issue…

  29. real-or-random commented at 5:31 pm on June 8, 2021: contributor

    I’ve looked at the output of powerpc64le-linux-gnu-objdump -drwCSl .libs/libsecp256k1.so | grep -B10 memcpy@@ (for -O2) and it tells me that most calls to libc’s memcpy correspond to actual memcpy calls in our code. But around 10 calls are struct assignments and a handful of them may be problematic because they may be a self-copy. All of those cases are ge or gej structs. (I haven’t looked at ARM again but I believe it’s similar there…)

    I think the scope of the issue is still small but this is more than the two code locations we’ve seen before. :/

    I suggest to add _ge_clone and _gej_clone methods, use them in the corresponding places, and then we can see how to implement them such that self-memcpys won’t show up, e.g., using an if or memmove. This enables us to benchmark the code and see whether a particular implementation affects performance. (edit: And if we don’t want to make changes due to performance, this makes it at least easier to write a Valgrind suppression rule or wrap the changes with if(RUNNING_ON_VALGRIND)).

    All of this is acceptable, I think. But it’s an ugly hack.

  30. jonasnick commented at 1:58 pm on June 9, 2021: contributor
    Fwiw, if we do this hack we should also test in CI that we actually use these methods (preferably by running the tests in valgrind on the affected architectures).
  31. real-or-random commented at 3:53 pm on June 9, 2021: contributor

    Fwiw, if we do this hack we should also test in CI that we actually use these methods (preferably by running the tests in valgrind on the affected architectures).

    I agree though it’s hard to run Valgrind on foreign architectures. Valgrind and qemu-user don’t go together… We could pay for some EC2 ARM instances and hand them to Cirrus CI (that is supported). But even this won’t cover PowerPC. No matter what, this will be a project on its own.

    For now, I’d prefer some clever grep on the assembly.

  32. real-or-random commented at 1:11 pm on June 10, 2021: contributor

    For now, I’d prefer some clever grep on the assembly.

    Well ok, the issue with this is that this needs to be something which is statically checkable. This leaves us with only two options:

    • memmove
    • assigning all struct members individually

    In both cases, we’d hope the compiler won’t turn it into a memcpy.

    And in both cases, it’s probably better to ensure that all of the field members are initialized to avoid UB (or any discussion on whether this is UB or not, which may be worse than the actual UB :P).

    So we think we should first merge #791 .

  33. real-or-random commented at 4:28 pm on October 26, 2023: contributor

    I’ve run into this again in #1433, see https://cirrus-ci.com/build/5276023932059648.

    0==3653== Source and destination overlap in memcpy(0xfed0e388, 0xfed0e388, 148)
    1==3653==    at 0x486DA10: memcpy (vg_replace_strmem.c:1120)
    2==3653==    by 0x1689FF: secp256k1_ecmult_pippenger_wnaf (ecmult_impl.h:567)
    3==3653==    by 0x1689FF: secp256k1_ecmult_pippenger_batch (ecmult_impl.h:710)
    4==3653==    by 0x16BF5F: secp256k1_ecmult_pippenger_batch_single (ecmult_impl.h:729)
    5==3653==    by 0x169ABF: test_ecmult_multi (tests.c:4583)
    6==3653==    by 0x12BDA7: run_ecmult_multi_tests (tests.c:5137)
    7==3653==    by 0x12BDA7: main (tests.c:7588)
    

    (Note:

    0==3653== Source and destination overlap in memcpy(0xfed13708, 0xfed13708, 148)
    1==3653==    at 0x486DA10: memcpy (vg_replace_strmem.c:1120)
    2==3653==    by 0x124E5F: test_ge (tests.c:3813)
    3==3653==    by 0x124E5F: run_ge (tests.c:3984)
    4==3653==    by 0x124E5F: main (tests.c:7575)
    

    No useful debug info here, but I tracked this down to the known *r = *a in secp256k1_gej_add_var. )

  34. real-or-random cross-referenced this on Oct 26, 2023 from issue ci/cirrus: Add ARM32 valgrind tasks by real-or-random
  35. real-or-random added the label assurance on Oct 26, 2023
  36. real-or-random commented at 4:47 pm on October 27, 2023: contributor

    Do we know anyone that is involved in the C standard? … perhaps they could be convinced to at least elevate self-memcpy to implementation defined. :)

    LLVM people are working on a draft DR: https://reviews.llvm.org/D86993#4525896


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin-core/secp256k1. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-12-03 17:15 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me