Enable binary releases to work on older Linux distros without a static link #4042

pull theuni wants to merge 4 commits into bitcoin:master from theuni:libc-compat changing 5 files +134 −14
  1. theuni commented at 3:53 am on April 10, 2014: member

    This reverts #3914 in favor of a more subtle approach, giving us the benefits of shared base libs and the (relative) portability of full-static.

    glibc/libstd++ stubs have been added so that older distros don’t see undefined symbols. They are bundled together into an –enable-glibc-back-compat configure switch, as I’m not sure there’s any real-world usefulness in separating them.

  2. gmaxwell commented at 4:48 am on April 10, 2014: contributor
    I like this approach much better. Fully statically linking glibc is usually a bad idea, I’ve spent many an hour tracking down mysterious name resolution failures from static libc.
  3. wtogami commented at 5:30 am on April 10, 2014: contributor
    This should also benefit from ASLR, which was not possible with static linked glibc.
  4. in src/glibc_compat.cpp: in f17e2c7eb0 outdated
    0@@ -0,0 +1,19 @@
    1+#include "bitcoin-config.h"
    


    laanwj commented at 6:00 am on April 10, 2014:
    I’d prefer to move these source files to src/compat. Let’s keep them separate from the main bitcoin source.

    theuni commented at 6:06 am on April 10, 2014:
    Sure, makes sense.
  5. laanwj commented at 6:08 am on April 10, 2014: member

    Static may be a bad idea, but it is the only way to be consistently compatible with a lot of old releases without extra maintenance overhead.

    I feel scared importing parts of the C and C++ libraries into bitcoin. Any change in the code may make it necessary to extend glibcxx_compat.cpp / src/glibc_compat.cpp with ‘one more function’. If there are implementation problems in those files it will be hard to debug.

    Then again, it may be lesser of evils.

    Let’s at least get this tested on all the platforms people were concerned about in the other topic:

    • Debian 7.2 32-bit and 64 bit
    • Ubuntu 10.04.4 LTS 32-bit and 64 bit
    • CentOS 6.5 32bit and 64 bit
  6. theuni commented at 6:14 am on April 10, 2014: member

    Agreed on the overhead, but I do think that it’s the best of all evils. We’d basically just have to keep an eye on it, and update as necessary. As for debugging though, as @gmaxwell pointed out, this makes it substantially easier, not harder.

    Anyway, release decisions should be locked in far earlier than they have been in the past. With a sane procedure, we’d know how we stand on final back-compat when shipping the first alpha.

  7. laanwj commented at 6:24 am on April 10, 2014: member

    Half related, but more urgent if the gitian-shipped and self-built executables further diverge in implementation: we should make gitian build external debug information files (not to be shipped, just for our own use). This makes it possible to debug crashes in the gitian builds as well as to resolve crash dumps that people send.

    Working on test builds…

  8. theuni commented at 6:29 am on April 10, 2014: member

    That’s one of the benefits of being deterministic :)

    ‘objcopy –only-keep-debug’ on the pre-stripped binaries should be enough to track things down in the future.

  9. laanwj commented at 8:48 am on April 10, 2014: member

    Test builds available here: https://download.visucore.com/bitcoin/bitcoin-f17e2c7-linux.tar.gz https://download.visucore.com/bitcoin/bitcoin-f17e2c7-linux.tar.gz.sig

    If you help testing, please let us know what you tested, and on what OS version and architecture.

  10. build: add glibc/libstdc++ back-compat stubs
    glibc/libstdc++ have added new symbols in later releases. When running a new
    binary against an older glibc, the run-time linker is unable to resolve the
    new symbols and the binary refuses to run.
    
    This can be fixed by adding our own versions of those functions, so that the
    build-time linker does not emit undefined symbols for them.
    
    This enables our binary releases to work on older Linux distros, while not
    incurring the downsides of a fully static binary.
    ffc6b678b9
  11. theuni commented at 2:27 am on April 11, 2014: member
    moved to compat/ and force-pushed, since nothing else changed.
  12. build: add an option for enabling glibc back-compat
    Using "./configure --enable-glibc-back-compat" will attempt to be
    compatible with a target running glibc abi 2.9 and libstdc++ abi 3.4.
    d5aab70490
  13. gitian-linux: --enable-glibc-back-compat 49a3352c1c
  14. theuni commented at 2:52 am on April 11, 2014: member
    pinging @anton000, @jcea, and @norbertVC since they spoke up in the other PR. Would be great if you could test the binaries from @laanwj.
  15. anton000 commented at 10:18 am on April 11, 2014: none
    @laanwj bins run fine on fully updated CentOS 6.5 64-bit. Currently offshore so cant test 32bit on my box as of yet.
  16. sipa commented at 10:52 am on April 11, 2014: member

    Do any of the functions reimplemented by this patch actually get called from within Bitcoin? In particular, from within consensus-critical code?

    Is there any risk of this code not being exactly compatible with the libc/libstdc++ version the rest of the code is linked with? What if some of the implementation assumptions in std::list change?

  17. laanwj commented at 11:07 am on April 11, 2014: member

    Do any of the functions reimplemented by this patch actually get called from within Bitcoin? In particular, from within consensus-critical code?

    That would be useful to investigate.

    Is there any risk of this code not being exactly compatible with the libc/libstdc++ version the rest of the code is linked with? What if some of the implementation assumptions in std::list change?

    A significant change in assumptions would break the ABI. After all, the data structures in subsequent versions need to be binary-compatible: a large part of the C++ library consists of templates and headers which get included into the code.

  18. sipa commented at 11:42 am on April 11, 2014: member
    To be clear: I like this solution a lot, but I want to understand the risks.
  19. in src/compat/glibcxx_compat.cpp: in 49a3352c1c outdated
     9+
    10+namespace std {
    11+
    12+const char* bad_exception::what() const throw()
    13+{
    14+  return "std::bad_exception";
    


    Diapolo commented at 12:46 pm on April 11, 2014:
    That file and the above one don’t seem to use our 4 spaces indentation rule.
  20. theuni commented at 5:19 pm on April 11, 2014: member

    @sipa obviously memcpy does. But I think that one is pretty self-explanatory.

    As for the c++ ones, they have a specific purpose, just that the definitions were moved from the header to the binary implementation. I think it’s safe to say that the purpose of those functions won’t change. The only danger I can imagine is that they could (for example) change the side-effects of an internal function that we call.

    For example, imagine that we need to implement list::insert. We call some of their internal methods like operator[] and iterator++, then we change some internal variables. It’s possible that in the future, they could change those variables in the internal functions, so that we’ve actually changed them twice.

    But I don’t think that’s a very reasonable concern. If you look at what’s actually implemented, you’ll see that it’s so low-level that there’s really only one way to handle the function.

    Anyway, let’s enumerate here so we can narrow down the concern:

    _fdelt_warn: Checks for overflows in FD functions. This can be hit (read: this forces a crash) on:

    0fd_set rfds;
    1FD_ZERO(&rfds);
    2FD_SET(FD_SETSIZE+1, &rfds); //BOOM!
    

    I initially had a unit-test for this, but the problem is that we actually want it to crash, and boost tests don’t play nice with the idea of a program abort being a passing test. Plus, it spews a really nasty backtrace.

    bad_alloc/bad_cast/bad_exception: These are cosmetic, used for debugging or logging. If they end up displaying new messages in the future, that’s no problem. Anything comparing these strings would be horribly broken already.

    _List_node_base: These are related to list->insert or so. They were broken out of the headers and moved into the binary implementation, I believe as part of the c++11 split. These were taken as obvious from upstream, and it’s hard to imagine that their meanings could change.

    ostream/istream templates: These just force on new templates that are declared ’extern template’ in the headers. Seems strange that those worked without c++11 anway. gnu extension, i guess?

    out_of_range: This one I’ll admit I don’t know much about. I found that another project received permission to copy glibc’s implementation verbatim, so I did the same.

    If there’s any interest in seeing what pulls in a symbol, you can trace it at link-time using -Wl,–trace-symbol.

    Also, as to the concern about accidentally introducing new symbols, it would be very easy to teach pull-tester to check for that. Though, obviously, it’d need to be using the same libc/libstdc++ as the release compiler.

  21. theuni commented at 5:24 pm on April 11, 2014: member
    Also worth noting that we may be able to drop some of these (the what()s look like good candidates) with smarter linking. Namely -ffunctions-sections + -Wl,–gc-sections, or by enabling lto. Unfortunately, these would need to be used for all dependencies as well since we link statically, so they wouldn’t be trivial to pull off.
  22. theuni commented at 5:25 pm on April 11, 2014: member
    @Diapolo old habit, will fix.
  23. theuni commented at 5:39 pm on April 11, 2014: member

    One last point. The libstdc++ stubs shouldn’t be called in older distros anyway. The fact that those symbols aren’t present means that their libs don’t expect those functions to exist.. they’re routed elsewhere. They just need to be present to satisfy the runtime linker.

    Point being that the testing of these needs to happen on newer distros, rather than old. For old distros, pass/fail should pretty much cover it.

  24. theuni commented at 6:30 pm on April 11, 2014: member
    Err.. above, out-of-range is simple enough, I meant that I hadn’t looked deeply into ctype::_M_widen_init.
  25. theuni commented at 6:48 pm on April 11, 2014: member

    (last spam here, I’m just trying to hit all of this at once)

    See here for the _M_widen_init move: http://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=2dbdf201fad2ea4c13a4826e515dc76cb66f84dd

    libstdc++ even implements it several times (eww). So the abi is pretty much locked in.

  26. build: add symbol for upcoming gcc 4.9's libstdc++ 05c20a553a
  27. BitcoinPullTester commented at 11:33 pm on April 11, 2014: none
    Automatic sanity-testing: PASSED, see http://jenkins.bluematt.me/pull-tester/05c20a553a12d03b1512a75973674c6d25534259 for binaries and test log. This test script verifies pulls every time they are updated. It, however, dies sometimes and fails to test properly. If you are waiting on a test, please check timestamps to verify that the test.log is moving at http://jenkins.bluematt.me/pull-tester/current/ Contact BlueMatt on freenode if something looks broken.
  28. laanwj commented at 9:11 am on April 12, 2014: member

    Compiling everything with -lto would be interesting. As we compile everything including the dependencies from source I expect that there would be quite some savings. On the other hand, the compile time and memory usage juggling all those symbols may make this infeasible. But it’s good to keep in mind.

    One last point. The libstdc++ stubs shouldn’t be called in older distros anyway. The fact that those symbols aren’t present means that their libs don’t expect those functions to exis

    I don’t understand this. If our executable uses libstdc++ functions that are not provided by the older libstdc++’s, I would suppose it will still use them on older distros. ‘Their’ libs don’t expect the symbols to exist, but ‘our’ libs do. Or am I overlooking something?

  29. theuni commented at 3:19 pm on April 12, 2014: member

    @laanwj I had a long explanation written up before I realized that you were right and I was wrong :)

    You’re overlooking how the function is called, but in these cases, it doesn’t matter. For example, _M_unhook() is called by _M_erase(). If _M_erase were present in the libstdc++ shared lib, it would call whichever unhook implementation was hard-coded into it. But, as far as I can see, all of our replaced functions are used by inlined callers, so the effect is that it always will use our stubs.

    You do seem to have a misconception about how our code uses these stubs, though. Keep in mind that we don’t include definitions for these functions. “Our” code never sees them or has any idea that they exist. We just ends up getting routed to them from inside of libstdc++ at runtime.

  30. kristovatlas commented at 0:06 am on April 14, 2014: none

    @laanwj can this also be tested for debian 6 squeeze? this is what Tails is based on, which is the basis of my interest in this issue, since I want people to easily use Bitcoin Core with Tor.

    Edit: Here’s the results of running the 32-bit binaries in the latest version of tails (0.23): … amnesia@amnesia:~/bitcoin-f17e2c7-linux/bin/32$ ./bitcoin-qt ./bitcoin-qt: symbol lookup error: ./bitcoin-qt: undefined symbol: _ZN19QAbstractProxyModel11setItemDataERK11QModelIndexRK4QMapIi8QVariantE … amnesia@amnesia:~/bitcoin-f17e2c7-linux/bin/32$ ./test_bitcoin-qt ./test_bitcoin-qt: symbol lookup error: ./test_bitcoin-qt: undefined symbol: _ZN9QLineEdit18setPlaceholderTextERK7QString

    Full output here: http://paste.ubuntu.com/7247453/

  31. theuni commented at 1:06 am on April 14, 2014: member

    @kristovatlas Thanks for testing. Looks like qt drags in some other symbols. I’ll take a look.

    And yes, Debian stable is my primary motivation.

    Edit: Mm, I misread you. Squeeze is a target too, though.

  32. theuni commented at 1:23 am on April 14, 2014: member
    Ok, those are coming from Qt itself. @laanwj Why are we using shared qt?
  33. laanwj commented at 5:09 am on April 14, 2014: member

    @theuni Statically linking Qt on Linux sucks. If you don’t use the system’s Qt, you’ll lose system-specific theming and plugins (ubuntu appmenu integration, etc). Also, statically linking all the dependencies of Qt (such as X11 libs, possibly OpenGL) is a horrible mess in general.

    How can this be a problem here? The system’s Qt should not use any symbols that are not in the system’s glibc.

  34. theuni commented at 5:19 am on April 14, 2014: member

    @laanwj I’ll reserve judgement on the Qt static link since I haven’t looked into it before for Linux. I’ll take your word for it, but I’m curious so I’ll research a bit. Qt (qt5 at least) dyloads GL and friends via plugins, specifically to avoid compile-time dependencies. It may be worth looking into using qt5 for that reason.

    Those aren’t missing glibc symbols, they’re missing qt symbols. The build-side qt is newer than Squeeze’s, so it’s missing some new functionality. To be more specific, Precise links against qt 4.8, Squeeze uses 4.6.

  35. laanwj commented at 5:22 am on April 14, 2014: member
    Last time I tried it was a mess, but you’re welcome to try.
  36. wtogami commented at 9:35 am on April 14, 2014: contributor

    To be more specific, Precise links against qt 4.8, Squeeze uses 4.6.

    If this is the case, then this would not be a regression from the lucid-based gitian builds of Bitcoin 0.8.x. We should not attempt to support things that old.

    Isn’t Debian stable now 7 or Wheezy?

  37. nightlybitcoin referenced this in commit 25601444b9 on Apr 14, 2014
  38. nightlybitcoin referenced this in commit 7c26d92d49 on Apr 14, 2014
  39. nightlybitcoin referenced this in commit 3748c5ce41 on Apr 14, 2014
  40. quel commented at 2:09 am on April 15, 2014: none
    @wtogami Debian stable is currently 7/Wheezy and qt is 4.8. The strangest build requirement for bitcoin on Debian 7 is that we have to forward port/pull db4.8 from Debian 6/Squeeze/oldstable as db5.1 is the default on 7.
  41. wtogami commented at 2:12 am on April 15, 2014: contributor
    @quel This statement of fact has nothing to do with the gitian binary.
  42. quel commented at 2:19 am on April 15, 2014: none
    @wtogami I will test the gitian binary on Debian 6 and 7 if I can ever pull it. I’m averaging 15KB/s.
  43. quel commented at 3:02 am on April 15, 2014: none
    @wtogami https://download.visucore.com/bitcoin/bitcoin-f17e2c7-linux.tar.gz with gpg signature verified on Deiban 7.4, x86_64, running bin/64 for bitcoind, bitcoin-cli, and bitcoin-qt run for me without errors. That tar.gz appears to be dated 2014-04-10 08:22:55. Additionally, on a 64-bit platform running the 32-bit binaries works without issue as well. Besides click testing and basic cli testing, it is in no way exhaustive. I am happy to attach ldds, logs, or test additional builds.
  44. wtogami commented at 9:15 pm on April 17, 2014: contributor
    @theuni Anything else in particular do you think needs testing before merge?
  45. theuni commented at 0:00 am on April 18, 2014: member

    @wtogami For the most part, I think these are pretty much all or nothing. If bitcoind starts up successfully and runs for a few seconds, I don’t believe any problems would be likely to surface afterward.

    If anyone is uncomfortable with that response, I can look into adding some tests, but they’re low level enough that the tests would be something like:

    0std::list<int> foo, bar;
    1foo.push_back(1); foo.push_back(2); foo.pop_back();
    2bar.push_back(1); bar.push_back(2); bar.pop_back();
    3assert(foo == bar);
    

    Point being that if that failed, it’d be pretty obvious even without the test.

    Essentially, we just need to decide if this is the favored approach, and if so, set some sort of policy to deal with changes in the future.

  46. laanwj commented at 7:23 am on April 18, 2014: member

    @theuni It would be nice to have at least tests that (indirectly) make use of the more complex functions in the compat/ files, like those in list_node and ctype::_M_widen_init(). I don’t care so much about all the errors, exceptions variants.

    As for policy, we want to stay compatible with:

    • GLIBC_2.13+
    • GLIBCXX_3.4+

    Right? Let’s write these things down and put them in some document under doc/…

  47. theuni commented at 8:22 am on April 18, 2014: member

    @laanwj See my comment above about the low-level-ness of these functions. The example above wasn’t abstract, that would be the actual test-case for list_node.

    As for _M_widen_init(), this is all it takes is this to bring it in:

    0#include <iostream>
    1int main()
    2{
    3  for(int i=0; i != 2; i++)
    4    std::cout << std::endl;
    5}
    

    You can verify that by doing a quick compile and grep:

    0cory@cory-i7:~/dev/bitcoin/src(libc-compat)$ g++ test.cpp -O1 -o test && objdump -TC test | grep _M_widen_init
    10000000000000000      DF *UND*  0000000000000000  GLIBCXX_3.4.11 std::ctype<char>::_M_widen_init() const
    

    So cout (among many other things) would be broken with a borked version of that function. I’m not sure there’s really anything to test for that wouldn’t be 100% obvious in the resulting binary.

    I’m not trying to be difficult on this, I hope it doesn’t seem that way. But since we’ve lifted the definitions of these from upstream, it really does seem (to me, anyway) to be all or nothing.

    Any suggestions for alternate tests, or alternative ways of testing?

  48. laanwj commented at 9:56 am on April 18, 2014: member
    Low level is fine. If no difficult contortions are needed to bring the symbols in and test them, then that’s all the better. Let’s add those two examples that you give as sanity tests (maybe use std::iostream instead of std::cout so that no standard output is generated).
  49. wtogami commented at 9:11 pm on April 18, 2014: contributor
    Lucid is glibc-2.11 RHEL6 is glibc-2.12
  50. sipa commented at 9:46 pm on April 18, 2014: member

    I wonder whether we want something like a start-up test at runtime, rather than unit tests (which wouldn’t catch being loaded with an incompatible dynamic library).

    It could also do something like a basic (and very short) memory test, or computing some EC identity to check verification correctness. On the other hand, systems that are sufficiently broken to be detected using a subsecond test likely wouldn’t even boot… @theuni How much work/code do you think supporting 2.11 or 2.12 would be extra? Seems it would be very valuable if this can be extended, but of course… extra work and extra code to maintained.

  51. theuni commented at 9:52 pm on April 18, 2014: member

    @sipa: This is compatible back to glibc 2.9 and libstdc++ 3.4. Qt is a different story though, I’m afraid, see the discussion above.

    I agree that a startup check would be better. I still can’t really wrap my head around what a unit-test would actually be able to verify.

    Basically every operation at startup would depend on these in some way, so I agree that it likely wouldn’t get as far as running anyway. But it couldn’t hurt to do a few quick A==A checks explicitly before anything else, rather than possibly racing with sensitive code.

  52. laanwj commented at 6:57 am on April 19, 2014: member

    @sipa Good idea on a start-up sanity check. @theuni Ok let’s make the policy then:

    • GLIBC_2.11+
    • GLIBCXX_3.4+

    We could even add a script that automatically checks the post-gitian executables if ‘forbidden’ symbols have been used, hence the compat/ wrappers needs updating.

    I don’t care about Qt there but only bitcoind/bitcoin-cli. The older stable distributions are almost exclusively used on headless servers.

  53. gmaxwell commented at 7:43 am on April 22, 2014: contributor
    @sipa ACK on the startup built in self-tests, the fact that e.g. on current fedora the fact that secp256k1 isn’t supported in openssl is only ‘detected’ by a failed assertion deep inside key.cpp when other things run is somewhat concerning to me… but doesn’t need to be part of this pull.
  54. wtogami commented at 2:22 pm on April 22, 2014: contributor

    +1 to doing sanity checks as the next PR. This current PR is ready for merge.

    ACK

  55. laanwj merged this on Apr 22, 2014
  56. laanwj closed this on Apr 22, 2014

  57. laanwj referenced this in commit bbe53f61db on Apr 22, 2014
  58. DrahtBot locked this on Sep 8, 2021

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-10-06 16:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me