Avoid use of “socket” syscall when formatting IP addresses in CNetAddr::ToString #21466

issue practicalswift openend this issue on March 18, 2021
  1. practicalswift commented at 11:00 am on March 18, 2021: contributor

    When fuzzing our codebase in a restricted syscall setup I noticed that calling CNetAddr::ToString triggers the use of the socket syscall. The syscall is made indirectly via getnameinfo.

    AFAICT getnameinfo is used in our code only to format IPv6 addresses in their “shortened” form. I think our formatting only use case should be possible to fully cover without involving the kernel via socket :)

    Not a big deal of course, but it would be nice if kept the direct and indirect use of networking related syscalls such as socket to functions where they cannot be avoided. That would make reasoning about syscall restrictions and/or application-vs-kernel boundaries somewhat easier.

    The call chain is: CNetAddr::ToString > CNetAddr::ToStringIP > getnameinfo > gni_host > gni_host_inet > gni_host_inet_numeric > if_indextoname > __opensock > socket.

    Live demo:

     0$ FUZZ=netaddress src/test/fuzz/fuzz
     1
     2ERROR: The syscall "socket" (syscall number 41) is not allowed by the syscall sandbox in thread "test". Please report. Exiting.
     3terminate called without an active exception
     4==13417== ERROR: libFuzzer: deadly signal
     5    [#0](/bitcoin-bitcoin/0/) 0x5581be296819 in __sanitizer_print_stack_trace compiler-rt/lib/ubsan/ubsan_diag_standalone.cpp:33
     6    [#1](/bitcoin-bitcoin/1/) 0x5581be22e9f8 in fuzzer::PrintStackTrace() compiler-rt/lib/fuzzer/FuzzerUtil.cpp:210                                                                                                                                                                                        [#2](/bitcoin-bitcoin/2/) 0x5581be15a14a in fuzzer::Fuzzer::CrashCallback() (.part.48) compiler-rt/lib/fuzzer/FuzzerLoop.cpp:233
     7    [#3](/bitcoin-bitcoin/3/) 0x5581be20ab47 in fuzzer::Fuzzer::CrashCallback() compiler-rt/lib/fuzzer/FuzzerLoop.cpp:205
     8    [#4](/bitcoin-bitcoin/4/) 0x5581be20ab47 in fuzzer::Fuzzer::StaticCrashSignalCallback() compiler-rt/lib/fuzzer/FuzzerLoop.cpp:204
     9    [#5](/bitcoin-bitcoin/5/) 0x7f3139b2089f  (/lib/x86_64-linux-gnu/libpthread.so.0+0x1289f)
    10    [#6](/bitcoin-bitcoin/6/) 0x7f3138526f46 in __libc_signal_restore_set /build/glibc-2ORdQG/glibc-2.27/signal/../sysdeps/unix/sysv/linux/nptl-signals.h:80
    11    [#7](/bitcoin-bitcoin/7/) 0x7f3138526f46 in raise /build/glibc-2ORdQG/glibc-2.27/signal/../sysdeps/unix/sysv/linux/raise.c:48
    12    [#8](/bitcoin-bitcoin/8/) 0x7f31385288b0 in abort /build/glibc-2ORdQG/glibc-2.27/stdlib/abort.c:79
    13    [#9](/bitcoin-bitcoin/9/) 0x7f3139db9956  (/usr/lib/x86_64-linux-gnu/libstdc++.so.6+0x8c956)
    14    [#10](/bitcoin-bitcoin/10/) 0x7f3139dbfae5  (/usr/lib/x86_64-linux-gnu/libstdc++.so.6+0x92ae5)
    15    [#11](/bitcoin-bitcoin/11/) 0x7f3139dbfb20 in std::terminate() (/usr/lib/x86_64-linux-gnu/libstdc++.so.6+0x92b20)
    16    [#12](/bitcoin-bitcoin/12/) 0x5581bec4e6f7 in (anonymous namespace)::SyscallSandboxDebugSignalHandler(int, siginfo_t*, void*) src/util/syscall_sandbox.cpp:71:5
    17    [#13](/bitcoin-bitcoin/13/) 0x7f3139b2089f  (/lib/x86_64-linux-gnu/libpthread.so.0+0x1289f)
    18    [#14](/bitcoin-bitcoin/14/) 0x7f313860b076 in socket /build/glibc-2ORdQG/glibc-2.27/socket/../sysdeps/unix/syscall-template.S:78
    19    [#15](/bitcoin-bitcoin/15/) 0x7f313860b2b0 in __opensock /build/glibc-2ORdQG/glibc-2.27/socket/../sysdeps/unix/sysv/linux/opensock.c:100
    20    [#16](/bitcoin-bitcoin/16/) 0x7f3138626fd3 in if_indextoname /build/glibc-2ORdQG/glibc-2.27/inet/../sysdeps/unix/sysv/linux/if_index.c:226
    21    [#17](/bitcoin-bitcoin/17/) 0x7f3138626abd in gni_host_inet_numeric /build/glibc-2ORdQG/glibc-2.27/inet/getnameinfo.c:354
    22    [#18](/bitcoin-bitcoin/18/) 0x7f3138626abd in gni_host_inet /build/glibc-2ORdQG/glibc-2.27/inet/getnameinfo.c:389
    23    [#19](/bitcoin-bitcoin/19/) 0x7f3138626abd in gni_host /build/glibc-2ORdQG/glibc-2.27/inet/getnameinfo.c:422
    24    [#20](/bitcoin-bitcoin/20/) 0x7f3138626abd in getnameinfo /build/glibc-2ORdQG/glibc-2.27/inet/getnameinfo.c:539
    25    [#21](/bitcoin-bitcoin/21/) 0x5581beab72e3 in CNetAddr::ToStringIP[abi:cxx11]() const src/netaddress.cpp:580:18
    26
    
  2. practicalswift renamed this:
    Avoid use of "socket" syscall when formatting IP-addresses in CNetAddr::ToStringIP
    Avoid use of "socket" syscall when formatting IP addresses in CNetAddr::ToStringIP
    on Mar 18, 2021
  3. practicalswift renamed this:
    Avoid use of "socket" syscall when formatting IP addresses in CNetAddr::ToStringIP
    Avoid use of "socket" syscall when formatting IP addresses in CNetAddr::ToString
    on Mar 18, 2021
  4. vasild commented at 3:01 pm on March 19, 2021: member

    Concept ACK (to format the IPv6 addresses without calling (indirectly) socket()).

    Would inet_ntop() also call socket()?

  5. practicalswift commented at 3:55 pm on March 19, 2021: contributor

    @vasild

    I think we should make our function IPv6ToString do proper zero compression (should be doable in say ~25 LOC) and then use that from CNetAddr::ToStringIP.

    This is the patch I’m thinking at the call site:

     0 std::string CNetAddr::ToStringIP() const
     1 {
     2     switch (m_net) {
     3     case NET_IPV4:
     4+        return IPv4ToString(m_addr);
     5     case NET_IPV6: {
     6-        CService serv(*this, 0);
     7-        struct sockaddr_storage sockaddr;
     8-        socklen_t socklen = sizeof(sockaddr);
     9-        if (serv.GetSockAddr((struct sockaddr*)&sockaddr, &socklen)) {
    10-            char name[1025] = "";
    11-            if (!getnameinfo((const struct sockaddr*)&sockaddr, socklen, name,
    12-                             sizeof(name), nullptr, 0, NI_NUMERICHOST))
    13-                return std::string(name);
    14-        }
    15-        if (m_net == NET_IPV4) {
    16-            return strprintf("%u.%u.%u.%u", m_addr[0], m_addr[1], m_addr[2], m_addr[3]);
    17-        }
    18         return IPv6ToString(m_addr);
    19     }
    

    I find that much easier to reason about :)

  6. vasild commented at 2:03 pm on March 21, 2021: member

    I am trying to see if inet_ntop() would also call socket(), however I can’t get even getnameinfo() to call socket() (with Debian GLIBC 2.24-11+deb9u4).

     0#include <iostream>
     1#include <netdb.h>
     2#include <netinet/in.h>
     3#include <stdio.h>
     4#include <sys/socket.h>
     5#include <sys/types.h>
     6
     7int
     8main(int argc, char** argv)
     9{
    10  sockaddr_in6 a;
    11  a.sin6_scope_id = 0;
    12  a.sin6_addr = in6_addr({ { { 5,6,7,88,123,0,0,1,0,9,0,0,20,0,0,1 } } });
    13  a.sin6_family = AF_INET6;
    14  a.sin6_port = htons(8333);
    15  char name[NI_MAXHOST];
    16  int ret = getnameinfo((const struct sockaddr*)&a, sizeof(a), name, sizeof(name), nullptr, 0,
    17      NI_NUMERICHOST);
    18  if (ret == 0) { 
    19    std::cout << name << std::endl;
    20  } else { 
    21    std::cout << gai_strerror(ret) << std::endl;
    22  } 
    23  return 0;
    24}
    
  7. MarcoFalke referenced this in commit 97e6a7f98c on Mar 29, 2021
  8. practicalswift commented at 10:53 am on March 29, 2021: contributor

    The IPv6 zero compression tests in #21477 have now been merged.

    With these test cases to test against it should now be fairly trivial to a.) make IPv6ToString do proper zero compression when building the std::string representation of the IPv6 address, and b.) make use of IPv6ToString from CNetAddr::ToStringIP.

    Zero compression might sound fancy but it is really simple:

    • Find longest sequence of consecutive all-zero fields. Use first zero sequence if two or more zero sequences of equal length are found.
    • Replace the longest sequence of consecutive all-zero fields with two colons (::).

    In other words 2001:db8:0:0:1:0:0:1 uncompressed becomes 2001:db8::1:0:0:1 compressed. More test cases can be found in #21477.

    The implementation of zero compression in Rust’s std::net::Ipv6Addr might serve as an inspiration:

    https://github.com/rust-lang/rust/blob/cc4103089f40a163f6d143f06359cba7043da29b/library/std/src/net/ip.rs#L1635-L1683

    Anyone who wants to try to take on the task of making IPv6ToString do proper IPv6 zero compression? :)

    This is our current implementation which doesn’t do zero compression:

    https://github.com/bitcoin/bitcoin/blob/b9f41df1ead4b6a83a51fc41966b111c8459c313/src/netaddress.cpp#L554-L568

  9. laanwj commented at 11:10 am on March 29, 2021: member

    Good catch, and I agree, conceptually something cosmetic like formatting a IPv6 address should not need any kind of interaction with DNS services, not even with the operating system.

    Zero compression might sound fancy but it is really simple:

    “Good first issue” material maybe?

  10. laanwj added the label P2P on Mar 29, 2021
  11. sidhujag referenced this in commit 561ede4d32 on Mar 29, 2021
  12. vasild commented at 3:59 pm on March 29, 2021: member

    If inet_ntop() does the formatting without calling socket(), then there would not be need to re-invent the wheel?

    I did some testing in which I did not see either one of inet_ntop() or getnameinfo() calling socket().

  13. practicalswift commented at 7:20 pm on March 30, 2021: contributor

    @vasild

    The goal here is to avoid unexpected networking syscalls to allow for syscall sandboxing: see sandboxing PRs #20487 and #21538 for context.

    More specifically we want to be able to reason about syscall usage to be able to say things like: “this non-networking thread is not expected to make use of any networking related syscalls”.

    One advantage of doing the formatting in pure C++ is that such a solution would be absolutely trivial to review from a syscall usage perspective. In other words: as reviewer I would be able to reason with certainty about syscall usage simply by reading the patch.

    One disadvantage of relying on say inet_ntop is that such a solution would not be as trivial to review from a syscall usage perspective. Note that the set of syscalls used by inet_ntop may vary between implementations, between configurations/systems and also between versions of the same implementation. In other words: as reviewer I would not be able to reason with certainty about syscall usage simply by reading the patch.

    And as laanwj notes: “[…] conceptually something cosmetic like formatting a IPv6 address should not need any kind of interaction with DNS services, not even with the operating system.”

    That’s why I personally prefer a pure C++ solution in this case: the benefit of being able to reason with certainty about syscall usage is worth the cost (an extra ~25 LOC).

  14. practicalswift commented at 7:34 pm on April 29, 2021: contributor
    Suggested fix in #21756 :)
  15. laanwj closed this on May 17, 2021

  16. PastaPastaPasta referenced this in commit 4a11855818 on Jun 27, 2021
  17. PastaPastaPasta referenced this in commit 7d36737848 on Jun 28, 2021
  18. PastaPastaPasta referenced this in commit 7a8e2977da on Jun 29, 2021
  19. DrahtBot locked this on Aug 18, 2022

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-04-03 21:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me