Better error and log messages #218
pull ryanofsky wants to merge 2 commits into bitcoin-core:master from ryanofsky:pr/errs changing 5 files +83 −4-
ryanofsky commented at 9:45 pm on September 24, 2025: collaboratorThis PR contains some logging and error message improvements I made locally in recent months that seem useful to upstream. Details are in commit messages
-
DrahtBot commented at 9:45 pm on September 24, 2025: none
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.
Reviews
See the guideline for information on the review process.
Type Reviewers Concept ACK ismaelsadeeq Stale ACK theuni If your review is incorrectly listed, please copy-paste <!–meta-tag:bot-skip–> into the comment that the bot should ignore.
Conflicts
Reviewers, this pull request conflicts with the following ones:
- #240 (Avoid errors from asynchronous (non-c++) clients by ryanofsky)
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.
-
ryanofsky force-pushed on Sep 25, 2025
-
ryanofsky commented at 1:50 pm on September 25, 2025: collaboratorUpdated 395e3266b5e29a3241e8ea6c5e4024971c21dd39 -> a5b8a2a3293a9650feab1101afa35a1d47981049 (
pr/errs.1->pr/errs.2, compare) to fix iwyu errors (https://github.com/bitcoin-core/libmultiprocess/actions/runs/17990500563) -
ismaelsadeeq commented at 10:26 am on September 26, 2025: memberConcept ACK
-
ryanofsky force-pushed on Sep 26, 2025
-
in include/mp/util.h:261 in 0ce75e3162 outdated
256+inline std::string _demangle(const char* m) { return m; } 257+#endif 258+ 259+template<class T> 260+std::string TypeName(const T& /*unused*/) 261+{
theuni commented at 6:59 pm on October 1, 2025:Since this is only done in one place now, maybe wrap this in (c++20 standard)#ifdef __cpp_rtti? I checked quickly, and gcc and clang treat that define as-expected with or with or without-fno-rtti.
ryanofsky commented at 7:55 pm on October 2, 2025:re: #218 (review)
Sure, added
#ifdef __cpp_rtti, though have not tested without rtti myself.in include/mp/util.h:247 in 0ce75e3162
243@@ -238,6 +244,24 @@ inline char* CharCast(unsigned char* c) { return (char*)c; } 244 inline const char* CharCast(const char* c) { return c; } 245 inline const char* CharCast(const unsigned char* c) { return (const char*)c; } 246 247+#if defined(__GNUG__) // GCC & Clang ─ use <cxxabi.h> to demangle
theuni commented at 7:16 pm on October 1, 2025:I don’t love this without an escape-hatch for Frankenstein compilers like
clang-cl. Something like#if defined(__GNUG__) && !defined(MP_NO_DEMANGLE)would make it more bulletproof.That can always be added if it ever becomes a real problem though.
ryanofsky commented at 7:53 pm on October 2, 2025:re: #218 (review)
Thanks, replaced this with
__has_include(<cxxabi.h>)which I think should be more on-target than__GNUG__. I’d hope for something like this, the compiler could provide enough information to detect the feature without needing to involve cmake, but cmake is available if it turns out to be necessary.theuni commented at 7:22 pm on October 1, 2025: contributorcode review ACK 0ce75e3162bcc605757cd36cb3b90541d7a13a88.
The logging additions look helpful, though they’ll need to be updated to use the new macros from #220.
I’d rather hold off on that second commit until after v30 has settled though, since it introduces some platform-specific functionality that has the (admittedly small) potential for causing build problems.
21b2f653f4logging: Add better logging on IPC server-side failures
Motivation: when trying to add a new unit test ran into a confusing error "(remote):0: failed: remote exception: Called null capability" caused by forgetting to make a FooInterface.initThreadMap call during initialization and these logging prints would have made it easier to see where the error was coming from and debug.
debug: Add TypeName() function and log statements for Proxy objects being created and destroyed 2af6a9bc30DrahtBot added the label Needs rebase on Oct 2, 2025ryanofsky force-pushed on Oct 2, 2025ryanofsky commented at 7:58 pm on October 2, 2025: collaboratorRebased 0ce75e3162bcc605757cd36cb3b90541d7a13a88 -> 7283a2b3d3b9acbc1b3939ef054f1d1537cdfebd (
pr/errs.3->pr/errs.4, compare) to fix conflict with #220, and also applied review suggestions, and renamedTypeNametoCxxTypenamesince there is actually already anotherTypeNamefunction for cap’nproto typesUpdated 7283a2b3d3b9acbc1b3939ef054f1d1537cdfebd -> 2af6a9bc3036895ddddbd80d1f7789bec92a6516 (
pr/errs.4->pr/errs.5, compare) fixing typo found by drahtbotryanofsky force-pushed on Oct 2, 2025DrahtBot removed the label Needs rebase on Oct 2, 2025ryanofsky commented at 9:12 am on October 3, 2025: collaboratorgnu32 CI job is failing but I think it is not directly related to changes in this PR. The new thread busy test introduced in #214 is segfaulting here but it has a known race condition fixed in #222 . The segfault is consistent with the race and might just be happening now because new log prints in this PR change the timing of the test, and fact that bugs tend to trigger more segfaults in the gnu32 job because of the smaller memory space.
Will cherry-pick the race fix from #222 as a workaround to see if it fixes the segfault. Assuming this works, if reviewing this PR, would be nice to just ACK first two commits (2af6a9bc3036895ddddbd80d1f7789bec92a6516) and ignore the workaround commit so it can be dropped later without needing to rebase or re-review.
Added 1 commits 2af6a9bc3036895ddddbd80d1f7789bec92a6516 -> 0fc31ec036091fcc5e62372fade78faa80e83d0f (
pr/errs.5->pr/errs.6, compare) adding attempted workaround for CI failureryanofsky commented at 4:06 pm on October 3, 2025: collaboratorReally struggling to debug the gnu32 segfault in the “thread busy” test. Failed to get a stack trace from core dump in CI, but I was able to make failure happen locally compiling with gnu32 and asan, though failure seems to be just a normal segfault and there aren’t asan errors. GDB shows exception happening in:
serverInvokeproxy-types.h:734kj::str<char const (&) [16], kj::Exception&>kj/string.h:627kj::toCharSequence<kj::Exception&>kj/string.h:610kj::operator*kj/exception.c++:779
In serverInvoke, the
kj::Exception&object is valid and holds the “thread busy” error, but after the argument is forwarded tokj::operator*, thekj::Exception&argument somehow points to a different address with garbage data and the result is a segfault. I have no idea how this is happening because the two functions in between are just forwarding the reference argument, and they are hard to inspect with GDB because they are varargs functions.The failure seems very strange, but the segfault is happening in the line of code in
serverInvokethat this PR is changing, so I don’t think it is spurious. Will have to continue debugging. I hope it is not just something stupid in the gnu32 CI job setup.in include/mp/proxy-types.h:729 in 21b2f653f4 outdated
725@@ -726,6 +726,13 @@ kj::Promise<void> serverInvoke(Server& server, CallContext& call_context, Fn fn) 726 MP_LOG(*server.m_context.loop, Log::Debug) << "IPC server send response #" << req << " " << TypeName<Results>(); 727 MP_LOG(*server.m_context.loop, Log::Trace) << "response data: " 728 << LogEscape(call_context.getResults().toString(), server.m_context.loop->m_log_opts.max_chars); 729+ }, [&server, req](::kj::Exception&& e) {
theuni commented at 10:15 pm on October 3, 2025:isn’t the local
server_capturebeing captured by reference problematic ifinvokeis being called async? Possible that’s smashing the stack here?(Maybe I’m way off, these tangled callbacks a breaking my brain).
But I think I’d expect this to be
return ReplaceVoid([server_context = std::move(server_context)]()...
ryanofsky commented at 12:26 pm on October 4, 2025:I should double-check all the lifetimes here, but I think the problem in the gnu32 job is weirder than a lifetime issue, because in gdb I can inspect the
kj::Exception&& evariable here and it is a valid object containing the “thread busy” error and full context information. But then two calls deeper in the call stack, the stringify function which is supposed to print the exception, somehow receives a garbage reference to akj::Exceptionwith a completely different address. So it seems like there is an ABI issue or compiler bug exposed by this change, but I am not sure.On
servervsserver_context, theserverobject is actually the one with the longer lifetime. When you declare aninterface MyInterface { myMethod [@0](/bitcoin-core-multiprocess/contributor/0/) () -> (); }capnproto generates C++ class like:0class MyInterface::Server { 1 virtual Promise<void> myMethod(MyMethodContext) = 0; 2}which you inherit from to implement all the methods. The
servervariable is a reference to the object implementing these methods, and there is also aMyInterface::Clientclass generated to call the methods. The lifetime of the server object is completely controlled by Cap’n Proto. It will keep the server around until nothing is referencing it, so it will be alive as long as any client on a live connection exists, and as long as any IPC call is in progress, even if the client is released or disconnected.By contrast
server_contextis just a local variable in theserverInvokefunction, so you would need to copy or move it like you suggested to use it in a promise.then() handler but it is fine to use by reference inReplaceVoidlambdas because of the wayReplaceVoidis defined, where it just calls them right away before returning. I try to use convention of using[&]and implicit captures for any lambda that will be called right away before the statement returns, and using explicit captures for lambdas that are delayed callbacks.
maflcko commented at 12:12 pm on October 6, 2025:So it seems like there is an ABI issue or compiler bug exposed by this change, but I am not sure.
I haven’t looked at anything here, but I presume capnp is compiled with clang in the CI and libmultiprocess with GCC, so it just reminds me of https://github.com/bitcoin/bitcoin/issues/31772. But I haven’t checked any of this.
DrahtBot added the label Needs rebase on Oct 7, 2025ryanofsky force-pushed on Jan 9, 2026DrahtBot removed the label Needs rebase on Jan 9, 2026ryanofsky referenced this in commit ad8e5f93b0 on Feb 12, 2026Sjors referenced this in commit 3aa343cf37 on Feb 12, 2026Sjors commented at 5:45 pm on February 12, 2026: memberSince you mentioned this in https://github.com/bitcoin/bitcoin/pull/34568, I should probably review this. Can you give it a rebase?ryanofsky commented at 6:07 pm on February 12, 2026: collaboratorSince you mentioned this in bitcoin/bitcoin#34568, I should probably review this. Can you give it a rebase?
I think this PR effectively depends on #240, because there is a problem with the “thread busy” exception test in gnu32 CI job that I can’t figure out or reliably reproduce, and #240 rewrites thread busy behavior to avoid the exceptions that seem to trigger the problem.
I could rebase this on #240, but if you have time to review #240 it would be very helpful and I am happy to explain anything there.
ryanofsky referenced this in commit c2dc14ab53 on Feb 12, 2026ryanofsky referenced this in commit f7ca042b6a on Feb 12, 2026
github-metadata-mirror
This is a metadata mirror of the GitHub repository bitcoin-core/libmultiprocess. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-02-17 00:30 UTC
This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me