Structured logging output #35369

ajtowns commented at 3:57 AM on May 25, 2026: contributor

At the recent coredev it was suggested that it would be helpful to have logging output be more structured. This would go further than #34374 in that not only the internal metadata (time, category, level, source location, thread name) would be provided in a structured way, but so would per-message data, so that eg:

2026-05-24T01:56:29.650279Z [net] got inv: wtx 0629cda06a87a145a229417ba49b564779c7ddafd3cd4c1b388e99d0a059b845 have peer=58556

might instead be reported (perhaps via a -logjson=1 option) as

{"t": 1779587789650279, "c": "net", "l": "debug", "m": "got inv",
 "p": {
    "wtx": "0629cda06a87a145a229417ba49b564779c7ddafd3cd4c1b388e99d0a059b845",
    "have": 1,
    "peer": 58556
  }
}

Internally, that might be represented by adding std::vector<KV> kvs; to util::log::Entry, instead of putting the data in the Entry::message field, eg.

Streaming structured log messages like these over zmq, or storing them directly to a database might also be plausible and enable interesting features.

On the consumer side, I think this is relatively straightforward: include a vector<KV> in util::log::Entry where KV has key/value strings. The only complication is if you'd like to be able to distinguish values that are meant to be numbers, and could be represented in json as "foo": 1.23, in which case adding an "intended type" enum could be worthwhile.

On the producer side, I think there are a few options for how we could make this work. For example:

// current code for comparison:
LogDebug(BCLog::NET, "got inv: %s %s peer=%d", inv.ToString(), fAlreadyHave ? "have" : "new", pfrom.GetId());

// parse printf-style format strings ourselves:
LogDebug(BCLog::NET, "got inv wtx=%s have=%d peer=%d", inv.ToString(), fAlreadyHave, pfrom.GetId());

// switch to std::format syntax but parse ourselves:
LogDebug(BCLog::NET, "got inv wtx={} have={:d} peer={:d}", inv.ToString(), fAlreadyHave, pfrom.GetId());

// custom syntax for providing parameters based on user-defined literals:
LogDebug(BCLog::NET, "got inv", "wtx"_LP(inv.ToString()), "have"_LP(fAlreadyHave), "peer"_LP(pfrom.GetId()));

// custom syntax for providing parameters based on macros:
LogDebug(BCLog::NET, "got inv", LP("wtx", inv.ToString()), LP("have", fAlreadyHave), LP("peer", pfrom.GetId()));

ajtowns added the label Feature on May 25, 2026

ajtowns added the label Brainstorming on May 25, 2026

ajtowns commented at 3:59 AM on May 25, 2026: contributor

My opinions, fwiw:

I think the std::format approach would probably be a pain, since it would suggest that wtx={1} have={0} should allow switching parameters around which would probably be annoying to support, and it would also require changing every format string, for not terribly much benefit as far as I can see.

The two LP approaches are nice in that they put the key and value together making it harder to create mismatches, but are also require big changes to every log call, and seem a bit ugly to me, particularly when adding formatting to them. So I lean towards the "parse printf-style format strings ourselves" option.

0xB10C commented at 10:41 AM on May 25, 2026: contributor

Thanks for opening this for discussion! I think this is an interesting area to explore as it's a human-friendly (some humans can read JSON) machine-to-machine interface for internal node events.

Linking to a previous discussion and experimentation about parsing logs with e.g. a machine learning algorithm called Drain OR Python scripts to extract the possible log lines with their arguments from source code with e.g. clang.cindex: https://github.com/peer-observer/peer-observer/issues/336. The obvious alternative to manual parsing of log lines would be to have structured logging.

In https://github.com/peer-observer/peer-observer/issues/336#issuecomment-3745638203, I've been using the parameter names as argument names. It would require to do stringification, which I don't know if it's support everywhere. For your got inv it would be:

{"t": 1779587789650279, "c": "net", "l": "debug", "m": "got inv",
 "p": {
    "inv.ToString()": "0629cda06a87a145a229417ba49b564779c7ddafd3cd4c1b388e99d0a059b845",
    "fAlreadyHave?\"have\":\"new\"": 1,
    "pfrom.GetId()": 58556
  }
}

These keys are obviously a bit weird and might change over time as variables and functions are renamed. One solution could be to assign them to more readable names and then pass these. That's more boilerplate code. Could look something like this.

- LogDebug(BCLog::NET, "got inv: %s %s peer=%d", inv.ToString(), fAlreadyHave ? "have" : "new", pfrom.GetId()); 
+ const std::string wtx{inv.ToString()};
+ const bool have{fAlreadyHave};
+ const uint64_t peer{pfrom.GetId()};
+ LogDebug(BCLog::NET, "got inv: %s %s peer=%d", wtx, have, peer);

resulting in

{"t": 1779587789650279, "c": "net", "l": "debug", "m": "got inv",
 "p": {
    "wtx": "0629cda06a87a145a229417ba49b564779c7ddafd3cd4c1b388e99d0a059b845",
    "have": 1,
    "peer": 58556
  }
}

ajtowns commented at 6:36 AM on May 26, 2026: contributor

So I lean towards the "parse printf-style format strings ourselves" option.

Had a go at making this work, and it looks pretty plausible to me: https://github.com/ajtowns/bitcoin/commits/202605-log-structured/

$ build/bin/bitcoind -debug=net -logjson
...
{"t":1779777196071951,"c":"net","l":"debug","m":"received: pong (8 bytes)","k":{"peer":3}}
{"t":1779777196072119,"c":"net","l":"debug","m":"received: headers (82 bytes)","k":{"peer":3}}
{"t":1779777196072227,"c":"net","l":"debug","m":"sending sendheaders (0 bytes)","k":{"peer":3}}
{"t":1779777196072471,"c":"net","l":"debug","m":"received: sendheaders (0 bytes)","k":{"peer":3}}
...

EDIT: or, after updating the code:

-    LogDebug(BCLog::NET, "received: %s (%u bytes) peer=%d\n", SanitizeString(msg_type), vRecv.size(), pfrom.GetId());
+    LogDebug(BCLog::NET, "received msg=%s bytes=%u peer=%d\n", SanitizeString(msg_type), vRecv.size(), pfrom.GetId());```

{"t":1779782499995176,"c":"net","l":"debug","m":"received","k":{"msg":"pong","bytes":8,"peer":3}}
{"t":1779782499995338,"c":"net","l":"debug","m":"received","k":{"msg":"headers","bytes":82,"peer":3}}
{"t":1779782499995406,"c":"net","l":"debug","m":"sending sendheaders (0 bytes)","k":{"peer":3}}
{"t":1779782499995557,"c":"net","l":"debug","m":"received","k":{"msg":"sendheaders","bytes":0,"peer":3}}

m4ycon commented at 7:56 PM on May 28, 2026: none

Hello, if I may contribute to this discussion, the first time that I saw this "structured logging" concept it was in this post, where it discuss this in a web server context. One idea from it that I find interesting, is the use of a context per request that accumulates information (that usually would be on separate logs), so then it logs only one thing at the end of the request with all the accumulated information. So when someone is debugging it, they still have all the information to "redo" the code path but with a single, compact log -- you can also see that as an event.

I don't know how well this idea fits into core code, I suppose p2p messages could be related to that, as could rpc requests and similar ones. Probably this could change many things related to logs that seem to be outside the scope of this issue. But anyway, it would be cool to see something similar here.

Also, if there's anything that I can help with this issue, I'd love to -- maybe not necessarily with a PR/commit since you guys have a greater understanding of cpp and more specifically about core (and it seems aj already have some implementation of it). But anything else you guys have in mind I could investigate it further (~~and maybe save you some time idk~~).

As I'm working with log-extractor from peer-observer, this might affect/help us in some way.

hulxv commented at 7:59 PM on May 29, 2026: none

Thanks for opening this for discussion! I think this is an interesting area to explore as it's a human-friendly (some humans can read JSON) machine-to-machine interface for internal node events.

Linking to a previous discussion and experimentation about parsing logs with e.g. a machine learning algorithm called Drain OR Python scripts to extract the possible log lines with their arguments from source code with e.g. clang.cindex: peer-observer/peer-observer#336. The obvious alternative to manual parsing of log lines would be to have structured logging.

In peer-observer/peer-observer#336 (comment), I've been using the parameter names as argument names. It would require to do stringification, which I don't know if it's support everywhere. For your got inv it would be:

{"t": 1779587789650279, "c": "net", "l": "debug", "m": "got inv", "p": { "inv.ToString()": "0629cda06a87a145a229417ba49b564779c7ddafd3cd4c1b388e99d0a059b845", "fAlreadyHave?"have":"new"": 1, "pfrom.GetId()": 58556 } }

These keys are obviously a bit weird and might change over time as variables and functions are renamed. One solution could be to assign them to more readable names and then pass these. That's more boilerplate code. Could look something like this.

LogDebug(BCLog::NET, "got inv: %s %s peer=%d", inv.ToString(), fAlreadyHave ? "have" : "new", pfrom.GetId());

const std::string wtx{inv.ToString()};

const bool have{fAlreadyHave};

const uint64_t peer{pfrom.GetId()};

LogDebug(BCLog::NET, "got inv: %s %s peer=%d", wtx, have, peer);

resulting in

{"t": 1779587789650279, "c": "net", "l": "debug", "m": "got inv", "p": { "wtx": "0629cda06a87a145a229417ba49b564779c7ddafd3cd4c1b388e99d0a059b845", "have": 1, "peer": 58556 } }

we can make this approach simpler and provide a macro for naming the parameters instead of using temp variables for them. also instead of this meaningless m, we can embed the full message after formatting

something like that:

LogDebug(BCLog::NET, "got inv: %s %s peer=%d", 
         ITEM("inv", inv.ToString()), 
         ITEM("have", fAlreadyHave ? "have" : "new"), 
         pfrom.GetId());

output:

{
  "t": 1779587789650279,
  "c": "net",
  "m": "got inv: 0629cda06a87a145a229417ba49b564779c7ddafd3cd4c1b388e99d0a059b845 have peer=58556",
  "p": {
    "inv": "0629cda06a87a145a229417ba49b564779c7ddafd3cd4c1b388e99d0a059b845",
    "have": "have",
    "pfrom.GetId()": 58556
  }
}

This approach is insipred by tracing-rs crate in Rust:

debug!(
    target: "net",
    inv = %inv.to_string(),
    have = if f_already_have { "have" } else { "new" },
    peer = pfrom.get_id(),
    "got inv: {} {} peer={}", 
    %inv.to_string(), 
    if f_already_have { "have" } else { "new" }, 
    pfrom.get_id()
);