RFC: ArgsManager type and range checking

ryanofsky commented at 1:42 am on September 15, 2021: contributor

This issue exists to explore ways of improving the ArgsManager interface to be more convenient and add compile-time type safety.

#16545 adds runtime type checking for ArgsManager options, but it isn’t changing ArgsManager AddArg / GetArg interface, only adding new flags and behaviors to the existing interface, so it is not taking advantage of C++’s type safety to enforce at compile time that options are always retrieved with the same types they were registered with. A branch that implements this feature is being developed at:

https://github.com/ryanofsky/bitcoin/commits/pr/argtype

And is described in (comment) and (comment).

Planned additional features are:

Dropping dependency on #16545 and making new interface backwards compatible with existing arguments.
Writing a scripted diff to port all current ArgsManager AddArg and GetArg calls to use the new interface and be backwards compatible, without changing any behavior
Adding helpers to read arguments directly into Options structs like BlockManager::Options MemPoolOptions without needing to list types or struct members more than once.
Adding support for custom types (enums, structs, and classes) so Options structs are not limited to reading bool, int, and string options.
Adding documentation about the implementation and how to use the interface.

This issue description has been edited since the issue was opened. The original description was:

Originally posted by @ryanofsky in #22766 (review)

The way I would like range checking to work in the future would be to rely more on C++ types and std::numeric_limits. The idea is arguments would be registered using explicit C++ types:

0const Setting<int> SETTING_myint("-myint", "description");
1const Setting<std::string> SETTING_mystring("-mystring", "description");
2const Setting<std::vector<std::string>> SETTING_mylist("-mylist", "description");
3const Setting<std::optional<std::uint16_t>> SETTING_myopt("-myopt", "description");
4const Setting<SettingsValue> SETTING_mylegacy("-mylegacy", "description");
5
6void RegisterArgs(ArgsManager& args)
7{
8    args.Register({SETTING_myint, SETTING_mystring, SETTING_mylist, SETTING_myopt, SETTING_mylegacy});
9}

and then they could be retrieved in a type safe way:

0args.Get(SETTING_myint);    // returns int
1args.Get(SETTING_mystring); // returns std::string
2args.Get(SETTING_mylist);   // returns std::vector<std::string>
3args.Get(SETTING_myopt);    // returns std::optional<uint16_t>
4args.GetArg/GetArgs/GetIntArg/GetBoolArg(SETTING_mylegacy); // returns requested type

To get to this point, this PR cleans up existing misused flags and misnamed functions. PR #16545 adds type validation and runtime semantics without changing the ArgsManager API, and a followup PR can improve the API and update call sites without changing the semantics. (There is a direct correspondence between the ALLOW_ flags from #16545 and the useful C++ settings types bool/int/std::string/std::optional/std::variant/std::vector)

ajtowns #22766 (review)

I think I finally figured out how you can go a little bit further than the above, ending up with something like:

 0struct NetSettings
 1{
 2    int64_t blockreconstructionextratxn;
 3    int64_t maxorphantx;
 4    bool capturemessages;
 5
 6    template<typename C, typename... Args>
 7    static inline void F(Args&... args) {
 8        return C::Do(args...,
 9            C::Defn( &NetSettings::blockreconstructionextratxn, "-blockreconstructionextratxn=<n>", strprintf("Extra transactions to keep in memory for compact block reconstructions (default: %u)", DEFAULT_BLOCK_RECONSTRUCTION_EXTRA_TXN), ArgsManager::ALLOW_ANY, OptionsCategory::OPTIONS, DEFAULT_BLOCK_RECONSTRUCTION_EXTRA_TXN ),
10            C::Defn( &NetSettings::maxorphantx, "-maxorphantx=<n>", strprintf("Keep at most <n> unconnectable transactions in memory (default: %u)", DEFAULT_MAX_ORPHAN_TRANSACTIONS), ArgsManager::ALLOW_ANY, OptionsCategory::OPTIONS, DEFAULT_MAX_ORPHAN_TRANSACTIONS ),
11            C::Defn( &NetSettings::capturemessages, "-capturemessages", "Capture all P2P messages to disk", ArgsManager::ALLOW_ANY | ArgsManager::DEBUG_ONLY, OptionsCategory::DEBUG_TEST, false )
12        );
13    }
14};
15
16void RegisterNetProcessingArgs(ArgsManager& args)
17{
18    SettingsRegister<NetSettings>::Register(args);
19}
20
21NetSettings GetNetSettings(const ArgsManager& args)
22{
23    return SettingsRegister<NetSettings>::Get(args);
24}
25
26class PeerManagerImpl
27{
28private:
29    const NetSettings m_settings;
30    PeerManagerImpl(..., const ArgsManager& args) : m_settings{GetNetSettings(args)), ... { ... }
31    ...
32};

The idea being that this way:

it can infer the argument type directly from the type of the struct member so that you can’t accidentally specify different types between args.AddArg<int> and Get<bool>
that the settings are const at runtime so can be accessed without any additional locks
you only have to access the ArgsManager (and do string parsing) at setup time
you don’t have to make up lots of new names for everything or add too much boilerplate

Branch built on top of this PR that has the above working at https://github.com/ajtowns/bitcoin/tree/202109-settings_struct

hebasto commented at 12:32 pm on September 25, 2021: member

The way I would like range checking to work in the future would be to rely more on C++ types and std::numeric_limits.

Concept ACK.

maflcko referenced this in commit 825f4a64e6 on Sep 27, 2021

jamesob commented at 2:38 pm on January 3, 2022: member

Possibly related: https://github.com/bitcoin/bitcoin/issues/23843

ajtowns commented at 2:33 am on September 28, 2022: contributor

Had a go at revisiting this with #26177 in mind. Here’s a branch, https://github.com/ajtowns/bitcoin/commits/202209-chainstate-argsman-types and here’s an example of what it might look like:

 0class SigNetOptionsRegistration
 1{
 2public:
 3    using T = CChainParams::SigNetOptions;
 4
 5    static inline void GetChallenge(std::vector<uint8_t>& challenge, const std::string& hex)
 6    {
 7        challenge = ParseHex(hex);
 8    }
 9
10    template<typename C, typename Op>
11    static inline void Register(Op& op)
12    {
13        return C::Do(op,
14            C::Defn(&T::challenge, GetChallenge, "-signetchallenge", "", "Blocks must satisfy the given script to be considered valid (only for signet networks; defaults to the global default signet test network challenge)", ArgsManager::ALLOW_ANY | ArgsManager::DISALLOW_NEGATION, OptionsCategory::CHAINPARAMS),
15            C::Defn(&T::seeds, "-signetseednode", "", "Specify a seed node for the signet network, in the hostname[:port] format, e.g. sig.net:1234 (may be used multiple times to specify multiple seed nodes; defaults to the global default signet test network seed node(s))", ArgsManager::ALLOW_ANY | ArgsManager::DISALLOW_NEGATION, OptionsCategory::CHAINPARAMS)
16        );
17    }
18};
19...
20void RegisterChainParamsOptions(ArgsManager& args)
21{
22    SettingsRegister<SigNetOptionsRegistration>::Register(args);
23    SettingsRegister<RegTestOptionsRegistration>::Register(args);
24}
25
26CChainParams::SigNetOptions GetSigNetOptions(const ArgsManager& args)
27{
28    return SettingsRegister<SigNetOptionsRegistration>::Get(args);
29}

Idea is you make a container for the options (T = SigNetOptions), provide a Register function that associates the options with the field in that structure (and a conversion function like GetChallenge if needed), then use SettingsRegister<>::Register(args) to register the args, then parse them, then get the results with SettingsRegister<>::Get(args), and you’re done.

ajtowns commented at 9:27 am on August 23, 2023: contributor

Had a go at revisiting this with #26177 in mind. Here’s a branch, https://github.com/ajtowns/bitcoin/commits/202209-chainstate-argsman-types and here’s an example of what it might look like: @ryanofsky I’ve updated this (and switched it to the UpdateFooOpts(args, options) style instead of options = GetFooOpts(args)) in https://github.com/ajtowns/bitcoin/pull/8 if you (or anyone else) are interested in bikeshedding it towards a concept/approach ack?

ryanofsky commented at 4:45 pm on August 24, 2023: contributor

@ryanofsky I’ve updated this (and switched it to the UpdateFooOpts(args, options) style instead of options = GetFooOpts(args)) in ajtowns#8 if you (or anyone else) are interested in bikeshedding it towards a concept/approach ack?

Looks nice. It seems similar to your NetSettings example above, #22978#issue-996573480, but much better now because it now works on plain options structs like DBOptions, CoinsViewOptions, wallet::DatabaseOptions, and no longer requires the structs to contain argument parsing code (if I understand the commented out example from 7cb67a694e82c52b71ec86d23cb574465f9aad22 correctly).

I’m still not sure I like the idea of encouraging all options to be stored in separate structs. I think that idea makes most sense for options that are used once at initialization time, since they can be parsed into the ArgsManager::m_settings map, then copied into the struct, and we never have to care about what happens after that. But if we added an RPC or GUI interface for changing settings at runtime and validating them and applying them and storing them in settings.json then it would be probably be simpler without the structs, because ArgsManager::m_settings would be the canonical place where settings are stored, and there wouldn’t be external structs that need to be updated.

In any case, I think your idea makes an improvement over the status quo by adding compile time checks. And it could probably be generalized latter to support an API more similar to the description #22978#issue-996573480 where settings can be retrieved in a type-safe way agnostic to how they are stored.

ajtowns commented at 5:00 am on August 25, 2023: contributor

(if I understand the commented out example from https://github.com/bitcoin/bitcoin/commit/7cb67a694e82c52b71ec86d23cb574465f9aad22 correctly).

I think you do; the -fastprune option in https://github.com/ajtowns/bitcoin/pull/8/commits/dddfbead90fc7b121b3a6870d546e9881eb5278a is a non-commented-out example where parsing isn’t needed, if that helps?

I’m still not sure I like the idea of encouraging all options to be stored in separate structs.

I’m thinking of that more as “we want our code to be modular, and to avoid globals, so we should have our options in structs (to avoid globals), and there should be separate ones per module (to be, well, modular)”. ie, I’m not really thinking about it as “where they’re stored” but more as syntax to keep things tidier?

But if we added an RPC or GUI interface for changing settings at runtime and validating them and applying them and storing them in settings.json then it would be probably be simpler without the structs, because ArgsManager::m_settings would be the canonical place where settings are stored, and there wouldn’t be external structs that need to be updated.

I think external structs with “native” types are desirable where we either care about performance (no need to lock argsmanager and do string parsing all the time) or want to have the code be independent of our argument parsing (ie libkernel, but maybe also unit/fuzz tests?). For cases like that, even if we want to be able to update the options at runtime, just calling ReadFooArgs(argsman, fooman.options) when argsman has been updated seems feasible?

For cases where you can talk to argsman directly, then I guess the idea would be to replace bool b = argsman.GetBool("-disablewallet") with something like bool b = WalletSettings(argsman).disablewallet.Get(), where WalletSettings is some magic to let us define args per-module, which gives something that has members named after the option names, that then have typesafe getter/setters, with the setter updating settings.json…

Maybe something like:

 0class Magic { public: ArgsManager& argsman; ... };
 1
 2template<typename T>
 3class MagicOption {
 4private:
 5    ArgsManager& argsman;
 6    const std::string name;
 7public:
 8    MagicOption(ArgsManager& argsman, std::string name, std::string desc) : argsman{argsman}, name{name} {
 9        argsman.AddTypedArg<T>(name, desc);
10    }
11    T Get() { return argsman.Get<T>(name); }
12    void Set(T value) { argsman.Set<T>(name, value); }
13};
14
15class WalletArgs : public Magic
16{
17    explicit WalletArgsMan(ArgsManager& argsman) : Magic{argsman} { }
18    MagicOption<bool> disablewallet{argsman, "-disablewallet", "Do not load the wal..."}; // maybe add the settings.json key
19};
20
21WalletArgs walletargs{argsman}; // calls `AddArg<bool>` via `MagicOption` initialisation
22
23argsman.ParseParameters(...); // read the config file so that accessing the options is meaningful
24
25if (walletargs.disablewallet.Get()) { ... }
26walletargs.disablewallet.Set(false);

could be made to work, eg?

(In this scenario maybe you also want to be able to register callbacks so that Set(...) triggered by RPC causes UI elements to be updated?)

So to me that looks like a mostly independent problem that we could/should address independently or at least afterwards?

Though, that said, having two different, complicated, ways of doing essentially the same things does annoy me…

Another approach to think about could be to go all-in on storing the settings natively, ie have just one place to store the settings data during runtime, but change that to being bool WalletOptions::disablewallet; not argsman.GetBool("-disablewallet"). In that case we’d have functions like UpdateWalletOptions(walletopts, argsman) to read args, but could also have SaveWalletOptions(walletopts, argsman) to update settings.json. Then perhaps you’d just register SaveWalletOptions as a callback, and call those callbacks as part of WriteSettingsFile() ? Once you have callbacks, perhaps you could also have a UpdateRuntimeWalletOptions(walletopts, argsman) callback that allows you to reload settings.json at runtime.

(The settings.json api confuses me – we read/write it as a blob in ArgsManager, but the actual logic to get at values is in node/interfaces.cpp, and afaics there’s no registration/documentation of what things can go in settings.json, and everything’s done via UniValue so you have to manually type check when you access anyway)

In any case, I think your idea makes an improvement over the status quo by adding compile time checks.

Cool. I’m going to add some bikesheddy thoughts to the PR in a moment, if you have any to add feel free. I’ll keep waiting on a ‘concept ack’ or similar there before opening a PR here though.

hodlinator commented at 10:45 pm on July 30, 2024: contributor

FWIW: @ryanofsky’s version is what comes naturally to me, with some small adjustments to default to file-local static storage unless accessed externally, and added default value.

0static const Setting<int> SETTING_myint("-myint", DEFAULT_FOO_VALUE, "description");

That said, @ajtowns struct-approach has some very strong points, avoiding globals and performing grouping and exposure of existing options-structs in a “language-native” way. Can see myself preferring it if the API ends up clean enough for 95% of use-cases.

ryanofsky commented at 4:32 pm on August 2, 2024: contributor

Posting to stop thinking about this for now, but I started working on a change that lets you declare type-safe settings without global constants, in a way that should be compatible with options structs and standalone setting retrieval.

This idea is to declare settings types like:

0using UpnpSetting = Setting<
1    "-upnp", "Use UPnP to map the listening port (default: %u)",
2    bool, OptionsCategory::CONNECTION>::Default<DEFAULT_UPNP>;

Which you can register with:

0ArgsManager args;
1UpnpSetting::Register(args);

And retrieve with:

0bool upnp_enabled{UpnpSetting::Get(args)};

This example just declares a simple bool setting with a constant default value, but the implementation understands bool, integer, and string types and std::optional, std::vector, and std::variant compositions of these types.

So for example, instead of bool in the example above, you could use:

uint16_t for a range-checked integer setting
std::string_view or std::string for a string setting
std::vector<std::string_view>> and std::vector<std::int> for string and integer settings that accept multiple values
std::optional<std::string_view>, std::optional<int64_t> and std::optional<bool> for settings that don’t have simple default values, and you want to distinguish unset values from ones that were explicitly specified
std::optional<std::vector<std::string_view> for list settings like -listen where you want to distinguish an unset setting (represented as nullopt) from a negated -nolisten setting (represented as an empty list)
std::variant<Unset, T> for an equivalent to std::optional<T>.
std::variant<Unset, Disabled, Enabled> for an equivalent to std::optional<bool>.
std::variant<bool, int> for a setting like -rescan that performs some action and accepts an optional number value. (-norescan sets false, -rescan sets true, and -rescan=<height> sets the height.)
std::variant<bool, std::string_view> for a setting like -ipcbind that performs some action and accepts an optional string value (-noipcbind sets false, -ipcbind sets true, -ipcbind=<address> sets the address.)
std::variant<Disabled, T> might be useful for special cases or backwards compatibility to detect negated values and treat them differently from 0 and “”. For example if you had a -bwlimit setting and wanted to treat -nobwlimit as “use unlimited bandwidth” and -bwlimit=0 as “use no bandwidth” using a variant with a Disabled member allows that.

A version of this is implemented in 378ba9814aea2389e253c84adaafd66472c36cb7, but it is lacking tests and documentation. The implementation doesn’t change ArgsManager code, and just wraps it, but it could be used to simplify ArgManager later if code is ported to use the interface.

EDIT: An updated version of this can be found in the https://github.com/ryanofsky/bitcoin/commits/pr/argtype branch now linked to in the issue description.

ajtowns commented at 9:48 am on August 7, 2024: contributor

Posting to stop thinking about this for now, but I started working on a change that lets you declare type-safe settings without global constants, in a way that should be compatible with options structs and standalone setting retrieval.

Seems okay, though having a global type name that contains data (UpnpSetting::Register and UpnpSetting::Update have summary.value and help.value) doesn’t seem meaningfully different to a global constant to me.

This idea is to declare settings types like:

0using UpnpSetting = Setting<
1    "-upnp", "Use UPnP to map the listening port (default: %u)",
2    bool, OptionsCategory::CONNECTION>::Default<DEFAULT_UPNP>;

Some of our help uses strprintf’s with more arguments than just the setting’s default (eg -loglevel wants LogCategoriesString, -assumevalid and others have different defaults per-chain, -dbcache lists its min/max values as well as a default)). I don’t think it’s trivial to make tinyformat work at compile-time?

Specifying an option as std::optional<int16_t> (etc) seems massively superior to ALLOW_INT | ALLOW_BOOL to me.

Having distinct type names for each option seems redundant to me in general; all we really want names for are RegisterConnectionArgs(argsman) (ie a function to register all the args for a module) and upnp_enabled (ie somewhere to access each config option once it’s been loaded).

The per-option types could be useful for rw options, in order to write:

0    if (value != getOption(upnp).toBool()) {
1        UpnpSetting::SaveRWSetting(value);
2        setOption(upnp, value);
3    }

and get compile-type type checking of value, though I’m not sure if it would make sense to expose those types through the node/interface stuff, and if we weren’t doing that it doesn’t seem like there’d be much point changing from the generic updateRwSetting we already have.

ryanofsky commented at 2:11 pm on August 7, 2024: contributor

Thanks for taking a look!

I don’t think it’s trivial to make tinyformat work at compile-time?

Right, Default<DEFAULT_UPNP> just provides a default help string format argument. If you want to provide other format arguments you can register with:

0UpnpSetting::Register(args, HelpArgs(help1, help2, DEFAULT_UPNP));

instead of:

0UpnpSetting::Register(args);

but only a few settings should need this for more complicated formatting.

Specifying an option as std::optional<int16_t> (etc) seems massively superior to ALLOW_INT | ALLOW_BOOL to me.

Those are not equivalent. (For background you may want to check the new documentation in #16545 which explains how the flags work.) Basically:

int16_t or std::optional<int16_t> or std::variant<Unset, int16_t> are equivalent to ALLOW_INT.
std::variant<bool, int16_t> or std::variant<Unset, bool, int16_t> are equivalent to ALLOW_INT | ALLOW_BOOL

Combining with bool is useful for an imperative setting that causes a new action to be performed, and can take an optional integer or string value. The main thing | ALLOW_BOOL does is just allow specifying a -foo command line argument with no value.

Having distinct type names for each option seems redundant to me in general; all we really want names for are RegisterConnectionArgs(argsman) (ie a function to register all the args for a module) and upnp_enabled (ie somewhere to access each config option once it’s been loaded).

The advantage of giving each type a name isn’t for registration alone, but for registration + retrieval. The names are checked by the compiler when you retrieve settings, you can’t typo "-blocksor" when you register and "-blocksxor" when you retrieve. The names also carry type information, so you don’t have to remember to call GetInt for int settings, GetBool for bool settings. You can just call Get and Update and the right types will be used by the compiler.

However, type names should not be necessary if we have self-registering and retrieving Options structs like I think we will want many places. I plan to work on supporting Options structs by building on top of this API. (The type names are not part of this API, but are just one way to use it.) I also want to extend the API to support custom types like enums better.

The per-option types could be useful for rw options, in order to write:

0    if (value != getOption(upnp).toBool()) {
1        UpnpSetting::SaveRWSetting(value);
2        setOption(upnp, value);
3    }

Right, a major motivation for porting the settings to univalue and adding type flags was to be able to add an RPC method that allows examining and modifying settings. The idea was that all settings should be read-write, not just some settings. Even if many settings can not currently be applied live at runtime, they can be validated and saved in settings.json and used after the next restart.

The ability to introspect settings is worth bringing up because it can affect what representations are used for settings. For example if we decided we did not want to allow arbitrary settings to be stored in settings.json, we might decide to store int16_t settings as int16_t values instead of UniValues.

RFC: ArgsManager type and range checking #22978