Add `not_null<T>` from the Guidelines Support Library

dongcarl commented at 9:50 pm on February 22, 2022: member

There are many places in the codebase where we use a pointer type (*, shared_ptr, unique_ptr, etc.) because of historical reasons (CBlockIndex*) or for polymorphism. In many such cases, we may want to avoid needing to handle the pointer being null, or make it clear in the function contract that we require a non-null pointer.

not_null<T> is the solution proposed by the C++ Core Guidelines (relevant section), and restricts pointers and smart pointers to hold non-null values.

Since I’ve learned about this, I’ve repeatedly ran into scenarios where I wish I could use it so as to make call semantics clearer and avoid misuse / unnecessary null handling.

There is a header-only implementation of not_null<T> here, however, there are two issues we need to solve:

The header-only implementation linked above requires the Expects and Ensures macros defined here, do we also import these or can we tweak our existing assertion macros to work?
There is also a strict_not_null<T>, which requires explicit construction. Do we need that or just not_null<T>?

fanquake added the label Brainstorming on Feb 23, 2022

fanquake commented at 11:51 am on February 23, 2022: member

@theuni @ryanofsky @ajtowns you might have opinions / thoughts?

ryanofsky commented at 1:43 pm on February 23, 2022: member

I guess I’d be curious to see real examples of where it is useful. It seems to me like most of the places where you could use not_null<T*> you could use T& instead and references seem nicer to me aesthetically, and I like how they limit what you can do with the variable (just access the data not do mess with pointer or ownership). The only cases where not_null seems really useful is when when you need the pointer variable to be able change which object it is pointing to. This could be better in some data structures than raw pointers or reference_wrappers (https://stackoverflow.com/questions/33306553/gslnot-nullt-vs-stdreference-wrappert-vs-t), but I don’t think these cases are very common in our codebase.

I think in general replacing T& with not_null<T*> would be bad, while using not_null<T*> in certain data structures could be good. A more neutral case could be replacing things like:

0const CChain& active = Assert(m_node.chainman)->ActiveChain();

with

0const CChain& active = m_node.chainman->ActiveChain();

You get same runtime assert in either case here, but in first case author is clearly stating intention for chainman to be not null here at this line of the code, while in second case with not_null author is just saying the pointer shouldn’t be null any place it is dereferenced, which isn’t saying much. I could see people preferring the second version though.

MarcoFalke commented at 1:49 pm on February 23, 2022: member

I guess I’d be curious to see real examples of where it is useful

To turn runtime errors into compile errors #19273 ?

ryanofsky commented at 2:04 pm on February 23, 2022: member

To turn runtime errors into compile errors #19273 ?

Is #19273 the right issue? I think point of not_null has more to do with safety around accessing data, not allocating objects.

To be clear why I was asking for examples, I don’t have any problem not_null, but I do think references are better than not_nulls in most cases and am responding to “In many such cases, we may want to avoid needing to handle the pointer being null, or make it clear in the function contract that we require a non-null pointer” in the issue description.

MarcoFalke commented at 2:28 pm on February 23, 2022: member

So you are saying there should be a UniqueReference (similar to std::unique_ptr when it comes to memory management and similar to std::reference_wrapper when it comes to accessing data)?

ryanofsky commented at 2:45 pm on February 23, 2022: member

So you are saying there should be a UniqueReference (similar to std::unique_ptr when it comes to memory management and similar to std::reference_wrapper when it comes to accessing data)?

I think the point of references is that they are transparent aliases to data somewhere that you should just access and not try to do memory management with. So if you are writing a function that just needs to access data f(Data& data) is better than f(not_null<Data*> data) or f(not_null<unique_ptr<Data>> data). I could imagine not_null<unique_ptr> and not_null<shared_ptr> stuff being useful in some places, just not a lot of places, so I’m asking what the motivating examples are to see if they are really calling for not_null.

dongcarl commented at 7:29 pm on February 23, 2022: member

I can see how references and reference_wrapper (which I didn’t know was zero-cost until today) can solve most of the cases. I do think that not_null<unique_ptr> and not_null<shared_ptr> are useful when we’re passing around (shared) ownership and perhaps we can focus on those use-cases (e.g. RegisterSharedValidationInterface)

theuni commented at 11:46 pm on February 23, 2022: member

This seems pretty niche, but I like the idea of being extra expressive when possible. I agree with @ryanofsky that references should be generally preferred though.

I think @dongcarl pointed to a concrete example of where passing a reference or std::reference_wrapper won’t work as an access model: when inheritance is needed. So, I can imagine not_null<BaseClass*> potentially being a useful pattern there.

I’m curious though, is this better than using the nonnull attribute when needed? In terms of expressiveness, it seems to me that it might be more useful to peg the property to a specific parameter and locale than a storage duration. It wouldn’t be correct-by-construction that way, but we could still get warnings/errors.

So I guess my question is: is it more useful to enforce as part of the function signature, or storage duration?

ryanofsky commented at 10:54 am on February 24, 2022: member

I think @dongcarl pointed to a concrete example of where passing a reference or std::reference_wrapper won’t work as an access model: when inheritance is needed. So, I can imagine not_null<BaseClass*> potentially being a useful pattern there.

I’m curious about this, but I don’t see what it is referring to. It seems like inheritance should be orthogonal to nullability and which pointer or reference mechanism is used.

I’m curious though, is this better than using the nonnull attribute when needed? In terms of expressiveness, it seems to me that it might be more useful to peg the property to a specific parameter and locale than a storage duration. It wouldn’t be correct-by-construction that way, but we could still get warnings/errors.

So I guess my question is: is it more useful to enforce as part of the function signature, or storage duration?

It seems like ideal thing would be for the not_null type to automatically apply the nonnull compiler hint. Question of which is “more useful” is probably hard to answer without knowing specifics of a situation, and what level of analysis and optimization compiler will do, but I think it should be safe to say both are useful, and have some overlapping benefits, and some non-overlapping benefits.

On larger issue I definitely think not_null is useful in some cases. Just not very many cases I’m aware of. And in most cases where it is possible to use references, I think it is better to use references.

dongcarl commented at 5:57 pm on March 2, 2022: member

In any case, I think having not_null for not_null<unqiue_ptr<T>> is quite useful in itself as we incrementally move towards having clearer ownership semantics.

Add not_null<T> from the Guidelines Support Library #24423

Add `not_null<T>` from the Guidelines Support Library #24423