Investigate potential lifetime issues in cases where we are returning “const std::string&” #17198

issue practicalswift openend this issue on October 19, 2019
  1. practicalswift commented at 6:36 pm on October 19, 2019: contributor

    Suggestion for a “good first issue”:

    Investigate potential lifetime issues in cases where we are returning const std::string&.

    Candidates:

    0$ git grep -E '^(const|static| )*std::string *&+[^=)]+\(' \
    1    -- "*.cpp" "*.h" ":(exclude)src/univalue/"
    

    Questions:

    • Which of these have lifetime issues and need to be fixed?
    • In the cases which do not have lifetime issues: is returning by const std::string& justified?
  2. practicalswift commented at 5:24 pm on October 20, 2019: contributor

    Also worth checking:

    0$ git grep -E '^(const|static| )*[A-Za-z:]+ *&+[^=)/<>.]+\(' -- "*.cpp" "*.h" \
    1      ":(exclude)src/bench/" ":(exclude)src/leveldb/" ":(exclude)src/qt/" \
    2      ":(exclude)src/test/" ":(exclude)src/univalue/" ":(exclude)src/wallet/" | \
    3      grep -v ' operator'
    
  3. practicalswift commented at 8:49 pm on October 20, 2019: contributor
    Somewhat related: Curiously Recurring C++ Bugs at Facebook (CppCon 2017)
  4. fanquake referenced this in commit 0d7e8d66c4 on Oct 20, 2019
  5. laanwj added the label good first issue on Oct 22, 2019
  6. brakmic commented at 3:03 pm on November 29, 2019: contributor

    Hi,

    I created a list from the output of @practicalswift ’s command and am now going it through one by one. It’s also a good way to learn Bitcoin’s APIs. ;)

    For example:

    0src/util/threadnames.cpp:const std::string& util::ThreadGetInternalName() { return g_thread_name; }
    

    I’ve checked its references throughout the code and it seems that this string reference never does anything with the data it owns. So, why not converting it into a string_view, that is: boost::string_view, because currently no C++17 support in Bitcoin, if I am not mistaken.

    However, there’s the Logger class that deals with strings.

    And this is where I had to stop, because I didn’t want to change Logger itself. All other parts of the code where ThreadGetInternalName() was used could be changed to string_view without any interference. So, maybe I should have tried to change Logger, because it would make sense for a Logger to never own any data. Would it make sense? I don’t know, only speculating.

    I am not saying, that this should be done (the string_view, I mean), or that it even makes sense, but in general, when something doesn’t do anything with data it points to, then it should not own it. Also, string_views can be passed by value without any performance hit, while references always have a redirection in them.

    Let’s see what other functions have to offer. :)

    Regards,

  7. MarcoFalke commented at 7:28 pm on December 3, 2019: member
    I’d doubt this has a performance impact. Might as well return a copy of the string?
  8. brakmic commented at 7:35 pm on December 3, 2019: contributor

    Performance is not the problem here, but the many owners, that is string-references. A string_view never owns a string, it only looks at it. Also, when an existing reference is not used, that is there is no work on data it owns, why have it in the first place? Why not keeping stuff read-only that could work in read-only-mode?

    This one string from ThreadGetInternalName might be unimportant, sure, but in general, why sending around string-refs when a much more safe string_view could be sent by value? Also, by explicitly declaring a string_view one also makes a statement that this object will never be able to change the data it looks at, because it doesn’t own it.

    That was my first question when I looked at Logger’s API. Does it need to own all those strings it’s logging?

  9. laanwj commented at 1:46 pm on December 4, 2019: member

    I’d doubt this has a performance impact. Might as well return a copy of the string?

    Yes, I’ve always been confused by that one, the thread name is a short string, it’s only queried up to once per log message, returning a reference seems a very premature optimization. (though I don’t think util::ThreadGetInternalName has any lifetime issues? it’s either a global or a thread-local string,and in the latter case it’s only used in the thread it’s associated with)

  10. practicalswift commented at 2:11 pm on December 4, 2019: contributor

    I’d doubt this has a performance impact. Might as well return a copy of the string?

    Yes, I’ve always been confused by that one, the thread name is a short string, it’s only queried up to once per log message, returning a reference seems a very premature optimization.

    Agreed. A lot of these cases listed in the OP feel like classic examples of premature optimization :)

    Unfortunately it is not only a premature optimization but it also unnecessarily introduce sharp edges that might hit us in the form of life-time issues in the future. Issues that would be much much more costly than an extra copy.

    Recommending this talk (again): https://www.youtube.com/watch?v=lkgszkPnV8g&t=876 :)

  11. elichai commented at 3:19 pm on March 1, 2020: contributor

    I’d doubt this has a performance impact. Might as well return a copy of the string?

    Yes, I’ve always been confused by that one, the thread name is a short string, it’s only queried up to once per log message, returning a reference seems a very premature optimization. (though I don’t think util::ThreadGetInternalName has any lifetime issues? it’s either a global or a thread-local string,and in the latter case it’s only used in the thread it’s associated with)

    The problem usually isn’t about big copy, but about the call to new ie heap allocation. Not that I’m saying it’s not premature optimization, but we do spend a significant amount of time on the new operator in benchmarks. avoiding heap allocation when possible is IMHO a good idea, but obviously not in the cost of safety In thread names we might be hitting small string optimization anyways (I think clang’s impl is 10 chars in 32bit and 22 chars in 64bit or something like that)

  12. practicalswift closed this on Jul 7, 2021

  13. DrahtBot locked this on Aug 18, 2022

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-12-19 00:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me