This picks up parts of #25908.
The use of adjusted time is removed from validation code while the warning to users if their clock is out of sync with the rest of the network remains.
<!--e57a25ab6845829454e8d69fc972939a-->
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.
<!--006a51241073e994b41acfe9ec718e94-->
For detailed information about the code coverage, see the test coverage report.
<!--021abf342d371248e50ceaed478a90ca-->
See the guideline for information on the review process.
| Type | Reviewers |
|---|---|
| ACK | naumenkogs, maflcko, stickies-v, achow101 |
| Concept ACK | ajtowns, RandyMcMillan, fanquake, Pttn, ariard, TheCharlatan, sr-gi |
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.
<!--174a7506f384e20aa4161008e828411d-->
Reviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.
87 | @@ -84,6 +88,9 @@ class PeerManager : public CValidationInterface, public NetEventsInterface 88 | /** Get statistics from node state */ 89 | virtual bool GetNodeStateStats(NodeId nodeid, CNodeStateStats& stats) const = 0; 90 | 91 | + /** Get stats from peer manager. */ 92 | + virtual PeerManagerStats GetStats() const = 0;
If you're building a function that collects this info, maybe call it GetInfo() and combine it with IgnoresIncomingTxs() ?
Good idea, done!
672 | @@ -674,7 +673,7 @@ static RPCHelpMan getnetworkinfo() 673 | if (node.peerman) { 674 | obj.pushKV("localrelay", !node.peerman->IgnoresIncomingTxs()); 675 | } 676 | - obj.pushKV("timeoffset", GetTimeOffset()); 677 | + obj.pushKV("timeoffset", node.peerman->GetStats().median_outbound_time_offset);
Should be conditional on node.peerman being valid, see the previous line.
3600 | @@ -3567,6 +3601,8 @@ void PeerManagerImpl::ProcessMessage(CNode& pfrom, const std::string& msg_type, 3601 | // Don't use timedata samples from inbound peers to make it 3602 | // harder for others to tamper with our adjusted time. 3603 | AddTimeData(pfrom.addr, peer->m_time_offset); 3604 | + 3605 | + m_outbound_time_offsets.Add(peer->m_time_offset);
Having a rolling set of 10 time offsets from every outbound connection we make (full outbounds, block-relay-only, extra block-relay-only, feelers?) seems somewhat risky: at present, you'll get odd behaviour if a high-proportion of the first 100 outbound peers you connect to have a different time; with this change, you'll get a warning if the last 10 peers you randomly connect to have a dodgy view of the time?
Maybe consider making this conditional on IsManualOrFullOutboundConn() instead; that way it only updates rarely for things we're really trying to connect to, and isn't mostly made up of random peers we're briefly sampling?
Maybe consider making this conditional on IsManualOrFullOutboundConn()
Done
3612 | @@ -3611,6 +3613,12 @@ void PeerManagerImpl::ProcessMessage(CNode& pfrom, const std::string& msg_type, 3613 | AddTimeData(pfrom.addr, peer->m_time_offset); 3614 | 3615 | m_outbound_time_offsets.Add(peer->m_time_offset); 3616 | + 3617 | + auto median = m_outbound_time_offsets.Median(); 3618 | + if (median > TIME_OFFSET_WARN_THRESHOLD.count() || 3619 | + median < -TIME_OFFSET_WARN_THRESHOLD.count()) { 3620 | + SetMiscWarning(Untranslated("Please check that your computer's date and time are correct"));
SetMiscWarning seems a poor interface; if there's two warnings, the latest one overrides the first, and you can't remove warnings in the event that they resolve themselves. Might be better to do something like SetfLargeWorkInvalidChainFound(bool) ?
175 | @@ -175,6 +176,7 @@ static constexpr double MAX_ADDR_RATE_PER_SECOND{0.1}; 176 | static constexpr size_t MAX_ADDR_PROCESSING_TOKEN_BUCKET{MAX_ADDR_TO_SEND}; 177 | /** The compactblocks version we support. See BIP 152. */ 178 | static constexpr uint64_t CMPCTBLOCKS_VERSION{2}; 179 | +static constexpr std::chrono::seconds TIME_OFFSET_WARN_THRESHOLD{30min};
This is reduced from 70min to 30min, compared to DEFAULT_MAX_TIME_ADJUSTMENT. 70min avoids warning if you're buggy by an hour due to (you or your peers) getting daylight savings time wrong, which seems like a vaguely plausible thing to continue doing?
495 | @@ -497,7 +496,6 @@ void SetupServerArgs(ArgsManager& argsman) 496 | argsman.AddArg("-maxconnections=<n>", strprintf("Maintain at most <n> automatic connections to peers (default: %u). This limit does not apply to connections manually added via -addnode or the addnode RPC, which have a separate limit of %u.", DEFAULT_MAX_PEER_CONNECTIONS, MAX_ADDNODE_CONNECTIONS), ArgsManager::ALLOW_ANY, OptionsCategory::CONNECTION); 497 | argsman.AddArg("-maxreceivebuffer=<n>", strprintf("Maximum per-connection receive buffer, <n>*1000 bytes (default: %u)", DEFAULT_MAXRECEIVEBUFFER), ArgsManager::ALLOW_ANY, OptionsCategory::CONNECTION); 498 | argsman.AddArg("-maxsendbuffer=<n>", strprintf("Maximum per-connection memory usage for the send buffer, <n>*1000 bytes (default: %u)", DEFAULT_MAXSENDBUFFER), ArgsManager::ALLOW_ANY, OptionsCategory::CONNECTION); 499 | - argsman.AddArg("-maxtimeadjustment", strprintf("Maximum allowed median peer time offset adjustment. Local perspective of time may be influenced by outbound peers forward or backward by this amount (default: %u seconds).", DEFAULT_MAX_TIME_ADJUSTMENT), ArgsManager::ALLOW_ANY, OptionsCategory::CONNECTION);
Perhaps swap this option for one that controls how much of a time difference results in a warning? Not sure.
No strong opinion on this, but it doesn't seem very useful to me to be able to set that threshold. Happy to add this if someone can name a good reason.
Kept the setting in the new approach
Concept ACK
Concept ACK
Concept ACK
Concept ACK
209 | + AssertLockNotHeld(m_mutex); 210 | + std::array<int64_t, N> sorted_copy = WITH_LOCK(m_mutex, return m_offsets); 211 | + std::sort(sorted_copy.begin(), sorted_copy.end()); 212 | + 213 | + if constexpr (N % 2 == 0) { 214 | + return (sorted_copy[N / 2 - 1] + sorted_copy[N / 2]) / 2;
Shouldn't this use min(N, m_index) instead of N?
122658f95b36b7f940351bd34a89e13831931f18
should we make this addition overflow safe?
The idea is that m_index is always set to the next index in m_offsets that gets written to (so a value out of [0, N)) and we just assume that the initial N samples were 0, which is why the median is immediately computed out of N.
I could change it to always increase m_index instead and add new offsets by doing m_offsets[m_index % N] = new_offset, in which case your suggestion would allow us to compute the median more "accurately" while we haven't seen N samples yet. I'm not sure though if that is an improvement,. e.g. do we want to issue a warning if only the first outbound connection has a offset larger than our threshold?
Hm yea I guess... I could also just remove support for even Ns from this class.
do we want to issue a warning if only the first outbound connection has a offset larger than our threshold?
This is the main question I guess. So, with the current code, having 4 peers would never trigger it, right? Because median will always be 0.
i won't mind that either, but ideally with an in-function comment.
So, with the current code, having 4 peers would never trigger it, right? Because median will always be 0.
Yes and this sort of (on accident) mimics how/when we trigger the warning on master at 5 or more samples: https://github.com/bitcoin/bitcoin/blob/dde7ac5c704688c8a9af29bd07e5ae8114824ce7/src/timedata.cpp#L77
Done
I won't mind, but this should be commented somewhere above — perhaps where 0 returned is handled.
Concept ACK
3615 | + 3616 | + if (pfrom.IsManualOrFullOutboundConn()) { 3617 | + // Only sample from outbound connections we are really trying to connect to. 3618 | + m_outbound_time_offsets.Add(peer->m_time_offset); 3619 | + 3620 | + auto median = m_outbound_time_offsets.Median();
I think one advance in term of robustness of this local time monitoring mechanism could be to add feeler ’s time offset (ConnectionType::FEELER) to reserved slots in the index (e.g +3 on the 10 full-outbound connections). Those peers connections are already triggered at regular interval, which is not necessarily the case of outbound peer topology if the peers are stable and performant.
I guess a NTP-attacker could silently time shift the local clock of a victim, without this warning logic ever triggering if no new outbound or manual connections are ever opened during the exploitation.
I see your point but I think I would like to keep discussion about improving the warning condition to a future PR. The current adjusted time mechanism has the same flaw and is even worse as it stops sampling entirely after 200 samples.
Maybe we can collect some stats on how often outbound connections rotate and how many samples from which connections are actually useful to inform such a change.
I will add the release note.
stops sampling entirely after 200 samples
perhaps record this bug in the commit message, might be useful in the future?
This is documented in the current code. https://github.com/bitcoin/bitcoin/blob/40bc501bf462e1b38679f728336f18f08ee251ca/src/timedata.cpp#L60-L76
Concept ACK
I would recommend to add an explicit release note, inciting nodes operators to monitor this new warning, especially if they’re running time-sensitive apps. Exploiting NTP to target bitcoin has been suggested both in NTP security literature and the time-dilation paper (I’ve never seen or heard any in the wild, contrary to some BGP attacks in other cryptos though I believe this is fairly practical). E.g
new local clock monitoring
==========================
The notion of adjusted time relying on the median of other network nodes clocks is removed.
Instead, a rolling sample of the time offset from local clock of each of the last 10 outbound
peers is maintained and the median computed. If the yielding result is more or less than
70 min, a node warning is issued.
This warning can be consumed by a call to `getblockchaininfo`. If the node is backing an
application or protocol with stringent time-sensitive requirements and high funds at stake
(e.g a lightning or time-enabled wallet) it is recommended to periodically run a monitoring
job and in presence of warning automatically notify the operator.
200 | + void Add(int64_t offset) EXCLUSIVE_LOCKS_REQUIRED(!m_mutex) 201 | + { 202 | + AssertLockNotHeld(m_mutex); 203 | + LOCK(m_mutex); 204 | + m_offsets[m_index] = offset; 205 | + m_index = (1 + m_index) % N;
6c65f61b625212e128e5db52cbd80a3b57b1450b
nit: your code assumes that N is less than the capacity of size_t, otherwise this addition can overflow. You might consider adding a static assert for that too, just in case someone in the future decides to use TimeOffsets<bazillion>
3602 | @@ -3604,11 +3603,6 @@ void PeerManagerImpl::ProcessMessage(CNode& pfrom, const std::string& msg_type, 3603 | remoteAddr, (mapped_as ? strprintf(", mapped_as=%d", mapped_as) : "")); 3604 | 3605 | peer->m_time_offset = nTime - GetTime(); 3606 | - if (!pfrom.IsInboundConn()) {
After your changes, we won't consider BLOCK_RELAY, FEELER, ADDR_FETCH anymore, right?
With the new approach this is kept as is on master
28 | @@ -28,6 +29,12 @@ void SetfLargeWorkInvalidChainFound(bool flag) 29 | fLargeWorkInvalidChainFound = flag; 30 | } 31 | 32 | +void SetMedianTimeOffsetWarning() 33 | +{ 34 | + LOCK(g_warnings_mutex); 35 | + g_timeoffset_warning = true;
Why don't you implement going out of this warning?
This wasn't implemented before either. This can be improved in a follow up.
Changed the approach.
This now still deletes all the adjusted time code and removes it from consensus code but keeps the warning condition for a large median time offset as it is on master. Improvements to the warning can be done in a follow up.
This should now be a very straight forward review, since we don't need to discuss a new warning mechanism.
Looking at the diff between the two versions, I see these changes:
FEELER, ADDRFETCH, BLOCKRELAY are also considered)int64_max/2; but i don't care that much)Not sure i see keeps warning condition for a large median time offset as it is on master.
Generally, while I think it's fine to leave (1) for the latter PR, re-introducing the bug (2) is somewhat unfortunate:
(3) should be kept for either version of course.
Removing adjusted time from consensus is a big win. Having time samples received from peers be involved in consensus checks is not great and not effective against NTP based attacks (https://github.com/bitcoin/bitcoin/pull/25908#issuecomment-1264632475, #25908 (comment)). The warning itself is not as interesting to me but most of the review on both #25908 and this PR is about the warning and its threshold.
I don't want changing the warning (small win) to distract from removing adjusted time from consensus (big win).
an alternative would divide each element by 2 and then add without overflow risk
Done.
Removing adjusted time from consensus is a big win. Having time samples received from peers be involved in consensus checks is not great and not effective against NTP based attacks (#25908 (comment), #25908 (comment)). The warning itself is not as interesting to me but most of the review on both #25908 and this PR is about the warning and its threshold.
Currently, we're adjusting our local clock from the median of the first 199 outbound peers we're connecting to (only the first 199 due to the oddness bug in AddTimeData()). In case of the occurence of a NTP-based attack (e.g a malicious network delay on traffic between the local system NTP client and its timeservers), this median-adjusted time mechanism can generate a salvatory warning towards the node operators (the "Please check that your computer's date and time are correct!"). However, this mechanism comes as a security benefit only if the attack is active when you start your local bitcoind and until the first 199 outbound peers connection have been sampled (which might last for hours / days, given our only 10 outbound connections slots at the moment). So there is a robustness benefit brought by the current adjusted time mechanism and this is has been one of my main concern brought in #25908.
On the other hand, this mechanism can allow your outbound peers topology to correct your local time (as used for some consensus rules validation in ContextualCheckBlockHeader() ) in the boundaries of -/+ DEFAULT_MAX_TIME_ADJUSTEMENT. This exploit assumes you’re able to corrupt or compromise 100 outbound peers (as it's a median), which has to be done in considerations of all our hardenings in AddrMan and some low-hashrate capabilities. Those 2 elements in my appreciation present a very high-bar (at least compared to state-of-art practical NTP-based attacks).
As such, I reasonably think maintaining the warning mechanism is valuable at the very least to maintain current robustness of our consensus validation code in face of NTP attacks. This always give the chance of node operators noticing time anomalies from their logs, which is welcome robustness if you’re operating a time-sensitive bitcoin infrastructure imo. I think we should be better to keep a large buffer of sampled outbound peers (e.g in the order of 100 outbound peers as right now) for this current change itself, and fix the oddness bug.
For sure, we can add additional robustness mechanisms in follow-up PRs, e.g "random drop" of sampeld peers time offset to add uncertainty in the eyes of adversaries targeting neutralization of the warning mechanism or including feelers connections in sampled offsets.
Another source of concern, the ajdusted-time is implemented by other full-nodes with at least some other real-world economic traffic. If we're modifying our consensus-definition of time, and they stay on the old one, this could be source of consensus discrepancies in some edge scenarios. True, this could be a source of concern for non-upraded Bitcoin Core nodes.
From a deployment viewpoint, my recommendation is to notify on the maling list or bitcoin forums other bitcoin full-nodes implementations of this change, and some "rough" deployment timeline in function of Bitcoin Core's own 27.0 or 28.0 release schedules. I think there is a lesson to be learnt from the old Bitcoin 0.8 LevelDB upgrade.
See, btcd code: https://github.com/btcsuite/btcd/blob/master/blockchain/mediantime.go#L74
// medianTime provides an implementation of the MedianTimeSource interface.
// It is limited to maxMedianTimeEntries includes the same buggy behavior as
// the time offset mechanism in Bitcoin Core. This is necessary because it is
// used in the consensus code.
type medianTime struct {
mtx sync.Mutex
knownIDs map[string]struct{}
offsets []int64
offsetSecs int64
invalidTimeChecked bool
}
228 | + return sorted_copy[m_index / 2]; 229 | + } else { 230 | + return (sorted_copy[m_index / 2] / 2) + 231 | + (sorted_copy[m_index / 2 - 1] / 2); 232 | + } 233 | + }
sorted_copy being an array of size N, I think we'd be sorting uninitialized values if m_index < N?Potential fix addressing all 3:
<details> <summary>git diff on 8b1c133921</summary>
diff --git a/src/net_processing.cpp b/src/net_processing.cpp
index 664992dcab..9ade704fe9 100644
--- a/src/net_processing.cpp
+++ b/src/net_processing.cpp
@@ -215,20 +215,24 @@ public:
/** Compute and return the median of the collected time offset samples. */
int64_t Median() const EXCLUSIVE_LOCKS_REQUIRED(!m_mutex)
{
- AssertLockNotHeld(m_mutex);
- LOCK(m_mutex);
+ std::vector<int64_t> sorted_copy;
+ {
+ AssertLockNotHeld(m_mutex);
+ LOCK(m_mutex);
- // Only compute the median from 5 or more samples.
- if (m_index < 4) return 0;
+ // Only compute the median from 5 or more samples.
+ if (m_index < 4) return 0;
+
+ sorted_copy.assign(m_offsets.begin(), m_offsets.begin() + m_index);
+ }
- std::array<int64_t, N> sorted_copy = m_offsets;
std::sort(sorted_copy.begin(), sorted_copy.end());
+ auto mid{sorted_copy.size() / 2};
- if (m_index % 2 == 0) {
- return sorted_copy[m_index / 2];
+ if (sorted_copy.size() % 2 == 0) {
+ return (sorted_copy[mid] / 2) + (sorted_copy[mid - 1] / 2);
} else {
- return (sorted_copy[m_index / 2] / 2) +
- (sorted_copy[m_index / 2 - 1] / 2);
+ return sorted_copy[mid];
}
}
};
</details>
I think the calculation is wrong: the average of the 2 middle elements should be taken in case of an even number of elements.
yes🤦
UB: sorted_copy being an array of size N, I think we'd be sorting uninitialized values if m_index < N?
No, m_offsets is not uninitialized
No,
m_offsetsis not uninitialized
The array is initialized, ~but the items are not~, and afaik int64_t doesn't have a default constructor?
Ah cool, didn't know, thanks. Still, having default (0-)values in sorted_copy seems like it would lead to an incorrect median calculation?
213 | + } 214 | + 215 | + /** Compute and return the median of the collected time offset samples. */ 216 | + int64_t Median() const EXCLUSIVE_LOCKS_REQUIRED(!m_mutex) 217 | + { 218 | + AssertLockNotHeld(m_mutex);
670 | @@ -672,9 +671,10 @@ static RPCHelpMan getnetworkinfo() 671 | obj.pushKV("localservicesnames", GetServicesNames(services)); 672 | } 673 | if (node.peerman) { 674 | - obj.pushKV("localrelay", !node.peerman->IgnoresIncomingTxs()); 675 | + auto peerman_info{node.peerman->GetInfo()};
const nit
const auto peerman_info{node.peerman->GetInfo()};
191 | +private: 192 | + // Historic maximum number of time offsets we include in our median. 193 | + static constexpr size_t N{199}; 194 | + 195 | + mutable Mutex m_mutex; 196 | + std::array<int64_t, N> m_offsets GUARDED_BY(m_mutex){};
Wdyt about making the offsets typesafe with std::chrono::seconds?
<details> <summary>git diff on 8b1c133921</summary>
diff --git a/src/net_processing.cpp b/src/net_processing.cpp
index 664992dcab..4a7243b82f 100644
--- a/src/net_processing.cpp
+++ b/src/net_processing.cpp
@@ -193,12 +193,12 @@ private:
static constexpr size_t N{199};
mutable Mutex m_mutex;
- std::array<int64_t, N> m_offsets GUARDED_BY(m_mutex){};
+ std::array<std::chrono::seconds, N> m_offsets GUARDED_BY(m_mutex){};
size_t m_index GUARDED_BY(m_mutex){0};
public:
/** Add a new time offset sample. */
- void Add(int64_t offset) EXCLUSIVE_LOCKS_REQUIRED(!m_mutex)
+ void Add(std::chrono::seconds offset) EXCLUSIVE_LOCKS_REQUIRED(!m_mutex)
{
AssertLockNotHeld(m_mutex);
LOCK(m_mutex);
@@ -213,15 +213,15 @@ public:
}
/** Compute and return the median of the collected time offset samples. */
- int64_t Median() const EXCLUSIVE_LOCKS_REQUIRED(!m_mutex)
+ std::chrono::seconds Median() const EXCLUSIVE_LOCKS_REQUIRED(!m_mutex)
{
AssertLockNotHeld(m_mutex);
LOCK(m_mutex);
// Only compute the median from 5 or more samples.
- if (m_index < 4) return 0;
+ if (m_index < 4) return std::chrono::seconds{0};
- std::array<int64_t, N> sorted_copy = m_offsets;
+ std::array<std::chrono::seconds, N> sorted_copy = m_offsets;
std::sort(sorted_copy.begin(), sorted_copy.end());
if (m_index % 2 == 0) {
@@ -3621,11 +3621,11 @@ void PeerManagerImpl::ProcessMessage(CNode& pfrom, const std::string& msg_type,
peer->m_time_offset = nTime - GetTime();
if (!pfrom.IsInboundConn()) {
- m_outbound_time_offsets.Add(peer->m_time_offset);
+ m_outbound_time_offsets.Add(std::chrono::seconds{peer->m_time_offset});
auto median = m_outbound_time_offsets.Median();
- if (median > m_opts.time_offset_warn_threshold.count() ||
- median < -m_opts.time_offset_warn_threshold.count()) {
+ if (median > m_opts.time_offset_warn_threshold ||
+ median < -m_opts.time_offset_warn_threshold) {
SetMedianTimeOffsetWarning();
}
}
diff --git a/src/net_processing.h b/src/net_processing.h
index 0fec728ebb..4fb280ed16 100644
--- a/src/net_processing.h
+++ b/src/net_processing.h
@@ -46,7 +46,7 @@ struct CNodeStateStats {
};
struct PeerManagerInfo {
- int64_t median_outbound_time_offset{0};
+ std::chrono::seconds median_outbound_time_offset{0};
bool ignores_incoming_txs{false};
};
diff --git a/src/rpc/net.cpp b/src/rpc/net.cpp
index ea53a974f1..780f1383ed 100644
--- a/src/rpc/net.cpp
+++ b/src/rpc/net.cpp
@@ -673,7 +673,7 @@ static RPCHelpMan getnetworkinfo()
if (node.peerman) {
auto peerman_info{node.peerman->GetInfo()};
obj.pushKV("localrelay", !peerman_info.ignores_incoming_txs);
- obj.pushKV("timeoffset", peerman_info.median_outbound_time_offset);
+ obj.pushKV("timeoffset", peerman_info.median_outbound_time_offset.count());
}
if (node.connman) {
obj.pushKV("networkactive", node.connman->GetNetworkActive());
</details>
Good idea, but I'd prefer to leave that for a follow up
Care to elaborate why? Since this is newly introduced code, seems sensible to do it right away?
Looks like a nice improvement, but still improving my understanding so comments are limited to code review now, and not the concept. It would be helpful if at least some of the commits had a bit more rationale in the description.
3628 | + 3629 | + auto median = m_outbound_time_offsets.Median(); 3630 | + if (median > m_opts.time_offset_warn_threshold.count() || 3631 | + median < -m_opts.time_offset_warn_threshold.count()) { 3632 | + SetMedianTimeOffsetWarning(); 3633 | + }
This does seem to change warning behaviour, in that:
Is that intentional?
215 | + int64_t Median() const EXCLUSIVE_LOCKS_REQUIRED(!m_mutex) 216 | + { 217 | + LOCK(m_mutex); 218 | + 219 | + // Only compute the median from 5 or more samples. 220 | + if (m_index < 4) return 0;
dde4e1c6b0cbb49b84e75b9d0d1a92161ba5a499
if m_index is 4, there are 4 samples but we will still compute the median, so the comment is inaccurate.
In order to be consistent with how the code used to be what needs updating is the code and not the comment, doesn't it? Otherwise, we will be computing the median as low as with 4 elements when we used to need 5 before.
I don't know what the motivation to require 5 data points previously was (why not 4 or 3?), but my impression by reading the comments in this PR is that we are trying no to change anyting with respect to how the median is computed and leave that for a followup
87 | @@ -84,6 +88,9 @@ class PeerManager : public CValidationInterface, public NetEventsInterface 88 | /** Get statistics from node state */ 89 | virtual bool GetNodeStateStats(NodeId nodeid, CNodeStateStats& stats) const = 0; 90 | 91 | + /** Get peer manager info. */
83c790e3abd7d2434894fb964c21796f1d024e9b
the comment could be more useful, rephrasing the meaning of this info.
I agree with @ariard above we don't want a chain split after relaxing the consensus rule, perhaps this is worthy a discussion on the ML or irc meeting. I think it's safe due to the buggy protection and whatnot, but I'd bring this up for others.
Otherwise the code looks good.
202 | + { 203 | + LOCK(m_mutex); 204 | + 205 | + // Stop sampling after N samples. 206 | + // TODO This maintains a historic bug of adjusted time and could be 207 | + // changed in the future.
nit: Especially since there's no git log for this new code, this comment is pretty difficult to understand for future readers. Would suggest adding a link to e.g. https://github.com/bitcoin/bitcoin/commit/93659379bdedc29a87fe351ba2950a2c976aebd9 for context
3626 | - AddTimeData(pfrom.addr, nTimeOffset); 3627 | + m_outbound_time_offsets.Add(peer->m_time_offset); 3628 | + 3629 | + auto median = m_outbound_time_offsets.Median(); 3630 | + if (median > m_opts.time_offset_warn_threshold.count() || 3631 | + median < -m_opts.time_offset_warn_threshold.count()) {
if (std::abs(median) > m_opts.time_offset_warn_threshold.count()) {
This wouldn't trigger the warning if the median is equal to std::numeric_limits<int64_t>::min()
Came for this. 😕 indeed
Concept ACK, and code LGTM 87c577e9742d7154826c755a7fe320f34fd54c81 (but needs rebase).
It is unfortunate that this is not a strict security improvement (by increasing the dependency on the local system clock, which can also be tampered with), but not having consensus rely on a majority of arbitrary peers (phrased very well here) is a worthwhile improvement and in my view probably safer. A wrong system clock should be pretty visible (and cause other issues) anyway, and hardening the system (clock) is not really a task for Core.
That said, I think having a good warning system is helpful, if indeed manipulating a user's system clock is feasible (which I'm not an expert on), but I'm happy to be convinced otherwise on this.
perhaps this is worthy a discussion on the ML or irc meeting.
I agree.
My understanding of what nuking network-adjusted time affects:
PeerManagerImpl::CanDirectFetch())HeadersSyncState::m_max_commitments which represents the maximum number of headers we accept from a peerOnly rebased for now
3607 | @@ -3548,12 +3608,15 @@ void PeerManagerImpl::ProcessMessage(CNode& pfrom, const std::string& msg_type, 3608 | peer->m_starting_height, addrMe.ToStringAddrPort(), fRelay, pfrom.GetId(), 3609 | remoteAddr, (mapped_as ? strprintf(", mapped_as=%d", mapped_as) : "")); 3610 | 3611 | - int64_t nTimeOffset = nTime - GetTime(); 3612 | - pfrom.nTimeOffset = nTimeOffset; 3613 | + peer->m_time_offset = nTime - GetTime(); 3614 | if (!pfrom.IsInboundConn()) { 3615 | - // Don't use timedata samples from inbound peers to make it 3616 | - // harder for others to tamper with our adjusted time.
I think a version of this comment may still be relevant to motivate why we only sample outbound nodes
222 | + 223 | + std::array<int64_t, N> sorted_copy = m_offsets; 224 | + std::sort(sorted_copy.begin(), sorted_copy.begin() + m_index); 225 | + 226 | + if (m_index % 2 == 0) { 227 | + return (sorted_copy[m_index / 2] / 2) +
I think the the addition should be done before the division to handle rounding effects better, like it's done in timedata. If the two relevant offsets are, for example, 3 and 5, the median with this code would be 3.
The current approach addresses overflow concerns as per #28956 (review)
ok, I see. Still, having a function that returns 3 as the median of {3,5} and 5 as the median of {4,6} seems inconsistent, couldn't we explicitly check and handle overflows instead?
I was going to suggest doing sorted_copy[m_index / 2] + sorted_copy[m_index / 2 - 1]) / 2)) before seeing this comment since the current approach feels a bit strange. If we keep this I'd say at least add a comment pointing out this is to prevent overflow
We already assume the median can be rounded down, so I don't think extra rounding changes any assumptions, and therefore I wouldn't overcomplicate things, it's just not worth it imo. I'd be okay with casting to a double first, but it's not strictly better as we're trading off precision at the higher ranges (which I also don't think is important at all).
if (m_index % 2 == 0) {
return (0.5 * sorted_copy[m_index / 2]) +
(0.5 * sorted_copy[m_index / 2 - 1]);
}
Just a code comment would be fine too. I agree that the rounding doesn't matter at all in this context (and it'd also be fine not to calculate the mean in the even case at all), but code tends to get copied and reused.
41 | @@ -41,6 +42,12 @@ struct CNodeStateStats { 42 | bool m_addr_relay_enabled{false}; 43 | ServiceFlags their_services; 44 | int64_t presync_height{-1}; 45 | + int64_t time_offset{0}; 46 | +}; 47 | + 48 | +struct PeerManagerInfo { 49 | + int64_t median_outbound_time_offset{0}; 50 | + bool ignores_incoming_txs{false};
Out of curiosity: are you grouping these two so you can query them together, given you'll need both of them in getnetworkinfo if peerman is defined?
Concept ACK
Concept ACK
2184 | @@ -2185,10 +2185,6 @@ bool Chainstate::ConnectBlock(const CBlock& block, BlockValidationState& state, 2185 | // upgrade from one software version to the next after a consensus rule 2186 | // change is potentially tricky and issue-specific (see NeedsRedownload() 2187 | // for one approach that was used for BIP 141 deployment). 2188 | - // Also, currently the rule against blocks more than 2 hours in the future
I think this comment should stay, because NodeClock isn't guaranteed to be monotonic?
This PR now only removes adjusted time from validation code while keeping all the code for the warning in place as is. A follow-up can change the warning (as it has many problems) but this PR will focus on the removal of adjusted time from consensus.
Re-implementing the same warning seemed unnecessary (even if it was less code) and prone to bugs.
3804 | @@ -3805,7 +3805,7 @@ static bool ContextualCheckBlockHeader(const CBlockHeader& block, BlockValidatio 3805 | return state.Invalid(BlockValidationResult::BLOCK_INVALID_HEADER, "time-too-old", "block's timestamp is too early"); 3806 | 3807 | // Check timestamp 3808 | - if (block.Time() > now + std::chrono::seconds{MAX_FUTURE_BLOCK_TIME}) { 3809 | + if (block.Time() > NodeClock::now() + std::chrono::seconds{MAX_FUTURE_BLOCK_TIME}) {
This latest force-push seems to have accidentally reverted the necessary doc change in chain.h:
<details> <summary>git diff on ff9039f6ea</summary>
diff --git a/src/chain.h b/src/chain.h
index f9121fb861..ab58fad5b9 100644
--- a/src/chain.h
+++ b/src/chain.h
@@ -19,7 +19,7 @@
/**
* Maximum amount of time that a block timestamp is allowed to exceed the
- * current network-adjusted time before the block will be accepted.
+ * current time before the block will be accepted.
*/
static constexpr int64_t MAX_FUTURE_BLOCK_TIME = 2 * 60 * 60;
</details>
Approach ACK
I think keeping this diff small and focusing on just the (non-warning) functional behaviour change is the best approach to reduce friction to get more high quality feedback on whether this is indeed a safety improvement (I lean towards yes). Code LGTM ff9039f6ea876bab2c40a06a93e0dd087f445fa2, but I think this warrants a release note since we are changing some security assumptions, making users aware of that seems like the way to go imo.
nit: latest force push regressed a couple of includes:
<details> <summary>git diff on ff9039f6ea</summary>
diff --git a/src/headerssync.cpp b/src/headerssync.cpp
index e1dfe10483..e14de004f5 100644
--- a/src/headerssync.cpp
+++ b/src/headerssync.cpp
@@ -5,8 +5,8 @@
#include <headerssync.h>
#include <logging.h>
#include <pow.h>
-#include <timedata.h>
#include <util/check.h>
+#include <util/time.h>
#include <util/vector.h>
// The two constants below are computed using the simulation script in
diff --git a/src/net_processing.cpp b/src/net_processing.cpp
index b426b889cf..8c43377875 100644
--- a/src/net_processing.cpp
+++ b/src/net_processing.cpp
@@ -41,6 +41,7 @@
#include <txrequest.h>
#include <util/check.h>
#include <util/strencodings.h>
+#include <util/time.h>
#include <util/trace.h>
#include <validation.h>
diff --git a/src/node/miner.cpp b/src/node/miner.cpp
index 89498fb976..87f40e993f 100644
--- a/src/node/miner.cpp
+++ b/src/node/miner.cpp
@@ -20,8 +20,8 @@
#include <policy/policy.h>
#include <pow.h>
#include <primitives/transaction.h>
-#include <timedata.h>
#include <util/moneystr.h>
+#include <util/time.h>
#include <validation.h>
#include <algorithm>
diff --git a/src/rpc/mining.cpp b/src/rpc/mining.cpp
index 069a451089..929247d32b 100644
--- a/src/rpc/mining.cpp
+++ b/src/rpc/mining.cpp
@@ -27,11 +27,11 @@
#include <script/descriptor.h>
#include <script/script.h>
#include <script/signingprovider.h>
-#include <timedata.h>
#include <txmempool.h>
#include <univalue.h>
#include <util/strencodings.h>
#include <util/string.h>
+#include <util/time.h>
#include <util/translation.h>
#include <validation.h>
#include <validationinterface.h>
diff --git a/src/timedata.h b/src/timedata.h
index 90428d071c..17e7539700 100644
--- a/src/timedata.h
+++ b/src/timedata.h
@@ -5,8 +5,6 @@
#ifndef BITCOIN_TIMEDATA_H
#define BITCOIN_TIMEDATA_H
-#include <util/time.h>
-
#include <algorithm>
#include <cassert>
#include <chrono>
</details>
This PR now only removes adjusted time from validation code while keeping all the code for the warning in place as is. A follow-up can change the warning (as it has many problems) but this PR will focus on the removal of adjusted time from consensus.
Re-implementing the same warning seemed unnecessary (even if it was less code) and prone to bugs.
Does this mean that the adjusted time is now being collected and used exclusively to raise the warning?
Does this mean that the adjusted time is now being collected and used exclusively to raise the warning?
This was the case before the force-push, too. In this current approach, the diff is much smaller so review can focus more on whether we're okay removing network-adjusted time, instead of on the adjusted time calculation refactoring.
ACK ff9039f6ea876bab2c40a06a93e0dd087f445fa2
After a few attempts, I haven't found a way to trigger a consensus fork through adjusted time. It's not like this could "infect" half of the network (legacy) through block propagation or something. Because of that, if there was/is a legacy way to exploit it (say an overflow in nTimeOffset calculation), this new code won't make it worse.
lgtm ACK ff9039f6ea876bab2c40a06a93e0dd087f445fa2 🤽
<details><summary>Show signature</summary>
Signature:
untrusted comment: signature from minisign secret key on empty file; verify via: minisign -Vm "${path_to_any_empty_file}" -P RWTRmVTMeKV5noAMqVlsMugDDCyyTSbA3Re5AkUrhvLVln0tSaFWglOw -x "${path_to_this_whole_four_line_signature_blob}"
RUTRmVTMeKV5npGrKx1nqXCw5zeVHdtdYURB/KlyA/LMFgpNCs+SkW9a8N95d+U4AP1RJMi+krxU1A3Yux4bpwZNLvVBKy0wLgM=
trusted comment: lgtm ACK ff9039f6ea876bab2c40a06a93e0dd087f445fa2 🤽
hjCKZbc2j48mFkBbdMp6XYwvC+JpHi9gtElrJyy4RJVJjbMvsAntWQAHbs4leVk6OCxxLaX5X9PRIWR3wRQ/DA==
</details>
ACK ff9039f6ea876bab2c40a06a93e0dd087f445fa2
I would prefer the commit message to contain a rationale for the change, though. It's currently empty.
I now believe that the network-adjusted time implementation in master does not offer any security guarantees against an attacker trying to get a node out of consensus. The existing NTP vulnerabilities (which seem to be very real, even if currently apparently not exploited too often) can just as well be used by an attacker when network-adjusted time is used, given that we limit the network adjustment to 70 minutes. As such, this PR does indeed seem like a strong improvement to me.
Ensuring an accurate clock is a system responsibility, and should not be a Bitcoin Core concern. Removing this logic simplifies the code, and removes the assumption that we need a majority of honest outbound nodes. Moreover, it seems trivial to implement an external daemon that monitors Bitcoin network time and e.g. adjusts the local clock or shuts down Bitcoin Core when a large offset is detected, for users that wish to harden their system this way.
ACK ff9039f6ea876bab2c40a06a93e0dd087f445fa2
ACK ff9039f6ea876bab2c40a06a93e0dd087f445fa2
After a few attempts, I haven't found a way to trigger a consensus fork through adjusted time.
Unclear what has been tested actually (e.g setting sampled time of peers in the past and playing with blocks timestamps). That said I think it’s a very edge scenario, and most susceptible to affect non-upgraded things like btcd.
I now believe that the network-adjusted time implementation in master does not offer any security guarantees against an attacker trying to get a node out of consensus. The existing NTP vulnerabilities (which seem to be very real, even if currently apparently not exploited too often) can just as well be used by an attacker when network-adjusted time is used, given that we limit the network adjustment to 70 minutes.
NTP has its own data / control plane. Our peers connections going through TCP. You might compromise NTP stack but not TCP and vice-versa.
Ensuring an accurate clock is a system responsibility, and should not be a Bitcoin Core concern
With argument of that kind we can remove assumevalid / assumeutxo and say perf issues poor host not core concern.
More we’re able to compensate weakness from network infrastructure level, better core node ecosystem stronger.
Moreover, it seems trivial to implement an external daemon that monitors Bitcoin network time and e.g. adjusts the local clock or shuts down Bitcoin Core when a large offset is detected, for users that wish to harden their system this way.
I’ll let as an exercise to the reader to define what should be the authoritative source of time of this external monitoring daemon :) My belief we should conserve a a warning system based on other peers nversion announcements. @dergoegge I think it can be good if you notify btcd and other validation software carrying a minimum of economic traffic of this change, at least to avoid future issues like the bip68 consensus discrepancy you found recently.
My belief we should conserve a a warning system based on other peers nversion announcements.
Yes, this is exactly what this pull request did. See also the description, which said: "while the warning to users if their clock is out of sync with the rest of the network remains."
See also: https://bitcoinops.org/en/newsletters/2024/02/07/#bitcoin-core-28956
Yes, this is exactly what this pull request did. See also the description, which said: "while the warning to users if their clock is out of sync with the rest of the network remains.”
That I know. It was more related to using an external daemon and naively using the same NTP source than your Bitcoin Core instance. There is no such thing as Bitcoin network time, just a sampling of peers nversion time. Yes it works if this external daemon consumes Core warning messages and takes corrective actions on this. And even that the correction frequency have to be weighted carefully (are you scaling the correction frequency on internal clock itself or received peers msgs ?)
FYI, some NTP attacks were actually on targeting NTP pool monitoring infrastructure, see this paper.
For future reference, related: https://bitcoincore.org/en/2024/07/03/disclose-timestamp-overflow.