This picks up parts of #25908.
The use of adjusted time is removed from validation code while the warning to users if their clock is out of sync with the rest of the network remains.
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.
For detailed information about the code coverage, see the test coverage report.
See the guideline for information on the review process.
Type | Reviewers |
---|---|
ACK | naumenkogs, maflcko, stickies-v, achow101 |
Concept ACK | ajtowns, RandyMcMillan, fanquake, Pttn, ariard, TheCharlatan, sr-gi |
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.
Reviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.
87@@ -84,6 +88,9 @@ class PeerManager : public CValidationInterface, public NetEventsInterface
88 /** Get statistics from node state */
89 virtual bool GetNodeStateStats(NodeId nodeid, CNodeStateStats& stats) const = 0;
90
91+ /** Get stats from peer manager. */
92+ virtual PeerManagerStats GetStats() const = 0;
GetInfo()
and combine it with IgnoresIncomingTxs()
?
672@@ -674,7 +673,7 @@ static RPCHelpMan getnetworkinfo()
673 if (node.peerman) {
674 obj.pushKV("localrelay", !node.peerman->IgnoresIncomingTxs());
675 }
676- obj.pushKV("timeoffset", GetTimeOffset());
677+ obj.pushKV("timeoffset", node.peerman->GetStats().median_outbound_time_offset);
node.peerman
being valid, see the previous line.
3600@@ -3567,6 +3601,8 @@ void PeerManagerImpl::ProcessMessage(CNode& pfrom, const std::string& msg_type,
3601 // Don't use timedata samples from inbound peers to make it
3602 // harder for others to tamper with our adjusted time.
3603 AddTimeData(pfrom.addr, peer->m_time_offset);
3604+
3605+ m_outbound_time_offsets.Add(peer->m_time_offset);
Having a rolling set of 10 time offsets from every outbound connection we make (full outbounds, block-relay-only, extra block-relay-only, feelers?) seems somewhat risky: at present, you’ll get odd behaviour if a high-proportion of the first 100 outbound peers you connect to have a different time; with this change, you’ll get a warning if the last 10 peers you randomly connect to have a dodgy view of the time?
Maybe consider making this conditional on IsManualOrFullOutboundConn()
instead; that way it only updates rarely for things we’re really trying to connect to, and isn’t mostly made up of random peers we’re briefly sampling?
Maybe consider making this conditional on IsManualOrFullOutboundConn()
Done
3612@@ -3611,6 +3613,12 @@ void PeerManagerImpl::ProcessMessage(CNode& pfrom, const std::string& msg_type,
3613 AddTimeData(pfrom.addr, peer->m_time_offset);
3614
3615 m_outbound_time_offsets.Add(peer->m_time_offset);
3616+
3617+ auto median = m_outbound_time_offsets.Median();
3618+ if (median > TIME_OFFSET_WARN_THRESHOLD.count() ||
3619+ median < -TIME_OFFSET_WARN_THRESHOLD.count()) {
3620+ SetMiscWarning(Untranslated("Please check that your computer's date and time are correct"));
SetMiscWarning
seems a poor interface; if there’s two warnings, the latest one overrides the first, and you can’t remove warnings in the event that they resolve themselves. Might be better to do something like SetfLargeWorkInvalidChainFound(bool)
?
175@@ -175,6 +176,7 @@ static constexpr double MAX_ADDR_RATE_PER_SECOND{0.1};
176 static constexpr size_t MAX_ADDR_PROCESSING_TOKEN_BUCKET{MAX_ADDR_TO_SEND};
177 /** The compactblocks version we support. See BIP 152. */
178 static constexpr uint64_t CMPCTBLOCKS_VERSION{2};
179+static constexpr std::chrono::seconds TIME_OFFSET_WARN_THRESHOLD{30min};
DEFAULT_MAX_TIME_ADJUSTMENT
. 70min avoids warning if you’re buggy by an hour due to (you or your peers) getting daylight savings time wrong, which seems like a vaguely plausible thing to continue doing?
495@@ -497,7 +496,6 @@ void SetupServerArgs(ArgsManager& argsman)
496 argsman.AddArg("-maxconnections=<n>", strprintf("Maintain at most <n> automatic connections to peers (default: %u). This limit does not apply to connections manually added via -addnode or the addnode RPC, which have a separate limit of %u.", DEFAULT_MAX_PEER_CONNECTIONS, MAX_ADDNODE_CONNECTIONS), ArgsManager::ALLOW_ANY, OptionsCategory::CONNECTION);
497 argsman.AddArg("-maxreceivebuffer=<n>", strprintf("Maximum per-connection receive buffer, <n>*1000 bytes (default: %u)", DEFAULT_MAXRECEIVEBUFFER), ArgsManager::ALLOW_ANY, OptionsCategory::CONNECTION);
498 argsman.AddArg("-maxsendbuffer=<n>", strprintf("Maximum per-connection memory usage for the send buffer, <n>*1000 bytes (default: %u)", DEFAULT_MAXSENDBUFFER), ArgsManager::ALLOW_ANY, OptionsCategory::CONNECTION);
499- argsman.AddArg("-maxtimeadjustment", strprintf("Maximum allowed median peer time offset adjustment. Local perspective of time may be influenced by outbound peers forward or backward by this amount (default: %u seconds).", DEFAULT_MAX_TIME_ADJUSTMENT), ArgsManager::ALLOW_ANY, OptionsCategory::CONNECTION);
209+ AssertLockNotHeld(m_mutex);
210+ std::array<int64_t, N> sorted_copy = WITH_LOCK(m_mutex, return m_offsets);
211+ std::sort(sorted_copy.begin(), sorted_copy.end());
212+
213+ if constexpr (N % 2 == 0) {
214+ return (sorted_copy[N / 2 - 1] + sorted_copy[N / 2]) / 2;
min(N, m_index)
instead of N
?
122658f95b36b7f940351bd34a89e13831931f18
should we make this addition overflow safe?
The idea is that m_index
is always set to the next index in m_offsets
that gets written to (so a value out of [0, N)
) and we just assume that the initial N samples were 0, which is why the median is immediately computed out of N.
I could change it to always increase m_index
instead and add new offsets by doing m_offsets[m_index % N] = new_offset
, in which case your suggestion would allow us to compute the median more “accurately” while we haven’t seen N samples yet. I’m not sure though if that is an improvement,. e.g. do we want to issue a warning if only the first outbound connection has a offset larger than our threshold?
N
s from this class.
do we want to issue a warning if only the first outbound connection has a offset larger than our threshold?
This is the main question I guess. So, with the current code, having 4 peers would never trigger it, right? Because median will always be 0.
So, with the current code, having 4 peers would never trigger it, right? Because median will always be 0.
Yes and this sort of (on accident) mimics how/when we trigger the warning on master at 5 or more samples: https://github.com/bitcoin/bitcoin/blob/dde7ac5c704688c8a9af29bd07e5ae8114824ce7/src/timedata.cpp#L77
3615+
3616+ if (pfrom.IsManualOrFullOutboundConn()) {
3617+ // Only sample from outbound connections we are really trying to connect to.
3618+ m_outbound_time_offsets.Add(peer->m_time_offset);
3619+
3620+ auto median = m_outbound_time_offsets.Median();
I think one advance in term of robustness of this local time monitoring mechanism could be to add feeler ’s time offset (ConnectionType::FEELER
) to reserved slots in the index (e.g +3 on the 10 full-outbound connections). Those peers connections are already triggered at regular interval, which is not necessarily the case of outbound peer topology if the peers are stable and performant.
I guess a NTP-attacker could silently time shift the local clock of a victim, without this warning logic ever triggering if no new outbound or manual connections are ever opened during the exploitation.
I see your point but I think I would like to keep discussion about improving the warning condition to a future PR. The current adjusted time mechanism has the same flaw and is even worse as it stops sampling entirely after 200 samples.
Maybe we can collect some stats on how often outbound connections rotate and how many samples from which connections are actually useful to inform such a change.
I will add the release note.
stops sampling entirely after 200 samples
perhaps record this bug in the commit message, might be useful in the future?
Concept ACK
I would recommend to add an explicit release note, inciting nodes operators to monitor this new warning, especially if they’re running time-sensitive apps. Exploiting NTP to target bitcoin has been suggested both in NTP security literature and the time-dilation paper (I’ve never seen or heard any in the wild, contrary to some BGP attacks in other cryptos though I believe this is fairly practical). E.g
0new local clock monitoring
1==========================
2
3The notion of adjusted time relying on the median of other network nodes clocks is removed.
4Instead, a rolling sample of the time offset from local clock of each of the last 10 outbound
5peers is maintained and the median computed. If the yielding result is more or less than
670 min, a node warning is issued.
7
8This warning can be consumed by a call to `getblockchaininfo`. If the node is backing an
9application or protocol with stringent time-sensitive requirements and high funds at stake
10(e.g a lightning or time-enabled wallet) it is recommended to periodically run a monitoring
11job and in presence of warning automatically notify the operator.
200+ void Add(int64_t offset) EXCLUSIVE_LOCKS_REQUIRED(!m_mutex)
201+ {
202+ AssertLockNotHeld(m_mutex);
203+ LOCK(m_mutex);
204+ m_offsets[m_index] = offset;
205+ m_index = (1 + m_index) % N;
6c65f61b625212e128e5db52cbd80a3b57b1450b
nit: your code assumes that N is less than the capacity of size_t
, otherwise this addition can overflow. You might consider adding a static assert for that too, just in case someone in the future decides to use TimeOffsets<bazillion>
3602@@ -3604,11 +3603,6 @@ void PeerManagerImpl::ProcessMessage(CNode& pfrom, const std::string& msg_type,
3603 remoteAddr, (mapped_as ? strprintf(", mapped_as=%d", mapped_as) : ""));
3604
3605 peer->m_time_offset = nTime - GetTime();
3606- if (!pfrom.IsInboundConn()) {
BLOCK_RELAY
, FEELER
, ADDR_FETCH
anymore, right?
28@@ -28,6 +29,12 @@ void SetfLargeWorkInvalidChainFound(bool flag)
29 fLargeWorkInvalidChainFound = flag;
30 }
31
32+void SetMedianTimeOffsetWarning()
33+{
34+ LOCK(g_warnings_mutex);
35+ g_timeoffset_warning = true;
Changed the approach.
This now still deletes all the adjusted time code and removes it from consensus code but keeps the warning condition for a large median time offset as it is on master. Improvements to the warning can be done in a follow up.
This should now be a very straight forward review, since we don’t need to discuss a new warning mechanism.
Looking at the diff between the two versions, I see these changes:
FEELER
, ADDRFETCH
, BLOCKRELAY
are also considered)int64_max/2
; but i don’t care that much)Not sure i see keeps warning condition for a large median time offset as it is on master
.
Generally, while I think it’s fine to leave (1) for the latter PR, re-introducing the bug (2) is somewhat unfortunate:
(3) should be kept for either version of course.
Removing adjusted time from consensus is a big win. Having time samples received from peers be involved in consensus checks is not great and not effective against NTP based attacks (https://github.com/bitcoin/bitcoin/pull/25908#issuecomment-1264632475, #25908 (comment)). The warning itself is not as interesting to me but most of the review on both #25908 and this PR is about the warning and its threshold.
I don’t want changing the warning (small win) to distract from removing adjusted time from consensus (big win).
an alternative would divide each element by 2 and then add without overflow risk
Done.
Removing adjusted time from consensus is a big win. Having time samples received from peers be involved in consensus checks is not great and not effective against NTP based attacks (#25908 (comment), #25908 (comment)). The warning itself is not as interesting to me but most of the review on both #25908 and this PR is about the warning and its threshold.
Currently, we’re adjusting our local clock from the median of the first 199 outbound peers we’re connecting to (only the first 199 due to the oddness bug in AddTimeData()
). In case of the occurence of a NTP-based attack (e.g a malicious network delay on traffic between the local system NTP client and its timeservers), this median-adjusted time mechanism can generate a salvatory warning towards the node operators (the "Please check that your computer's date and time are correct!"
). However, this mechanism comes as a security benefit only if the attack is active when you start your local bitcoind and until the first 199 outbound peers connection have been sampled (which might last for hours / days, given our only 10 outbound connections slots at the moment). So there is a robustness benefit brought by the current adjusted time mechanism and this is has been one of my main concern brought in #25908.
On the other hand, this mechanism can allow your outbound peers topology to correct your local time (as used for some consensus rules validation in ContextualCheckBlockHeader()
) in the boundaries of -/+ DEFAULT_MAX_TIME_ADJUSTEMENT
. This exploit assumes you’re able to corrupt or compromise 100 outbound peers (as it’s a median), which has to be done in considerations of all our hardenings in AddrMan
and some low-hashrate capabilities. Those 2 elements in my appreciation present a very high-bar (at least compared to state-of-art practical NTP-based attacks).
As such, I reasonably think maintaining the warning mechanism is valuable at the very least to maintain current robustness of our consensus validation code in face of NTP attacks. This always give the chance of node operators noticing time anomalies from their logs, which is welcome robustness if you’re operating a time-sensitive bitcoin infrastructure imo. I think we should be better to keep a large buffer of sampled outbound peers (e.g in the order of 100 outbound peers as right now) for this current change itself, and fix the oddness bug.
For sure, we can add additional robustness mechanisms in follow-up PRs, e.g “random drop” of sampeld peers time offset to add uncertainty in the eyes of adversaries targeting neutralization of the warning mechanism or including feelers connections in sampled offsets.
Another source of concern, the ajdusted-time is implemented by other full-nodes with at least some other real-world economic traffic. If we’re modifying our consensus-definition of time, and they stay on the old one, this could be source of consensus discrepancies in some edge scenarios. True, this could be a source of concern for non-upraded Bitcoin Core nodes.
From a deployment viewpoint, my recommendation is to notify on the maling list or bitcoin forums other bitcoin full-nodes implementations of this change, and some “rough” deployment timeline in function of Bitcoin Core’s own 27.0 or 28.0 release schedules. I think there is a lesson to be learnt from the old Bitcoin 0.8 LevelDB upgrade.
See, btcd code: https://github.com/btcsuite/btcd/blob/master/blockchain/mediantime.go#L74
0// medianTime provides an implementation of the MedianTimeSource interface.
1// It is limited to maxMedianTimeEntries includes the same buggy behavior as
2// the time offset mechanism in Bitcoin Core. This is necessary because it is
3// used in the consensus code.
4type medianTime struct {
5 mtx sync.Mutex
6 knownIDs map[string]struct{}
7 offsets []int64
8 offsetSecs int64
9 invalidTimeChecked bool
10}
228+ return sorted_copy[m_index / 2];
229+ } else {
230+ return (sorted_copy[m_index / 2] / 2) +
231+ (sorted_copy[m_index / 2 - 1] / 2);
232+ }
233+ }
sorted_copy
being an array of size N
, I think we’d be sorting uninitialized values if m_index < N
?Potential fix addressing all 3:
0diff --git a/src/net_processing.cpp b/src/net_processing.cpp
1index 664992dcab..9ade704fe9 100644
2--- a/src/net_processing.cpp
3+++ b/src/net_processing.cpp
4@@ -215,20 +215,24 @@ public:
5 /** Compute and return the median of the collected time offset samples. */
6 int64_t Median() const EXCLUSIVE_LOCKS_REQUIRED(!m_mutex)
7 {
8- AssertLockNotHeld(m_mutex);
9- LOCK(m_mutex);
10+ std::vector<int64_t> sorted_copy;
11+ {
12+ AssertLockNotHeld(m_mutex);
13+ LOCK(m_mutex);
14
15- // Only compute the median from 5 or more samples.
16- if (m_index < 4) return 0;
17+ // Only compute the median from 5 or more samples.
18+ if (m_index < 4) return 0;
19+
20+ sorted_copy.assign(m_offsets.begin(), m_offsets.begin() + m_index);
21+ }
22
23- std::array<int64_t, N> sorted_copy = m_offsets;
24 std::sort(sorted_copy.begin(), sorted_copy.end());
25+ auto mid{sorted_copy.size() / 2};
26
27- if (m_index % 2 == 0) {
28- return sorted_copy[m_index / 2];
29+ if (sorted_copy.size() % 2 == 0) {
30+ return (sorted_copy[mid] / 2) + (sorted_copy[mid - 1] / 2);
31 } else {
32- return (sorted_copy[m_index / 2] / 2) +
33- (sorted_copy[m_index / 2 - 1] / 2);
34+ return sorted_copy[mid];
35 }
36 }
37 };
I think the calculation is wrong: the average of the 2 middle elements should be taken in case of an even number of elements.
yes🤦
UB: sorted_copy being an array of size N, I think we’d be sorting uninitialized values if m_index < N?
No, m_offsets
is not uninitialized
No,
m_offsets
is not uninitialized
The array is initialized, ~but the items are not~, and afaik int64_t
doesn’t have a default constructor?
213+ }
214+
215+ /** Compute and return the median of the collected time offset samples. */
216+ int64_t Median() const EXCLUSIVE_LOCKS_REQUIRED(!m_mutex)
217+ {
218+ AssertLockNotHeld(m_mutex);
670@@ -672,9 +671,10 @@ static RPCHelpMan getnetworkinfo()
671 obj.pushKV("localservicesnames", GetServicesNames(services));
672 }
673 if (node.peerman) {
674- obj.pushKV("localrelay", !node.peerman->IgnoresIncomingTxs());
675+ auto peerman_info{node.peerman->GetInfo()};
const nit
0 const auto peerman_info{node.peerman->GetInfo()};
191+private:
192+ // Historic maximum number of time offsets we include in our median.
193+ static constexpr size_t N{199};
194+
195+ mutable Mutex m_mutex;
196+ std::array<int64_t, N> m_offsets GUARDED_BY(m_mutex){};
Wdyt about making the offsets typesafe with std::chrono::seconds
?
0diff --git a/src/net_processing.cpp b/src/net_processing.cpp
1index 664992dcab..4a7243b82f 100644
2--- a/src/net_processing.cpp
3+++ b/src/net_processing.cpp
4@@ -193,12 +193,12 @@ private:
5 static constexpr size_t N{199};
6
7 mutable Mutex m_mutex;
8- std::array<int64_t, N> m_offsets GUARDED_BY(m_mutex){};
9+ std::array<std::chrono::seconds, N> m_offsets GUARDED_BY(m_mutex){};
10 size_t m_index GUARDED_BY(m_mutex){0};
11
12 public:
13 /** Add a new time offset sample. */
14- void Add(int64_t offset) EXCLUSIVE_LOCKS_REQUIRED(!m_mutex)
15+ void Add(std::chrono::seconds offset) EXCLUSIVE_LOCKS_REQUIRED(!m_mutex)
16 {
17 AssertLockNotHeld(m_mutex);
18 LOCK(m_mutex);
19@@ -213,15 +213,15 @@ public:
20 }
21
22 /** Compute and return the median of the collected time offset samples. */
23- int64_t Median() const EXCLUSIVE_LOCKS_REQUIRED(!m_mutex)
24+ std::chrono::seconds Median() const EXCLUSIVE_LOCKS_REQUIRED(!m_mutex)
25 {
26 AssertLockNotHeld(m_mutex);
27 LOCK(m_mutex);
28
29 // Only compute the median from 5 or more samples.
30- if (m_index < 4) return 0;
31+ if (m_index < 4) return std::chrono::seconds{0};
32
33- std::array<int64_t, N> sorted_copy = m_offsets;
34+ std::array<std::chrono::seconds, N> sorted_copy = m_offsets;
35 std::sort(sorted_copy.begin(), sorted_copy.end());
36
37 if (m_index % 2 == 0) {
38@@ -3621,11 +3621,11 @@ void PeerManagerImpl::ProcessMessage(CNode& pfrom, const std::string& msg_type,
39
40 peer->m_time_offset = nTime - GetTime();
41 if (!pfrom.IsInboundConn()) {
42- m_outbound_time_offsets.Add(peer->m_time_offset);
43+ m_outbound_time_offsets.Add(std::chrono::seconds{peer->m_time_offset});
44
45 auto median = m_outbound_time_offsets.Median();
46- if (median > m_opts.time_offset_warn_threshold.count() ||
47- median < -m_opts.time_offset_warn_threshold.count()) {
48+ if (median > m_opts.time_offset_warn_threshold ||
49+ median < -m_opts.time_offset_warn_threshold) {
50 SetMedianTimeOffsetWarning();
51 }
52 }
53diff --git a/src/net_processing.h b/src/net_processing.h
54index 0fec728ebb..4fb280ed16 100644
55--- a/src/net_processing.h
56+++ b/src/net_processing.h
57@@ -46,7 +46,7 @@ struct CNodeStateStats {
58 };
59
60 struct PeerManagerInfo {
61- int64_t median_outbound_time_offset{0};
62+ std::chrono::seconds median_outbound_time_offset{0};
63 bool ignores_incoming_txs{false};
64 };
65
66diff --git a/src/rpc/net.cpp b/src/rpc/net.cpp
67index ea53a974f1..780f1383ed 100644
68--- a/src/rpc/net.cpp
69+++ b/src/rpc/net.cpp
70@@ -673,7 +673,7 @@ static RPCHelpMan getnetworkinfo()
71 if (node.peerman) {
72 auto peerman_info{node.peerman->GetInfo()};
73 obj.pushKV("localrelay", !peerman_info.ignores_incoming_txs);
74- obj.pushKV("timeoffset", peerman_info.median_outbound_time_offset);
75+ obj.pushKV("timeoffset", peerman_info.median_outbound_time_offset.count());
76 }
77 if (node.connman) {
78 obj.pushKV("networkactive", node.connman->GetNetworkActive());
3628+
3629+ auto median = m_outbound_time_offsets.Median();
3630+ if (median > m_opts.time_offset_warn_threshold.count() ||
3631+ median < -m_opts.time_offset_warn_threshold.count()) {
3632+ SetMedianTimeOffsetWarning();
3633+ }
This does seem to change warning behaviour, in that:
Is that intentional?
215+ int64_t Median() const EXCLUSIVE_LOCKS_REQUIRED(!m_mutex)
216+ {
217+ LOCK(m_mutex);
218+
219+ // Only compute the median from 5 or more samples.
220+ if (m_index < 4) return 0;
m_index
is 4, there are 4 samples but we will still compute the median, so the comment is inaccurate.
In order to be consistent with how the code used to be what needs updating is the code and not the comment, doesn’t it? Otherwise, we will be computing the median as low as with 4 elements when we used to need 5 before.
I don’t know what the motivation to require 5 data points previously was (why not 4 or 3?), but my impression by reading the comments in this PR is that we are trying no to change anyting with respect to how the median is computed and leave that for a followup
87@@ -84,6 +88,9 @@ class PeerManager : public CValidationInterface, public NetEventsInterface
88 /** Get statistics from node state */
89 virtual bool GetNodeStateStats(NodeId nodeid, CNodeStateStats& stats) const = 0;
90
91+ /** Get peer manager info. */
83c790e3abd7d2434894fb964c21796f1d024e9b
the comment could be more useful, rephrasing the meaning of this info.
I agree with @ariard above we don’t want a chain split after relaxing the consensus rule, perhaps this is worthy a discussion on the ML or irc meeting. I think it’s safe due to the buggy protection and whatnot, but I’d bring this up for others.
Otherwise the code looks good.
202+ {
203+ LOCK(m_mutex);
204+
205+ // Stop sampling after N samples.
206+ // TODO This maintains a historic bug of adjusted time and could be
207+ // changed in the future.
3626- AddTimeData(pfrom.addr, nTimeOffset);
3627+ m_outbound_time_offsets.Add(peer->m_time_offset);
3628+
3629+ auto median = m_outbound_time_offsets.Median();
3630+ if (median > m_opts.time_offset_warn_threshold.count() ||
3631+ median < -m_opts.time_offset_warn_threshold.count()) {
0 if (std::abs(median) > m_opts.time_offset_warn_threshold.count()) {
std::numeric_limits<int64_t>::min()
Concept ACK, and code LGTM 87c577e9742d7154826c755a7fe320f34fd54c81 (but needs rebase).
It is unfortunate that this is not a strict security improvement (by increasing the dependency on the local system clock, which can also be tampered with), but not having consensus rely on a majority of arbitrary peers (phrased very well here) is a worthwhile improvement and in my view probably safer. A wrong system clock should be pretty visible (and cause other issues) anyway, and hardening the system (clock) is not really a task for Core.
That said, I think having a good warning system is helpful, if indeed manipulating a user’s system clock is feasible (which I’m not an expert on), but I’m happy to be convinced otherwise on this.
perhaps this is worthy a discussion on the ML or irc meeting.
I agree.
My understanding of what nuking network-adjusted time affects:
PeerManagerImpl::CanDirectFetch()
)HeadersSyncState::m_max_commitments
which represents the maximum number of headers we accept from a peer3607@@ -3548,12 +3608,15 @@ void PeerManagerImpl::ProcessMessage(CNode& pfrom, const std::string& msg_type,
3608 peer->m_starting_height, addrMe.ToStringAddrPort(), fRelay, pfrom.GetId(),
3609 remoteAddr, (mapped_as ? strprintf(", mapped_as=%d", mapped_as) : ""));
3610
3611- int64_t nTimeOffset = nTime - GetTime();
3612- pfrom.nTimeOffset = nTimeOffset;
3613+ peer->m_time_offset = nTime - GetTime();
3614 if (!pfrom.IsInboundConn()) {
3615- // Don't use timedata samples from inbound peers to make it
3616- // harder for others to tamper with our adjusted time.
222+
223+ std::array<int64_t, N> sorted_copy = m_offsets;
224+ std::sort(sorted_copy.begin(), sorted_copy.begin() + m_index);
225+
226+ if (m_index % 2 == 0) {
227+ return (sorted_copy[m_index / 2] / 2) +
timedata
. If the two relevant offsets are, for example, 3 and 5, the median with this code would be 3.
sorted_copy[m_index / 2] + sorted_copy[m_index / 2 - 1]) / 2))
before seeing this comment since the current approach feels a bit strange. If we keep this I’d say at least add a comment pointing out this is to prevent overflow
We already assume the median can be rounded down, so I don’t think extra rounding changes any assumptions, and therefore I wouldn’t overcomplicate things, it’s just not worth it imo. I’d be okay with casting to a double first, but it’s not strictly better as we’re trading off precision at the higher ranges (which I also don’t think is important at all).
0 if (m_index % 2 == 0) {
1 return (0.5 * sorted_copy[m_index / 2]) +
2 (0.5 * sorted_copy[m_index / 2 - 1]);
3 }
41@@ -41,6 +42,12 @@ struct CNodeStateStats {
42 bool m_addr_relay_enabled{false};
43 ServiceFlags their_services;
44 int64_t presync_height{-1};
45+ int64_t time_offset{0};
46+};
47+
48+struct PeerManagerInfo {
49+ int64_t median_outbound_time_offset{0};
50+ bool ignores_incoming_txs{false};
getnetworkinfo
if peerman is defined?
2184@@ -2185,10 +2185,6 @@ bool Chainstate::ConnectBlock(const CBlock& block, BlockValidationState& state,
2185 // upgrade from one software version to the next after a consensus rule
2186 // change is potentially tricky and issue-specific (see NeedsRedownload()
2187 // for one approach that was used for BIP 141 deployment).
2188- // Also, currently the rule against blocks more than 2 hours in the future
NodeClock
isn’t guaranteed to be monotonic?
This PR now only removes adjusted time from validation code while keeping all the code for the warning in place as is. A follow-up can change the warning (as it has many problems) but this PR will focus on the removal of adjusted time from consensus.
Re-implementing the same warning seemed unnecessary (even if it was less code) and prone to bugs.
3804@@ -3805,7 +3805,7 @@ static bool ContextualCheckBlockHeader(const CBlockHeader& block, BlockValidatio
3805 return state.Invalid(BlockValidationResult::BLOCK_INVALID_HEADER, "time-too-old", "block's timestamp is too early");
3806
3807 // Check timestamp
3808- if (block.Time() > now + std::chrono::seconds{MAX_FUTURE_BLOCK_TIME}) {
3809+ if (block.Time() > NodeClock::now() + std::chrono::seconds{MAX_FUTURE_BLOCK_TIME}) {
This latest force-push seems to have accidentally reverted the necessary doc change in chain.h
:
0diff --git a/src/chain.h b/src/chain.h
1index f9121fb861..ab58fad5b9 100644
2--- a/src/chain.h
3+++ b/src/chain.h
4@@ -19,7 +19,7 @@
5
6 /**
7 * Maximum amount of time that a block timestamp is allowed to exceed the
8- * current network-adjusted time before the block will be accepted.
9+ * current time before the block will be accepted.
10 */
11 static constexpr int64_t MAX_FUTURE_BLOCK_TIME = 2 * 60 * 60;
12
Approach ACK
I think keeping this diff small and focusing on just the (non-warning) functional behaviour change is the best approach to reduce friction to get more high quality feedback on whether this is indeed a safety improvement (I lean towards yes). Code LGTM ff9039f6ea876bab2c40a06a93e0dd087f445fa2, but I think this warrants a release note since we are changing some security assumptions, making users aware of that seems like the way to go imo.
nit: latest force push regressed a couple of includes:
0diff --git a/src/headerssync.cpp b/src/headerssync.cpp
1index e1dfe10483..e14de004f5 100644
2--- a/src/headerssync.cpp
3+++ b/src/headerssync.cpp
4@@ -5,8 +5,8 @@
5 #include <headerssync.h>
6 #include <logging.h>
7 #include <pow.h>
8-#include <timedata.h>
9 #include <util/check.h>
10+#include <util/time.h>
11 #include <util/vector.h>
12
13 // The two constants below are computed using the simulation script in
14diff --git a/src/net_processing.cpp b/src/net_processing.cpp
15index b426b889cf..8c43377875 100644
16--- a/src/net_processing.cpp
17+++ b/src/net_processing.cpp
18@@ -41,6 +41,7 @@
19 #include <txrequest.h>
20 #include <util/check.h>
21 #include <util/strencodings.h>
22+#include <util/time.h>
23 #include <util/trace.h>
24 #include <validation.h>
25
26diff --git a/src/node/miner.cpp b/src/node/miner.cpp
27index 89498fb976..87f40e993f 100644
28--- a/src/node/miner.cpp
29+++ b/src/node/miner.cpp
30@@ -20,8 +20,8 @@
31 #include <policy/policy.h>
32 #include <pow.h>
33 #include <primitives/transaction.h>
34-#include <timedata.h>
35 #include <util/moneystr.h>
36+#include <util/time.h>
37 #include <validation.h>
38
39 #include <algorithm>
40diff --git a/src/rpc/mining.cpp b/src/rpc/mining.cpp
41index 069a451089..929247d32b 100644
42--- a/src/rpc/mining.cpp
43+++ b/src/rpc/mining.cpp
44@@ -27,11 +27,11 @@
45 #include <script/descriptor.h>
46 #include <script/script.h>
47 #include <script/signingprovider.h>
48-#include <timedata.h>
49 #include <txmempool.h>
50 #include <univalue.h>
51 #include <util/strencodings.h>
52 #include <util/string.h>
53+#include <util/time.h>
54 #include <util/translation.h>
55 #include <validation.h>
56 #include <validationinterface.h>
57diff --git a/src/timedata.h b/src/timedata.h
58index 90428d071c..17e7539700 100644
59--- a/src/timedata.h
60+++ b/src/timedata.h
61@@ -5,8 +5,6 @@
62 #ifndef BITCOIN_TIMEDATA_H
63 #define BITCOIN_TIMEDATA_H
64
65-#include <util/time.h>
66-
67 #include <algorithm>
68 #include <cassert>
69 #include <chrono>
This PR now only removes adjusted time from validation code while keeping all the code for the warning in place as is. A follow-up can change the warning (as it has many problems) but this PR will focus on the removal of adjusted time from consensus.
Re-implementing the same warning seemed unnecessary (even if it was less code) and prone to bugs.
Does this mean that the adjusted time is now being collected and used exclusively to raise the warning?
Does this mean that the adjusted time is now being collected and used exclusively to raise the warning?
This was the case before the force-push, too. In this current approach, the diff is much smaller so review can focus more on whether we’re okay removing network-adjusted time, instead of on the adjusted time calculation refactoring.
ACK ff9039f6ea876bab2c40a06a93e0dd087f445fa2
After a few attempts, I haven’t found a way to trigger a consensus fork through adjusted time. It’s not like this could “infect” half of the network (legacy) through block propagation or something. Because of that, if there was/is a legacy way to exploit it (say an overflow in nTimeOffset
calculation), this new code won’t make it worse.
lgtm ACK ff9039f6ea876bab2c40a06a93e0dd087f445fa2 🤽
Signature:
0untrusted comment: signature from minisign secret key on empty file; verify via: minisign -Vm "${path_to_any_empty_file}" -P RWTRmVTMeKV5noAMqVlsMugDDCyyTSbA3Re5AkUrhvLVln0tSaFWglOw -x "${path_to_this_whole_four_line_signature_blob}"
1RUTRmVTMeKV5npGrKx1nqXCw5zeVHdtdYURB/KlyA/LMFgpNCs+SkW9a8N95d+U4AP1RJMi+krxU1A3Yux4bpwZNLvVBKy0wLgM=
2trusted comment: lgtm ACK ff9039f6ea876bab2c40a06a93e0dd087f445fa2 🤽
3hjCKZbc2j48mFkBbdMp6XYwvC+JpHi9gtElrJyy4RJVJjbMvsAntWQAHbs4leVk6OCxxLaX5X9PRIWR3wRQ/DA==
ACK ff9039f6ea876bab2c40a06a93e0dd087f445fa2
I would prefer the commit message to contain a rationale for the change, though. It’s currently empty.
I now believe that the network-adjusted time implementation in master does not offer any security guarantees against an attacker trying to get a node out of consensus. The existing NTP vulnerabilities (which seem to be very real, even if currently apparently not exploited too often) can just as well be used by an attacker when network-adjusted time is used, given that we limit the network adjustment to 70 minutes. As such, this PR does indeed seem like a strong improvement to me.
Ensuring an accurate clock is a system responsibility, and should not be a Bitcoin Core concern. Removing this logic simplifies the code, and removes the assumption that we need a majority of honest outbound nodes. Moreover, it seems trivial to implement an external daemon that monitors Bitcoin network time and e.g. adjusts the local clock or shuts down Bitcoin Core when a large offset is detected, for users that wish to harden their system this way.
After a few attempts, I haven’t found a way to trigger a consensus fork through adjusted time.
Unclear what has been tested actually (e.g setting sampled time of peers in the past and playing with blocks timestamps). That said I think it’s a very edge scenario, and most susceptible to affect non-upgraded things like btcd.
I now believe that the network-adjusted time implementation in master does not offer any security guarantees against an attacker trying to get a node out of consensus. The existing NTP vulnerabilities (which seem to be very real, even if currently apparently not exploited too often) can just as well be used by an attacker when network-adjusted time is used, given that we limit the network adjustment to 70 minutes.
NTP has its own data / control plane. Our peers connections going through TCP. You might compromise NTP stack but not TCP and vice-versa.
Ensuring an accurate clock is a system responsibility, and should not be a Bitcoin Core concern
With argument of that kind we can remove assumevalid
/ assumeutxo
and say perf issues poor host not core concern.
More we’re able to compensate weakness from network infrastructure level, better core node ecosystem stronger.
Moreover, it seems trivial to implement an external daemon that monitors Bitcoin network time and e.g. adjusts the local clock or shuts down Bitcoin Core when a large offset is detected, for users that wish to harden their system this way.
I’ll let as an exercise to the reader to define what should be the authoritative source of time of this external monitoring daemon :) My belief we should conserve a a warning system based on other peers nversion announcements. @dergoegge I think it can be good if you notify btcd and other validation software carrying a minimum of economic traffic of this change, at least to avoid future issues like the bip68 consensus discrepancy you found recently.
My belief we should conserve a a warning system based on other peers nversion announcements.
Yes, this is exactly what this pull request did. See also the description, which said: “while the warning to users if their clock is out of sync with the rest of the network remains.”
See also: https://bitcoinops.org/en/newsletters/2024/02/07/#bitcoin-core-28956
Yes, this is exactly what this pull request did. See also the description, which said: “while the warning to users if their clock is out of sync with the rest of the network remains.”
That I know. It was more related to using an external daemon and naively using the same NTP source than your Bitcoin Core instance. There is no such thing as Bitcoin network time, just a sampling of peers nversion time. Yes it works if this external daemon consumes Core warning messages and takes corrective actions on this. And even that the correction frequency have to be weighted carefully (are you scaling the correction frequency on internal clock itself or received peers msgs ?)
FYI, some NTP attacks were actually on targeting NTP pool monitoring infrastructure, see this paper.