Summary
Add an experimental backpressure mechanism that defers low-priority P2P message processing when the RPC work queue is under pressure, improving RPC tail latency under sustained P2P load.
Problem
The single-threaded message handler (-msghand thread) processes all P2P messages sequentially. Under sustained low-priority P2P traffic (tx relay, addr gossip), RPC tail latency degrades significantly because:
- Single-threaded bottleneck - The msghand thread can saturate a single CPU core at 100%, leaving other cores idle
- No priority differentiation - Low-priority P2P messages (INV, TX, ADDR) compete equally with RPC work
- RPC starvation - When RPC work queue fills up, new requests are rejected with “Work queue depth exceeded”
This affects users running RPC-heavy workloads (wallets, block explorers, Lightning nodes) alongside P2P relay.
Solution
Introduce a minimal, opt-in backpressure policy:
Request Flow
0sequenceDiagram
1 participant Peer as P2P Peer
2 participant PM as PeerManager
3 participant Monitor as RpcLoadMonitor
4 participant HTTP as HTTP Server
5 participant RPC as RPC Client
6
7 Note over HTTP,Monitor: RPC queue fills up
8 RPC->>HTTP: Multiple RPC requests
9 HTTP->>Monitor: OnQueueDepthSample(depth=80, cap=100)
10 Monitor->>Monitor: State: NORMAL → ELEVATED
11
12 Note over Peer,PM: P2P message arrives
13 Peer->>PM: INV (tx announcements)
14 PM->>Monitor: GetState()
15 Monitor-->>PM: ELEVATED
16 PM->>PM: IsLowPriorityMessage("inv") = true
17 PM->>PM: RequeueMessageForProcessing()
18 Note over PM: Message deferred to back of queue
19
20 Note over Peer,PM: Critical message arrives
21 Peer->>PM: HEADERS
22 PM->>Monitor: GetState()
23 Monitor-->>PM: ELEVATED
24 PM->>PM: IsLowPriorityMessage("headers") = false
25 PM->>PM: ProcessMessage()
26 Note over PM: Critical messages always processed
27
28 Note over HTTP,Monitor: RPC queue drains
29 HTTP->>Monitor: OnQueueDepthSample(depth=40, cap=100)
30 Monitor->>Monitor: State: ELEVATED → NORMAL
31 Note over PM: Resume normal P2P processing
Components
-
RpcLoadMonitor - A lock-free state machine that tracks RPC queue depth and exposes load state (
NORMAL,ELEVATED,CRITICAL) with hysteresis to prevent oscillation. -
Backpressure Gate - A check in
PeerManagerImpl::ProcessMessages()that defers low-priority P2P messages when RPC load is elevated. -
Message Classification - Clear separation between:
- Low-priority (deferrable):
TX,INV(tx),GETDATA(tx),MEMPOOL,ADDR,ADDRV2,GETADDR - Critical (never throttled):
HEADERS,BLOCK,CMPCTBLOCK,BLOCKTXN,GETHEADERS,GETBLOCKS, handshake/control messages
- Low-priority (deferrable):
-
Defer-to-tail - Deferred messages are requeued to the back of the peer’s message queue, not dropped. This preserves eventual delivery while prioritizing RPC responsiveness.
Changes
New Files
src/node/rpc_load_monitor.h-RpcLoadStateenum,RpcLoadMonitorinterface,AtomicRpcLoadMonitorimplementation
Modified Files
src/net_processing.h- Addexperimental_rpc_priorityandrpc_load_monitortoPeerManager::Optionssrc/net_processing.cpp- Backpressure gate inProcessMessages(),IsLowPriorityMessage()helpersrc/net.h- AddRequeueMessageForProcessing()toCNodesrc/net.cpp- ImplementRequeueMessageForProcessing()src/httpserver.h- AddSetHttpServerRpcLoadMonitor()src/httpserver.cpp- CallOnQueueDepthSample()at enqueue/dispatch pointssrc/node/peerman_args.cpp- Parse-experimental-rpc-priorityflagsrc/init.cpp- CreateRpcLoadMonitor, wire to HTTP server and PeerManager
New Flag
0-experimental-rpc-priority=<0|1> (default: 0)
1 Enable experimental RPC-aware P2P backpressure policy.
2 When enabled, low-priority P2P messages may be deferred
3 during RPC queue overload to improve RPC latency.
Policy Details
State Machine
0stateDiagram-v2
1 [*] --> NORMAL
2
3 NORMAL --> ELEVATED : queue ≥ 75%
4 NORMAL --> CRITICAL : queue ≥ 90%
5
6 ELEVATED --> CRITICAL : queue ≥ 90%
7 ELEVATED --> NORMAL : queue < 50%
8
9 CRITICAL --> ELEVATED : queue < 70%
10
11 note right of NORMAL : Process all messages
12 note right of ELEVATED : Defer low-priority P2P
13 note right of CRITICAL : Defer low-priority P2P
Hysteresis prevents rapid state oscillation under fluctuating load.
Thresholds
| Transition | Condition |
|---|---|
| NORMAL → ELEVATED | queue_depth ≥ 75% capacity |
| NORMAL → CRITICAL | queue_depth ≥ 90% capacity |
| ELEVATED → CRITICAL | queue_depth ≥ 90% capacity |
| ELEVATED → NORMAL | queue_depth < 50% capacity |
| CRITICAL → ELEVATED | queue_depth < 70% capacity |
Behavior by State
| State | Low-priority P2P | Critical P2P | RPC |
|---|---|---|---|
| NORMAL | Process normally | Process normally | Process normally |
| ELEVATED | Defer to tail | Process normally | Process normally |
| CRITICAL | Defer to tail | Process normally | Process normally |
Performance Results
A/B test with concurrent P2P INV flood (~108K entries) and RPC workload (12 threads, 45s duration):
Baseline (flag=0)
| Metric | Value |
|---|---|
| RPC p50 | 1.925ms |
| RPC p95 | 9.320ms |
| RPC p99 | 17.974ms |
| RPC calls/sec | 3,919 |
| P2P INV msgs | 3,394 |
With Policy (flag=1)
| Metric | Value |
|---|---|
| RPC p50 | 1.845ms |
| RPC p95 | 7.755ms |
| RPC p99 | 15.977ms |
| RPC calls/sec | 4,286 |
| P2P INV msgs | 3,372 |
Improvement
| Metric | Change |
|---|---|
| RPC p50 | -4.2% (better) |
| RPC p95 | -16.79% (better) |
| RPC p99 | -11.11% (better) |
| Throughput | +9.4% |
Testing
Unit Tests (src/test/rpc_load_monitor_tests.cpp)
rpc_load_monitor_testssuite (12 tests):- State transitions (normal→elevated→critical)
- Hysteresis behavior
- Thread safety
- Edge cases (zero/negative capacity)
- Custom threshold configuration
Functional Test (test/functional/feature_rpc_p2p_backpressure_ab.py)
- A/B comparison with P2P INV flood workload
- Measures RPC latency percentiles (p50/p95/p99)
- Verifies no RPC errors under load
- Outputs JSON metrics for analysis
Test Commands
0# Unit tests
1build/bin/test_bitcoin --run_test=rpc_load_monitor_tests
2
3# Functional A/B test
4python3 test/functional/feature_rpc_p2p_backpressure_ab.py
Limitations and Future Work
- Experimental - Feature is opt-in and may change based on feedback
- Overhead - Without P2P pressure, policy adds ~1% overhead from state checks
- Tuning - Threshold values are initial estimates; may need adjustment based on real-world data
- Message granularity - Currently classifies by message type; could be refined to inspect INV/GETDATA contents
Backwards Compatibility
- No consensus changes
- No P2P protocol changes
- No behavior change when flag is disabled (default)
- Existing tests pass
Historical Context
This problem has been discussed in various forms:
- Mining pools (AntPool) reported ProcessMessage CPU saturation causing block delays
- Requests to split the msghand thread into multiple threads
- Analysis of CPU time spent in ProcessMessages() per peer
This PR provides a lightweight, opt-in mitigation without requiring architectural changes to the message handler threading model.