net_processing: add opt-in RPC-aware P2P backpressure gate

morozow commented at 9:47 am on March 23, 2026: none

Summary

Add an experimental backpressure mechanism that defers low-priority P2P message processing when the RPC work queue is under pressure, improving RPC tail latency under sustained P2P load.

Problem

The single-threaded message handler (-msghand thread) processes all P2P messages sequentially. Under sustained low-priority P2P traffic (tx relay, addr gossip), RPC tail latency degrades significantly because:

Single-threaded bottleneck - The msghand thread can saturate a single CPU core at 100%, leaving other cores idle
No priority differentiation - Low-priority P2P messages (INV, TX, ADDR) compete equally with RPC work
RPC starvation - When RPC work queue fills up, new requests are rejected with “Work queue depth exceeded”

This affects users running RPC-heavy workloads (wallets, block explorers, Lightning nodes) alongside P2P relay.

Solution

Introduce a minimal, opt-in backpressure policy:

Request Flow

 0sequenceDiagram
 1    participant Peer as P2P Peer
 2    participant PM as PeerManager
 3    participant Monitor as RpcLoadMonitor
 4    participant HTTP as HTTP Server
 5    participant RPC as RPC Client
 6
 7    Note over HTTP,Monitor: RPC queue fills up
 8    RPC->>HTTP: Multiple RPC requests
 9    HTTP->>Monitor: OnQueueDepthSample(depth=80, cap=100)
10    Monitor->>Monitor: State: NORMAL → ELEVATED
11
12    Note over Peer,PM: P2P message arrives
13    Peer->>PM: INV (tx announcements)
14    PM->>Monitor: GetState()
15    Monitor-->>PM: ELEVATED
16    PM->>PM: IsLowPriorityMessage("inv") = true
17    PM->>PM: RequeueMessageForProcessing()
18    Note over PM: Message deferred to back of queue
19
20    Note over Peer,PM: Critical message arrives
21    Peer->>PM: HEADERS
22    PM->>Monitor: GetState()
23    Monitor-->>PM: ELEVATED
24    PM->>PM: IsLowPriorityMessage("headers") = false
25    PM->>PM: ProcessMessage()
26    Note over PM: Critical messages always processed
27
28    Note over HTTP,Monitor: RPC queue drains
29    HTTP->>Monitor: OnQueueDepthSample(depth=40, cap=100)
30    Monitor->>Monitor: State: ELEVATED → NORMAL
31    Note over PM: Resume normal P2P processing

Components

RpcLoadMonitor - A lock-free state machine that tracks RPC queue depth and exposes load state (NORMAL, ELEVATED, CRITICAL) with hysteresis to prevent oscillation.
Backpressure Gate - A check in PeerManagerImpl::ProcessMessages() that defers low-priority P2P messages when RPC load is elevated.
Message Classification - Clear separation between:
- Low-priority (deferrable): TX, INV (tx), GETDATA (tx), MEMPOOL, ADDR, ADDRV2, GETADDR
- Critical (never throttled): HEADERS, BLOCK, CMPCTBLOCK, BLOCKTXN, GETHEADERS, GETBLOCKS, handshake/control messages
Defer-to-tail - Deferred messages are requeued to the back of the peer’s message queue, not dropped. This preserves eventual delivery while prioritizing RPC responsiveness.

Changes

New Files

src/node/rpc_load_monitor.h - RpcLoadState enum, RpcLoadMonitor interface, AtomicRpcLoadMonitor implementation

Modified Files

src/net_processing.h - Add experimental_rpc_priority and rpc_load_monitor to PeerManager::Options
src/net_processing.cpp - Backpressure gate in ProcessMessages(), IsLowPriorityMessage() helper
src/net.h - Add RequeueMessageForProcessing() to CNode
src/net.cpp - Implement RequeueMessageForProcessing()
src/httpserver.h - Add SetHttpServerRpcLoadMonitor()
src/httpserver.cpp - Call OnQueueDepthSample() at enqueue/dispatch points
src/node/peerman_args.cpp - Parse -experimental-rpc-priority flag
src/init.cpp - Create RpcLoadMonitor, wire to HTTP server and PeerManager

New Flag

0-experimental-rpc-priority=<0|1>  (default: 0)
1    Enable experimental RPC-aware P2P backpressure policy.
2    When enabled, low-priority P2P messages may be deferred
3    during RPC queue overload to improve RPC latency.

Policy Details

State Machine

 0stateDiagram-v2
 1    [*] --> NORMAL
 2    
 3    NORMAL --> ELEVATED : queue ≥ 75%
 4    NORMAL --> CRITICAL : queue ≥ 90%
 5    
 6    ELEVATED --> CRITICAL : queue ≥ 90%
 7    ELEVATED --> NORMAL : queue < 50%
 8    
 9    CRITICAL --> ELEVATED : queue < 70%
10    
11    note right of NORMAL : Process all messages
12    note right of ELEVATED : Defer low-priority P2P
13    note right of CRITICAL : Defer low-priority P2P

Hysteresis prevents rapid state oscillation under fluctuating load.

Thresholds

Transition	Condition
NORMAL → ELEVATED	queue_depth ≥ 75% capacity
NORMAL → CRITICAL	queue_depth ≥ 90% capacity
ELEVATED → CRITICAL	queue_depth ≥ 90% capacity
ELEVATED → NORMAL	queue_depth < 50% capacity
CRITICAL → ELEVATED	queue_depth < 70% capacity

Behavior by State

State	Low-priority P2P	Critical P2P	RPC
NORMAL	Process normally	Process normally	Process normally
ELEVATED	Defer to tail	Process normally	Process normally
CRITICAL	Defer to tail	Process normally	Process normally

Performance Results

A/B test with concurrent P2P INV flood (~108K entries) and RPC workload (12 threads, 45s duration):

Baseline (flag=0)

Metric	Value
RPC p50	1.925ms
RPC p95	9.320ms
RPC p99	17.974ms
RPC calls/sec	3,919
P2P INV msgs	3,394

With Policy (flag=1)

Metric	Value
RPC p50	1.845ms
RPC p95	7.755ms
RPC p99	15.977ms
RPC calls/sec	4,286
P2P INV msgs	3,372

Improvement

Metric	Change
RPC p50	-4.2% (better)
RPC p95	-16.79% (better)
RPC p99	-11.11% (better)
Throughput	+9.4%

Testing

Unit Tests (`src/test/rpc_load_monitor_tests.cpp`)

rpc_load_monitor_tests suite (12 tests):
- State transitions (normal→elevated→critical)
- Hysteresis behavior
- Thread safety
- Edge cases (zero/negative capacity)
- Custom threshold configuration

Functional Test (`test/functional/feature_rpc_p2p_backpressure_ab.py`)

A/B comparison with P2P INV flood workload
Measures RPC latency percentiles (p50/p95/p99)
Verifies no RPC errors under load
Outputs JSON metrics for analysis

Test Commands

0# Unit tests
1build/bin/test_bitcoin --run_test=rpc_load_monitor_tests
2
3# Functional A/B test
4python3 test/functional/feature_rpc_p2p_backpressure_ab.py

Limitations and Future Work

Experimental - Feature is opt-in and may change based on feedback
Overhead - Without P2P pressure, policy adds ~1% overhead from state checks
Tuning - Threshold values are initial estimates; may need adjustment based on real-world data
Message granularity - Currently classifies by message type; could be refined to inspect INV/GETDATA contents

Backwards Compatibility

No consensus changes
No P2P protocol changes
No behavior change when flag is disabled (default)
Existing tests pass

Historical Context

This problem has been discussed in various forms:

Mining pools (AntPool) reported ProcessMessage CPU saturation causing block delays
Requests to split the msghand thread into multiple threads
Analysis of CPU time spent in ProcessMessages() per peer

This PR provides a lightweight, opt-in mitigation without requiring architectural changes to the message handler threading model.

net_processing: add opt-in RPC-aware P2P backpressure gate

Add an experimental backpressure mechanism that defers low-priority P2P
message processing when the RPC work queue is under pressure, improving
RPC tail latency under sustained P2P load.

Problem:
The single-threaded message handler can saturate CPU processing P2P
messages, causing RPC latency spikes and 'Work queue depth exceeded'
errors under heavy P2P traffic.

Solution:
- Add RpcLoadMonitor interface with lock-free atomic implementation
- Add backpressure gate in ProcessMessages() to defer low-priority P2P
- Add RequeueMessageForProcessing() to CNode for defer-to-tail behavior
- Wire HTTP queue depth sampling to RpcLoadMonitor
- Add -experimental-rpc-priority flag (default: off)

Low-priority messages (TX, INV, GETDATA for tx, ADDR, MEMPOOL) are
deferred when RPC queue depth exceeds thresholds. Critical messages
(HEADERS, BLOCK, CMPCTBLOCK, handshake) are never throttled.

State machine uses hysteresis to prevent oscillation:
- NORMAL -> ELEVATED: queue >= 75%
- ELEVATED -> NORMAL: queue < 50%
- ELEVATED -> CRITICAL: queue >= 90%
- CRITICAL -> ELEVATED: queue < 70%

A/B test results with P2P INV flood (~108K entries):
- RPC p95 latency: -16.79% (better)
- RPC p99 latency: -11.11% (better)
- RPC throughput: +9.4%

No consensus or P2P protocol changes. Feature is opt-in and experimental.

a1c08cc9ce

docs: Add RPC-P2P backpressure issue documentation and A/B test

- Add comprehensive issue documentation explaining RPC latency degradation
under sustained P2P traffic with root cause analysis
- Document proposed backpressure solution with state machine design and
threshold values for queue pressure monitoring
- Include preliminary A/B test results showing p95/p99 latency improvements
- Update functional test docstring to remove specific issue reference
- Provides context for opt-in RPC-aware P2P backpressure gate implementation

76ccb3fc6f

DrahtBot commented at 9:48 am on March 23, 2026: contributor

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Reviews

See the guideline for information on the review process.

Type	Reviewers
Concept NACK	stickies-v

If your review is incorrectly listed, please copy-paste <!–meta-tag:bot-skip–> into the comment that the bot should ignore.

test(feature_rpc_p2p_backpressure_ab): Use assert_greater_than_or_equal for latency checks

- Import assert_greater_than_or_equal utility function
- Replace direct assertions with assert_greater_than_or_equal for p95 latency comparison
- Replace direct assertions with assert_greater_than_or_equal for p99 latency comparison
- Improves test readability and provides clearer assertion semantics for threshold validation

afb0c61462

DrahtBot added the label CI failed on Mar 23, 2026

test(test_runner): Add RPC-P2P backpressure A/B test to suite

- Add feature_rpc_p2p_backpressure_ab.py to BASE_SCRIPTS test list
- Position test in functional test suite after mining_getblocktemplate_longpoll.py
- Enable A/B testing of RPC-aware P2P backpressure gate functionality

f46fac7594

maflcko commented at 1:42 pm on March 23, 2026: member

Add an experimental backpressure mechanism that defers low-priority P2P message processing when the RPC work queue is under pressure, improving RPC tail latency under sustained P2P load.

If your machine can’t handle the RPC load, I don’t think the solution is to starve the P2P. The correct fix would be to:

Reduce the load on the RPC.
Upgrade your machine to handle the load.
Create a flame graph, or other benchmarks to find the RPC bottleneck and make it faster, or create an issue.

If you have too many P2P peers for your machine to handle, you can reduce the number of peers or otherwise reduce traffic (see the reduce traffic docs)

stickies-v commented at 3:04 pm on March 23, 2026: contributor

Concept NACK, agreed with the rationale outlined above. This is adding a lot of unnecessary complexity.

maflcko commented at 3:40 pm on March 23, 2026: member

Also, this is a low-effort LLM generated pull, in any case.

maflcko closed this on Mar 23, 2026

morozow commented at 4:05 pm on March 23, 2026: none

Thanks for the feedback. I understand the concern about P2P priority. If anyone encounters RPC latency issues under P2P load in production, I’d be interested to hear about the use case.

To confirm, docs and tests can be LLM-generated, but the core contribution logic is small, targeted, and integrated exactly where needed. The problem it addresses is potentially significant, so the “low-effort” sounds too abstract.

net_processing: add opt-in RPC-aware P2P backpressure gate #34898