I think that if a reorg fails due to an error while disconnecting blocks, the node should shut down.
My thought is that if we are trying to switch to a more work chain and something goes wrong before we can even try to validate that more-work chain, then we have a problem that is not related to the chain we’re trying to switch to but instead related to our own hardware or software; shutting down and having the user deal with the issue is better than continuing on, when we know there’s a more work chain that we’re unable to try validating.
This has recently come up on testnet, where unpatched nodes are unable to reorg from the invalid chain containing the duplicate-input-in-a-transaction block, but rather than crash or shutdown they are continuing on the invalid chain, with messages like this in the debug log:
0ERROR: DisconnectTip(): DisconnectBlock 00000000210004840364b52bc5e455d888f164e4264a4fec06a514b67e9d5722 failed