CVE-2012-2459, possible code and performance improvement #19598

MithrilMan commented at 11:40 AM on July 27, 2020: none

ok so, in my effort to create a fullnode from scratch both for learning purpose and need (not a topic for discussion, thanks), I was looking at the code that check Merkle Root to not be vulnerable to CVE-2012-2459.

That vulnerability basically allows you to create a block that has a valid merkle root but contains duplicate transactions, causing a node that receive such block before the correct one, to be stuck on a fork (because that block will be flagged as incorrect and having the same hash of a correct block will prevent the node to ask again for that block, causing a node to be stuck.

Anyway, this is an old CVE that was fixed back in 2012 and code changed during time. Now in bitcoin core the code is not that pretty and the method to compute merkle root are polluted from a boolean parameter that when read back reflects if the block has been mutated (malleated) or not.

The code is this

uint256 ComputeMerkleRoot(std::vector<uint256> hashes, bool* mutated) {
    bool mutation = false;
    while (hashes.size() > 1) {
        if (mutated) {
            for (size_t pos = 0; pos + 1 < hashes.size(); pos += 2) {
                if (hashes[pos] == hashes[pos + 1]) mutation = true;
            }
        }
        if (hashes.size() & 1) {
            hashes.push_back(hashes.back());
        }
        SHA256D64(hashes[0].begin(), hashes[0].begin(), hashes.size() / 2);
        hashes.resize(hashes.size() / 2);
    }
    if (mutated) *mutated = mutation;
    if (hashes.size() == 0) return uint256();
    return hashes[0];
}

now thinking about the problem, I think I found a better approach that is O(log n) and allows to have a better code too (without having to pass the mutated flag around)... if it works and the logic isn't flawled... so I'd like to know your thought about this. my paint skill can show you some of the logic:

A, B, C... etc... are transactions

the vertical line is what I call "safe point", basically all transactions before that safe point are guaranteed to be not duplicable (this is one of the assumption I do and one thing to check if it's correct)

then, based on how merkle root is computed and how the CVE uses that to do nasty things, you can see some example of malleated blocks

I created then a gist, containing a LINQPad code that can be run as it is and produce outputs to see if the logc is correct and can spot malleated blocks.

https://gist.github.com/MithrilMan/27985e4f5bcc3853e792aa39631b9647

this is an output example of that gist

The core logic relies into checking the last transaction against the previous, moving exponentially to the left at each iteration (see the picture with the vertical blue arrows showing which element it checks, or see the linqpad result where it's explicitly detailed)

I handle both the cases where the tx count is even or odd. To me it seems to work but would like to have some more eyes on that to see if the logic sounds correct or not. Note how for 9000 tx I have just to compare 14 transactions instead of thousands like current bitcoin code is doing. Apart from that in my case it has another pros that allow me to split the markle computation and markle cve check in two different places, without having to use a solution like that mutated boolean parameter

(actually I can't setup an environment to check if my intuition is correct and benchmark it so I can't provide numbers, but I'm more interested about the check logic itself)

MithrilMan added the label Feature on Jul 27, 2020

MithrilMan commented at 1:45 PM on July 27, 2020: none

for those who doesn't have linqpad I adapted the script to be run on dotnetfiddle here

https://dotnetfiddle.net/wT87D2

sipa commented at 6:55 AM on September 16, 2020: member

I think you're right that an O(log n) solution is possible.
I don't understand what you mean by "without having to use a solution like that mutated boolean parameter". That's just an interface question - a different algorithm would still need a way to report malleation back to the caller. It could do that in the same way, or a different way, but that shouldn't depend on the algorithm.
The performance of the malleability check in the existing code is absolutely negligible. It's one extra 32-byte comparison (~nanoseconds) added per inner hash (~100s of nanoseconds at the very least). Even if a better solution is available, I don't think we should touch this critical piece of code unless benchmarks show a significant improvement (which I doubt).
I can't follow your code, but I think this observation captures is: if the malleability trick is exploited, it must mean that at some level in the tree there is an even number of nodes where the real tree had an odd number - and the last entry was duplicated. This means that at every level we should only check the last two nodes for equality, rather than all pairs of two nodes.

This should do it (untested):

uint256 ComputeMerkleRoot(std::vector<uint256> hashes, bool* mutated) {
    bool mutation = false;
    while (hashes.size() > 1) {
        if (mutated && !mutation && hashes.size() % 2 == 0) {
            if (hashes[hashes.size() - 1] == hashes[hashes.size() - 2]) mutation = true;
        }
        if (hashes.size() & 1) {
            hashes.push_back(hashes.back());
        }
        SHA256D64(hashes[0].begin(), hashes[0].begin(), hashes.size() / 2);
        hashes.resize(hashes.size() / 2);
    }
    if (mutated) *mutated = mutation;
    if (hashes.size() == 0) return uint256();
    return hashes[0];
}

tryphe commented at 7:21 AM on September 16, 2020: contributor

Concept ACK, wrt sipa's fourth bullet point.

sipa commented at 4:19 PM on September 16, 2020: member

FWIW, my point is that we shouldn't change anything; I'm just showing something that I think could work in the 4th point.

MithrilMan commented at 12:57 PM on September 18, 2020: none

The performance of the malleability check in the existing code is absolutely negligible. It's one extra 32-byte comparison (~nanoseconds) added per inner hash (~100s of nanoseconds at the very least). Even if a better solution is available, I don't think we should touch this critical piece of code unless benchmarks show a significant improvement (which I doubt).

I can't follow your code, but I think this observation captures is: if the malleability trick is exploited, it must mean that at some level in the tree there is an even number of nodes where the real tree had an odd number - and the last entry was duplicated. This means that at every level we should only check the last two nodes for equality, rather than all pairs of two nodes.

No, implication is bigger. In my implementation I check only the lower level of the tree saving lot of checks and the checks isn't about having even/odd node numbers this is the copy&pasted code from my gist

// not a bitcoin double hash, not important for POC code
public string ComputeMerkleRoot(IList<string> hashes)
{
	if (hashes.Count == 0) return String.Empty;

	bool oddHashes = (hashes.Count & 1) == 1;
	//ensure to allocate one more item if hashes are odd.
	List<string> hashesList = new List<string>(oddHashes ? hashes.Count + 1 : hashes.Count);

	for (int i = 0; i < hashes.Count; i++)
	{
		hashesList.Add(hashes[i]);
	}

	// if odd, duplicate last element
	if (oddHashes)
	{
		hashesList.Add(hashes[hashes.Count - 1]);
	}

	var hashAlgo = HashAlgorithm.Create("SHA256");
	int elementsCount = hashesList.Count;
	while (elementsCount > 1)
	{
		int newHashPosition = 0;
		for (int pos = 0; pos + 1 < elementsCount; pos += 2)
		{
			string pairOfHashes = hashesList[pos] + hashesList[pos + 1];

			hashesList[newHashPosition++] = ASCIIEncoding.Unicode.GetString(hashAlgo.ComputeHash(ASCIIEncoding.Unicode.GetBytes(pairOfHashes)));
		}

		if (newHashPosition > 1 && (newHashPosition & 1) == 1)
		{
			hashesList[newHashPosition] = hashesList[newHashPosition - 1];
			newHashPosition++;
		}

		hashesList.RemoveRange(newHashPosition, elementsCount - newHashPosition);
		elementsCount = newHashPosition;
	}

	return BitConverter.ToString(ASCIIEncoding.Unicode.GetBytes(hashesList[0])).Replace("-", "");
}

I can comment/explain the code if it's not easy to follow. This code that computes merklee tree is implemented without the boolean flag that return if it's malleated or not (separation of concerns)

MithrilMan commented at 1:05 PM on September 18, 2020: none

and this is the code that can check if there is a malleability in place (thus no need to have ComputeMerkleRoot polluted with that boolean flag like I said in point 2

List<string> transactions = GetTransactions(transactionSet).Select(c => c.ToString()).ToList();
uint transactionsCount = ((uint)transactions.Count);
bool transactionCountIsOdd = (transactionsCount & 1) == 1;
uint higherBitPosition = (uint)(sizeof(uint) * 8 - (BitOperations.LeadingZeroCount(transactionsCount - 1) + 1));
uint safePoint = ((uint)Math.Pow(2, higherBitPosition));
uint itemsToConsider = transactionsCount - safePoint;

string transactionToFind = transactions.Last();
int transactionIndexToCompare = transactions.Count - 2;
int expStep = transactionCountIsOdd ? 2 : 1;

int numberOfChecks = 0;
while (expStep < itemsToConsider)
{
	numberOfChecks++;
	$"comparing {transactionToFind} with {transactions[transactionIndexToCompare]}".Dump();
	if (transactions[transactionIndexToCompare] == transactionToFind)
	{
		"".PadRight(120, '-').Dump($"FOUND MALLEABLE BLOCK: {transactionToFind} found both on last position and at position {transactionIndexToCompare}. Checks performed: {numberOfChecks})");
		return;
	}

	transactionIndexToCompare -= expStep;
	expStep <<= 1;
}

MithrilMan commented at 1:22 PM on September 18, 2020: none

let me try to summarize some concept behind this (sorry if I fail to use correct terminology here and there):

it's impossible to malleate a block duplicating less than the max pow of 2 transaction count (this is why I visualized the vertical line that I named "safe point". e.g. if you have a block that have 10 transactions, you know that it can be malleable just by duplicating the last 2 (because the first 8 belongs to a different node up in the hierarchy
In order to not have to check every level of the tree, I do a simple (smart?) thing: I choose a specific element of comparison that may indicate that a duplication took place and it's the last transaction hash. string transactionToFind = transactions.Last();

Once I have it, I start then a loop where, at each iteration, I check if that element is the same as the one at a different index int transactionIndexToCompare where transactionIndexToCompare starts from the second last transaction and move exponentially backward at each iteration (there is a caveat if the number of transaction is odd or even, handled by int expStep = transactionCountIsOdd ? 2 : 1; before start the loop

of course if I find these 2 elements that are equal it means a malleability took place

also note that I stop the loop when the index pass the so called "safe point", saving even more comparisons.

TLTR: my implementation is O(Log n) over a list of transaction hashes, doesn't need to be part of the merkle tree computation method and save lot of checks other than applying the separation of concerns

MarcoFalke commented at 1:19 PM on September 20, 2020: member

The performance of the malleability check in the existing code is absolutely negligible. It's one extra 32-byte comparison (~nanoseconds) added per inner hash (~100s of nanoseconds at the very least). Even if a better solution is available, I don't think we should touch this critical piece of code unless benchmarks show a significant improvement (which I doubt).

Tend to agree with this. Will close this for now until there is a proven (wall-clock) time performance improvement on IBD or some other real-world load.

MarcoFalke closed this on Sep 20, 2020

bitcoin locked this on Feb 15, 2022