Obfuscate database files #6613

issue laanwj openend this issue on September 1, 2015
  1. laanwj commented at 2:03 pm on September 1, 2015: member

    To avoid problems on Windows with Anti-Virus software, there needs to be an option to obfuscate the keys/values written to the database files, especially the UTXO database. See #4069 for discussion.

    It should be really simple, just enough to make it useless to put AV signatures in transactions. E.g. generate a random key on first start, store the key in the database, then XOR all subsequent data read/written to the database with that.

    Possibly this obfuscation could include the block files as well, although I’ve never heard of problems with those - the most likely explanation is that AV software doesn’t consider files above a certain size.

  2. laanwj added the label Windows on Sep 1, 2015
  3. laanwj added the label UTXO Db and Indexes on Sep 1, 2015
  4. jgarzik commented at 2:35 pm on September 1, 2015: contributor
    +1
  5. jamesob commented at 1:14 am on September 6, 2015: member

    I’d like to try tackling this one, but I’m a little confused about where to start. Is the basic approach here something like

    • On construction of CCoinsViewDB, check db to ensure that some value exists under key xor_obfuscate (or similar); if not, randomly generate one and write it to db
    • On CCoinsViewDB::BatchWrite(), use db.Read('xor_obfuscate') to XOR the value of the serialized Coin objects being written
    • On CCoinsViewDB::GetCoins(), use the xor_obfuscate value to reverse the XOR in BatchWrite()
    • Add a flag to disable obfuscation

    Does that sound like the right tack, or am I off?

    In terms of testing, it looks like we’d get some incidental coverage out of test/coins_test.cpp, but are there any supplemental testcases that should be written?

  6. luke-jr commented at 5:09 am on September 6, 2015: member
    Where are you handling un-obfuscated reads (eg, those written with older versions) after initialising the xor_obfuscate key?
  7. dcousens commented at 2:28 pm on September 6, 2015: contributor
    Concept ACK
  8. sipa commented at 2:38 pm on September 6, 2015: member

    jamesob: I would implement it at the LevelDB wrapper layer, rather than CCoinsViewDB, as that would be more generic and IMHO easier.

    The xor value could be cached inside the wrapper object - there is no need to read it over and over again. You’d read it when opening the database, and if it doesn’t exist, generate and write it.

    As Luke mentions, you also need to deal with non-obfuscated values. I suppose we can institute a rule that a key value cannot start with a null byte, and then store obfuscated values using (nullbyte + key) as key. LevelDB is efficient for large amounts of keys that start with the same sequence of bytes, so this would not hurt storage. You do need some mechanism for erasing the old unobfuscated key when the new one is written. This may have some performance impact…

    Also, would it be overkill to use AES-CTR instead?

  9. jamesob commented at 5:27 am on September 7, 2015: member

    @luke-jr @sipa good points, thanks. @sipa, I think given the discussion in #4069 AES might be more than we need right now, and definitely has different performance characteristics than XOR does. Maybe something like that will be called for down the road, but for now it seems like an XOR should do the job.

    It would be nice if we could somehow indicate obfuscation in the value itself, so that way we won’t have to do two reads (and a potential write) for correction, but I think that may be a little too tricky, and we should just keep this as simple as performance constraints will allow.

    Should we generalize the the obfuscation mechanism to the CLevelDBWrapper, maybe adding an obfuscate bool parameter to its constructor? or should we keep this fairly contained as a one-off solution for CCoinsViewDB? If anyone has any strong opinions, I’d like to hear them. Otherwise I’m going to move forward with a draft of the more general approach.

  10. laanwj commented at 9:45 am on September 7, 2015: member

    Where are you handling un-obfuscated reads (eg, those written with older versions) after initialising the xor_obfuscate key?

    I’d say: make a database either obfuscated or not obfuscated. Partially obfuscated databases are as incompatible with older versions as completely obfuscated ones (plus there is danger it would work partially and error out later), so there is no use for the middle road.

    A new database would be ‘born’ obfuscated. An old database would be kept unobfuscated unless the user gives a command line option to obfuscate, which would obfuscate the entire database in one go in an upgrade process. This loses backward compatibility.

    The two cases can be distinguished by the presence/non-presence of the obfuscation key, or e.g. add a MIN_VERSION like in the wallet. I think this makes implementation a lot easier and less error-prone.

    Also, would it be overkill to use AES-CTR instead?

    As this is meant as a fast, lightweight obfuscation mechanism, using AES seems like overkill. The goal is to avoid trivial signature matching. I don’t see what using a stronger crypto-system would add.

  11. jamesob commented at 6:06 pm on September 7, 2015: member
    @laanwj any examples of “upgrade processes” in the codebase that I can reference for this implementation?
  12. pstratem commented at 1:38 am on September 8, 2015: contributor

    @sipa “LevelDB is efficient for large amounts of keys that start with the same sequence of bytes, so this would not hurt storage.”

    Does leveldb merge prefixes or something?

  13. sipa commented at 1:39 am on September 8, 2015: member
    @pstratem Yes.
  14. pstratem commented at 1:45 am on September 8, 2015: contributor
    @sipa it looks like there are already some special keys in the chainstate database?
  15. pstratem commented at 1:54 am on September 8, 2015: contributor

    @laanwj There’s certainly a disadvantage to not supporting mixed obfuscation databases.

    Specifically if this is switched to the default then everybody is going to need to do the upgrade procedure, which is potentially slow and expensive.

  16. sipa commented at 1:56 am on September 8, 2015: member
    @pstratem What special keys?
  17. pstratem commented at 2:01 am on September 8, 2015: contributor

    @sipa ‘B’ is the best block ’l’ is the last block

    Everything else seems to be prefix || hash -> value

  18. sipa commented at 2:04 am on September 8, 2015: member
    All keys are serialize(something), whether that something is a character or something more complex.
  19. jgarzik commented at 2:09 am on September 8, 2015: contributor

    Transition thinking…

    • Mixed obfuscation creates a mixed situation that lingers forever.
    • Mixed obfuscation does not eliminate the problem for a user, even after the user upgrades.
    • A one-time upgrade is painful only if done in the foreground
  20. pstratem commented at 2:12 am on September 8, 2015: contributor
    @sipa right but the serialization of a single charater is … a single character
  21. pstratem commented at 2:14 am on September 8, 2015: contributor
    @jgarzik The upgrade being done in the background seems optimal.
  22. sipa commented at 2:24 am on September 8, 2015: member
    Doing this in the background is pretty hard, as you need to prevent writing entries that a concurrent modification may be updating.
  23. jgarzik commented at 2:28 am on September 8, 2015: contributor

    @sipa Then don’t do it that way. You also said -reindex was hard :)

    e.g. create a unified view of new + old database; old db under the hood becomes read-only, all updates go to new db.

    There are other approaches as well.

  24. sipa commented at 2:28 am on September 8, 2015: member
    @jgarzik I agree, a partial state does not fix the problem. A UTXO entry that is not touched is never overwritten, so creating a UTXO with a virus signature in it, and then never spending it would leave AV software detecting the file as problematic until reindex anyway.
  25. sipa commented at 2:36 am on September 8, 2015: member
    @jgarzik Didn’t say it was impossible, but it’s a non-trivial thing to do that requires careful testing (UTXO database errors can lead to consensus failure). I think for now I’m perfectly fine with an opt-in approach. We can add updating later.
  26. jamesob commented at 2:42 am on September 8, 2015: member
    So: opt-in, atomic obfuscation for now, more sophisticated obfuscation-by-default scheme later?
  27. jgarzik commented at 2:42 am on September 8, 2015: contributor

    IMO:

    • NAK mixed obfuscation - all or none is cleaner, less complex and better reflects the future
    • if all-or-none, obfuscation could easily become encryption with a swap of layers, without touching low-level db or upper-level bitcoin.
    • turn on by default for new db or -reindex
  28. sipa commented at 2:46 am on September 8, 2015: member
    Enabling by default on -reindex should be easy to do in this PR.
  29. sipa commented at 2:47 am on September 8, 2015: member
    Another issue: we need a clean failure when downgrading. I don’t think the chainstate has a version number, so that’s problematic.
  30. jgarzik commented at 2:50 am on September 8, 2015: contributor
    Presumably the failure mode is database-appears-totally-empty?
  31. sipa commented at 2:53 am on September 8, 2015: member
    Hmm, no, it would read random garbage and try to deserialize it.
  32. jamesob commented at 2:56 am on September 8, 2015: member
    What’s the usecase for a downgrade? When would that actually happen?
  33. sipa commented at 2:58 am on September 8, 2015: member
    People tend to try a new version but then find a problem, so need to revert to an earlier version of the client.
  34. gmaxwell commented at 3:20 am on September 8, 2015: contributor

    My thought was that if there is no obf key, we set one and set it to all zeros. Tada. Everyone is “obfuscated”. On reindex we should reset the key. Downgrade works fine, so long as you haven’t reindexed since. If you do downgrade after one you’ll fail right away, on the chainstate check … and it’ll tell you to reindex!

    Log the OBF key so that when troubleshooting we can see if you were still AV exposed.

    I do not think more complexity than this is justified. The users this helps are primarily ones where the software currently immediately fails. If you suffer a latent AV incident, the result will be to need a reindex in any case… which will helpfully fix you here too. Considering how much data corruption problems we currently have with LevelDB on windows already, this sounds pretty complete.

  35. pstratem commented at 3:24 am on September 8, 2015: contributor
    @gmaxwell Ha that’s a nice solution.
  36. jamesob commented at 3:24 am on September 8, 2015: member
    @gmaxwell very slick :). I’ll get to implementing.
  37. dcousens commented at 3:49 am on September 8, 2015: contributor
    @gmaxwell ha, :+1: good solution.
  38. laanwj closed this on Oct 6, 2015

  39. kazcw referenced this in commit 7a539dec7c on Apr 22, 2016
  40. DrahtBot locked this on Sep 8, 2021

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-01-21 21:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me