SAV reported bitcoind infected w Silly.218, crashed bitcoind (likely false positive) #4069

issue ghost openend this issue on April 19, 2014
  1. ghost commented at 4:00 am on April 19, 2014: none

    OS: Win 7 x64 Bitcoin core: 0.9.1

    While running bitcoind.exe –reindex:

    c:\Program Files\Bitcoin\daemon>bitcoind.exe --reindex Error: System error: Database I/O error

    Symantec Anti Virus detected Silly.218 in chainstate\052878.sst directory (other names under which the malware is known: Virus.DOS.Dutch_Tiny.163.a (Kaspersky), Silly.218 (Symantec), Tiny-Family #3 (Avira)). Hash of the pattern detected: E272F4FF4AD99D1C48C4888990893FC6193DB1CB9849C69B1710069BBD047E0D

    As the file was automatically quarantined and deleted (I can’t change SAV default settings), that crashed bitcoind and corrupted the bitcoin DB.

    Internet search on this yielded no results. This looks like a false positive.

  2. unknown renamed this:
    SAV reported bitcoind to have Silly.218, crashed bitcoind (likely false positive)
    SAV reported bitcoind infected w Silly.218, crashed bitcoind (likely false positive)
    on Apr 19, 2014
  3. gmaxwell commented at 6:01 am on April 19, 2014: contributor
    Please report the misbehavior to your anti-virus vendor.
  4. laanwj commented at 7:21 am on April 19, 2014: member

    Ugh. I really hate this.

    The only workaround (without the AV vendors cleaning up their act) would be to start obfuscating the database and block files, but that looks even more suspicious.

  5. laanwj added the label Windows on Apr 19, 2014
  6. gmaxwell commented at 7:28 am on April 19, 2014: contributor

    The alternative seems to be software randomly corrupting the installs, not much of a tradeoff. We could have a setting at least to disable the obfuscation (we’ll need to support obfuscation off for ’legacy files’ in any case). At least then they’d have to be specifically targeting Bitcoin.

    If it makes you feel better, obfuscating the block files would address some fringe risks. E.g. say someone discovers some disk sector value that that a popular brand of disk always stores incorrectly (don’t laugh, I’ve had silicon flaws in switches result in packets that they cannot forward, due to the bit pattern making it lose sync with internal busses)… and exploit it to fork the network.

  7. KyrosKrane commented at 11:20 am on April 19, 2014: none
    I got hit with this same issue just a bit ago. The file it hit in my directory was a different file, but it was also a .sst file in the chainstate directory. I ended up telling Norton Antivirus to ignore my bitcoin data directory entirely. Is the right way to recover from this now to reindex? Or what’s the right way to restore the .sst files?
  8. laanwj commented at 11:27 am on April 19, 2014: member

    Yes, reindex.

    We should at least add a note to the next release notes that people must add the bitcoin data directory to the ignored directories of their AV software.

  9. leofidus commented at 1:55 pm on April 19, 2014: none

    that people must add the bitcoin data directory to the ignored directories of their AV software

    Couldn’t that lead to malware installing itself in the data directory? Bitcoin (indirectly) helping malware is certainly a headline we want to avoid.

  10. KyrosKrane commented at 4:22 pm on April 19, 2014: none
    Ideally, I guess you’d want to exclude *.sst files in the chainstate directory only (as that’s what seems to be triggering this alert). Norton, stupidly, can’t do that. It can exclude all .sst files, everywhere on your PC, OR it can exclude a particular folder/directory. It can’t do both.
  11. ghost commented at 9:28 pm on April 19, 2014: none
    Well that might be task the Bitcoin Foundation could get involved with administratively IMO, so contact AV vendors and discuss whitelisting the database files. In any case, they are not executable so I cant see why they would be scanning and flagging these files.
  12. laanwj commented at 4:15 pm on April 20, 2014: member

    @leofidus Indeed, they could. Making the virus scanner ignore .sst files sounds like a better suggestion. It’s a fairly rare extension and is never used for executables.

    That would be enough. Somehow the blk*.dat and rev*.dat files currently avoid ‘detection’ as they are padded to large sizes.

  13. c64e5e9e-08ce-4f7d-be39-92cf89188b45 commented at 11:43 pm on April 21, 2014: none

    Something arguably needs to be done, the 30+ virus vendors in common use won’t all whitelist these files. @gmaxwell If you go ahead with obfuscating the blocks in some way, be sure to be in contact with the Armory folks as this will completely break their platform on Windows. @laanwj Virus scanners ignore large files while scanning. Padding out the .sst to 32MB would maintain compatibility with current parsers while escaping the virus scanners.

    The files not being executable isn’t actually anything to do with it, malware authors get around this on Windows by creating file links that attempt to execute any file. This was used on Reddit recently, with a “.txt” file containing a “leaked Bitstamp passwords” and a link to it containing a command that executed the “.txt” file. It’s a fairly unpleasant trick that gets even seasoned users, so I’m not sure a global whitelist of those files would be advisable. Even apparently safe files can be dangerous on Windows.

    In reality virus scanners are the dumbest of the dumb. Even just XORing the files would do. There’s no comfortable way of doing it though, either every platform obfuscates the blocks with the next version at the expense of massive IO for the whole network, or Windows users have incompatible block files with Linux and OSX. In my opinion just padding out the sst files is the least destructive change.

  14. c64e5e9e-08ce-4f7d-be39-92cf89188b45 commented at 3:32 am on April 22, 2014: none
    Just for fun, there’s about 8000 reachable nodes on the network at the time of writing. Assuming that a large portion of the network is unreachable (NAT, filtering, intermittent, just not listening), it’s probably safe to assume there’s probably at least 50,000 nodes with the complete blockchain. If we XOR just the chainstate, we cause 50000_430 MB of disk writes, 50000_430_2 MB read and write combined, somewhere in the region of 43TB. If we XOR the entire blockchain on disk we cause 50000_21000*2 MB of IO, around 1.95PB of RW across the wider Bitcoin network. Incredible.
  15. gmaxwell commented at 3:34 am on April 22, 2014: contributor
    My assumption is that we’d store the chainstate key as a record in chainstate, and if there is no key or its size zero we do nothing. On newly created or reindexed chainstates we’d set it to a random value… so we wouldn’t rewrite it for anyone. If someone has problems with their AV they’d be told to reindex and doing so would set a key.
  16. c64e5e9e-08ce-4f7d-be39-92cf89188b45 commented at 4:15 am on April 22, 2014: none
    Some sort of sane error when the inevitable “oh crap some of my databases are gone” moment happens wouldn’t go astray either. Currently it’s not clear what has even happened, just that Bitcoin Core broke and the virus protection went nuts deleting “viruses”.
  17. laanwj commented at 7:00 am on April 22, 2014: member
    @c64e5e9e-08ce-4f7d-be39-92cf89188b45 I don’t believe padding is a sustainable solution. As time moves on and storage media become faster and larger, viruses may use this trick too and AV vendors will likely increase the maximum file size. That’s a cat-and-mouse game we don’t want to play. So that leaves XORing. @gmaxwell Good point on only doing it for new databases/reindexes. That answers the eternal ‘what about legacy support’ question.
  18. leo-bogert commented at 9:45 pm on April 23, 2014: none

    Please think about what a virus is: Executable code. There are only 3 ways in which data in a file can be executable on any operating system:

    • The file contains a program header. Notice that the file extension is not a program header. The header is contained inside of the file. On Windows its typically a PE-Header (http://en.wikipedia.org/wiki/Portable_Executable). The trick for executing .TXT files likely only works if they contain such a header. [But then it is not even an exploit, if I recall it correctly you are for example free to use the regular Windows CreateProcess() API to execute files without EXE extension.]. So as our SST files do not contain a program header, this case does not match.
    • The file does not contain a program header but the operating system still executes its content automatically due to exploitable bugs. This has been used in viruses for sure. But it is again not our problem, the bug in the OS is to blame and to be fixed.
    • The file does not contain a program header but there is a different file, commonly called a “loader”, on the computer which does contain a program header, thereby gets executed, and contains code to load the non-executable SST file into its own address space and execute it. This is the only case which in theory could match here. But there is no reason for a loader-program on a computer to randomly pick up our SST files and load the very particular area which contains the virus for execution. This would have to be done intentionally, you don’t just execute random file content. So the loader is the malicious root of execution. And as long as the virus scanner detects and deletes the loader, the non-executable SST file is completely harmless. There is no reason for the virus scanner to care about the non-executable file. So even in this only matching case, there still would be a large indication that the virus scanner is just wrong here, it is a false positive.

    Conclusion: You are wasting your time discussing this. Contact Symantec to fix the false positive. They are obliged to do so: After all a virus scanner divides the world into “good” and “bad” software, which is a really judgmental thing to do, and someone who is that judgmental has a moral obligation to only judge against something which is really malicious, which Bitcoin isn’t in any way. It is their job to fix this.

  19. c64e5e9e-08ce-4f7d-be39-92cf89188b45 commented at 11:10 pm on April 23, 2014: none
    @leo-bogert There’s more than Symantec giving false positives. At my last count there’s at least 50 virus vendors in common use (looking at virustotal.com’s scan database) most of which will probably have these incredibly old virus definitions in them somewhere. I doubt even the smallest portion of them will bother to respond.
  20. gmaxwell commented at 11:29 pm on April 23, 2014: contributor

    AV companies have also show absolutely no hesitance in explicitly and intentionally marking as malware general bitcoin miner software that some malware has simply taken along for the ride.

    When you’re talking about software written by people who think its acceptable to corrupt a database because they didn’t like a particular 16 byte sequence nested inside of it you can’t make too many assumptions. (I can’t wait until someone legally changes their name to one of these sequences and we find out that all sorts of government databases didn’t have functioning backups… :) ) Of course, people should report the false positives in any case, but I don’t think false positive reporting is a long term solution.

  21. c64e5e9e-08ce-4f7d-be39-92cf89188b45 commented at 8:07 am on April 24, 2014: none
    #4086 was a dupe also occurring with Microsoft Security Essentials antivirus package.
  22. pygy commented at 8:45 am on May 16, 2014: none

    But why XOR the whole block chain for everyone?

    Rather than just storing the random XOR key, you could store a (block, key) tuple in a custom file.

    The first time the new client is run, it would start XORing at the block that is the source of the current issue.

    Fresh installs would XOR the whole chain.

  23. ghost commented at 10:09 am on May 16, 2014: none
    i dont imagine XORing will help since a joker could engineer some data to look wrong when XOR’d too. Surely this is something the up to now useless Bitcoin Foundation can start to deal with and actually lobby these companies. They have the funds to employ staff to do just that.
  24. leofidus commented at 10:34 am on May 16, 2014: none
    If the value you XOR against is chosen randomly and stored e.g. at the beginning of the block file, it would work.
  25. laanwj commented at 12:38 pm on May 16, 2014: member
    Yes, the xor key would need to be different for everyone, otherwise it can be trivially worked around.
  26. whitslack commented at 3:31 pm on May 16, 2014: contributor
    Whatever you guys decide to do, would you please put in a build option to disable your workaround. I have no problems with virus scanners, as I don’t run one (I use Linux), and I value being able to scan my block chain data files with external tools. I don’t want the data obfuscated.
  27. wyager commented at 8:34 pm on May 16, 2014: contributor

    In fact, if this must be implemented, please keep this disabled by default. Either only build this into windows builds, or leave it as a checkbox (if checked by default, only on windows machines).

    We should not compromise the usefulness of our own software to facilitate the awfulness of some other software.

  28. gmaxwell commented at 8:40 pm on May 16, 2014: contributor
    @wyager Doing this in no way compromises the usefulness of the software. Its generally protective for users for the same reason freenet does the same thing, and this kind of encoding was previously proposed back in 2011— long before people were complaining about crazy AV on windows.
  29. wyager commented at 8:43 pm on May 16, 2014: contributor

    It compromises the usefulness of the software for the same reason @whitslack mentioned. If the chain files are XORed, it disrupts the ability of external analysis tools to work on the blockchain.

    Can you explain what you mean re. freenet? I was under the impression that freenet files were not easily decryptable by the user, which gave them plausible deniability. That isn’t what we’re trying to do here.

  30. gmaxwell commented at 8:47 pm on May 16, 2014: contributor
    What needs to be adjusted here is not the blockchain, it’s already ignored by AV tools, but the chainstate indexes which already cannot be read by external tools. Any such tools could also decode the data. In any case, increasing the complexity for ‘accidental’ decodes can reduce some problems in jurisdictions with strict liability for possessing some kinds of data. (though generally our anti-spam/anti-dos rules are also protective there, people would like to relax them in the future…)
  31. wyager commented at 9:01 pm on May 16, 2014: contributor
    Fair enough on the technical side, but I’m highly skeptical that XORing illicit data with a known key provides legal protection in any jurisdiction, especially when that data is explicitly intended to be decoded on the same machine later on.
  32. whitslack commented at 1:49 am on May 17, 2014: contributor

    In any case, increasing the complexity for ‘accidental’ decodes can reduce some problems in jurisdictions with strict liability for possessing some kinds of data.

    In order to have any chance of changing the legal status of a full node, you would have to XOR the block chain data files too. And even XOR’ing data with a one-time pad is not sufficient to avoid legal liability for distributing prohibited content. Ask yourself: if you have a web server serving up the latest Pixar movie XOR’d with a pseudorandom bitstream, the seed for which is known, could you be convicted of piracy? Answer: definitely yes.

  33. luke-jr commented at 4:30 am on May 17, 2014: member
    @whitslack You can only be convicted of piracy if the prosecutor can convince a jury that you attacked and robbed a ship at sea. In any case, I don’t think there’s any jurisdiction with strict liability for possessing data copied illegally - most don’t even consider it a crime unless there was a profit - so that’s a bit off-topic for this issue. Bitcoin nodes never store/distribute data other than financial transactions anyway. To claim transactions generated deterministically from illegal data carry over the colour is already a pretty big stretch and probably wouldn’t hold up in a court. The problem is that often automated tools are used to detect viruses and strict-liability-for-possession-data which can produce false positives. XORing the data should stop the false positives.
  34. whitslack commented at 4:48 am on May 17, 2014: contributor
    My point was that XOR’ing the data won’t change the color of the bits. If having a BitTorrent magnet URI to child pornography embedded in the block chain data files on your hard drive puts you in violation of some law somewhere, then merely obfuscating the link won’t absolve you of legal liability.
  35. wyager commented at 7:01 am on May 17, 2014: contributor

    I think @whitslack is absolutely correct here. Encrypting data with a known key will probably never help in any conceivable legal situation.

    The important thing here is the technical merit of this kind of obfuscation, not any negligible wetware side effects it might have.

  36. whitslack commented at 7:14 am on May 17, 2014: contributor

    The important thing here is the technical merit of this kind of obfuscation

    I agree, and I see no technical merit on non-Windows systems. I only see overhead. That’s why I’m asking that any act of kowtowing to the stupidity of the anti-virus vendors be made optional. I’d really prefer not to have to maintain my own fork of Bitcoin Core just so I can keep its CPU usage down.

    Addendum: I’ll have no complaint if it’s a build-time option rather than a runtime option or if it’s not possible to switch between obfuscated and unobfuscated without a reindex.

  37. luke-jr commented at 7:27 am on May 17, 2014: member

    Without obfuscation: Law enforcement runs a scan on your Linux desktop and is told (by automated software) that you have child pornography; they arrest you, and you get to pay $$$ for a decent lawyer to make the case that it was a false positive; you’re fired from your job and the mass media slanders your good name; charges are eventually dropped months later.

    With obfuscation: Law enforcement runs a scan on your Linux desktop and doesn’t find anything. You go about your life.

  38. jgarzik commented at 7:32 am on May 17, 2014: contributor

    There is privacy merit.

    Additionally, very similar to what @gmaxwell points out, it was discussed in the Linux community an intelligence capability to have a hard drive trigger different behavior in the drive firmware, if it sees some magic sectors stored to disk.

    Pretty easy to trigger such a mechanism via blockchain, targeting all bitcoin full node users.

  39. whitslack commented at 7:36 am on May 17, 2014: contributor

    Keep in mind that obfuscation can actually cause new false positives despite ameliorating others. A decade or so ago, I released a piece of freeware for Windows that was detected as containing a virus. It was a false positive triggered by my use of UPX to compress the application code in the EXE. After decompression (at runtime), there was no malicious code, but the compressed payload just so happened to match a viral pattern.

    Obfuscating the SST files will not solve the problem of retarded virus scanners blindly scanning all files for byte patterns that match known malware. What it will do, however, is spread the damage more evenly over the Bitcoin-using population, rather than having every Windows-resident copy of the block chain damaged simultaneously.

  40. jgarzik commented at 7:46 am on May 17, 2014: contributor
    1. Obfuscation clearly results in lower chance of plaintext occurrence, versus compressed files (or non-obfuscation).
    2. Obfuscation clearly results in lower chance of compressed virus signatures randomly appearing, versus compressed data. Comparisons with compressed data storage are not valid.
    3. I see a lot of fact-free hand-waving over costs.
  41. karelbilek commented at 8:19 am on May 17, 2014: contributor

    I am not following the whole conversation, but isn’t this an issue of the AV vendors instead of Bitcoin issue?

    (Sorry if it’s a stupid or an offhand remark. But it’s false positive on the AV vendors side, nothing wrong is at Bitcoin side.)

  42. rebroad commented at 10:18 am on May 17, 2014: contributor

    The answer is surely to ensure that the files that can contain the virus signatures are put into a directory with no other files, and the installation instructions help guide people into adding that directory to the exclusion directory in their antivirus software. It might also be helpful to have a user-interactive function to write the EICAR virus test file to that directory to ensure that it has been configured correctly before downloading the blockchain.

    Ideally antivirus software should exclude this directory anyway as to be scanning the blockchain like this is an unnecessary waste of system resources.

  43. whitslack commented at 10:22 am on May 17, 2014: contributor
    @rebroad I agree with you, but I can see the other side of the argument, that stupid users will screw up the permissions on the directory, and other processes will download malware and stash it there, knowing that Bitcoin’s data directory is a “safe” place for Bad Things™. Then the headlines will read, “new worm exploits Bitcoin vulnerability, causes $2B in damages.”
  44. leofidus commented at 11:03 am on May 17, 2014: none
    I have not implemented it, so I can’t say for sure, but I doubt that a simple XOR would produce any notable overhead when compared to the time it takes to read the file from disk or write it to disk.
  45. whitslack commented at 11:08 am on May 17, 2014: contributor
    @leofidus The chainstate database could be cached entirely in RAM. I would bet that XOR’ing every byte in every read and every write will contribute non-trivial overhead. I’ll shut up if the actual numbers prove otherwise, but I hate the idea of intentionally adding overhead to software where I perceive no value in exchange.
  46. leofidus commented at 11:15 am on May 17, 2014: none

    In a software where (within limitations) anyone can store data on my disk, I perceive anything that obfuscates this data at least from non-specialised, automated tools as a benefit, as @luke-jr already pointed out.

    To those more knowledeble in this area of the code: how often are the chainstate files read and written?

  47. wyager commented at 8:42 pm on May 17, 2014: contributor

    @luke-jr @leofidus The idea that obfuscating blockchain data would somehow foil the police is ridiculous. They aren’t that stupid. A simple XOR won’t hide the fact that you have a copy of the blockchain on your computer. If the blockchain contained any illegal data, it is likely that any automated tools would search for the existence of the blockchain in general, regardless of any mediocre obfuscation applied to it.

    We should drop this false premise of legal protection via obfuscation. If Bitcoin is deemed illegal for any reason, applying half-assed obfuscatory protection will not help the end user. Let’s leave it up to the end user to choose how to protect their data from legal inquisition, if necessary.

  48. luke-jr commented at 9:00 pm on May 17, 2014: member
    @wyager Nobody here is concerned about Bitcoin or the blockchain being made illegal. As you say, obfuscation won’t “fix” that, and isn’t intended to. What it intends to solve is the problem of false positives in software looking for other things. The police aren’t stupid, but they presumably don’t want to waste their time arresting someone for CP only to find out later they were merely running Bitcoin.
  49. wyager commented at 10:14 pm on May 17, 2014: contributor

    @luke-jr That’s marginally more sensible, but let’s get real, who is going to use magic numbers in this way? The automated systems that police use to detect illicit files don’t use magic numbers; they hash entire files and then compare them to a database of known hashes.

    It’s very unlikely that someone could embed illegal content in the blockchain in such a way that this would cause a false positive.

    I think this should be at the very bottom of our list of concerns.

  50. luke-jr commented at 10:21 pm on May 17, 2014: member
    @wyager Sorry, but you’re wrong. The Namecoin blockchain is already confirmed to flag as CP by police scanning software.
  51. wyager commented at 11:00 pm on May 17, 2014: contributor

    Source?

    Also, the situation with namecoin is very different, because it’s easy to put contiguous blocks of arbitrary data. That is more or less impossible with Bitcoin right now.

  52. wtogami commented at 0:54 am on May 18, 2014: contributor

    Do virus scanners scan all files or only particular file extensions? Will the change from .sst to .ldb alone mitigate this issue?

    The purpose that Google changed it from .sst to .ldb was apparently because of how Windows treats .sst.

    https://groups.google.com/forum/#!topic/leveldb/u9izbG-pDis “We are going to change the extension of LevelDB’s immutable table files. The extension is currently .sst. It will soon be .ldb. This is to prevent Windows System Restore from treating the table files as system files[1] and backing them up independent of the manifest.”

    .ldb is a thing on Windows apparently only for Access multi-user lock files.

  53. laanwj commented at 8:55 am on May 18, 2014: member

    Police scanning software scans for signatures of known files, comparable to virus scanners. It can also scan the raw disk data, not per-file - in the case of deleted files. There can be plugins to look into specific obfuscated data, but obviously no one is going to write one to decode the block chain as that is public data.

    Avoiding false positives there would involve XORing both the block chain files as well as indexes.

    The virus scanners on the other hand only look in the index files - for now. I don’t think ldb versus sst matters as neither is inherently executable.

    • The foremost priority is obfuscating those files, not the block chain ones (that can be done later and independently if needed, and should indeed be optional).
    • This will not affect any external software.
    • It is only a minimal data so those few extra instructions won’t overheat your CPU.

    I don’t think further bike-shedding is required here, except if it is about the technical implementation.

  54. whitslack commented at 10:00 am on May 18, 2014: contributor

    The virus scanners on the other hand only look in the index files - for now. I don’t think ldb versus sst matters as neither is inherently executable.

    Why do virus scanners only look at the index files and not the block chain data files? If there’s a virus pattern that appears in the index, isn’t it likely also to appear in the block chain? If virus scanners are scanning the index, then clearly they don’t care about executable versus non-executable, so wouldn’t they also be scanning the block chain data files? If you’re going to push forward with obfuscating, then you would have to do everything.

  55. gmaxwell commented at 10:04 am on May 18, 2014: contributor
    @whitslack They ignore files over 32MBytes, apparently universally. Blocks are stored in preallocated 128MByte files. (We’ve intentionally had triggers sequences in the blocks themselves for a couple years in testnet specifically to catch this kind of stuff early, but its only the recent introduction of short trigger strings in chainstate which has triggered issues— thats been the source of 100% of reports so far.)
  56. laanwj commented at 10:11 am on May 18, 2014: member
    @whitslack If you had actually read the discussion in the issue, you would have already known that, as it came up at the beginning already.
  57. whitslack commented at 10:15 am on May 18, 2014: contributor

    @laanwj Sorry. I have been reading this whole thread since the beginning, but I’ve been reading it incrementally as the comments roll in, and I didn’t remember the comment about the 32-MiB threshold. It was only mentioned once, and I didn’t commit it to memory as I didn’t understand its significance until now.

    So why are we still considering obfuscation? Just increase the LevelDB shard file size to something larger than 32 MiB and call it a day.

  58. gmaxwell commented at 10:20 am on May 18, 2014: contributor
    I expect that would be very harmful leveldb performance if it would even let us do that. What leveldb is storing is not “shards”. Leveldb looks like a log which is constantly recompacted in an octave structure. The recompacting has overhead (but in exchange the operation is locally append only).
  59. laanwj commented at 10:21 am on May 18, 2014: member

    AFAIK, that won’t work as LevelDB doesn’t pad the files that aren’t ‘full’ yet, so the some files could still flag the virus scanner.

    It would be an fragile solution either way. Obfuscation is still the way to go eventually, but for the index files it has much higher priority than the block chain files as it actually causes problems at this point (and they need different implementations).

  60. gmaxwell commented at 10:23 am on May 18, 2014: contributor
    I at least considered this desirable long before anyone was reporting any problems, just not a priority.
  61. whitslack commented at 10:24 am on May 18, 2014: contributor
    LevelDB manages a set of small-ish data files, right? They have a .sst file extension in the version of LevelDB used by Bitcoin Core. Insert one call to SetEndOfFile right after these files are created. Heck, you might not even have to write bytes into the space if the virus scanners are only looking at apparent file sizes. (I’m assuming NTFS supports sparse files, but it’s been a long time since I did any Windows programming.)
  62. sipa commented at 5:38 pm on May 18, 2014: member

    I don’t feel like changing the default sstable file size for LevelDB (it requires changing the compile-time flag, which would mean diverging from upstream code, which we prefer to avoid).

    I don’t see the problem with obfuscation. The data in there is already in a very custom format that AFAIK is not read by any external tools (which would require bitcoin core to not be running for starters). Xoring (or even applying applying a simple non-cryptographic stream cipher) would have immeasurable CPU overhead compared to the signature checking or other operations we already do.

  63. jgarzik commented at 6:33 am on May 19, 2014: contributor

    I don’t see any problem with obfuscation, either. XOR’ing data likely already to be in a cache, even in the case of the larger block files, is likely to be cheap. It seems likely to be lost in the noise, even on an underpowered VPS. It is nothing compared to the disk traffic going on in the same code region.

    No, the police aren’t stupid. Yes, this obfuscation will be obvious to anyone – and easily transformable by software that reads bitcoin data.

    Nevertheless, obfuscation may help in a non-technical sense. It potentially provides plausible deniability, because anyone who wishes to read the data, en masse or in small bites, must take a proactive, conscious, intent-ful action to un-obfuscate the data in question. Standard caveat: IANAL

    Obfuscation seems useful general protection against stupid software. Stupid software exists on Linux, just as it does on Windows. A Windows virus scanner is simply an illustrative example of a larger pattern.

    Never forget that we are talking about data that is generated by random, possibly malicious remote parties. Malicious attacks may include both technical and non-technical (reputational) attacks on bitcoin.

  64. Diapolo commented at 7:06 am on May 19, 2014: none
    While we are at this, is there any built-in compression available with LevelDB, which could help reduce the blockchain size on disk ;)?
  65. gmaxwell commented at 7:08 am on May 19, 2014: contributor
    diapolo, no, it increases the size. Sipa’s encoding is severely over engineered and is already a nearly optimal domain specific compressor. :)
  66. Diapolo commented at 7:11 am on May 19, 2014: none
    Good to know :), thanks.
  67. jgarzik commented at 7:15 am on May 19, 2014: contributor
    Obfuscation raises the bar. It prevents tripping dumb generic data scanners. You must write a bitcoin-specific scanner. Usefully, that is not a big hurdle for people already in the bitcoin space.
  68. laanwj commented at 8:06 am on May 19, 2014: member

    @Diapolo LevelDB has built-in support for ‘snappy’ compression. That will obfuscate a bit - though provides no guarantee - I’m not certain about the specific algorithm but in my experience many compressed files have fragments of recognizable data in them. And it is possible for someone that knows the compression algorithm to aim for a certain compressed data pattern. Note that the LevelDB files aren’t that large. Bulk of the store data are the block chain files themselves.

    As for compressing the block chain, the 15-20% that can be gained in some cases is not going to make the difference for anyone in being able to run a full node or not. And (de)compression overhead is not trivial unlike XORing.

  69. luke-jr commented at 8:07 am on May 19, 2014: member
    Don’t we remove snappy right now?
  70. laanwj commented at 8:10 am on May 19, 2014: member
    @luke-jr Yes we compile without it. That saves some memory as we don’t use it.
  71. gmaxwell commented at 8:11 am on May 19, 2014: contributor
    My recollection is that snappy makes the chainstate larger. The reason you get compression on the blocks with a stream compressor is because the compression spans many transactions and blocks and you get savings primarily from pubkey reuse, the blocks also are raw as compared to chainstate’s hyper optimized serialization. (I also expect that the figures are bloated up somewhat due to a single now mostly defunct notorious pubkey reuser. :) )
  72. jafaber commented at 12:47 pm on May 19, 2014: none
    Just wondering what common practice is for XORring. Obviously you don’t want to XOR with 0. Would you just pick a random non zero mask? Or would it be useful to require at least a few 1 bits in there? Or maybe at least one 1 bit in each byte of the mask? (Also considering the NSA attack of a magic number triggering the harddisk into doing something nasty.)
  73. leofidus commented at 3:44 pm on May 19, 2014: none

    If you want full protection against advanced pattern recognitions (which could be done by a harddisk firmware), you would probably want to XOR against the bytestream for a fast pseudorandom number generator and save the seed of that generator.

    But I suspect anything beyond XORing against a random non-zero 64-bit mask is overengineering this. Requiring at least one 1 bit per byte is a nice bonus though to protect against random short pattern matches.

  74. sipa commented at 3:48 pm on May 19, 2014: member
    What do we protect against? Accidental matches with overzealous byte sequence matchers, or intentional attempts to insert data that matches?
  75. laanwj commented at 4:08 pm on May 19, 2014: member
    @sipa Both, really; people have put virus signatures in the block chain on purpose, and will likely see it as a challenge.
  76. jgarzik commented at 4:12 pm on May 19, 2014: contributor
    Both, by necessity
  77. dscotese commented at 4:45 pm on June 4, 2014: contributor

    Sorry to dredge up this old thread, but I think the user should be allowed to tell the bitcoin client that one of its files needs to be retrieved from the network, perhaps in a from-random-peers way. So after the AV software breaks a file because it thought it was bad (which it may have been), the user can get that file again. This means adding a halt-and-repair-file function to the bitcoin client. The function would request the path to the bad file, request a new version from peers, and then use some blockchain data to validate the new file. It could then compare the new file with the old to tell the user whether the AV was a false positive (no difference) or the file had actually been altered.

    I haven’t (yet) thought about this long enough to see what problems it might cause, but I can’t see any off the top of my head.

  78. leofidus commented at 7:37 pm on June 4, 2014: none

    @dscotese You can already restore the file by running the Bitcoin Core with the -rescan option (you might have to delete the bad file first). I don’t think anything more specific is required, and there are more important features to add.

    Is somebody working on obfuscating the files?

  79. dscotese commented at 9:03 pm on June 4, 2014: contributor
    @leofidus I see what you mean. I was under the mistaken impression that what my client was doing (right now, because I got that silly.218 problem) was retrieving the blockchain again. Actually, it’s just re-indexing, but it is taking a few hours. If there were a protocol to get a chainstate file from someone else, then that would take a couple minutes, and then the client could do the same thing it does at every startup (validate all the data). That would take a few minutes instead of a few hours. If –rescan would accomplish that, then why wasn’t it mentioned before? Laanwj suggested the –reindex was the best way to fix it once the .sst file was deleted.
  80. sipa commented at 9:25 pm on June 4, 2014: member

    -rescan is an option to scan the blockchain for missing wallet transactions. It is only needed in case there’s (only) something wrong with the wallet, and since many versions ago almost always automatically detected. Unless people are manually editing their wallet files, -rescan is not a useful suggestions to give.

    -reindex rebuilds the verification database, by processing the block files you have locally already as if they were received from network, redoing the validation.

    Yes, you can copy that database from someone (just copy the chainstate/ directory, while the client is not running), but be aware that you must absolutely trust the person who is giving it to you (ideally: it is yourself), as they may make your client believe anything (up to and including making you believe you received payments that conjure coins out of thin air).

  81. whitslack commented at 9:21 pm on June 5, 2014: contributor

    @dscotese: You can’t just download a missing/corrupted chainstate file from someone else. The only units of data that are guaranteed to be identical across peers are the Bitcoin blocks themselves. The block chain data files, the block index files, the transaction index files, and the chainstate database files are all going to vary from one machine to the next, just due to the variation in the messages they’ve received from the network, their uptime, their particular schedule of software upgrades, etc. Your indices and chainstate files are not interchangeable with those of another peer.

    Also, what @sipa said about being able to copy the entire database from another node is only true if the other node is the same architecture. You can’t copy the database from an x86 to an ARM, for instance.

  82. rebroad commented at 12:46 pm on June 10, 2014: contributor

    In the same way that bitcoin leaves things to third party tools to rotate its log files, it can be left to third party tools (e.g. truecrypt) to encrypt the blockchain files.

    Web caches also will contain CP - how do they deal with this?

  83. sipa commented at 12:51 pm on June 10, 2014: member

    Truecrypt will hide the data from someone who gets physical offline access to your harddrive.

    It does not hide anything from running processes (how would you see the blockfiles in your OS, if it can’t access them)?

  84. laanwj commented at 12:51 pm on June 10, 2014: member
    @rebroad How does that solve the issue here? If you use third party tools to encrypt the files, the virus scanner will still pick it up (at least if it is some kind of mounted fs). And requiring every user with a virus scanner to run a third-party tool is ridiculous anyway.
  85. ghost commented at 10:43 am on June 18, 2014: none
    @rebroad, Web caches they simply exclude from scanning
  86. rebroad commented at 4:01 pm on July 5, 2014: contributor
    Ok, so the answer is to mention during setup that virus scanners need to exclude the directory containing the blockchain - ideally giving assistance to those less technically savy.
  87. ghost commented at 6:59 pm on July 5, 2014: none
    Personally I don’t think that the answer is to document to exclude the directory (I would have offered it at the beginning if I did) but if others I agree that’s fine by me - let’s call it an answer and the issue can be closed. It’s more of a workaround than an answer and in many environments it’s simply not possible to exclude (ignore) anything at will, for example my system is a corporate notebook on which I ocassionally run bitcoind (when I’m at home) and I’m not allowed to make any modifications to AV client settings.
  88. voisine commented at 7:17 pm on July 5, 2014: none

    Hopefully including virus signatures in the blockchain wasn’t part of some nefarious plan to make it easier to hide bitcoin stealing malware on windows desktops.

    Aaron Voisine breadwallet.com

    On Sat, Jul 5, 2014 at 11:59 AM, rippler notifications@github.com wrote:

    Personally I don’t think that the answer is to document to exclude the directory (I would have offered it at the beginning if I did) but if others I agree that’s fine by me - let’s call it an answer and the issue can be closed. It’s more of a workaround than an answer and in many environments it’s simply not possible to exclude (ignore) anything at will, for example my system is a corporate notebook on which I ocassionally run bitcoind (when I’m at home) and I’m not allowed to make any modifications to AV client settings.

    — Reply to this email directly or view it on GitHub #4069 (comment).

  89. dexX7 commented at 11:16 am on September 6, 2015: contributor

    @wyager: In fact, if this must be implemented, please keep this disabled by default. Either only build this into windows builds, or leave it as a checkbox (if checked by default, only on windows machines).

    Just as slightly related side note: in July I received a report that ClamXav on OS X appears to cause DB curruptions as well. Based on the logs, the synchronization failed after block 294007 due to an error in the block header. I haven’t verified the report, but appearingly the DB corruption was not a one-time event, and resolved after disabling the AV.

  90. laanwj commented at 3:54 pm on October 6, 2015: member
    Should be fixed by #6650, which is merged fo r 0.12
  91. laanwj closed this on Oct 6, 2015

  92. DrahtBot locked this on Sep 8, 2021

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-01-21 21:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me