ptschip
commented at 1:33 PM on November 9, 2015:
contributor
Compress Blocks before sending achieves an average of 21% block compression.
When blocks are almost full and at the highest Zlib setting,level 9, there is an average 22% block compression and takes 0.19 seconds to compress. At level 6 (which has been set as the default) there is 21% block compression but only takes 0.09 seconds. Decompression is very fast in all cases averaging only 0.008 seconds.
NOTE: all block data used to gather these numbers was from mainnet and compression was done using a 4 year old laptop with a Celeron processor. (Current i7 processors are 8 times more powerful)
laanwj added the label P2P on Nov 9, 2015
ptschip force-pushed on Nov 9, 2015
paveljanik
commented at 1:47 PM on November 9, 2015:
contributor
Do we want this as a default protocol feature? Isn't it better to make it optional and use it only if available/announced?
paveljanik
commented at 1:48 PM on November 9, 2015:
contributor
paveljanik
commented at 2:00 PM on November 9, 2015:
This hunk can be removed.
in
src/main.h:None
in
7be5ee57bfoutdated
82 | @@ -83,6 +83,8 @@ static const unsigned int DATABASE_WRITE_INTERVAL = 60 * 60;
83 | static const unsigned int DATABASE_FLUSH_INTERVAL = 24 * 60 * 60;
84 | /** Maximum length of reject messages. */
85 | static const unsigned int MAX_REJECT_MESSAGE_LENGTH = 111;
86 | +/** 0 = no compression , 9 = maximum compression. */
paveljanik
commented at 2:00 PM on November 9, 2015:
Please remove space before comma.
jonasschnelli
commented at 2:09 PM on November 9, 2015:
contributor
Compression/decompression before/after transmitting data through the internet in general is a good idea. Very likely the CPU costs is tiny,.. but it would be nice to see a benchmark of a node serving compressed blocks for IBD to another node.
Two things which moves this PR towards NACK territory for me:
adding transmission compression should happen in a different, deeper layer and should be independent from the used p2p command
compression should be optional (version bits?)
Maybe apache's mod_deflate could be a point of inspiration.
ptschip force-pushed on Nov 9, 2015
jmcorgan
commented at 2:46 PM on November 9, 2015:
contributor
Agree this capability should be advertised so it only comes into effect between mutually supporting nodes; also, even as-is it should default to compression level 0 to allow the node operator to turn on if desired. Concept ACK, implementation NACK.
ptschip force-pushed on Nov 9, 2015
ptschip
commented at 3:42 PM on November 9, 2015:
contributor
On 09/11/2015 6:10 AM, Jonas Schnelli wrote:
Compression/decompression before/after transmitting data through the
internet I general is a good idea. Very likely the CPU costs is
tiny,.. but it would be nice to see a benchmark of a node serving
compressed blocks for IBD to another node.
I'll work on this. Currenlty my only benchmark is with a 600MB
blockchain. Takes 3:47 to sync without compression and 1:25 with
compression. However, those numbers are quite optomistic since the
blocks in my blockchain, although full blocks, compress down very small.
Two things which moves this PR towards NACK territory for me:
adding transmission compression should happen in a different,
deeper layer and should be independent from the used p2p command
Perhaps a good idea...I think the only place for it would be
CDataStream....then we could do ss.compress(), ss.decompress(). I'll
work on that.
compression should be optional (version bits?)
Compression is optional by setting -compressionlevel=0 (this bypasses
compression entirely and uses the current code for block sending).
version bits? I'm not sure how that would make anything better. I was
getting the remote nodes' protocol version during the handshake and
using 70011 or higher to determine if nodes are accepting block
compression or not.
Maybe apache's mod_deflate could be a point of inspiration.
—
Reply to this email directly or view it on GitHub
#6973 (comment).
ptschip force-pushed on Nov 9, 2015
sipa
commented at 4:20 PM on November 9, 2015:
member
I don't think a 90ms delay before a block can be transmitted is acceptable.
Is there a way to cache the compressed result and relay that, if the peer
accepts compressed blocks? There may be other compression algorithms which
compress much faster (miniLZO?) that are more appropriate.
Compression may not be useful for all messages, if done on a per-message
basis, as most compression algorithms don't offer significant compression
for small data amounts.
We need a way for peers to advertize support for compressed messages or a
compressed network link.
This will need a BIP, but it looks interesting and reasonably simple to do.
ptschip
commented at 6:55 PM on November 9, 2015:
contributor
On 09/11/2015 8:21 AM, Pieter Wuille wrote:
I don't think a 90ms delay before a block can be transmitted is
acceptable.
Is there a way to cache the compressed result and relay that, if the peer
accepts compressed blocks? There may be other compression algorithms which
compress much faster (miniLZO?) that are more appropriate.
The 90ms delay is just for compression of large blocks...small blocks
can be 10 or 20 milliseconds, and these numbers come from a very slow
laptop. Also the transmission should be much faster and make up for the
90ms compression, particularly when network latency is high. For
example: a block that normally takes 1 second to transmit should save
(200-90=110) millis, but i'll have some better numbers for that forthcoming.
Compression may not be useful for all messages, if done on a per-message
basis, as most compression algorithms don't offer significant compression
for small data amounts.
I was surprised to see even on small blocks of 181 bytes the compression
is still 20% using zlib. So we could theoretically apply this to
transactions as well. But, first get it all
working for Blocks though.
We need a way for peers to advertize support for compressed messages or a
compressed network link.
The current build is backwardly compatible and uses the protocol version
to determine whether nodes can handle compression/decompression, but it
does assume in most cases that compression would be ON. To get around
that we could send blocks with a different process message string such
as, pfrom->PushMessage("cmp_block", block); as opposed to
pfrom->PushMessage("block", block); that way the receiving node will
know if this is a compressed or uncompressed block and act accordingly.
IMO, Other than using the protocol version, I'm not sure we need to
advertise that we can accept compressed blocks.
This will need a BIP, but it looks interesting and reasonably simple
to do.
ok, will do
—
Reply to this email directly or view it on GitHub
#6973 (comment).
gmaxwell
commented at 8:57 PM on November 9, 2015:
contributor
Interesting to see someone trying this.
Matt's relay network protocol can achieve much better compaction (e.g. sending a 1MB block in a couple kilobytes); and considering zlib's security non-track record... this doesn't excite me greatly. Then one must consider the potential adverse effect on the system from punishing people for failing to reuse addresses, which is another impact matt's protocol does not have.
Given the approach these numbers seemed suspect to me-- but I can verify them*. When we tested gzipping single block yeas ago we got much worse results... I wonder how much this is due to spam attack transactions (where pubkeys/txouts are repeated unusually often)?
I'd like to better understand where the compression is coming from, and I think it's critical that we do so: The only significant gains this should really be getting is from reused pubkeys, which is quite concerning. Protocol aware compression could likely do better (e.g. blocks can't spend outputs that don't exist.) without that problem.
Better than compressing blocks may be be compressing the stream of transactions, as doing that is not exclusive with the relay network protocol style compression; though it would continue the problem of punishing people who do not behave in a fungiblity destroying manner.
For mining block relay performance is a consideration; adding 90ms is an considerable fraction of the total processing time. For non-latency critical relay (e.g. nodes not involved anywhere near mining) a process that used mempool set reconciliation on just the txids, then more round trips to send the unknowns would be the bandwidth minimizing but (again) it wouldn't be exclusive with compressing the transactions which are sent further.
This would perhaps be more interesting if we also considered a flag to ask peers to never inv/send loose transactions... a blocks only node would not gain from set reconciliation.
sipa
commented at 10:27 PM on November 9, 2015:
member
@gmaxwell My vague recollection is that gzip gave 20% years ago, actually.
gmaxwell
commented at 12:31 AM on November 10, 2015:
contributor
@sipa I thought it gave gains when compressing many blocks, and almost nothing on single blocks? (gained from reuse, but reuse in a block was rare.) I could be in space. (incidentally, catting up those 101 blocks above and xzing as a single file gets 31% compression).
ptschip renamed this: Zlib Block Compression for block relay Compress Blocks before sending on Nov 14, 2015
ptschip
commented at 5:03 PM on November 14, 2015:
contributor
@gmaxwell After running several tests the data is showing there is about a 20% compression benefit and surprisingly a slight improvement in performance particularly when latency is present. It seems that in the past this was looked into but dropped but I'm wondering if in the past gzip was used as a comparison rather than the zlib library directly. Gzip while based on zlib, can add additional metadata to a file and I don't think you have the option of selecting compression levels with gzip and maybe that's where the difference lies.
As far as moving forward, I'd like to keep working on this and see that there is some benefit to perhaps enhancing the code further, trying out different compression libraries... But I'm a little unclear on where this should go from here. I'm wondering if this really does need a BIP given the compression feature can be turned fully on/off through config settings and it's backward compatible with previous versions (advertising as a service). It doesn't seem that this is a major change to bitcoin but rather just another feature that is configurable? I'd like to get your guidance on this.
jgarzik
commented at 5:06 PM on November 14, 2015:
contributor
Protocol changes for behavior that appears on the network, even optional ones, need a BIP.
You're absolutely right, it is something that is easily made configurable, and there will probably be additional thought and discussion given to default setting(s) related to such a protocol feature, and an implementation's use of it.
ptschip
commented at 8:52 PM on November 14, 2015:
contributor
@jgarzik Thanks for the clarification. Just have one more question. Do I write up a formal BIP proposal first and then get a BIP number assigned or get a BIP number and then write the formal proposal?
gmaxwell
commented at 9:08 PM on November 14, 2015:
contributor
First figure out what you want to propose in detail while discussing with others, then write the document, post it as a draft (so people can comment), then make a pull req to the bips repository adding it and request a number and one will be assigned there.
I suggest trying to drop the standard compressor and try using a simple custom one. For example, that keeps a little 256 entry LFU cache for txouts scriptpubkey and a 256 entry cache for 'script sigs' the first push if there are two and first two if there are more than two. I think such a construction could get equal (or perhaps better) compression than zlib and be much faster. (other small improvements would be similarly coding nsequences, txver, nlocktime, and vin-- but I these are necessarily small improvements that will only save a few bytes per transaction)
jgarzik
commented at 9:23 PM on November 14, 2015:
contributor
"meat before number assignment" - you want demo implementation and BIP-draft-ptschip-compression written before requesting number, ideally.
ptschip
commented at 9:26 PM on November 14, 2015:
contributor
sounds good...thanks.
On 14/11/2015 1:23 PM, Jeff Garzik wrote:
"meat before number assignment" - you want demo implementation and
|BIP-draft-ptschip-compression| written before requesting number, ideally.
—
Reply to this email directly or view it on GitHub
#6973 (comment).
ptschip force-pushed on Nov 18, 2015
ptschip force-pushed on Nov 19, 2015
ptschip force-pushed on Nov 23, 2015
ptschip force-pushed on Nov 28, 2015
ptschip force-pushed on Nov 30, 2015
ptschip force-pushed on Nov 30, 2015
ptschip force-pushed on Nov 30, 2015
ptschip force-pushed on Nov 30, 2015
ptschip force-pushed on Nov 30, 2015
ptschip force-pushed on Nov 30, 2015
ptschip force-pushed on Nov 30, 2015
ptschip force-pushed on Nov 30, 2015
Datastream compression for Blocks and Tx's
Compress blocks and transactions using the
LZO1x Library before sending. Blocks
and transactions are concatenated where
possible.
8b4a62e308
ptschip force-pushed on Nov 30, 2015
paveljanik
commented at 8:17 PM on January 20, 2016:
contributor
Needs rebase.
Any progress on this?
bgorlick
commented at 9:23 PM on January 26, 2016:
none
I would like to suggest Brotli as an compression technique. I've done some brief analysis, possibly it could provide 20% improvement in compression and a speed bump.
rebroad
commented at 4:43 PM on February 1, 2016:
contributor
Can compression get significantly better if many blocks are compressed together?
laanwj
commented at 3:17 PM on February 2, 2016:
member
@rebroad This was always the case. E.g. compressing a blkXXXX.dat gets a better compression ratio than compressing a separate block. This was mainly due to repeating outputs, for example due to gambling games. I'm don't have statistics by how much though.
laanwj added the label Brainstorming on Feb 4, 2016
laanwj
commented at 11:28 AM on March 24, 2016:
member
Closing this for now: progress seems to be stuck, both at the BIP discussion and implementation side.
This is a metadata mirror of the GitHub repository
bitcoin/bitcoin.
This site is not affiliated with GitHub.
Content is generated from a GitHub metadata backup.
generated: 2026-04-22 21:15 UTC
This site is hosted by @0xB10C More mirrored repositories can be found on mirror.b10c.me