Possibly more efficient blocks? #7412

issue yuilleb opened this issue on January 25, 2016
  1. yuilleb commented at 5:12 AM on January 25, 2016: none

    Hi, This is my first time posting anything to github, so go easy on me, but I'd like to help :). Since block size seems to be an important issue at the moment, I was wondering if the following would help.

    A potential place to save space would be to look into stop bit encoding for variable length integers that have a higher frequency of being larger than 0xfd. Stop bit encoding works by using the 8th bit of each byte as a flag that you've reached the last byte of the integer data (so 7 bits of each byte is used for the integer data). This is really nice as once you get out pass 0xfd the var_int encoding seems rather inefficient, unless you fill the entire int16, int32, or int64. For instance if the value to be encoded is 0xfe, var_int will use 3 bytes, while stop bit will only require 2 bytes. Both have a storage space of 3 bytes for full 16bit integers, 5 bytes for full 32bit integers, and 9 bytes for full 64bit integers. FIX/FAST protocol makes use of stop bit encoding for integer compression, if you'd like a reference. The nice thing about var_int is how much space it saves for values up to 0xfd, so that's why I'd think stop bit encoding would only be helpful for values which tend to normally be larger in size than 0xfd.

    Another place I could imagine using a compression scheme like stop bit encoding would be for the "lock_time" field. I'm not sure how often this field is used, but I could imagine if it's not used very frequently we are wasting 3 extra Bytes per transaction for nothing. There also seems to be an issue once the block number is larger than 500000000. What I would purpose instead is creating a single byte called "tx_opt" which is used as bit flags for turning on or off fields within the transaction. Two of the option bits could be used to signal the presence of either a "lock_block" field or a "lock_time" field. If neither flags are set then the transaction ends after the last read "tx_out". This saves at most 3 bytes from the transaction data when a lock_time or lock_block are not used. It also provides some flag space for further uses later on within each transaction. I was also noticing that many transactions tend to have only 1 input and 1 output, which if this is common enough, we could perhaps have a flag or two that set weather the "tx_in count" or "tx_out count" are specified (the default value if they are not included being 1).

    I would also recommend doing something similar with the "sequence" field in the "tx_in" format, if applicable, as this reads like it is more of an optional field than a required field so it might make more sense to have another "tx_in_opt" field which uses a bit flag to signal the presence of the "sequence" field.

    I hope these ideas are useful. I would like to contribute to bitcoin core more, but first I have to spend some time learning how github works.

    Brandon

  2. jonasschnelli added the label Consensus on Jan 25, 2016
  3. laanwj commented at 4:42 PM on January 25, 2016: member

    Hello Brandon, welcome to the Bitcoin Core repository.

    Changing the block or transaction format in such fundamental ways is generally a hard fork, breaking compatibility with all older software. This makes it impractical.

    Another option is to support optional block compression over the P2P protocol, but don't change local handling; see #6973.

    Though isavings add up to a lot (you'd need to quantify them) they could be part of a future hardfork. See also https://en.bitcoin.it/wiki/Hardfork_Wishlist

    Note that some of the fields which have been effectively ignored in the past have recently gained a more important role:

    • nLockTime is set by the wallet for every transaction to protect against fee sniping
    • nSequence in the input is used to signal opting in to RBF
  4. yuilleb commented at 7:02 PM on January 25, 2016: none

    Thanks for the response. I see, having to do a hard fork is troublesome. Is a hard fork likely to ever occur again that it would be worth taking the time to test the space savings of these ideas?

  5. bityogi commented at 7:44 PM on January 25, 2016: none

    @yuilleb My understanding is that even core expects a hard-fork to eventually happen. But they have relegated it behind other more important priorities to make such a fork as safe as possible. Here's a link to the road-map.

    We’ll continue to set the stage for non-bandwidth-increase-based scaling, while building additional tools that would make bandwidth increases safer long term. (Emphasis mine)

    I think by 'bandwidth increases' @gmaxwell means block-size. But that is just my interpretation.

  6. yuilleb commented at 8:25 PM on January 25, 2016: none

    @bityogi thanks for the follow up. Since there is a chance of a hard fork down the road, I'll write some code to transform blocks into the suggested format and see what the numbers look like. @laanwj I spent some time reading #6973. My guess is gzip/deflate is able to perform well because there are so many occurrences of 0x00 bytes and 0xff bytes. My intention is to remove these bytes altogether so more transactions can fit within a single block, before compression is applied for relaying.

    If someone does have any related ideas, please share them here so I can include those in my test code so we can see the results. Feel free to close this if no one else has any ideas related to this.

  7. bgorlick commented at 6:43 AM on January 26, 2016: none

    @yuilleb & @laanwj Another data compression library & format of interest along similar lines I've been exploring more recently is Brotli. Have you looked at it?

  8. yuilleb commented at 8:55 PM on January 26, 2016: none

    @bgorlick I have not heard of this compression technique, but thanks for the info. This should probably be suggested in #6973 as an optional compression format.

  9. yuilleb commented at 9:18 PM on January 26, 2016: none

    I ran some tests with my test code and found that if you enable the following options, the block size is decreased by about 3.5% to 4.1%. All initial testing was done originally on blk00426.dat.

    The test program was run with the following command: ./txcomp --allinputslocked --txinopt --varintindex --stopbitlocktime --varinttxver blk00426.dat

    --allinputslocked If all inputs have their sequence fields locked (set to 0xffffffff) then the locktime field is not included. --txinopt A single option byte is included in each tx input using bit flags to signal if the sequence field is locked (set to 0xffffffff) or is set to a value other than 0. If no flags are set then the sequence field is not included. --varintindex The index field is encoded as a var_int. --stopbitlocktime The lock time field is encoded as a stop bit integer. --varinttxver The transaction version field is encoded as a var_int.

    On the following files here are the original file sizes followed by their new format sizes: blk00426.dat 133412113 128700565 blk00037.dat 133991669 128130719 blk00137.dat 133895931 128389420

    I have decided not to run the tests over any other out of sample files so as not to have any curve fitting in case other options are explored.

    I also found that the average transaction input count appears to be about 2 inputs as well as the average transaction output count. The average script length seems to be about 72 bytes (scripts seem to be where the most compression could take place).

    I'm not sure if these findings warrant modifying transaction formats in a future hard fork scenario, but if anyone thinks there is merit here please let me know and I'll continue working on this.

  10. laanwj added the label Resource usage on Feb 16, 2016
  11. laanwj added the label Brainstorming on Apr 28, 2016
  12. laanwj commented at 12:42 PM on June 20, 2016: member

    I think this should be moved to a BIP, it is too wide of scope to be discussed here. Also, compact blocks already makes block relay much more efficient, so it is unclear what the added benefit of this will still be.

  13. laanwj closed this on Jun 20, 2016

  14. MarcoFalke locked this on Sep 8, 2021

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-04-22 18:15 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me