new bip: compact client side filtering #609

pull Roasbeef wants to merge 14 commits into bitcoin:master from Roasbeef:master changing 34 files +1614 −0
  1. Roasbeef commented at 11:42 pm on November 9, 2017: contributor

    This BIP describes a new light client node type for Bitcoin as well as the modifications to current full-nodes required to support this new type of light client. The light client mode described in this BIP is meant to supersede BIP 37 as it provides a greater degree of privacy, utility, and also reduces the resources required for full-nodes to service this new light client mode compared to BIP 37. The light client mode described in this BIP can be seen as a “reversal” of BIP 37: rather than the light clients sending filters to full-nodes, full-nodes send filters to light clients. Unlike BIP 37, we don’t utilize bloom filters. Instead, we utilize a compact filter (more efficient than bloom filters) which leverages Golomb-Rice coding for compression. Additionally, blocks are downloaded as a whole (from any source), rather than directly from peers as fragments with merkle-branches proving their authenticity.

    We’ve tested the implementation of both a light client implementing this BIP, as well as the serving full-node on Bitcoin’s testnet over the past several months. Additionally, this new operating mode also serves as the basis for our light weight Lightning node.

  2. new bip: compact client side filtering 83b83c78e1
  3. in gcs_light_client.mediawiki:11 in 83b83c78e1 outdated
     6+        Alex Akselrod <alex@akselrod.org>
     7+Comments: ???
     8+Comments-URI: ???
     9+Type: Standards Track
    10+Created: 05-24-2017
    11+License: PD
    


    luke-jr commented at 8:38 am on November 10, 2017:
    Please choose one of the recommended, or at least acceptable licenses (and also add a Copyright section below).

    Roasbeef commented at 7:14 am on November 30, 2017:
    Dunzo, we chose CCO-1.0.
  4. in gcs_light_client.mediawiki:22 in 83b83c78e1 outdated
    17+modifications to current full-nodes required to support this new type of light
    18+client. The light client mode described in this BIP is meant to supersede BIP
    19+37 as it provides a greater degree of privacy, utility, and also reduces the
    20+resources required for full-nodes to service this new light client mode
    21+compared to BIP 37. The light client mode described in this BIP can be seen as
    22+a "reversal"[1] of BIP 37: rather than the light clients sending filters to
    


    luke-jr commented at 8:39 am on November 10, 2017:
    [1] should be <ref>https://github.com/bitcoin/bips/blob/master/bip-0037.mediawiki</ref> (and use <references/> below) to make it auto-linkable. (suggestion, not mandatory)

    Roasbeef commented at 7:14 am on November 30, 2017:
    Done.
  5. luke-jr changes_requested
  6. luke-jr commented at 8:43 am on November 10, 2017: member

    Needs to address backward compatibility, at least briefly (eg, just say there isn’t any).

    We probably shouldn’t copy the test vectors here. The external repo is good for that.

  7. in gcs_light_client.mediawiki:157 in 83b83c78e1 outdated
    152+** <code>k</code> acts as fixed length encoding for the length of a run.
    153+** This value acts as the maximum encodable run-length.
    154+* Transmission of runs of 1's is omitted.
    155+* Two 1's in a row are denoted by a zero-length run of zero.
    156+
    157+As an example, consider the following sequence of bits: 
    


    jimpo commented at 10:05 am on November 11, 2017:

    This should mention that k=4 in the example.

    Also, I’m confused by the RLE scheme. How are runs of more than 2 1’s represented? What would be the encoding of the below example with s/0{20}/0{15} 1 0{5}/?


    TheBlueMatt commented at 7:24 pm on November 13, 2017:
    Yea, this is underspecified and also probably not entirely relevant here - the point of the Specification section is to describe what you do, not some alternative ways to encode the data.

    Roasbeef commented at 7:14 am on November 30, 2017:
    Opted to move this to a “Background” appendix at the end of the document.
  8. in gcs_light_client.mediawiki:97 in 83b83c78e1 outdated
    92+== Preliminaries ==
    93+
    94+Before we specify the details of our proposal, we'll first go over a few
    95+preliminaries which will aid in the understanding our proposal.
    96+
    97+By <code>[]byte</code> we refer to a slice (or array) of bytes. This value is
    


    jimpo commented at 10:07 am on November 11, 2017:
    I feel like “vector” is more appropriate than “array” (not just in a C++ sense).

    Roasbeef commented at 7:14 am on November 30, 2017:
    Modified wording to say “vector” instead of array.
  9. in gcs_light_client.mediawiki:169 in 83b83c78e1 outdated
    164+<pre>
    165+1110 1001 0000 1111 0101 1111 1111 0000 0000 1011
    166+</pre>
    167+
    168+RLE allows one to efficiently encode a data stream in a lossless manner. Due
    169+the the encoding of runs, RLE works best when encoding a set with a high degree
    


    jimpo commented at 5:57 pm on November 13, 2017:
    typo: Due to the encoding
  10. in gcs_light_client.mediawiki:201 in 83b83c78e1 outdated
    196+  where q is (n / m)
    197+   and  r is n % m
    198+</pre>
    199+
    200+[https://en.wikipedia.org/wiki/Golomb_coding Golomb Coding] encodes the two
    201+values (<code>q</code> and <code>m</code> for a given integer <code>n</code> as a two-tuple. The first value
    


    jimpo commented at 6:00 pm on November 13, 2017:

    typo: (q and r) for a given integer n

    Also missing closing parenthesis.


    Roasbeef commented at 7:14 am on November 30, 2017:
    Fixed.
  11. in gcs_light_client.mediawiki:239 in 83b83c78e1 outdated
    234+        k-bits 
    235+
    236+    c*m + r
    237+</pre>
    238+
    239+To aide in understanding we provide the following examples of using Golomb-Rice
    


    jimpo commented at 6:02 pm on November 13, 2017:
    typo: aid. aide is a noun.

    Roasbeef commented at 7:14 am on November 30, 2017:
    Fixed.
  12. in gcs_light_client.mediawiki:297 in 83b83c78e1 outdated
    292+<code>F</code> constricts the range of the hashed values accordingly in order to
    293+achieve our desired false positive rate.
    294+
    295+In addition, to help optimize the algorithm, we use a fast range algorithm[7],
    296+multiplying the hashed value by F and taking only the top 64 bits. This fairly
    297+distributes the values over F without division and can be done with fewer cycles
    


    jimpo commented at 6:11 pm on November 13, 2017:
    Doesn’t mention what strategy this algorithm uses fewer cycles than.

    Roasbeef commented at 7:14 am on November 30, 2017:
    Elaborated a bit more.
  13. in gcs_light_client.mediawiki:344 in 83b83c78e1 outdated
    339+        // P.
    340+        let remainder = (value - last_value) & (P - 1)
    341+
    342+        // Compute the difference between this value and the last one, divided
    343+        // by P. This is our quotient.
    344+        let quotient = (value - last_value - remainder) >> fp
    


    jimpo commented at 6:17 pm on November 13, 2017:
    Is it necessary to subtract the remainder? Just the shift is sufficient, no?
  14. in gcs_light_client.mediawiki:486 in 83b83c78e1 outdated
    481+light clients will need to ''preferentially'' peer to full-nodes that support
    482+the features outlined in this BIP.
    483+
    484+The 6th service bit will now be dedicated to signaling support for the
    485+features described within this BIP: 
    486+* <code>CFNodeCF = 1 << 6</code>
    


    jimpo commented at 6:21 pm on November 13, 2017:
    SFNodeCF

    Roasbeef commented at 7:14 am on November 30, 2017:
    Fixed.
  15. in gcs_light_client.mediawiki:607 in 83b83c78e1 outdated
    602+As it's feasible that in the future, this document is extended to encompass
    603+additional filter encoding algorithms or filter contents, we define a new p2p
    604+message that allows light clients to ascertain which filters a node supports.
    605+
    606+The <code>getcftypes</code> message is an ''empty message'' whose command string is:
    607+<code>getcftypes</code>
    


    jimpo commented at 6:45 pm on November 13, 2017:
    Mentioning the command string seems redundant.
  16. in gcs_light_client.mediawiki:184 in 83b83c78e1 outdated
    179+However, in our case due to the hashing of items within the compact filter,
    180+we'll be dealing with items that are ''uniformly distributed''. We can use this
    181+fact to leverage a more efficient encoding scheme based on the distribution of
    182+the length of a run. The [https://en.wikipedia.org/wiki/Geometric_distribution
    183+Geometric Distribution] represents the probabilities of a number of failures
    184+before the first success in a series of Bernoulli trials (yes/no experiments).
    


    TheBlueMatt commented at 7:27 pm on November 13, 2017:
    This paragraph probably doesn’t belong in the “Specification” section, motivation section seems much more appropriate, or maybe a “Background Math” section.

    Roasbeef commented at 7:14 am on November 30, 2017:
    Great suggestion, move the bulk of the initial section to a “Background Math” section at the end of the document.
  17. in gcs_light_client.mediawiki:502 in 83b83c78e1 outdated
    497+
    498+A <code>Normal</code> filter is intended to contain all the items that a light client
    499+needs to sync a basic Bitcoin wallet. In order to facilitate this use-case, for
    500+each transaction, normal filters contain:
    501+* The outpoints of each input within a transaction.
    502+* The data-pushes contained within the public key script of each output within the transaction.
    


    TheBlueMatt commented at 7:34 pm on November 13, 2017:
    I dont think “public key script” is the way we normally refer to it….I’d say stick with either “scriptPubKey” or “the output script”.

    Roasbeef commented at 7:14 am on November 30, 2017:
    Modified to use “output script”.

    TheBlueMatt commented at 5:08 pm on November 30, 2017:
    No?
  18. in gcs_light_client.mediawiki:115 in 83b83c78e1 outdated
    110+
    111+We define the concept of an abstract bit stream instantiated by the function:
    112+<code>new_bit_stream</code> The <code>bit_stream</code> has two functions that
    113+operate on it, <code>unary_encode(stream, n)</code> and
    114+<code>write_bits_big_endian(stream, n, k)</code> where <code>unary_encode(steam,
    115+n)</code> emits n (an integer) to the stream in unary, and
    


    TheBlueMatt commented at 7:38 pm on November 13, 2017:
    This is technically underspecified - “unary” is not a normative encoding, and can be 0-indexed, 1-indexed and can encode as 0s followed by a 1 or 1s followed by a 0.

    Roasbeef commented at 7:14 am on November 30, 2017:
    Updated to make more explicit.
  19. in gcs_light_client.mediawiki:330 in 83b83c78e1 outdated
    325+distributed, the deltas between these values will be Geometrically Distributed,
    326+meaning that Golomb-Rice coding will be optimal for this use-case [6].
    327+
    328+The following routine describes the compression process:
    329+<pre>
    330+gcs_compress(sorted_set, fp) -> []byte:
    


    TheBlueMatt commented at 8:17 pm on November 13, 2017:
    Wait, sorted_set? No where else does anything reference that the set must be sorted? Sorted in what order?

    Roasbeef commented at 7:14 am on November 30, 2017:
    The hashed_set_construct function above this returns a sorted set. Update to point this out, and make it explicit in the section above gcs_compress.
  20. in gcs_light_client.mediawiki:540 in 83b83c78e1 outdated
    535+    for tx in block.transactions:
    536+        let txid = tx.hash()
    537+        raw_items.append(txid)
    538+
    539+        for output in tx.outputs:
    540+            let output_bytes = extract_push_datas(output.script)
    


    TheBlueMatt commented at 8:22 pm on November 13, 2017:
    extract_push_datas is never well-defined - only referenced in a general sense.

    Roasbeef commented at 7:14 am on November 30, 2017:
    Added a new section explaining what’s meant by extract_push_datas. I’ve left a level of abstraction though, as otherwise I’d need to start specifying Script itself…so it currently assumes a certain level of familiarity with Bitcoin (as do all BIPs).
  21. in gcs_light_client.mediawiki:560 in 83b83c78e1 outdated
    555+    let hashed_items = []
    556+    for raw_item in raw_items:
    557+        let hashed_item = (siphash_key(siphash_key, raw_item) * F) >> 64
    558+        hashed_items.append(hashed_item)
    559+
    560+    hashed_items.sort()
    


    TheBlueMatt commented at 8:23 pm on November 13, 2017:
    Sort order is not defined.

    Roasbeef commented at 7:14 am on November 30, 2017:
    Defined.
  22. in gcs_light_client.mediawiki:633 in 83b83c78e1 outdated
    628+
    629+
    630+==== Compact Filter Header Chain ====
    631+
    632+As the filters described in this BIP ''are not'' consensus critical, meaning
    633+each filter is validated by full-nodes and committed into blocks by miners, we
    


    TheBlueMatt commented at 8:28 pm on November 13, 2017:
    Wait, huh? I think you mean meaning filters are not validated by full-nodes and not committed to in blocks.

    TheBlueMatt commented at 8:30 pm on November 13, 2017:
    More importantly, dunno why you have to specify this? It is binding insofar as the protocol says it must be X, and light clients will think you’re broken if its not X.

    Roasbeef commented at 7:14 am on November 30, 2017:
    Ah yeh, nice catch! Modified to say “not”. Specifying it to give the reader the rationale to why the header chain exists.
  23. in gcs_light_client.mediawiki:648 in 83b83c78e1 outdated
    643+<pre>
    644+filter_header(n: uint) -> [32]byte = 
    645+   let zero_hash [32]byte = {0*32}
    646+
    647+   if n == 0:
    648+       double-sha-256(genesis_block.prevblock || filter(0))
    


    TheBlueMatt commented at 8:33 pm on November 13, 2017:
    Soooo…0? Please just say 32 bytes of 0s.

    Roasbeef commented at 7:14 am on November 30, 2017:
    Fixed.
  24. in gcs_light_client.mediawiki:653 in 83b83c78e1 outdated
    648+       double-sha-256(genesis_block.prevblock || filter(0))
    649+
    650+   match filter(n):
    651+      case Some:
    652+          double-sha-256(filter_header(n-1) || double-sha-256(filter(n)))
    653+      case None:
    


    TheBlueMatt commented at 8:35 pm on November 13, 2017:
    Huh? Can you make it a bit clearer that you’re saying if the filter has data in it? case Some, case None reads as nonsense to me (and you say this before you describe it in text).

    TheBlueMatt commented at 8:36 pm on November 13, 2017:
    Also, why the special case here? Why not just sha256 of the empty string?

    Roasbeef commented at 7:15 am on November 30, 2017:

    Added further text explaining the two cases. Also defined the whole match, some, none usage at the top in the preliminaries (borrowed from more functional languages).

    No particular reason, but this way 32-byte are always hashed current filter header.

  25. in gcs_light_client.mediawiki:671 in 83b83c78e1 outdated
    666+"hash" of said filter is simply "32 zeroes". 
    667+
    668+This filter header chain can be utilized by light clients to gain a greater
    669+degree of security against bamboozling full-nodes during their initial chain
    670+sync. In addition to fetching all the bitcoin headers, light clients
    671+implementing this BIP should also fetch all the ''filter headers'' from ''each'' of
    


    TheBlueMatt commented at 8:45 pm on November 13, 2017:
    I think this section may be clearer if you use RFC words (which you appear to already be doing) and be clear that you are. Also, what is the advantage of fetching the filter headerS from each peer? It would be less bandwidth and just as much cross-checking to only fetch the top block’s filter header after you’ve synced the headers chain (which is probably a much better approach in any case as you can first make sure you’re on the best chain, and then download a filters chain, protecting you from some types of DoS issues). You can then do a backwards sync of the filters. If there is a mismatch you do a binary search or otherwise to figure out which peer is lying.

    Roasbeef commented at 7:15 am on November 30, 2017:
    Modified to make the usage of RFC works explicit. The section below assumes the client has fully synced, as is fetching from tip. As a result, they already have the entire header chain. The naive way would be just to fetch the entire filter from each peer. This algo avoids that and only fetches the filter in the case a bamboozlement attempt by a peer.
  26. in gcs_light_client.mediawiki:682 in 83b83c78e1 outdated
    677+verify purported filter authenticity when fetching the next set of headers from
    678+chain tip. Instead of fetching the filter ''from each peer'' (which wastes
    679+bandwidth), a light client instead does the following:
    680+
    681+<pre>
    682+verify_from_tip(tip_block_hash: [32]byte):
    


    TheBlueMatt commented at 8:46 pm on November 13, 2017:
    I dont know why this needs to be in the normative “Specification” section - clients can use this how they feel, and I think this is probably not the best approach anyway. You may wish to say something like clients SHOULD avoid syncing the full filter chain from all peers, but instead should only download from multiple peers when an inconsistency is detected.

    Roasbeef commented at 7:15 am on November 30, 2017:
    Moved to the “Implementation Notes” section.
  27. in gcs_light_client.mediawiki:703 in 83b83c78e1 outdated
    698+
    699+        // Otherwise, syncing continues as normal: fetch filter to see if it
    700+        // matches any relevant items.
    701+</pre>
    702+
    703+Light clients should persistently commit all filter headers to disk, as when
    


    TheBlueMatt commented at 8:47 pm on November 13, 2017:
    I dont think this should be normative, but if it is, I’d suggest using RFC words and use MAY here instead of SHOULD.

    Roasbeef commented at 7:15 am on November 30, 2017:
    Modified to use MAY.
  28. in gcs_light_client.mediawiki:857 in 83b83c78e1 outdated
    852+This value was chosen as during simulations it was the value that minimized the
    853+bandwidth utilized by the expected number of blocks downloaded due to false
    854+positives, and the bandwidth used to download the filters themselves. The code along with a demo used for the paramter tuning can be found [here]
    855+
    856+
    857+=== Protocol Version Bump ===
    


    TheBlueMatt commented at 8:52 pm on November 13, 2017:
    I do not believe this is neccessary? There is a new service bit, which should be sufficient.

    Roasbeef commented at 7:15 am on November 30, 2017:
    AFAIK, all new extensions to the p2p protocol have been accompanied by a protocol version bump? (or at least ones that add new messages)
  29. in gcs_light_client.mediawiki:114 in 83b83c78e1 outdated
    109+<code>c = 2</code> and <code>d = 4</code>.
    110+
    111+We define the concept of an abstract bit stream instantiated by the function:
    112+<code>new_bit_stream</code> The <code>bit_stream</code> has two functions that
    113+operate on it, <code>unary_encode(stream, n)</code> and
    114+<code>write_bits_big_endian(stream, n, k)</code> where <code>unary_encode(steam,
    


    TheBlueMatt commented at 8:56 pm on November 13, 2017:
    I think we need to much more seriously consider whether we want to, for the first time, introduce big endian at the purely-p2p layer - it is nice at a bit level, but it kinda sucks to have yet more endianness confusion in Bitcoin’s P2P protocol.

    Roasbeef commented at 7:14 am on November 30, 2017:
    Big endian just seemed natural initially. It’s not as if we’re using big-endian to encode a field that was previously encoded using little-endian. This is a new data-structure all together.
  30. TheBlueMatt commented at 8:59 pm on November 13, 2017: contributor
    I found this spec very confusing…I could probably stumble my way through implementing it purely because of the test vectors, but I would certainly miss parts of it the first few times around. Also, there is way too much background information in the “Specification” section - it doesn’t read as a specification, it reads as a general goal of what you want it to do.
  31. in gcs_light_client.mediawiki:296 in 83b83c78e1 outdated
    291+Using <code>N</code> and <code>P</code> we compute <code>F = N * P</code>
    292+<code>F</code> constricts the range of the hashed values accordingly in order to
    293+achieve our desired false positive rate.
    294+
    295+In addition, to help optimize the algorithm, we use a fast range algorithm[7],
    296+multiplying the hashed value by F and taking only the top 64 bits. This fairly
    


    jimpo commented at 6:17 pm on November 15, 2017:
    64 bits appears as a magic number here. I would note that the 64 bits comes from the SipHash output size.

    Roasbeef commented at 7:15 am on November 30, 2017:
    Specified.
  32. fixup! new bip: compact client side filtering af5e1376ef
  33. fixup! new bip: compact client side filtering 59fd92e556
  34. fixup! new bip: compact client side filtering 6a2f9cdac7
  35. fixup! new bip: compact client side filtering 33b6f49f08
  36. fixup! new bip: compact client side filtering 0a68479a0b
  37. Roasbeef commented at 7:17 am on November 30, 2017: contributor
    @luke-jr AFAICT other BIPs do contain test vectors in folders named the same as the BIP. For example BIP-00039 contains the word list used. Added a section for backwards compatability. @TheBlueMatt thanks for the thorough review! Admittedly the initial section reads more like a blog post. Following your suggestion I’ve adopted RFC words where applicable throughout, and created a new section (“Mathematical Background”) that houses what use to be the opening portion of the specification.
  38. fixup! new bip: compact client side filtering 5d71117b3d
  39. fixup! new bip: compact client side filtering 3728ea0db5
  40. fixup! new bip: compact client side filtering de4c6c9cab
  41. fixup! new bip: compact client side filtering 3c31af657f
  42. fixup! new bip: compact client side filtering 3ee99e66a9
  43. fixup! new bip: compact client side filtering ba82315713
  44. fixup! new bip: compact client side filtering bfe336c8e8
  45. fixup! new bip: compact client side filtering d52f586a13
  46. evoskuil commented at 7:30 am on November 30, 2017: contributor

    The light client mode described in this BIP is meant to supersede BIP 37 as it provides a greater degree of privacy…

    As far as I can tell the privacy improvement is never quantified. It would seem that the independent downloading of less than all blocks (which is ultimately the objective) is the only source of privacy loss. However, given that privacy is a primary objective, it would be helpful if this analysis was explicit. I am specifically interested in whether there is any privacy loss in the filtering aspect (as compared to the operation of a full node).

  47. in gcs_light_client.mediawiki:405 in d52f586a13
    400+
    401+
    402+==== Peer to Peer Service Bit ====
    403+
    404+To start, we reserve a currently unutilized service bit. This is required as
    405+light clients SHOULD ''preferentially'' peer to full-nodes that support the
    


    TheBlueMatt commented at 5:05 pm on November 30, 2017:
    I think SHOULD is maybe wrong here - some light clients will almost certainly have to only connect to nodes that have the bit set. You could replace with MAY preferentially, or only connect to, as required, or just drop it.
  48. in gcs_light_client.mediawiki:408 in d52f586a13
    403+
    404+To start, we reserve a currently unutilized service bit. This is required as
    405+light clients SHOULD ''preferentially'' peer to full-nodes that support the
    406+features outlined in this BIP.
    407+
    408+The 6th service bit will now be dedicated to signaling support for the
    


    TheBlueMatt commented at 5:05 pm on November 30, 2017:
    I guess terminology…its kinda the 7th bit.
  49. in gcs_light_client.mediawiki:409 in d52f586a13
    404+To start, we reserve a currently unutilized service bit. This is required as
    405+light clients SHOULD ''preferentially'' peer to full-nodes that support the
    406+features outlined in this BIP.
    407+
    408+The 6th service bit will now be dedicated to signaling support for the
    409+features described within this BIP: 
    


    TheBlueMatt commented at 5:16 pm on November 30, 2017:
    This is hugely under-defined - can you set the BIP if you’re not a full node (ie is NODE_NETWORK implied or is that separate and you can look for eg NODE_NETWORK_LIMITED & NODE_CF)? Do you have to have implemented both currently-defined filter types (No is better for maintainability, but that has huge useability implications)? What does this bit mean without NODE_NETWORK? Maybe you always have to return the filter chain hashes but dont need to return the filters themselves?

    luke-jr commented at 6:57 pm on November 30, 2017:
    I agree. It would be ideal if pruned nodes could set this and still be usable by light wallets (I might even suggest that clients SHOULD support using pruned nodes… many users want pruned nodes, and their own node should be the expected server)
  50. in gcs_light_client.mediawiki:410 in d52f586a13
    405+light clients SHOULD ''preferentially'' peer to full-nodes that support the
    406+features outlined in this BIP.
    407+
    408+The 6th service bit will now be dedicated to signaling support for the
    409+features described within this BIP: 
    410+* <code>SFNodeCF = 1 << 6</code>
    


    TheBlueMatt commented at 5:16 pm on November 30, 2017:
    What is the SF prefix here? I think generally BIPs have used Bitcoin Core’s NODE_* constant naming (and generally use all-caps constant naming).
  51. in gcs_light_client.mediawiki:539 in d52f586a13
    534+    hashed_items.sort()
    535+
    536+    gcs_compress(hashed_items, fp)
    537+</pre>
    538+
    539+==== Filter Capability Querying ====
    


    TheBlueMatt commented at 5:19 pm on November 30, 2017:
    I think this is kinda poorly thought-through. As noted in the service bit discussion section, its not clear to me how you’d actually use this in practice. Lets say a new filter type is added to include segwit/p2sh redeemScript pushes, now a client that wants that filter has to go around connecting to every node it can find with the service bit until it finds one that supports the new filter type?
  52. in gcs_light_client.mediawiki:770 in d52f586a13
    765+along with a demo used for the parameter tuning can be found [here https://github.com/Roasbeef/bips/blob/83b83c78e189be898573e0bfe936dd0c9b99ecb9/gcs_light_client/gentestvectors.go]
    766+
    767+
    768+=== Protocol Version Bump ===
    769+
    770+As this BIP defines new peer-to-peer behavior, we bump the protocol version by
    


    TheBlueMatt commented at 5:21 pm on November 30, 2017:
    As noted previously (but github maybe lost the comment?), given the service bit inclusion, this is uneccessary. Would strongly prefer you not bump the protocol version.
  53. in gcs_light_client.mediawiki:869 in d52f586a13
    864+likely the older filters would only be fetched for historical rescans. It is
    865+also likely that the past few hundred filters will be fetched mostly frequently
    866+by smaller devices (phones, laptops, etc) periodically coming back online after a
    867+period of inactivity.
    868+
    869+It is possible to implementations of this BIP to serve ''other''
    


    TheBlueMatt commented at 5:22 pm on November 30, 2017:
    I have no idea what this sentence is saying?
  54. in gcs_light_client.mediawiki:874 in d52f586a13
    869+It is possible to implementations of this BIP to serve ''other''
    870+implementations to a degree. All headers (regular bitcoin, regular, extended)
    871+can be served, and also any on-disk filters can also be served to other light
    872+clients.
    873+
    874+Key import and rescan: with lazy <code>filters</code> fetching, having a start
    


    TheBlueMatt commented at 5:23 pm on November 30, 2017:
    What is “lazy filters fetching”?
  55. in gcs_light_client.mediawiki:594 in d52f586a13
    589+       double-sha-256(genesis_block.prevblock || filter(0))
    590+
    591+   match filter(n): 
    592+      // If the filter isn't empty, then we hash the filter itself into the
    593+      // header chain.
    594+      case Some:
    


    TheBlueMatt commented at 5:28 pm on November 30, 2017:
    Maybe I’m just not a Go person, but I find this pseudocode very hard to read… match filter(n) case Some case None where Some and None are never previously defined in the document? Without the comments it would be very hard to decipher…maybe something like if (filter(n) has elements): ?
  56. in gcs_light_client.mediawiki:595 in d52f586a13
    590+
    591+   match filter(n): 
    592+      // If the filter isn't empty, then we hash the filter itself into the
    593+      // header chain.
    594+      case Some:
    595+          double-sha-256(filter_header(n-1) || double-sha-256(filter(n)))
    


    TheBlueMatt commented at 5:37 pm on November 30, 2017:
    Given this construction I see no reason to differentiate between the “some data” and “no data” case - double-sha-256(null string) is well-defined and you’re concatenating hashes so it doesn’t lead to any confusion.
  57. in gcs_light_client.mediawiki:597 in d52f586a13
    592+      // If the filter isn't empty, then we hash the filter itself into the
    593+      // header chain.
    594+      case Some:
    595+          double-sha-256(filter_header(n-1) || double-sha-256(filter(n)))
    596+
    597+      // Otherwise, if the filter is empty (created from a block with a single
    


    TheBlueMatt commented at 5:39 pm on November 30, 2017:
    This is confusing as its only true for the Extended filter, not the Normal filter (as the Normal filter would contain at least the coinbase’ txid and output script data pushes).
  58. in gcs_light_client.mediawiki:589 in d52f586a13
    584+filter_header(n: uint) -> [32]byte = 
    585+   // The zero hash is 32 bytes of 0's.
    586+   let zero_hash [32]byte = {0*32}
    587+
    588+   if n == 0:
    589+       double-sha-256(genesis_block.prevblock || filter(0))
    


    TheBlueMatt commented at 5:40 pm on November 30, 2017:
    Heh, since you redid the section now this is even more confusing….genesis_block.prevblock reads to me like a reference to its actual previous block (which is undefined) not the previous block hash field in the header. Maybe swap back to zero_hash instead since its now clear that this is only for the genesis block or just put “hash” in there? Sorry for the back-and-forth.
  59. in gcs_light_client.mediawiki:611 in d52f586a13
    606+The filter header for the genesis block uses the hash stored in the prevblock
    607+field of the genesis block header itself, as there's no prior filter header
    608+(by definition).
    609+
    610+Due to the nature of filter construction, it's possible to construct a block
    611+such that an "empty" filter will be produced. This is the case of a coinbase
    


    TheBlueMatt commented at 5:42 pm on November 30, 2017:
    Same comment as above, if I understand the construction correctly, good to note that this is only true for the Extended filter.
  60. in gcs_light_client.mediawiki:633 in d52f586a13
    628+| NumBlockLocators
    629+| uint64
    630+| Number of block locators.
    631+|-
    632+| NumBlockLocators * 32
    633+| BlockLocatorHashes
    


    TheBlueMatt commented at 5:44 pm on November 30, 2017:
    Why use block locators here? It seems like you’re endeavoring to allow people to fetch the filters chain entirely separately from headers, but I dont know why anyone would ever want to do this - you still need the headers themselves to check things like PoW.
  61. in gcs_light_client.mediawiki:652 in d52f586a13
    647+
    648+The <code>BlockLocators</code> within the message are to be interpreted
    649+identically to the <code>BlockLocators</code> within Bitcoin's
    650+<code>getheaders</code> and <code>getblocks</code> messages <ref>https://en.bitcoin.it/wiki/Protocol_documentation</ref>.
    651+
    652+The <code>cfheaders></code> message MUST be sent in response to a
    


    TheBlueMatt commented at 5:50 pm on November 30, 2017:
    This seems too restrictive (and conflicts with the note that locators “have the same semantics as getheaders” above. There are cases where getheaders messages are currently not responded to (hashStop set but locator empty, which has very different semantics as well as if a node thinks its in IBD so its headers response is going to be useless…note obviously things are easier to define if you drop the locator/stop thing in favor of a simpler startHash/blockCount request and restrict requests to hashes for blocks which were recently received in headers messages from the same node (something you’re gonna get from essentially all your peers anyway).
  62. in gcs_light_client.mediawiki:663 in d52f586a13
    658+! Descriptions
    659+! Data Type
    660+! Comments
    661+|-
    662+| 32
    663+| StopHash
    


    TheBlueMatt commented at 6:16 pm on November 30, 2017:
    This is racy - lets say you request some filter headers (eg from a peer just finishing re-syncing after restart), so the request ends with a block hash that you dont have the header for, but the network converges on a different block…now you have a filter chain ending in a block that you dont have a header for and possibly cant get, but needlessly so. Switching things to backwards download (eg making getcfheaders take a block hash and then a number of blocks before that block to return filter headers for) should make this much less of an issue. Or, ideally, just make it a getcfheader request instead of getcfheaders, simplifes a lot of things and you can still do a backwards walk with a binary search to find the conflicting header, something you’ll want to do either way.
  63. in gcs_light_client.mediawiki:670 in d52f586a13
    665+| Block hash for the last filter header returned, for locating the filter headers in the blockchain.
    666+|-
    667+| 1
    668+| FilterType
    669+| byte
    670+| Byte identifying the type of filter headers being returned.
    


    TheBlueMatt commented at 6:16 pm on November 30, 2017:
    Need to specify somewhere that this needs to be the same value as the corresponding request.
  64. in gcs_light_client.mediawiki:708 in d52f586a13
    703+| FilterType
    704+| byte
    705+| Byte identifying the type of filter requested.
    706+|}
    707+
    708+The <code>cfilter</code> message MUST be sent in response to a
    


    TheBlueMatt commented at 6:21 pm on November 30, 2017:
    This is too restrictive. Need to specify what the restrictions are on the blockhash field (eg that it is something for which you’ve recently received a headers message from the peer for or so). Might also consdier clarifying that you should respond with the data for the same block and filter type as the request :p.
  65. luke-jr commented at 6:27 pm on November 30, 2017: member

    @Roasbeef I see you’re doing a lot of revising of stuff. Let me know when you’d like this merged. (others commenting should note that this is just a draft, and merging does not preclude acting on comments in future revisions)

    There’s no hard rule against test vectors in the repo. My opinion on where they belong is an opinion.

  66. in gcs_light_client.mediawiki:749 in d52f586a13
    744+client in order to properly incrementally decode, query, and validate
    745+(reconstruct from Bitcoin block) a compact filter. The parameter <code>N</code>
    746+cannot be known ahead of time, therefore we define the serialization of a
    747+compact filter of type <code>0x00</code> and <code>0x01</code> as:
    748+<pre>
    749+N || raw_filter_bytes
    


    TheBlueMatt commented at 6:31 pm on November 30, 2017:
    Why not just put this in the message encoding? If a future new filter type no longer needs the N to be encoded, it can redefine the message encoding then.
  67. in gcs_light_client.mediawiki:823 in d52f586a13
    818+
    819+        if len(filter_headers) != 1:
    820+            // Peers have conflicting filters. The light client should fetch
    821+            // each unique filter from the set of peers AND fetch the block. The
    822+            // light client can then verify which filter header is correct, and
    823+            // BAN the offending peers.
    


    TheBlueMatt commented at 6:33 pm on November 30, 2017:
    Something something binary search?
  68. in gcs_light_client.mediawiki:571 in d52f586a13
    566+|}
    567+
    568+
    569+==== Compact Filter Header Chain ====
    570+
    571+As the filters described in this BIP ''are not'' consensus critical, meaning
    


    TheBlueMatt commented at 6:35 pm on November 30, 2017:
    I might suggest you strike this portion - you later say that clients should detect and ban nodes that dont have a normative filter for each block, so its a kind of consensus, even if not “consensus critical” per se.
  69. in gcs_light_client.mediawiki:842 in d52f586a13
    837+<code>4-bytes</code>. Clients are able to verify that a filter will be
    838+<code>null</code> before requesting it (as it will just be the prior filter
    839+header hashed with zero bytes). Clients can take this fact into account in
    840+order to save a round trip for <code>null</code> headers.
    841+
    842+Light clients implementing this proposal SHOULD utilize the
    


    TheBlueMatt commented at 6:35 pm on November 30, 2017:
    This seems very strong. Why SHOULD? Hell, why even MAY? It seems very unrelated.
  70. in gcs_light_client.mediawiki:755 in d52f586a13
    750+</pre>
    751+where <code>N</code> is serialized as a 32-bit big-endian integer.
    752+
    753+However, there exists a special case of a <code>null</code> filter. This this
    754+case an empty byte slice is transmitted rather than consuming
    755+<code>4-bytes</code> to encode the size of zero.
    


    TheBlueMatt commented at 6:36 pm on November 30, 2017:
    Why the special case here? You’re already sending many bytes anyway, dropping the extra 4 just seems…over optimization?
  71. in gcs_light_client.mediawiki:792 in d52f586a13
    787+MAY implement both protocols to serve both types of light clients.
    788+
    789+
    790+== Implementation Notes ==
    791+
    792+This filter header chain SHOULD be utilized by light clients to gain a greater
    


    TheBlueMatt commented at 6:37 pm on November 30, 2017:
    Maybe s/by light clients/in place of BIP 37/
  72. in gcs_light_client.mediawiki:796 in d52f586a13
    791+
    792+This filter header chain SHOULD be utilized by light clients to gain a greater
    793+degree of security against bamboozling full-nodes during their initial chain
    794+sync. In addition to fetching all the bitcoin headers, light clients
    795+implementing this BIP SHOULD also fetch all the ''filter headers'' from
    796+''each'' of their connected peers. With these headers, light clients SHOULD
    


    TheBlueMatt commented at 6:39 pm on November 30, 2017:
    I’d say they SHOULD only fetch the top header after doing a headers sync from all of their peers except one, from which they build the whole header tree.
  73. TheBlueMatt commented at 6:39 pm on November 30, 2017: contributor
    Just reviewed the P2P part again.
  74. in gcs_light_client.mediawiki:1031 in d52f586a13
    1026+However, current constructions of cryptographic accumulators require an initial
    1027+trusted set up. Additionally, accumulators based on the Strong-RSA Assumption
    1028+require mapping set items to prime representatives in the associated group
    1029+which can be preemptively expensive.
    1030+
    1031+==== Matrix Based Probabilistic Set Datastructures ====
    


    jamesob commented at 11:52 pm on November 30, 2017:
    “Datastructures” should probably be two words.
  75. jimpo cross-referenced this on Dec 8, 2017 from issue Build tx index in parallel with validation by jimpo
  76. jonasschnelli cross-referenced this on Jan 4, 2018 from issue Compact Client Side Filtering for Light Clients (Neutrino server-side support) by Sjors
  77. jimpo cross-referenced this on Jan 17, 2018 from issue BIPs 157 & 158: Block Filtering stuff by jimpo
  78. Roasbeef commented at 10:26 pm on January 17, 2018: contributor

    Closing this in favor of #636.

    Thanks to everyone who gave valuable review comments, especially @TheBlueMatt for his comments on the P2P aspect which led to notable improvements as proposed by @jimpo.

  79. Roasbeef closed this on Jan 17, 2018


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bips. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-12-27 08:10 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me