policy: make unstructured annex standard #27926

pull joostjager wants to merge 1 commits into bitcoin:master from joostjager:std-unstructured-annex changing 3 files +84 −16
  1. joostjager commented at 11:23 am on June 21, 2023: none

    This PR opens up the annex in a limited way:

    • If any one of the inputs commits to an annex, all inputs must. This is a mechanism to opt-in to annex use, preventing participants in a multi-party transaction from adding an unexpected annex which would increase the tx weight and lower the fee rate. See https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2023-June/021736.html

    • Allow empty annexes, to let users opt-in to annex usage with the minimum number of bytes.

    • Additionally, only allow annexes that start with 0x00. This keeps open the possibility to support structured annexes such as for example the tlv format proposed in https://github.com/bitcoin/bips/pull/1381. A potential future structured format would start with a non-zero byte.

    • Limit the maximum size of unstructured annex data to 256 bytes. Including the annex tag 0x50 and the unstructured data tag 0x00, this makes for a maximum total witness element size of 258 bytes. This upper limit somewhat protects participants in a multi-party transaction that uses the annex against annex inflation.

      The constant 256 is chosen so that it provides enough space to accommodate several schnorr signatures. This is useful for implementing annex covenants with multiple presigned transactions. For a PoC of a single transaction annex covenant, see https://github.com/joostjager/annex-covenants

    Todo:

    • fix/add tests

    Related:

  2. DrahtBot commented at 11:23 am on June 21, 2023: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Reviews

    See the guideline for information on the review process. A summary of reviews will appear here.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #28246 (wallet: Use CTxDestination in CRecipient instead of just scriptPubKey by achow101)
    • #28244 (Break up script/standard.{h/cpp} by achow101)
    • #26291 (Update MANDATORY_SCRIPT_VERIFY_FLAGS by ajtowns)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  3. DrahtBot added the label TX fees and policy on Jun 21, 2023
  4. in src/policy/policy.cpp:337 in 0f67fca200 outdated
    331@@ -289,6 +332,12 @@ bool IsWitnessStandard(const CTransaction& tx, const CCoinsViewCache& mapInputs)
    332             }
    333         }
    334     }
    335+
    336+    // If any inputs commit to an annex, all inputs must commit to an annex.
    337+    if (annex_input_count > 0 && tx.vin.size() != tx.vin.size()) {
    


    Crypt-iQ commented at 1:26 pm on June 21, 2023:
    should this be if (annex_input_count > 0 && tx.vin.size() != annex_input_count)?

    joostjager commented at 1:46 pm on June 21, 2023:
    fixed
  5. joostjager force-pushed on Jun 21, 2023
  6. joostjager force-pushed on Jun 21, 2023
  7. DrahtBot added the label CI failed on Jun 21, 2023
  8. Crypt-iQ commented at 3:56 pm on June 21, 2023: contributor
  9. joostjager commented at 4:41 pm on June 21, 2023: none
    @Crypt-iQ thanks for pointing that out. I need to get used to this repository still. Will definitely fix tests, but looking for concept ack first.
  10. joostjager marked this as a draft on Jun 22, 2023
  11. luke-jr commented at 2:37 pm on June 24, 2023: member
    Use case?
  12. joostjager commented at 8:38 pm on June 25, 2023: none
    @luke-jr annex covenants, see link in PR description
  13. in src/policy/policy.cpp:230 in b4421da7c7 outdated
    225+        return false;
    226+    }
    227+
    228+    // Calculate the size of the unstructured data by subtracting
    229+    // one for the 0x00 signaling byte.
    230+    size_t unstructured_annex_size = annex_size - 1;
    


    ariard commented at 0:29 am on June 26, 2023:
    I think here we might prefer to not discount the 0x00 signaling byte from the 265 byte anti-annex-inflation-attacks limit, otherwise it sounds to me we’re introducing a coupling effect between the max witness size one can sign for and the tag signaling byte format itself. Such tag signaling byte format might acquire consensus semantic in the future, and I think we would prefer to reserve the evolvability in function of consensus changes here.

    joostjager commented at 9:46 am on June 26, 2023:

    My thought was that it might be good to choose a number of actually useable bytes that is a multiple of a commonly used data structure sizes such as 32 bytes. Therefore 256. It is an arbitrary number though that it supposed to strike a balance between the limiting the potency of annex inflation attacks and having enough space to make the annex useful.

    Are you proposing to limit the annex size including any tags to 257 bytes? I see the conceptual difference, but there wouldn’t be a functional difference for now?


    joostjager commented at 9:51 am on June 26, 2023:
    With the constant defined as suggested below, this suggestion does seem to make reading easier. Changed.

    ariard commented at 6:58 pm on July 4, 2023:

    Are you proposing to limit the annex size including any tags to 257 bytes? I see the conceptual difference, but there wouldn’t be a functional difference for now?

    Yes, for now it doesn’t make a functional difference. The issue I have in mind is the following, assuming the unstructured annex format is deployed and we see applications building on top of it, the 256 (or 257 now) annex size limit will have to become a max-size data payload, which is going to leak in the all application toolchains (in the same sense than TCP’s 64k bytes max frame is often used as the limit for application transport design).

    As of today, the signaling byte size is of 1 byte, however if in the future we have a consensus definition of the tag being more than 1 byte, we might have to swallow the 255 bytes reserved for the application data-payload. Not swallowing on the 255 bytes would mean the consensus-aware application will have to pay the witness cost for the 0x00 signaling byte and this byte will have to be discarded by parsing logic.

    So if those observations are correct, I think the constant could have some safety margin padding for changes of the “structured/unstructured guard” signaling byte size. One can think 256 byte reserved for application-data + 32 byte of padding, from which this 1st byte of padding will be the current 0x00 one ? And I think that way we can guarantee to the application a multiple of a 32-byte based data structure.


    joostjager commented at 7:25 am on July 5, 2023:

    I am not sure if I fully get what you mean. Can you give a concrete example of a future extended format that might interfere with the 0x00 + unstructured proposal? My thought was to simply reserve all non-0x00 first bytes for future use.

    When adding 32 bytes of padding, do you mean that initially (in a world with only unstructured annex data), 31 of those bytes are wasted but paid for?


    ariard commented at 0:26 am on July 24, 2023:

    Can you give a concrete example of a future extended format that might interfere with the 0x00 + unstructured proposal?

    Sure any future serialization format where the semantic “tag” is encoded on more than 8-bit to allow more than 256 types of payload without having to picked up which are the “most-used” 256 one when the consensus change. This is actually the serialization format on which the TLV proposal of the annex (with some other discussed tweaks for the length), see comment above VarIntMode in src/serialize.h.

    From my understanding, if we’re following your approach where non-0x00 first bytes are reserved for future use, and let’s say 0x01 is for TLV semantic of the annex field, it means all the annex payload with consensus enforcement will have to pay the feerate cost of the 1 byte marker (either 0x00 or 0x01).

    So I’m thinking the advantage of multi-bytes padding we reserve a quantitatively significant number of bytes for the eventual adoption of a multi-bytes serialization format. This comes at the burden of unstructured data having to pay for the cost of the reserve byte padding toda while making tomorrow potential consensus enforced annex payload cheaper, I think.

    Of course 32-byte might be considered to high a reserve byte padding to make the usage of the annex attractive for unstructured data applications. Yet the question we might have to answer is how much consensus “tag” we might need to preserve for now, let’s say in the future we have an automated consensus mechanism where people can “burn” satoshis / coins to assign an identifier to an annex tag (weird use-cases that has been proposed in the early days of Bitcoin for namespaces experiment iirc). I don’t know the answer to this “how much tag" question.

  14. in src/policy/policy.cpp:235 in b4421da7c7 outdated
    230+    size_t unstructured_annex_size = annex_size - 1;
    231+
    232+    // Unstructured annexes are nonstandard if they are larger than
    233+    // 256 bytes. This limits the potential of annex inflation
    234+    // attacks.
    235+    if (unstructured_annex_size > 256) {
    


    ariard commented at 0:51 am on June 26, 2023:

    I think we can constify the value, e.g PER_INPUT_MAX_ANNEX_SIZE, generally this is unclear if we would like a per-input annex limit and a max transaction all annexes limits, e.g MAX_ANNEX_BUDGET.

    Splitting the limit enables flow where inputs can be combined in a non-deterministic order, e.g BIP174’s combiner, and reject in function of “witness weight slots” that can be distributed by such combiner, which makes sense to coordinate flows between hardware signers.


    joostjager commented at 9:58 am on June 26, 2023:

    Added MAX_PER_INPUT_ANNEX_SIZE.

    Not sure if a per tx maximum is necessary. I’d say that opting in to annex usage has some risks to be aware of. Maybe you don’t want to use it for multi-party protocols at all because of that. To me, MAX_PER_INPUT_ANNEX_SIZE is already debatable when you have opt-in, and MAX_ANNEX_BUDGET feels even more like an unnecessary precaution. In the end, this is all policy anyway and not a real protection against tx outside of these boundaries appearing.


    ariard commented at 7:15 pm on July 4, 2023:

    Maybe you don’t want to use it for multi-party protocols at all because of that.

    I think that’s a good question, there is an annex inflation risk concerning the multi-party protocol users and there is a new CPU DoS risk that is faced by node operators due to the increased in annex data size. I think this CPU DoS risk is coming from the fact that with BIP341, there is a sha_annex in the transaction digest if the annex is present and therefore you have a new SHA-256 ops and data bytes to hash that have to get done by the nodes.

    I don’t think this CPU DoS risk, if correct, is that much an increase in DoS surface, though a MAX_ANNEX_BUDGET can be a good conservative “belt-and-suspender” DoS limits, like we have a lot in Bitcoin Core.

    In the end, this is all policy anyway and not a real protection against tx outside of these boundaries appearing.

    Yes if you’re considering the risk faced by multi-party protocol users, policy is a shallow protection for transaction appearing outside those boundaries by being directly relayed to the miners. However, if you’re considering the CPU DoS protection for node operators this is a real protection, especially in a world where you might have a lot of hobbyist nodes running on resources-constraint hosts.


    joostjager commented at 7:39 am on July 5, 2023:

    I don’t think this CPU DoS risk, if correct, is that much an increase in DoS surface, though a MAX_ANNEX_BUDGET can be a good conservative “belt-and-suspender” DoS limits, like we have a lot in Bitcoin Core.

    If it isn’t that much of an increase, is it worth adding extra code for it? More code, more things to go wrong. In the end, they are also paying for this tx in sats and the computation cost only increases linearly with the number of inputs.


    ariard commented at 0:31 am on July 24, 2023:

    If it isn’t that much of an increase, is it worth adding extra code for it?

    From my experience of DoS discussion on the Core-side, even if we’re able to come with sound evaluation of the worst-case inputs payload one might submit during a time period, the host performance are always an unknown as you have low perf Umbrel and Raspy, and somehow preserving their processing capabilities has always been pursued during Bitcoin Core development for technical decentralization purpose, from my understanding.

    There was a lot of dumb DoS vectors in the early days of Bitcoin Core, see https://en.bitcoin.it/wiki/Common_Vulnerabilities_and_Exposures. The codebase has still a reasonable number of conservative belt-and-suspender policy checks. Even if you’re paying for this DoSy malicious input in term of fees, they might inflict more damages than the attacker cost (e.g knocking-out nodes during a fork deployment).

  15. in src/policy/policy.cpp:304 in b4421da7c7 outdated
    305+                // An annex is present. Remove it from the stack and save it for
    306+                // standardness checks.
    307+                const auto& annex = SpanPopBack(stack);
    308+
    309+                // Check that the annex is standard.
    310+                if (!IsAnnexStandard(annex)) {
    


    ariard commented at 0:55 am on June 26, 2023:
    I think it makes sense to have a annexrelay option in src/init.cpp with DEFAULT_ANNEXRELAY_ENABLE=false, therefore to have node operators running nodes on performance-constrained hosts to opt-out from DoS concerns arising from the processing of the annex itself.

    joostjager commented at 9:59 am on June 26, 2023:
    Which DoS concerns are you referring to? Since #24007 hasn’t landed yet, the tx can only be accepted once?

    ariard commented at 7:17 pm on July 4, 2023:
    See the CPU DoS concerned mentioned above about BIP341’s sha_annex, if annexes start to be relayed.

    joostjager commented at 7:43 am on July 5, 2023:

    Yes, that’s clear now. Thanks.

    Same as with MAX_ANNEX_BUDGET, my feeling is that this safeguard isn’t necessary because the extra cost is only linear and paid for.

  16. in src/policy/policy.cpp:333 in b4421da7c7 outdated
    331@@ -289,6 +332,12 @@ bool IsWitnessStandard(const CTransaction& tx, const CCoinsViewCache& mapInputs)
    332             }
    333         }
    334     }
    335+
    336+    // If any inputs commit to an annex, all inputs must commit to an annex.
    337+    if (annex_input_count > 0 && annex_input_count != tx.vin.size()) {
    


    ariard commented at 1:02 am on June 26, 2023:
    Should we introduce a limit on the number of inputs supported in a transaction opted-in annex ? To make it easier to reason on worst-case scenario from a second-layer perspective in case of multi-party flow e.g dual-funding. This number of inputs could be tied to the transaction’s nVersion field.

    joostjager commented at 10:02 am on June 26, 2023:
    My answer here would be the same as in #27926 (review). I think you can argue that opt-in is enough protection and multi-party flows with annexes may not be a good idea. Especially if there are no concrete plans for multi-party protocols that require the unique properties of the annex.

    ariard commented at 7:26 pm on July 4, 2023:

    Especially if there are no concrete plans for multi-party protocols that require the unique properties of the annex.

    If you’re taking the use-case of unstructured data discussed on the mailing list, I think there is still a leak in term of fee-bumping reserves than a user must provision for, in case of the worst annex inflation attacks being done, under known limits of PER_INPUT_MAX_ANNEX_SIZE (and eventual MAX_ANNEX_BUDGET).

    Of course, as you’re raising you can always have an adversarial transaction being broadcast directly to the mining pools mempools, though in a world where miners select the block template (i.e more with Stratum V2 deployment), this is a high technical bar for an annex inflation adversary. So the “default” policy limit on the max annex size is far from second-order considerations on the level of fee-bumping reserves than one must maintained to prevent timevalue DoS.

    (I don’t deny this is quite a sophisticated discussion to encompass the impact on fee-bumping reserves, still the kind of realistic threat model we’re considering for LN today).


    joostjager commented at 7:46 am on July 5, 2023:

    If you’re taking the use-case of unstructured data discussed on the mailing list

    Which one do you refer to exactly, the timelocked vaults using presigned txes? For that there is only a single party, so the inflation attack doesn’t apply?


    ariard commented at 0:37 am on July 24, 2023:

    For the timelocked vaults, yes there is a single party using presigned txes so it shouldn’t apply. For the other use-case e.g anchoring symbolic data in the chain, I think you might have multiple parties contributing to the transaction, though I’m less knowledgeable about those use-cases to be honest.

    If there is the current annex deployment is not targeted for multi-party protocols, yes we don’t have discuss of the impact of annex DoS on fee-bumping reserve.

  17. in src/policy/policy.cpp:211 in b4421da7c7 outdated
    207@@ -208,11 +208,45 @@ bool AreInputsStandard(const CTransaction& tx, const CCoinsViewCache& mapInputs)
    208     return true;
    209 }
    210 
    211+bool IsAnnexStandard(const std::vector<unsigned char>& annex)
    


    ariard commented at 1:07 am on June 26, 2023:
    If we can move the annex policy-only code in its own src/policy/annex like we have at the image of src/policy/packages.h for code modularity. I think in the discussion of #1381 there has been question to reuse the annex in the future for Grafroot.

    joostjager commented at 10:04 am on June 26, 2023:
    Happy to do it if others agree that the current annex check isn’t too minimal for its own file.
  18. ariard commented at 1:17 am on June 26, 2023: member
    IIRC from #19645 and beyond, witnesses could be concatenate on the flight by transaction-relaying node, therefore ensuring the best feerate transaction propagates. As it’s opening it’s own wormhole of DoS issues, better to pursue the conversation on the mailing list (like doing or not sounds to impact L2/transaction-relay propagation and hardware signers, I believe).
  19. joostjager force-pushed on Jun 26, 2023
  20. joostjager force-pushed on Jun 26, 2023
  21. make unstructured annex standard e71f9399d2
  22. joostjager force-pushed on Jun 26, 2023
  23. joostjager commented at 3:36 pm on June 26, 2023: none
    Updated feature_taproot.py tests
  24. DrahtBot removed the label CI failed on Jun 26, 2023
  25. ariard commented at 7:34 pm on July 4, 2023: member

    Left a new round of reviews, I think the following conceptual concerns are pending:

    • padding of the signaling byte to preserve evolvability for future consensus validation of the annex data
    • adding a MAX_ANNEX_BUDGET or not to mitigate potential CPU DoS concerns for full-nodes, and potential impact for fee-bumping reserves
    • opted-in of annexrelay for node operators
    • annex policy limits versioned on the transaction’s nVersion field
    • “all-or-nothing” annex opted-in mechanism

    I still have to reply on the latest mail post about standardizing the annex, which I’m aiming to do so. I think at that stage the proposal would benefit from an “applications" BIP draft that would give one or two use-cases in detail, and therefore serves as technical tie-breaker for the conceptual concerns raised above.

    edited to add one concern from the mailing list

  26. joostjager commented at 7:53 am on July 5, 2023: none

    I think at that stage the proposal would benefit from an “applications" BIP draft that would give one or two use-cases in detail, and therefore serves as technical tie-breaker for the conceptual concerns raised above.

    I agree that use cases can help guide the decision making in this PR. That’s why I created the demo/explanation mentioned in the PR description: https://github.com/joostjager/annex-covenants.

  27. ariard commented at 1:02 am on July 24, 2023: member

    I agree that use cases can help guide the decision making in this PR. That’s why I created the demo/explanation mentioned in the PR description: https://github.com/joostjager/annex-covenants.

    I understand how the spend transaction can be re-built from the annex-covenant stateless tool, and on my side I can opine on the interest of using the chain as a reliable backup fro critical information enabling vaulted funds to be rebuilt, only from standard tools.

    I still have a concern on the economics of such a proposal as “archive” data like witness might be discarded by full-nodes, so if the scheme become generalized we might see this subset of validation data pruned by the majority of full-nodes, at the very least when it’s older than X. I think the incentives to store historical chain data for regular full-nodes are quite low, and one could argue a scheme could be introduced where this “unstructured data” must be re-paid for by a new spend every Y, therefore generating on-chain fees.

    I think “unstructured data” is still compelling to improve self-custody solutions like vaults, especially if we can have a future where all you need to backup is a least 12 words (and here mnemotechnic can be used if physical storage is a concern) and access to the chain data to safe-guard $$$ of satoshis value and standard tools like the PoC you built.

    Yet I’m not sure this “reliable backup” advantage for advanced custody solutions (where the locking script is not composed only of standard scripts and keys e.g taproot script-path spend) is well-understood across the ecosystem, cf Dave Harding comment on the mailing list thread “don’t use the block chain to store backup data”.

  28. joostjager commented at 2:10 pm on August 10, 2023: none
    Clearly data backed up in this way isn’t guaranteed forever and forever free. However, for specific users this might still represent the most feasible option available. At the very least chain backup can serve as a safety net in case a conventional backup that is fully under control of the user somehow gets lost or becomes inaccessible.
  29. ariard commented at 3:55 am on August 12, 2023: member
    Chain backup servicing as a safety net definitely is interesting for witness and chain of transactions state or scarce collectible, when the value of the coins locked or “digital items” is worth more than a range of block miners fees. At least in term of Bitcoin crypto-economics IMHO, the good equilibrium with full-nodes resources I don’t know.
  30. ariard commented at 3:59 am on August 12, 2023: member
    @fanquake If you can reference this PR and this mailing list thread: https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2023-June/thread.html started by the author of this PR in #28130 as I think the conversation is capturing good technical insights in practical ways to improve data carriage and archiving in the Bitcoin ecosystem. I was thinking to do it in #28130 though you locked the PR before.
  31. DrahtBot added the label Needs rebase on Aug 17, 2023
  32. DrahtBot commented at 1:50 pm on August 17, 2023: contributor

    🐙 This pull request conflicts with the target branch and needs rebase.

  33. ChrisMartl commented at 10:00 am on September 3, 2023: none

    @joostjager Could you please provide an analysis statement, which effects or how this proposal could increment the exploit exposure for the Bitcoin system due to the loose flexibility of Bitcoin’s script (predicative processing)?

    Bitcoin has this predicate issue (since genesis) and it is necessary to perform a risk analysis about this for every proposed change.

  34. DrahtBot commented at 1:32 am on December 2, 2023: contributor

    There hasn’t been much activity lately and the patch still needs rebase. What is the status here?

    • Is it still relevant? ➡️ Please solve the conflicts to make it ready for review and to ensure the CI passes.
    • Is it no longer relevant? ➡️ Please close.
    • Did the author lose interest or time to work on this? ➡️ Please close it and mark it ‘Up for grabs’ with the label, so that it can be picked up in the future.
  35. DrahtBot commented at 0:44 am on March 1, 2024: contributor

    ⌛ There hasn’t been much activity lately and the patch still needs rebase. What is the status here?

    • Is it still relevant? ➡️ Please solve the conflicts to make it ready for review and to ensure the CI passes.
    • Is it no longer relevant? ➡️ Please close.
    • Did the author lose interest or time to work on this? ➡️ Please close it and mark it ‘Up for grabs’ with the label, so that it can be picked up in the future.
  36. maflcko commented at 7:14 am on March 1, 2024: member
    Closing for now due to inactivity since August. Leave a comment below, if you are working on this again and want it reopened.
  37. maflcko closed this on Mar 1, 2024


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-11-21 09:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me