translations: 28.x update pulled in random strings? #30897

issue fanquake openend this issue on September 13, 2024
  1. fanquake commented at 1:31 pm on September 13, 2024: member

    Translations were updated for 28.x in #30715.

    Looks like that update pulled in things that are not translations. i.e:

    https://github.com/bitcoin/bitcoin/blob/e43ce250c6fb65cc0c8903296c8ab228539d2204/src/qt/locale/bitcoin_gl_ES.ts#L6

    ?

  2. fanquake added this to the milestone 28.0 on Sep 13, 2024
  3. hebasto commented at 1:39 pm on September 13, 2024: member

    Transations were updated for 28.x in #30715.

    Looks like that update pulled in things that are not translations. i.e:

    https://github.com/bitcoin/bitcoin/blob/e43ce250c6fb65cc0c8903296c8ab228539d2204/src/qt/locale/bitcoin_gl_ES.ts#L6

    ?

    This is a poor / malicious translation.

    https://app.transifex.com/bitcoin/bitcoin/translate/#gl_ES/qt-translation-028x/508593963: image

  4. maflcko commented at 1:41 pm on September 13, 2024: member

    Aren’t LLMs capable of translation? With all the hype around them I wonder if a script can be written to check that each translation pair is a valid translation. With 4o-mini the cost should also be trivial.

    (Edit: To clarify, I don’t mean that translation should be done by the LLM, just that the validity check yes/no could be considered to be done by one, as an additional check)

  5. pablomartin4btc commented at 4:14 pm on September 13, 2024: member

    In the meantime, is it too ugly to add a regex into update-translations.py? (or a function like the existent ones that already parse the strings and validate their format)

    0# regex patterns for malicious content and symbols (just an example)
    1MALICIOUS_PATTERN = re.compile(r'[\x00-\x1F\x7F-\x9F<>&\'";`\\\xFFFD]|'
    2                               r'(\.\./|\.\.\\|\%2e%2e/|\%2e%2e\\|'
    3                               r'\$HOME/|\%USERPROFILE\%|\%APPDATA\%|\$USER|\$PATH|'
    4                               r';|&&|\|\||\||&|\\|>|--)', re.UNICODE | re.IGNORECASE)
    

    It won’t detect random non-sense translations but at least it’s a step forward while we find the 4o-mini alternative.

  6. fanquake closed this on Sep 16, 2024

  7. fanquake referenced this in commit 37679b856c on Sep 16, 2024

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-12-22 03:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me