scripts: attempt to fix some translations in update-translations.py #16644

pull GChuf wants to merge 1 commits into bitcoin:master from GChuf:patch-1 changing 1 files +73 −13
  1. GChuf commented at 4:19 PM on August 18, 2019: contributor

    The script now tries to fix some of the translations which would otherwise be omitted - that is translations of strings containing format specifiers. Also added a counter for some statistics about translations fixed and languages removed due to minimum translated strings requirement. In addition, minimum required strings for translations to be accepted bumped from 10 to 100.

  2. GChuf commented at 4:21 PM on August 18, 2019: contributor

    If code style needs to be changed or anything like that, let me know. But the script does the job. See the results below:

    Screenshot_63

    p.s. just found I missed something after looking at the screenshot. Updated the code.

  3. DrahtBot added the label Scripts and tools on Aug 18, 2019
  4. in contrib/devtools/update-translations.py:182 in e81ab00396 outdated
     178 | @@ -169,30 +179,63 @@ def postprocess_translations(reduce_diff_hacks=False):
     179 |                          continue
     180 |                      errors = []
     181 |                      valid = check_format_specifiers(source, translation, errors, numerus) and not contains_bitcoin_addr(translation, errors)
     182 | -
     183 | +					
    


    emilengler commented at 7:11 PM on August 18, 2019:

    Please remove this whitespace


    GChuf commented at 8:38 PM on August 18, 2019:

    thanks, removed

  5. in contrib/devtools/update-translations.py:15 in 2069ecd48e outdated
      13 |  - post-process them into valid and committable format
      14 |    - remove invalid control characters
      15 |    - remove location tags (makes diffs less noisy)
      16 | -
      17 | +  - attempt to fix some translations
      18 | +  
    


    fanquake commented at 12:53 AM on August 19, 2019:

    Please check and fix any issues present in the linter output, such as trailing white space here.

  6. in contrib/devtools/update-translations.py:33 in 2069ecd48e outdated
      29 | @@ -29,7 +30,7 @@
      30 |  # Directory with locale files
      31 |  LOCALE_DIR = 'src/qt/locale'
      32 |  # Minimum number of messages for translation to be considered at all
      33 | -MIN_NUM_MESSAGES = 10
      34 | +MIN_NUM_MESSAGES = 100
    


    fanquake commented at 12:59 AM on August 19, 2019:

    Why 100?


    GChuf commented at 10:37 AM on August 19, 2019:

    See my comment below

  7. in contrib/devtools/update-translations.py:143 in 2069ecd48e outdated
     135 | @@ -131,6 +136,11 @@ def contains_bitcoin_addr(text, errors):
     136 |      return False
     137 |  
     138 |  def postprocess_translations(reduce_diff_hacks=False):
     139 | +    global source_f
     140 | +    global tf #translations fixed
     141 | +    global lr #languages removed
     142 | +    tf = 0
     143 | +    lr=0
    


    fanquake commented at 1:01 AM on August 19, 2019:

    Please be consistent with formatting.

  8. in contrib/devtools/update-translations.py:192 in 2069ecd48e outdated
     193 | +                    if not valid:
     194 | +                       if translation_node.text != None: # only attempt to fix if translation is not a NoneType object
     195 | +                           translation_node.text = fix_string(translation_node.text) #fix most common mistakes by replacing symbols
     196 | +                           translation_f = split_format_specifiers(find_format_specifiers(translation_node.text))
     197 | +                           if source_f == translation_f: # check if translation is acceptable after fixing it. if not, try to fix more.
     198 | +                               if translation_node.text[0] != "%" and translation_node.text.find(' %') == -1 and translation_node.text.find('(%') == -1 and translation_node.text.find('\'%') == -1 and translation_node.text.find('\"%') == -1: #if the translation seems okay, add spaces before % if needed - only if certain strings are not found and if '%' is not the first symbol in a string.
    


    fanquake commented at 1:02 AM on August 19, 2019:

    Instead of adding a very long comment at the end of an already long line of code, you could you bring the it back over above the this line and break it up over multiple lines if needed.

  9. fanquake commented at 1:11 AM on August 19, 2019: member

    Left some general comments. Please squash and push a single commit after addressing them all locally. @laanwj is probably best suited to review this, as he'll know the rationale behind the original MIN_NUM_MESSAGES values etc.

    I had been thinking this script could be moved to the maintainer tools repo, as it basically only receives random Python cleanups here.

  10. fanquake requested review from laanwj on Aug 19, 2019
  11. laanwj commented at 9:05 AM on August 19, 2019: member

    Thanks for trying to improve, however

    • I don't think MIN_NUM_MESSAGES should be increased. This threshold is there to ignore useless translations that have only three or so messages. Translating 10 often-seen messages can already be useful. This is even more true for translations that have a language code and a country for a certain dialect (as they overlay an existing translation).

    • The place to make more comprehensive fixes to messages is not here. This amounts to second-guessing translators. Please make the fixes on Transifex itself where possible, so people learn how to translate the messages.

    The counters could be useful, I suppose.

    I had been thinking this script could be moved to the maintainer tools repo, as it basically only receives random Python cleanups here.

    Agree.

  12. in contrib/devtools/update-translations.py:4 in 2069ecd48e outdated
       0 | @@ -1,17 +1,18 @@
       1 |  #!/usr/bin/env python3
       2 |  # Copyright (c) 2014 Wladimir J. van der Laan
       3 |  # Distributed under the MIT software license, see the accompanying
       4 | -# file COPYING or http://www.opensource.org/licenses/mit-license.php.
       5 | +# file COPYING or http://www.opensource.org/licenses/mit-license.php
    


    laanwj commented at 9:28 AM on August 19, 2019:

    Please do not make a random change to the license header here. Keep it the same as in every single other file.


    GChuf commented at 10:12 AM on August 19, 2019:

    Sorry for not mentioning my reasoning - working in ubuntu text editor, the link was only displayed properly (italics) when the dot was removed. But I see now other headers all contain dots at the end. Will add the dot back.

  13. GChuf commented at 10:36 AM on August 19, 2019: contributor

    @laanwj About the minimum number of translations, I agree 10 often seen messages can be useful. Whether or not these translations are the ones often seen or not (my thinking was that they're not), it would be best to keep the old requirement of 10 translations anyway.

    I didn't understand your second point - what do you mean by comprehensive fixes and second-guessing? The script only fixes strings with wrong format specifiers (but only in the strings that would otherwise be left out by the script itself!). The translation remains the same, all it does is, for example, change 1% to %1. So out of ~250 translations this script removes, it now fixes ~180 of them, without messing with the actual translations.

    However, I did fix some messages on transifex while doing this, but fixing format specifiers is a tedious job and people will continue to "translate" them poorly even after good examples are given. In arabic language, for example, where they write from right to left, there are a lot of occurrences where the format specifier is inverted. Typos also happen constantly.

    And about transifex and translations, I'd like to reference this issue here: #16591 There are a lot of duplicate translations, I believe some of them need to be removed. I'd like to help with that.

  14. laanwj commented at 12:12 PM on August 19, 2019: member

    The translation remains the same, all it does is, for example, change 1% to %1.

    I think that's a good aim, but what if someone means 1%, say, for a percentage? That's my fear what I mean with second-guessing. I'm often annoyed by tools that try to 'intelligently' guess what I'm trying to do and get it wrong which makes everything so much more complex.

    Although you alleviate this concern somewhat by only fixing messages that are detected as broken in the first place. So I'm not against this, just wary. The preferred place to fix messages is upstream, thanks for fixing a few there too :)

  15. GChuf commented at 12:58 PM on August 19, 2019: contributor

    I understand what you mean now, but yes, I think the overlap between broken translations with format specifiers and translations using percentages is very little (I never found any). In addition, it would have to be exactly 1% or 2%, not x%. Plus, anything that comes out fixed, is a translation that was not included up until now. I think the script works perfectly 99% of the time for the purpose it was built, but I understand the concerns.

    I intend to manually fix some or all of the translations in transifex which cant be fixed by this script, although I still don't understand why some translations are NoneType objects - I didnt manage to fix those strings in python.

  16. laanwj commented at 1:34 PM on August 19, 2019: member

    Yes, doing the work manually is time-consuming.

    Which gives me an idea, could we batch-fix these (after looking over them once to make sure we're not doing something stupid) through the Transifex API? That would avoid having to patch them at download time (with much less oversight).

  17. GChuf commented at 2:04 PM on August 19, 2019: contributor

    Squashed and addressed all comments. I don't know about transifex API, but I agree that it's a good idea if it's possible. Although I would argue patching them in the future with this script would still be useful - unless you were to batch fix all the translations every time, but I imagine this to be done on rare occasions.

  18. scripts: attempt to fix some translations in update-translations.py
    The script now tries to fix some of the translations which would otherwise be omitted - that is translations of strings containing format specifiers.
    Also added a counter for some statistics about translations fixed and languages removed due to minimum translated strings requirement.
    01de86f7e6
  19. DrahtBot commented at 4:39 AM on August 20, 2019: member

    <!--e57a25ab6845829454e8d69fc972939a-->

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    <!--174a7506f384e20aa4161008e828411d-->

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #16665 (scripts: move update-translations.py to maintainer-tools repo by fanquake)
    • #16539 (wallet: lower -txmaxfee default from 0.1 to 0.01 BTC by Sjors)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  20. GChuf commented at 8:33 PM on August 20, 2019: contributor

    @laanwj I cleaned up the translations for 0.18 completely. The script now reports 3 mismatches total (all from fully reviewed french language).

    I was also thinking about cleaning some unnecessary locales on transifex (like moving (it_IT) to (it) and removing (it_IT) completely, but I'd need more privileges, of course.

    I am also thinking about cleaning 0.17 translations, but only if they get included in the 0.17.2, else it's pointless (also @fanquake). In the last 2 days some 0.17 translations were updated, but only a few, manually by me. They were not run through the script. So there are some improvements not included in commit for 0.17.2, but I'd like to clean everything if it gets included.

    Please let me know what you think.

  21. fanquake commented at 1:02 AM on August 21, 2019: member

    I am also thinking about cleaning 0.17 translations, but only if they get included in the 0.17.2, else it's pointless

    If you update additional translations (on Transifex) for 0.17.2 I can pull them into #16638.

  22. laanwj referenced this in commit a7c27f0941 on Aug 24, 2019
  23. fanquake referenced this in commit db67101c74 on Aug 25, 2019
  24. fanquake commented at 3:30 AM on August 25, 2019: member

    @GChuf Thanks for your work here, however the script has been moved to the maintainer tools repo.

    Can you please re-open your pull request over there?

  25. fanquake closed this on Aug 25, 2019

  26. GChuf commented at 11:06 AM on August 25, 2019: contributor

    Yes, I'll open a PR request there

  27. sidhujag referenced this in commit 2ae55d4ae7 on Aug 25, 2019
  28. GChuf deleted the branch on Aug 26, 2019
  29. ShengguangXiao referenced this in commit cca66300e8 on Aug 28, 2020
  30. DrahtBot locked this on Dec 16, 2021

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-04-26 09:15 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me