Koha Test Wiki MW Canasta on Koha Portainer

Test major Koha Wiki changes or bug fixes here without fear of breaking the production wiki.

For the current Koha Wiki, visit https://wiki.koha-community.org .

ICU Chains Library

From Koha Test Wiki MW Canasta on Koha Portainer
Jump to navigation Jump to search

ICU Chains Configuration and Customization

After ICU_chains_configuration, it may be necessary to modify words-icu.xml or phrases-icu.xml, based on the grammar of the language being searched.

See http://www.indexdata.com/zebra/doc/icuchain-files.html, http://www.indexdata.com/yaz/doc/yaz-icu.html, http://userguide.icu-project.org/transforms/general/rules and http://www.unicode.org/cldr/charts/latest/transforms/index.html

Languages

Arabic

  • Developer: Yasserkad
  • File: words-icu.xml
  • Language: Arabic
  • Locale: ar
  • Status: Complete
  <icu_chain locale="ar">
    <transliterate rule="\'>\ "/>
    <transliterate rule="[:Number:] { '-' > '' "/>
    <transform rule="[:Control:] Any-Remove"/>
    <tokenize rule="l"/>
    <transform rule="[[:WhiteSpace:][:Punctuation:]] Remove"/>
    <transform rule="NFD"/>
    <transform rule="[:Nonspacing Mark:] Remove"/>
    <transform rule="NFC"/>
    <transliterate rule="{ الا > ا "/>
    <transliterate rule="{ الأ > أ "/>
    <transliterate rule="{ الإ > إ "/>
    <transliterate rule="{ الآ > آ "/>
    <transliterate rule="{ الب > ب "/>
    <transliterate rule="{ الت > ت "/>
    <transliterate rule="{ الث > ث "/>
    <transliterate rule="{ الج > ج "/>
    <transliterate rule="{ الح > ح "/>
    <transliterate rule="{ الخ > خ "/>
    <transliterate rule="{ الد > د "/>
    <transliterate rule="{ الذ > ذ "/>
    <transliterate rule="{ الر > ر "/>
    <transliterate rule="{ الز > ز "/>
    <transliterate rule="{ الس > س "/>
    <transliterate rule="{ الش > ش "/>
    <transliterate rule="{ الص > ص "/>
    <transliterate rule="{ الض > ض "/>
    <transliterate rule="{ الط > ط "/>
    <transliterate rule="{ الظ > ظ "/>
    <transliterate rule="{ الع > ع "/>
    <transliterate rule="{ الغ > غ "/>
    <transliterate rule="{ الف > ف "/>
    <transliterate rule="{ الق > ق "/>
    <transliterate rule="{ الك > ك "/>
    <transliterate rule="{ الل > ل "/>
    <transliterate rule="{ الم > م "/>
    <transliterate rule="{ الن > ن "/>
    <transliterate rule="{ اله > ه "/>
    <transliterate rule="{ الو > و "/>
    <transliterate rule="{ الي > ي "/>
    <display/>
    <casemap rule="l"/>
  </icu_chain>

Chinese / zh_TW

    <icu_chain locale="zh_TW.UTF-8">
      <transliterate rule="\'>\ "/>
      <transliterate rule="[:Number:] { '-' > '' "/>
      <transform rule="[:Control:] Any-Remove"/>
      <tokenize rule="l"/>
      <transform rule="[[:WhiteSpace:][:Punctuation:]] Remove"/>
      <transform rule="NFD"/>
      <transform rule="[:Nonspacing Mark:] Remove"/>
      <transform rule="NFC"/>
      <display/>
      <casemap rule="l"/>
    </icu_chain>

Kurdish (کوردی)

  • Developer: D.Roshani
  • File: words-icu.xml
  • Language: Kurdish (کوردی)
  • Locale: ku
  • Status: Untested
  <icu_chain locale="ku">
    <transliterate rule="\'>\ "/>
    <transliterate rule="[:Number:] { '-' > '' "/>
    <transform rule="[:Control:] Any-Remove"/>
    <tokenize rule="l"/>
    <transform rule="[[:WhiteSpace:][:Punctuation:]] Remove"/>
    <transform rule="NFD"/>
    <transform rule="[:Nonspacing Mark:] Remove"/>
    <transform rule="NFC"/>
    <transliterate rule="{ ئ > ئـ "/>
    <transliterate rule="{ ئا > ا "/>
    <transliterate rule="{ بێ > ب "/>
    <transliterate rule="{ پێ > پ "/>
    <transliterate rule="{ تێ > ت "/>
    <transliterate rule="{ جێ > ج "/>
    <transliterate rule="{ چێ > چ "/>
    <transliterate rule="{ حێ > ح "/>
    <transliterate rule="{ خێ > خ "/>
    <transliterate rule="{ دال > د "/>
    <transliterate rule="{ رێ > ر "/>
    <transliterate rule="{ ڕێ > ڕ "/>
    <transliterate rule="{ زێ > ز "/>
    <transliterate rule="{ ژێ > ژ "/>
    <transliterate rule="{ سێ > س "/>
    <transliterate rule="{ شێ > ش "/>
    <transliterate rule="{ عین > ع "/>
    <transliterate rule="{ غین > غ "/>
    <transliterate rule="{ فێ > ف "/>
    <transliterate rule="{ ڤێ > ڤ "/>
    <transliterate rule="{ قێ > ق "/>
    <transliterate rule="{ کێ > ک "/>
    <transliterate rule="{ گێ > گ "/>
    <transliterate rule="{ لێ > ل "/>
    <transliterate rule="{ ڵێ > ڵ "/>
    <transliterate rule="{ لام > م "/>
    <transliterate rule="{ نوون > ن "/>
    <transliterate rule="{ هە > ھ "/>
    <transliterate rule="{ ئه > ە "/>
    <transliterate rule="{ ئو > و "/>
    <transliterate rule="{ ئۆ > ۆ "/>
    <transliterate rule="{ ئوو > وو "/>
    <transliterate rule="{ ئی > ی "/>
    <transliterate rule="{ ئێ > ێ "/>
    <display/>
    <casemap rule="l"/>
  </icu_chain>

Polish

  • Developer: Fsomers
  • File: words-icu.xml
  • Language: Polish
  • Locale: pl
  • Status: Incomplete
  <transliterate rule="{ ą > a "/>
  <transliterate rule="{ Ą > a "/>
  <transliterate rule="{ ć > c "/>
  <transliterate rule="{ Ć > c "/>
  <transliterate rule="{ ę > e "/>
  <transliterate rule="{ Ę > e "/>
  <transliterate rule="{ ł > l "/>
  <transliterate rule="{ Ł > l "/>
  <transliterate rule="{ ń > n "/>
  <transliterate rule="{ Ń > n "/>
  <transliterate rule="{ ó > o "/>
  <transliterate rule="{ Ó > o "/>
  <transliterate rule="{ ś > s "/>
  <transliterate rule="{ Ś > s "/>
  <transliterate rule="{ ź > z "/>
  <transliterate rule="{ Ź > z "/>
  <transliterate rule="{ ż > z "/>
  <transliterate rule="{ Ż > z "/>

I would like to see a full xml <icu_chain locale="pl"> tag here.

Swedish

  • Developer: Gaetan Boisson, Fridolin Somers
  • File: words-icu.xml
  • Language: Swedish
  • Locale: sv-SE
  • Status: Untested
  • Notes:
    <icu_chain locale="sv-SE">
      <transform rule="[^åäöÅÄÖ] NFD"/><!-- do not undiactric some characters -->
    </icu_chain>

Thai

    <icu_chain locale="th">
      <transliterate rule="\'>\ "/>
      <transliterate rule="[:Number:] { '-' > '' "/>
      <transform rule="[:Control:] Any-Remove"/>
      <tokenize rule="l"/>
      <transform rule="[[:WhiteSpace:][:Punctuation:]] Remove"/>
      <transform rule="NFD"/>
      <transform rule="[:Nonspacing Mark:] Remove"/>
      <transform rule="NFC"/>
      <display/>
      <casemap rule="l"/>
    </icu_chain>