Koha Test Wiki MW Canasta on Koha Portainer
Test major Koha Wiki changes or bug fixes here without fear of breaking the production wiki.
For the current Koha Wiki, visit https://wiki.koha-community.org .ICU Chains Library
Jump to navigation
Jump to search
ICU Chains Configuration and Customization
After ICU_chains_configuration, it may be necessary to modify words-icu.xml or phrases-icu.xml, based on the grammar of the language being searched.
See http://www.indexdata.com/zebra/doc/icuchain-files.html, http://www.indexdata.com/yaz/doc/yaz-icu.html, http://userguide.icu-project.org/transforms/general/rules and http://www.unicode.org/cldr/charts/latest/transforms/index.html
Languages
Arabic
- Developer: Yasserkad
- File: words-icu.xml
- Language: Arabic
- Locale: ar
- Status: Complete
<icu_chain locale="ar">
<transliterate rule="\'>\ "/>
<transliterate rule="[:Number:] { '-' > '' "/>
<transform rule="[:Control:] Any-Remove"/>
<tokenize rule="l"/>
<transform rule="[[:WhiteSpace:][:Punctuation:]] Remove"/>
<transform rule="NFD"/>
<transform rule="[:Nonspacing Mark:] Remove"/>
<transform rule="NFC"/>
<transliterate rule="{ الا > ا "/>
<transliterate rule="{ الأ > أ "/>
<transliterate rule="{ الإ > إ "/>
<transliterate rule="{ الآ > آ "/>
<transliterate rule="{ الب > ب "/>
<transliterate rule="{ الت > ت "/>
<transliterate rule="{ الث > ث "/>
<transliterate rule="{ الج > ج "/>
<transliterate rule="{ الح > ح "/>
<transliterate rule="{ الخ > خ "/>
<transliterate rule="{ الد > د "/>
<transliterate rule="{ الذ > ذ "/>
<transliterate rule="{ الر > ر "/>
<transliterate rule="{ الز > ز "/>
<transliterate rule="{ الس > س "/>
<transliterate rule="{ الش > ش "/>
<transliterate rule="{ الص > ص "/>
<transliterate rule="{ الض > ض "/>
<transliterate rule="{ الط > ط "/>
<transliterate rule="{ الظ > ظ "/>
<transliterate rule="{ الع > ع "/>
<transliterate rule="{ الغ > غ "/>
<transliterate rule="{ الف > ف "/>
<transliterate rule="{ الق > ق "/>
<transliterate rule="{ الك > ك "/>
<transliterate rule="{ الل > ل "/>
<transliterate rule="{ الم > م "/>
<transliterate rule="{ الن > ن "/>
<transliterate rule="{ اله > ه "/>
<transliterate rule="{ الو > و "/>
<transliterate rule="{ الي > ي "/>
<display/>
<casemap rule="l"/>
</icu_chain>
Chinese / zh_TW
- Developer: Thomas -- https://groups.google.com/forum/#!msg/kohataiwan/BlGak5iVvgE/u3-37wepdmYJ
- File: words-icu.xml
- Language: Chinese / zh_TW
- Locale: zh_TW
- Status: Untested
<icu_chain locale="zh_TW.UTF-8">
<transliterate rule="\'>\ "/>
<transliterate rule="[:Number:] { '-' > '' "/>
<transform rule="[:Control:] Any-Remove"/>
<tokenize rule="l"/>
<transform rule="[[:WhiteSpace:][:Punctuation:]] Remove"/>
<transform rule="NFD"/>
<transform rule="[:Nonspacing Mark:] Remove"/>
<transform rule="NFC"/>
<display/>
<casemap rule="l"/>
</icu_chain>
Kurdish (کوردی)
- Developer: D.Roshani
- File: words-icu.xml
- Language: Kurdish (کوردی)
- Locale: ku
- Status: Untested
<icu_chain locale="ku">
<transliterate rule="\'>\ "/>
<transliterate rule="[:Number:] { '-' > '' "/>
<transform rule="[:Control:] Any-Remove"/>
<tokenize rule="l"/>
<transform rule="[[:WhiteSpace:][:Punctuation:]] Remove"/>
<transform rule="NFD"/>
<transform rule="[:Nonspacing Mark:] Remove"/>
<transform rule="NFC"/>
<transliterate rule="{ ئ > ئـ "/>
<transliterate rule="{ ئا > ا "/>
<transliterate rule="{ بێ > ب "/>
<transliterate rule="{ پێ > پ "/>
<transliterate rule="{ تێ > ت "/>
<transliterate rule="{ جێ > ج "/>
<transliterate rule="{ چێ > چ "/>
<transliterate rule="{ حێ > ح "/>
<transliterate rule="{ خێ > خ "/>
<transliterate rule="{ دال > د "/>
<transliterate rule="{ رێ > ر "/>
<transliterate rule="{ ڕێ > ڕ "/>
<transliterate rule="{ زێ > ز "/>
<transliterate rule="{ ژێ > ژ "/>
<transliterate rule="{ سێ > س "/>
<transliterate rule="{ شێ > ش "/>
<transliterate rule="{ عین > ع "/>
<transliterate rule="{ غین > غ "/>
<transliterate rule="{ فێ > ف "/>
<transliterate rule="{ ڤێ > ڤ "/>
<transliterate rule="{ قێ > ق "/>
<transliterate rule="{ کێ > ک "/>
<transliterate rule="{ گێ > گ "/>
<transliterate rule="{ لێ > ل "/>
<transliterate rule="{ ڵێ > ڵ "/>
<transliterate rule="{ لام > م "/>
<transliterate rule="{ نوون > ن "/>
<transliterate rule="{ هە > ھ "/>
<transliterate rule="{ ئه > ە "/>
<transliterate rule="{ ئو > و "/>
<transliterate rule="{ ئۆ > ۆ "/>
<transliterate rule="{ ئوو > وو "/>
<transliterate rule="{ ئی > ی "/>
<transliterate rule="{ ئێ > ێ "/>
<display/>
<casemap rule="l"/>
</icu_chain>
Polish
- Developer: Fsomers
- File: words-icu.xml
- Language: Polish
- Locale: pl
- Status: Incomplete
<transliterate rule="{ ą > a "/>
<transliterate rule="{ Ą > a "/>
<transliterate rule="{ ć > c "/>
<transliterate rule="{ Ć > c "/>
<transliterate rule="{ ę > e "/>
<transliterate rule="{ Ę > e "/>
<transliterate rule="{ ł > l "/>
<transliterate rule="{ Ł > l "/>
<transliterate rule="{ ń > n "/>
<transliterate rule="{ Ń > n "/>
<transliterate rule="{ ó > o "/>
<transliterate rule="{ Ó > o "/>
<transliterate rule="{ ś > s "/>
<transliterate rule="{ Ś > s "/>
<transliterate rule="{ ź > z "/>
<transliterate rule="{ Ź > z "/>
<transliterate rule="{ ż > z "/>
<transliterate rule="{ Ż > z "/>
I would like to see a full xml <icu_chain locale="pl"> tag here.
Swedish
- Developer: Gaetan Boisson, Fridolin Somers
- File: words-icu.xml
- Language: Swedish
- Locale: sv-SE
- Status: Untested
- Notes:
<icu_chain locale="sv-SE">
<transform rule="[^åäöÅÄÖ] NFD"/><!-- do not undiactric some characters -->
</icu_chain>
Thai
- Developer: Ajahn Ratanawanno -- http://lists.indexdata.dk/pipermail/zebralist/2015-August/002630.html
- File: words-icu.xml
- Language: Thai
- Locale: th
- Status: Untested
- Notes: According to the Koha mailing list thread, "The result in searching in Thai seems to be much better but still not really satisfy me if I compare to http://koha.library.tu.ac.th/, maybe I don't have search by keyword enable like their library.", so perhaps this needs some work. Perhaps the administrators at http://koha.library.tu.ac.th/ could share their words-icu.xml?
<icu_chain locale="th">
<transliterate rule="\'>\ "/>
<transliterate rule="[:Number:] { '-' > '' "/>
<transform rule="[:Control:] Any-Remove"/>
<tokenize rule="l"/>
<transform rule="[[:WhiteSpace:][:Punctuation:]] Remove"/>
<transform rule="NFD"/>
<transform rule="[:Nonspacing Mark:] Remove"/>
<transform rule="NFC"/>
<display/>
<casemap rule="l"/>
</icu_chain>