All notable changes to this project will be documented in this file. This project adheres to Semantic Versioning. It follows some conventions.
- fixed analysis of ewts string starting with space returning no token
- added PaBaFilter and use it in the constructor
- remove bo from the list of stop words to avoid side effects with the PaBaFilter
- fixed fetching of jar file
- fixed offsets of ewts conversion
- fixed resource inclusion
- possibility to index DTS and ALALC encodings
- fixed offsets of ewts conversion
- fixed offsets of ewts conversion
- Tibetan lexicon now included in the .jar file
- maven option
-DincludeDeps=true
to includeewts-converter
- possibility to specify a stop word list file in the constructor
fromEwts
replaced byinputMode
in constructor, see README.md, allowing DTS and ALA-LC Transliteration schemas
- fixed constructor
- possibility to index EWTS text
- Maven packaging
- TibAffixedFilter: handle affixed འིའོ, འམ and འང
- TibAffixedFilter: when removing an affixed particle, keep the suffix འ if it was in the original syllable (ex: དགའི -> དགའ)
- all: adaptation to Lucene 6.4.1
- TibSyllableTokenizer: consider characters in range
Ux0F84
-Ux0F8F
to be part of the syllable