(integer, default 1) If 1, strips the possessive 's from each subword. (optional) The pathname of a file that contains a list of protected words that should be passed through without splitting. (integer, default 0) If non-zero, the original token is preserved: "Zap-Master-9000" → "Zap-Master-9000", "Zap", "Master", "9000" protected (0/1, default 0) If non-zero, runs of word and number parts will be joined: "Zap-Master-9000" → "ZapMaster9000" preserveOriginal (integer, default 0) If non-zero, maximal runs of number parts will be joined: 1947-32" → "194732" catenateAll (integer, default 0) If non-zero, maximal runs of word parts will be joined: "hot-spot-sensor’s" → "hotspotsensor" catenateNumbers (integer, default 1) If 0, don’t split words on transitions from alpha to numeric:"FemBot3000" → "Fem", "Bot3000" catenateWords Example 1 below illustrates the default (non-zero) splitting behavior. (integer, default 1) If 0, words are not split on camel-case changes:"BugBlaster-XL" → "BugBlaster", "XL". (integer, default 1) If non-zero, splits numeric strings at delimiters:"1947-32" →*"1947", "32" splitOnCaseChange For example:"CamelCase", "hot-spot" → "Camel", "Case", "hot", "spot" generateNumberParts (integer, default 1) If non-zero, splits words at delimiters. Arguments with the name prefix tokenizerFactory.* will be supplied as init params to the specified tokenizer factory. (optional default: WhitespaceTokenizerFactory) The name of the tokenizer factory to use when parsing the synonyms file. The short names solr (for SolrSynonymParser) and wordnet (for WordnetSynonymParser ) are supported, or you may alternatively supply the name of your own SynonymMap.Builder subclass. (optional default: solr) Controls how the synonyms will be parsed. If false, all equivalent synonyms will be reduced to the first in the list. (optional default: true) If true, a synonym will be expanded to all equivalent synonyms. (optional default: false) If true, synonyms will be matched case-insensitively. The original token will not be included unless it is also in the list on the right. If the token matches any word on the left, then the list on the right is substituted. Two comma-separated lists of words with the symbol "⇒" between them. If the token matches any of the words, then all the words in the list are substituted, which will include the original token.
0 Comments
Leave a Reply. |