Package org.apache.lucene.search.spell
Class WordBreakSpellChecker
- java.lang.Object
-
- org.apache.lucene.search.spell.WordBreakSpellChecker
-
public class WordBreakSpellChecker extends Object
A spell checker whose sole function is to offer suggestions by combining multiple terms into one word and/or breaking terms into multiple words.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classWordBreakSpellChecker.BreakSuggestionSortMethodDetermines the order to list word break suggestions
-
Field Summary
Fields Modifier and Type Field Description static TermSEPARATOR_TERMTerm that can be used to prohibit adjacent terms from being combined
-
Constructor Summary
Constructors Constructor Description WordBreakSpellChecker()Creates a new spellchecker with default configuration values
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description intgetMaxChanges()Returns the maximum number of changes to perform on the inputintgetMaxCombineWordLength()Returns the maximum length of a combined suggestionintgetMaxEvaluations()Returns the maximum number of word combinations to evaluate.intgetMinBreakWordLength()Returns the minimum size of a broken wordintgetMinSuggestionFrequency()Returns the minimum frequency a term must have to be part of a suggestion.voidsetMaxChanges(int maxChanges)The maximum numbers of changes (word breaks or combinations) to make on the original term(s).voidsetMaxCombineWordLength(int maxCombineWordLength)The maximum length of a suggestion made by combining 1 or more original terms.voidsetMaxEvaluations(int maxEvaluations)The maximum number of word combinations to evaluate.voidsetMinBreakWordLength(int minBreakWordLength)The minimum length to break words down to.voidsetMinSuggestionFrequency(int minSuggestionFrequency)The minimum frequency a term must have to be included as part of a suggestion.SuggestWord[][]suggestWordBreaks(Term term, int maxSuggestions, IndexReader ir, SuggestMode suggestMode, WordBreakSpellChecker.BreakSuggestionSortMethod sortMethod)Generate suggestions by breaking the passed-in term into multiple words.CombineSuggestion[]suggestWordCombinations(Term[] terms, int maxSuggestions, IndexReader ir, SuggestMode suggestMode)Generate suggestions by combining one or more of the passed-in terms into single words.
-
-
-
Field Detail
-
SEPARATOR_TERM
public static final Term SEPARATOR_TERM
Term that can be used to prohibit adjacent terms from being combined
-
-
Method Detail
-
suggestWordBreaks
public SuggestWord[][] suggestWordBreaks(Term term, int maxSuggestions, IndexReader ir, SuggestMode suggestMode, WordBreakSpellChecker.BreakSuggestionSortMethod sortMethod) throws IOException
Generate suggestions by breaking the passed-in term into multiple words. The scores returned are equal to the number of word breaks needed so a lower score is generally preferred over a higher score.- Parameters:
suggestMode- - default =SuggestMode.SUGGEST_WHEN_NOT_IN_INDEXsortMethod- - default =WordBreakSpellChecker.BreakSuggestionSortMethod.NUM_CHANGES_THEN_MAX_FREQUENCY- Returns:
- one or more arrays of words formed by breaking up the original term
- Throws:
IOException- If there is a low-level I/O error.
-
suggestWordCombinations
public CombineSuggestion[] suggestWordCombinations(Term[] terms, int maxSuggestions, IndexReader ir, SuggestMode suggestMode) throws IOException
Generate suggestions by combining one or more of the passed-in terms into single words. The returnedCombineSuggestioncontains both aSuggestWordand also an array detailing which passed-in terms were involved in creating this combination. The scores returned are equal to the number of word combinations needed, also one less than the length of the arrayCombineSuggestion.originalTermIndexes. Generally, a suggestion with a lower score is preferred over a higher score.To prevent two adjacent terms from being combined (for instance, if one is mandatory and the other is prohibited), separate the two terms with
SEPARATOR_TERMWhen suggestMode equals
SuggestMode.SUGGEST_WHEN_NOT_IN_INDEX, each suggestion will include at least one term not in the index.When suggestMode equals
SuggestMode.SUGGEST_MORE_POPULAR, each suggestion will have the same, or better frequency than the most-popular included term.- Returns:
- an array of words generated by combining original terms
- Throws:
IOException- If there is a low-level I/O error.
-
getMinSuggestionFrequency
public int getMinSuggestionFrequency()
Returns the minimum frequency a term must have to be part of a suggestion.- See Also:
setMinSuggestionFrequency(int)
-
getMaxCombineWordLength
public int getMaxCombineWordLength()
Returns the maximum length of a combined suggestion- See Also:
setMaxCombineWordLength(int)
-
getMinBreakWordLength
public int getMinBreakWordLength()
Returns the minimum size of a broken word- See Also:
setMinBreakWordLength(int)
-
getMaxChanges
public int getMaxChanges()
Returns the maximum number of changes to perform on the input- See Also:
setMaxChanges(int)
-
getMaxEvaluations
public int getMaxEvaluations()
Returns the maximum number of word combinations to evaluate.- See Also:
setMaxEvaluations(int)
-
setMinSuggestionFrequency
public void setMinSuggestionFrequency(int minSuggestionFrequency)
The minimum frequency a term must have to be included as part of a suggestion. Default=1 Not applicable when used withSuggestMode.SUGGEST_MORE_POPULAR- See Also:
getMinSuggestionFrequency()
-
setMaxCombineWordLength
public void setMaxCombineWordLength(int maxCombineWordLength)
The maximum length of a suggestion made by combining 1 or more original terms. Default=20- See Also:
getMaxCombineWordLength()
-
setMinBreakWordLength
public void setMinBreakWordLength(int minBreakWordLength)
The minimum length to break words down to. Default=1- See Also:
getMinBreakWordLength()
-
setMaxChanges
public void setMaxChanges(int maxChanges)
The maximum numbers of changes (word breaks or combinations) to make on the original term(s). Default=1- See Also:
getMaxChanges()
-
setMaxEvaluations
public void setMaxEvaluations(int maxEvaluations)
The maximum number of word combinations to evaluate. Default=1000. A higher value might improve result quality. A lower value might improve performance.- See Also:
getMaxEvaluations()
-
-