Package org.apache.lucene.analysis.query
Class QueryAutoStopWordAnalyzer
- java.lang.Object
-
- org.apache.lucene.analysis.Analyzer
-
- org.apache.lucene.analysis.AnalyzerWrapper
-
- org.apache.lucene.analysis.query.QueryAutoStopWordAnalyzer
-
- All Implemented Interfaces:
Closeable,AutoCloseable
public final class QueryAutoStopWordAnalyzer extends AnalyzerWrapper
AnAnalyzerused primarily at query time to wrap another analyzer and provide a layer of protection which prevents very common words from being passed into queries.For very large indexes the cost of reading TermDocs for a very common word can be high. This analyzer was created after experience with a 38 million doc index which had a term in around 50% of docs and was causing TermQueries for this term to take 2 seconds.
- Since:
- 3.1
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class org.apache.lucene.analysis.Analyzer
Analyzer.ReuseStrategy, Analyzer.TokenStreamComponents
-
-
Field Summary
Fields Modifier and Type Field Description static floatdefaultMaxDocFreqPercent-
Fields inherited from class org.apache.lucene.analysis.Analyzer
GLOBAL_REUSE_STRATEGY, PER_FIELD_REUSE_STRATEGY
-
-
Constructor Summary
Constructors Constructor Description QueryAutoStopWordAnalyzer(Analyzer delegate, IndexReader indexReader)Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for all indexed fields from terms with a document frequency percentage greater thandefaultMaxDocFreqPercentQueryAutoStopWordAnalyzer(Analyzer delegate, IndexReader indexReader, float maxPercentDocs)Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for all indexed fields from terms with a document frequency percentage greater than the given maxPercentDocsQueryAutoStopWordAnalyzer(Analyzer delegate, IndexReader indexReader, int maxDocFreq)Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for all indexed fields from terms with a document frequency greater than the given maxDocFreqQueryAutoStopWordAnalyzer(Analyzer delegate, IndexReader indexReader, Collection<String> fields, float maxPercentDocs)Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for the given selection of fields from terms with a document frequency percentage greater than the given maxPercentDocsQueryAutoStopWordAnalyzer(Analyzer delegate, IndexReader indexReader, Collection<String> fields, int maxDocFreq)Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for the given selection of fields from terms with a document frequency greater than the given maxDocFreq
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description Term[]getStopWords()Provides information on which stop words have been identified for all fieldsString[]getStopWords(String fieldName)Provides information on which stop words have been identified for a fieldprotected AnalyzergetWrappedAnalyzer(String fieldName)protected Analyzer.TokenStreamComponentswrapComponents(String fieldName, Analyzer.TokenStreamComponents components)-
Methods inherited from class org.apache.lucene.analysis.AnalyzerWrapper
attributeFactory, createComponents, getOffsetGap, getPositionIncrementGap, initReader, initReaderForNormalization, normalize, wrapReader, wrapReaderForNormalization, wrapTokenStreamForNormalization
-
Methods inherited from class org.apache.lucene.analysis.Analyzer
close, getReuseStrategy, normalize, tokenStream, tokenStream
-
-
-
-
Field Detail
-
defaultMaxDocFreqPercent
public static final float defaultMaxDocFreqPercent
- See Also:
- Constant Field Values
-
-
Constructor Detail
-
QueryAutoStopWordAnalyzer
public QueryAutoStopWordAnalyzer(Analyzer delegate, IndexReader indexReader) throws IOException
Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for all indexed fields from terms with a document frequency percentage greater thandefaultMaxDocFreqPercent- Parameters:
delegate- Analyzer whose TokenStream will be filteredindexReader- IndexReader to identify the stopwords from- Throws:
IOException- Can be thrown while reading from the IndexReader
-
QueryAutoStopWordAnalyzer
public QueryAutoStopWordAnalyzer(Analyzer delegate, IndexReader indexReader, int maxDocFreq) throws IOException
Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for all indexed fields from terms with a document frequency greater than the given maxDocFreq- Parameters:
delegate- Analyzer whose TokenStream will be filteredindexReader- IndexReader to identify the stopwords frommaxDocFreq- Document frequency terms should be above in order to be stopwords- Throws:
IOException- Can be thrown while reading from the IndexReader
-
QueryAutoStopWordAnalyzer
public QueryAutoStopWordAnalyzer(Analyzer delegate, IndexReader indexReader, float maxPercentDocs) throws IOException
Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for all indexed fields from terms with a document frequency percentage greater than the given maxPercentDocs- Parameters:
delegate- Analyzer whose TokenStream will be filteredindexReader- IndexReader to identify the stopwords frommaxPercentDocs- The maximum percentage (between 0.0 and 1.0) of index documents which contain a term, after which the word is considered to be a stop word- Throws:
IOException- Can be thrown while reading from the IndexReader
-
QueryAutoStopWordAnalyzer
public QueryAutoStopWordAnalyzer(Analyzer delegate, IndexReader indexReader, Collection<String> fields, float maxPercentDocs) throws IOException
Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for the given selection of fields from terms with a document frequency percentage greater than the given maxPercentDocs- Parameters:
delegate- Analyzer whose TokenStream will be filteredindexReader- IndexReader to identify the stopwords fromfields- Selection of fields to calculate stopwords formaxPercentDocs- The maximum percentage (between 0.0 and 1.0) of index documents which contain a term, after which the word is considered to be a stop word- Throws:
IOException- Can be thrown while reading from the IndexReader
-
QueryAutoStopWordAnalyzer
public QueryAutoStopWordAnalyzer(Analyzer delegate, IndexReader indexReader, Collection<String> fields, int maxDocFreq) throws IOException
Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for the given selection of fields from terms with a document frequency greater than the given maxDocFreq- Parameters:
delegate- Analyzer whose TokenStream will be filteredindexReader- IndexReader to identify the stopwords fromfields- Selection of fields to calculate stopwords formaxDocFreq- Document frequency terms should be above in order to be stopwords- Throws:
IOException- Can be thrown while reading from the IndexReader
-
-
Method Detail
-
getWrappedAnalyzer
protected Analyzer getWrappedAnalyzer(String fieldName)
- Specified by:
getWrappedAnalyzerin classAnalyzerWrapper
-
wrapComponents
protected Analyzer.TokenStreamComponents wrapComponents(String fieldName, Analyzer.TokenStreamComponents components)
- Overrides:
wrapComponentsin classAnalyzerWrapper
-
getStopWords
public String[] getStopWords(String fieldName)
Provides information on which stop words have been identified for a field- Parameters:
fieldName- The field for which stop words identified in "addStopWords" method calls will be returned- Returns:
- the stop words identified for a field
-
getStopWords
public Term[] getStopWords()
Provides information on which stop words have been identified for all fields- Returns:
- the stop words (as terms)
-
-