Class StandardQueryParser
- java.lang.Object
-
- org.apache.lucene.queryparser.flexible.core.QueryParserHelper
-
- org.apache.lucene.queryparser.flexible.standard.StandardQueryParser
-
- All Implemented Interfaces:
CommonQueryParserConfiguration
- Direct Known Subclasses:
PrecedenceQueryParser
public class StandardQueryParser extends QueryParserHelper implements CommonQueryParserConfiguration
TheStandardQueryParseris a pre-assembled query parser that supports most features of the classic Lucene query parser, allows dynamic configuration of some of its features (like multi-field expansion or wildcard query restrictions) and adds support for new query types and expressions.The
StandardSyntaxParseris an extension of theQueryParserHelperwith reasonable defaults for syntax tree parsing (StandardSyntaxParser, node processor pipeline (StandardQueryNodeProcessorPipelineand node tree toQuerybuilder (StandardQueryTreeBuilder).Typical usage, including configuration tweaks:
StandardQueryParser qpHelper = new StandardQueryParser(); StandardQueryConfigHandler config = qpHelper.getQueryConfigHandler(); config.setAllowLeadingWildcard(true); config.setAnalyzer(new WhitespaceAnalyzer()); Query query = qpHelper.parse("apache AND lucene", "defaultField");Supported query syntax
Standard query parser borrows most of its syntax from the classic query parser but adds more features and expressions on top of that syntax.
A query consists of clauses, field specifications, grouping and Boolean operators and interval functions. We will discuss them in order.
Basic clauses
A query must contain one or more clauses. A clause can be a literal term, a phrase, a wildcard expression or other expression that
The following are some examples of simple one-clause queries:
testselects documents containing the word test (term clause).
"test equipment"phrase search; selects documents containing the phrase test equipment (phrase clause).
"test failure"~4proximity search; selects documents containing the words test and failure within 4 words (positions) from each other. The provided "proximity" is technically translated into "edit distance" (maximum number of atomic word-moving operations required to transform the document's phrase into the query phrase).
tes*prefix wildcard matching; selects documents containing words starting with tes, such as: test, testing or testable.
/.est(s|ing)/documents containing words matching the provided regular expression, such as resting or nests.
nest~2fuzzy term matching; documents containing words within 2-edits distance (2 additions, removals or replacements of a letter) from nest, such as test, net or rests.
Field specifications
Most clauses can be prefixed by a field name and a colon: the clause will then apply to that field only. If the field specification is omitted, the query parser will expand the clause over all fields specified by a call to
setMultiFields(CharSequence[])or will use the default field provided in the call toparse(String, String).The following are some examples of field-prefixed clauses:
title:testdocuments containing test in the
titlefield.title:(die OR hard)documents containing die or hard in the
titlefield.
Boolean operators and grouping
You can combine clauses using Boolean AND, OR and NOT operators to form more complex expressions, for example:
test AND resultsselects documents containing both the word test and the word results.
test OR suite OR resultsselects documents with at least one of test, suite or results.
title:test AND NOT title:completeselects documents containing test and not containing complete in the
titlefield.title:test AND (pass* OR fail*)grouping; use parentheses to specify the precedence of terms in a Boolean clause. Query will match documents containing test in the
titlefield and a word starting with pass or fail in the default search fields.title:(pass fail skip)shorthand notation; documents containing at least one of pass, fail or skip in the
titlefield.title:(+test +"result unknown")shorthand notation; documents containing both pass and result unknown in the
titlefield.
Note the Boolean operators must be written in all caps, otherwise they are parsed as regular terms.
Range operators
To search for ranges of textual or numeric values, use square or curly brackets, for example:
name:[Jones TO Smith]inclusive range; selects documents whose
namefield has any value between Jones and Smith, including boundaries.score:{2.5 TO 7.3}exclusive range; selects documents whose
scorefield is between 2.5 and 7.3, excluding boundaries.score:{2.5 TO *]one-sided range; selects documents whose
scorefield is larger than 2.5.
Term boosting
Terms, quoted terms, term range expressions and grouped clauses can have a floating-point weight boost applied to them to increase their score relative to other clauses. For example:
jones^2 OR smith^0.5prioritize documents with
jonesterm over matches on thesmithterm.field:(a OR b NOT c)^2.5 OR field:dapply the boost to a sub-query.
Special character escaping
Most search terms can be put in double quotes making special-character escaping not necessary. If the search term contains the quote character (or cannot be quoted for some reason), any character can be quoted with a backslash. For example:
\:\(quoted\+term\)\:a single search term
(quoted+term):with escape sequences. An alternative quoted form would be simpler:":(quoted+term):".
Minimum-should-match constraint for Boolean disjunction groups
A minimum-should-match operator can be applied to a disjunction Boolean query (a query with only "OR"-subclauses) and forces the query to match documents with at least the provided number of these subclauses. For example:
(blue crab fish)@2matches all documents with at least two terms from the set [blue, crab, fish] (in any order).
((yellow OR blue) crab fish)@2sub-clauses of a Boolean query can themselves be complex queries; here the min-should-match selects documents that match at least two of the provided three sub-clauses.
Interval function clauses
Interval functions are a powerful tool to express search needs in terms of one or more * contiguous fragments of text and their relationship to one another. All interval clauses start with the
fn:prefix (possibly prefixed by a field specification). For example:fn:ordered(quick brown fox)matches all documents (in the default field or in multi-field expansion) with at least one ordered sequence of
quick,brownandfoxterms.title:fn:maxwidth(5 fn:atLeast(2 quick brown fox))matches all documents in the
titlefield where at least two of the three terms (quick,brownandfox) occur within five positions of each other.
-
-
Constructor Summary
Constructors Constructor Description StandardQueryParser()Constructs aStandardQueryParserobject.StandardQueryParser(Analyzer analyzer)Constructs aStandardQueryParserobject and sets anAnalyzerto it.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description booleangetAllowLeadingWildcard()AnalyzergetAnalyzer()DateTools.ResolutiongetDateResolution()Returns the defaultDateTools.Resolutionused for certain field when noDateTools.Resolutionis defined for this field.Map<CharSequence,DateTools.Resolution>getDateResolutionMap()Returns the field toDateTools.Resolutionmap used to normalize each date field.StandardQueryConfigHandler.OperatorgetDefaultOperator()Gets implicit operator setting, which will be eitherStandardQueryConfigHandler.Operator.ANDorStandardQueryConfigHandler.Operator.OR.booleangetEnablePositionIncrements()Map<String,Float>getFieldsBoost()Returns the field to boost map used to set boost for each field.floatgetFuzzyMinSim()Get the minimal similarity for fuzzy queries.intgetFuzzyPrefixLength()Get the prefix length for fuzzy queries.LocalegetLocale()Returns current locale, allowing access by subclasses.CharSequence[]getMultiFields()Returns the fields used to expand the query when the field for a certain query isnullMultiTermQuery.RewriteMethodgetMultiTermRewriteMethod()intgetPhraseSlop()Gets the default slop for phrases.Map<String,PointsConfig>getPointsConfigMap()TimeZonegetTimeZone()Queryparse(String query, String defaultField)OverridesQueryParserHelper.parse(String, String)so it casts the return object toQuery.voidsetAllowLeadingWildcard(boolean allowLeadingWildcard)Set totrueto allow leading wildcard characters.voidsetAnalyzer(Analyzer analyzer)voidsetDateResolution(DateTools.Resolution dateResolution)Sets the defaultDateTools.Resolutionused for certain field when noDateTools.Resolutionis defined for this field.voidsetDateResolutionMap(Map<CharSequence,DateTools.Resolution> dateRes)Sets theDateTools.Resolutionused for each fieldvoidsetDefaultOperator(StandardQueryConfigHandler.Operator operator)Sets the boolean operator of the QueryParser.voidsetEnablePositionIncrements(boolean enabled)Set totrueto enable position increments in result query.voidsetFieldsBoost(Map<String,Float> boosts)Sets the boost used for each field.voidsetFuzzyMinSim(float fuzzyMinSim)Set the minimum similarity for fuzzy queries.voidsetFuzzyPrefixLength(int fuzzyPrefixLength)Set the prefix length for fuzzy queries.voidsetLocale(Locale locale)Set locale used by date range parsing.voidsetMultiFields(CharSequence[] fields)Set the fields a query should be expanded to when the field isnullvoidsetMultiTermRewriteMethod(MultiTermQuery.RewriteMethod method)By default QueryParser usesMultiTermQuery.CONSTANT_SCORE_BLENDED_REWRITEwhen creating aPrefixQuery,WildcardQueryorTermRangeQuery.voidsetPhraseSlop(int defaultPhraseSlop)Sets the default slop for phrases.voidsetPointsConfigMap(Map<String,PointsConfig> pointsConfigMap)voidsetTimeZone(TimeZone timeZone)StringtoString()-
Methods inherited from class org.apache.lucene.queryparser.flexible.core.QueryParserHelper
getQueryBuilder, getQueryConfigHandler, getQueryNodeProcessor, getSyntaxParser, setQueryBuilder, setQueryConfigHandler, setQueryNodeProcessor, setSyntaxParser
-
-
-
-
Constructor Detail
-
StandardQueryParser
public StandardQueryParser()
Constructs aStandardQueryParserobject.
-
StandardQueryParser
public StandardQueryParser(Analyzer analyzer)
Constructs aStandardQueryParserobject and sets anAnalyzerto it. The same as:StandardQueryParser qp = new StandardQueryParser(); qp.getQueryConfigHandler().setAnalyzer(analyzer);
- Parameters:
analyzer- the analyzer to be used by this query parser helper
-
-
Method Detail
-
parse
public Query parse(String query, String defaultField) throws QueryNodeException
OverridesQueryParserHelper.parse(String, String)so it casts the return object toQuery. For more reference about this method, checkQueryParserHelper.parse(String, String).- Overrides:
parsein classQueryParserHelper- Parameters:
query- the query stringdefaultField- the default field used by the text parser- Returns:
- the object built from the query
- Throws:
QueryNodeException- if something wrong happens along the three phases
-
getDefaultOperator
public StandardQueryConfigHandler.Operator getDefaultOperator()
Gets implicit operator setting, which will be eitherStandardQueryConfigHandler.Operator.ANDorStandardQueryConfigHandler.Operator.OR.
-
setDefaultOperator
public void setDefaultOperator(StandardQueryConfigHandler.Operator operator)
Sets the boolean operator of the QueryParser. In default mode (StandardQueryConfigHandler.Operator.OR) terms without any modifiers are considered optional: for examplecapital of Hungaryis equal tocapital OR of OR Hungary.
InStandardQueryConfigHandler.Operator.ANDmode terms are considered to be in conjunction: the above mentioned query is parsed ascapital AND of AND Hungary
-
setAllowLeadingWildcard
public void setAllowLeadingWildcard(boolean allowLeadingWildcard)
Set totrueto allow leading wildcard characters.When set,
*or?are allowed as the first character of a PrefixQuery and WildcardQuery. Note that this can produce very slow queries on big indexes.Default: false.
- Specified by:
setAllowLeadingWildcardin interfaceCommonQueryParserConfiguration
-
setEnablePositionIncrements
public void setEnablePositionIncrements(boolean enabled)
Set totrueto enable position increments in result query.When set, result phrase and multi-phrase queries will be aware of position increments. Useful when e.g. a StopFilter increases the position increment of the token that follows an omitted token.
Default: false.
- Specified by:
setEnablePositionIncrementsin interfaceCommonQueryParserConfiguration
-
getEnablePositionIncrements
public boolean getEnablePositionIncrements()
- Specified by:
getEnablePositionIncrementsin interfaceCommonQueryParserConfiguration- See Also:
setEnablePositionIncrements(boolean)
-
setMultiTermRewriteMethod
public void setMultiTermRewriteMethod(MultiTermQuery.RewriteMethod method)
Description copied from interface:CommonQueryParserConfigurationBy default QueryParser usesMultiTermQuery.CONSTANT_SCORE_BLENDED_REWRITEwhen creating aPrefixQuery,WildcardQueryorTermRangeQuery. This implementation is generally preferable because it a) Runs faster b) Does not have the scarcity of terms unduly influence score c) avoids anyIndexSearcher.TooManyClausesexception. However, if your application really needs to use the old-fashionedBooleanQueryexpansion rewriting and the above points are not relevant then use this to change the rewrite method. As another alternative, if you prefer all terms to be rewritten as a filter up-front, you can useMultiTermQuery.CONSTANT_SCORE_REWRITE. For more information on the different rewrite methods available, seeMultiTermQuerydocumentation.- Specified by:
setMultiTermRewriteMethodin interfaceCommonQueryParserConfiguration
-
getMultiTermRewriteMethod
public MultiTermQuery.RewriteMethod getMultiTermRewriteMethod()
- Specified by:
getMultiTermRewriteMethodin interfaceCommonQueryParserConfiguration- See Also:
setMultiTermRewriteMethod(org.apache.lucene.search.MultiTermQuery.RewriteMethod)
-
setMultiFields
public void setMultiFields(CharSequence[] fields)
Set the fields a query should be expanded to when the field isnull- Parameters:
fields- the fields used to expand the query
-
getMultiFields
public CharSequence[] getMultiFields()
Returns the fields used to expand the query when the field for a certain query isnull- Returns:
- the fields used to expand the query
-
setFuzzyPrefixLength
public void setFuzzyPrefixLength(int fuzzyPrefixLength)
Set the prefix length for fuzzy queries. Default is 0.- Specified by:
setFuzzyPrefixLengthin interfaceCommonQueryParserConfiguration- Parameters:
fuzzyPrefixLength- The fuzzyPrefixLength to set.
-
setPointsConfigMap
public void setPointsConfigMap(Map<String,PointsConfig> pointsConfigMap)
-
getPointsConfigMap
public Map<String,PointsConfig> getPointsConfigMap()
-
setLocale
public void setLocale(Locale locale)
Set locale used by date range parsing.- Specified by:
setLocalein interfaceCommonQueryParserConfiguration
-
getLocale
public Locale getLocale()
Returns current locale, allowing access by subclasses.- Specified by:
getLocalein interfaceCommonQueryParserConfiguration
-
setTimeZone
public void setTimeZone(TimeZone timeZone)
- Specified by:
setTimeZonein interfaceCommonQueryParserConfiguration
-
getTimeZone
public TimeZone getTimeZone()
- Specified by:
getTimeZonein interfaceCommonQueryParserConfiguration
-
setPhraseSlop
public void setPhraseSlop(int defaultPhraseSlop)
Sets the default slop for phrases. If zero, then exact phrase matches are required. Default value is zero.- Specified by:
setPhraseSlopin interfaceCommonQueryParserConfiguration
-
setAnalyzer
public void setAnalyzer(Analyzer analyzer)
-
getAnalyzer
public Analyzer getAnalyzer()
- Specified by:
getAnalyzerin interfaceCommonQueryParserConfiguration
-
getAllowLeadingWildcard
public boolean getAllowLeadingWildcard()
- Specified by:
getAllowLeadingWildcardin interfaceCommonQueryParserConfiguration- See Also:
setAllowLeadingWildcard(boolean)
-
getFuzzyMinSim
public float getFuzzyMinSim()
Get the minimal similarity for fuzzy queries.- Specified by:
getFuzzyMinSimin interfaceCommonQueryParserConfiguration
-
getFuzzyPrefixLength
public int getFuzzyPrefixLength()
Get the prefix length for fuzzy queries.- Specified by:
getFuzzyPrefixLengthin interfaceCommonQueryParserConfiguration- Returns:
- Returns the fuzzyPrefixLength.
-
getPhraseSlop
public int getPhraseSlop()
Gets the default slop for phrases.- Specified by:
getPhraseSlopin interfaceCommonQueryParserConfiguration
-
setFuzzyMinSim
public void setFuzzyMinSim(float fuzzyMinSim)
Set the minimum similarity for fuzzy queries. Default is defined onFuzzyQuery.defaultMaxEdits.- Specified by:
setFuzzyMinSimin interfaceCommonQueryParserConfiguration
-
setFieldsBoost
public void setFieldsBoost(Map<String,Float> boosts)
Sets the boost used for each field.- Parameters:
boosts- a collection that maps a field to its boost
-
getFieldsBoost
public Map<String,Float> getFieldsBoost()
Returns the field to boost map used to set boost for each field.- Returns:
- the field to boost map
-
setDateResolution
public void setDateResolution(DateTools.Resolution dateResolution)
Sets the defaultDateTools.Resolutionused for certain field when noDateTools.Resolutionis defined for this field.- Specified by:
setDateResolutionin interfaceCommonQueryParserConfiguration- Parameters:
dateResolution- the defaultDateTools.Resolution
-
getDateResolution
public DateTools.Resolution getDateResolution()
Returns the defaultDateTools.Resolutionused for certain field when noDateTools.Resolutionis defined for this field.- Returns:
- the default
DateTools.Resolution
-
getDateResolutionMap
public Map<CharSequence,DateTools.Resolution> getDateResolutionMap()
Returns the field toDateTools.Resolutionmap used to normalize each date field.- Returns:
- the field to
DateTools.Resolutionmap
-
setDateResolutionMap
public void setDateResolutionMap(Map<CharSequence,DateTools.Resolution> dateRes)
Sets theDateTools.Resolutionused for each field- Parameters:
dateRes- a collection that maps a field to itsDateTools.Resolution
-
-