Class TeeSinkTokenFilter
- All Implemented Interfaces:
Closeable,AutoCloseable,Unwrappable<TokenStream>
It is also useful for doing things like entity extraction or proper noun analysis as part of the analysis workflow and saving off those tokens for use in another field.
TeeSinkTokenFilter source1 = new TeeSinkTokenFilter(new WhitespaceTokenizer());
TeeSinkTokenFilter.SinkTokenStream sink1 = source1.newSinkTokenStream();
TeeSinkTokenFilter.SinkTokenStream sink2 = source1.newSinkTokenStream();
TokenStream final1 = new LowerCaseFilter(source1);
TokenStream final2 = new EntityDetect(sink1);
TokenStream final3 = new URLDetect(sink2);
d.add(new TextField("f1", final1));
d.add(new TextField("f2", final2));
d.add(new TextField("f3", final3));
In this example, sink1 and sink2 will both get tokens from source1
after whitespace tokenization, and will further do additional token filtering, e.g. detect
entities and URLs.
NOTE: it is important, that tees are consumed before sinks, therefore you should add
them to the document before the sinks. In the above example, f1 is added before the other
fields, and so by the time they are processed, it has already been consumed, which is the correct
way to index the three streams. If for some reason you cannot ensure that, you should call consumeAllTokens() before adding the sinks to document fields.
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic final classTokenStream output from a tee.Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource
AttributeSource.State -
Field Summary
Fields inherited from class org.apache.lucene.analysis.TokenFilter
inputFields inherited from class org.apache.lucene.analysis.TokenStream
DEFAULT_TOKEN_ATTRIBUTE_FACTORY -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionvoidTeeSinkTokenFilterpasses all tokens to the added sinks when itself is consumed.final voidend()booleanReturns a newTeeSinkTokenFilter.SinkTokenStreamthat receives all tokens consumed by this stream.voidreset()Methods inherited from class org.apache.lucene.analysis.TokenFilter
close, unwrapMethods inherited from class org.apache.lucene.util.AttributeSource
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, endAttributes, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, removeAllAttributes, restoreState, toString
-
Constructor Details
-
TeeSinkTokenFilter
-
-
Method Details
-
newSinkTokenStream
Returns a newTeeSinkTokenFilter.SinkTokenStreamthat receives all tokens consumed by this stream. -
consumeAllTokens
TeeSinkTokenFilterpasses all tokens to the added sinks when itself is consumed. To be sure that all tokens from the input stream are passed to the sinks, you can call this methods. This instance is exhausted after this method returns, but all sinks are instant available.- Throws:
IOException
-
incrementToken
- Specified by:
incrementTokenin classTokenStream- Throws:
IOException
-
end
- Overrides:
endin classTokenFilter- Throws:
IOException
-
reset
- Overrides:
resetin classTokenFilter- Throws:
IOException
-