Subject: Re: Can I just add ShingleFilter to my nalayzer used for indexing and searching
On 21/02/2012 14:37, Steven A Rowe wrote:
Hi Paul,

Lucene QueryParser splits on whitespace and then sends individual words one-by-one to
be analyzed. All analysis components that do their work based on more than one word,
including ShingleFilter and SynonymFilter, are borked by this. (There is a JIRA
issue open for the QueryParser
problem:<https://issues.apache.org/jira/browse/LUCENE-2605>).

There is a workaround involving PositionFilter described on the Solr
wiki:<http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PositionFilterFactory>.
Essentially, include PositionFilter after ShingleFilter in your analyzer, then wrap
queries in quotes before sending them to QueryParser.

CommonGramsFilter does the emit-only-shingles-containing-stopwords thing, but
in Lucene/Solr 3.x, it's in Solr (solr-core-3.X.jar, to be exact), not Lucene;
you can use it in your application by including the solr-core jar as a
dependency. In trunk, which will be released as Lucene/Solr 4.0,
CommonGramsFilter has been moved to the analyzers-common module.

Steve


Thanks Steve, as our user interface allows access to the full lucene query syntax I'll hold off this for now.

Paul

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@xxxxxxxxxxxxxxxxx
For additional commands, e-mail: java-user-help@xxxxxxxxxxxxxxxxx

(C)2011 mailinglist-archive.com