Solr: Interactions between the defaultOperator, q.op, and mm parameters
The Solr parameters defaultOperator
, q.op
, and mm
are used to
configure how optional clauses should be handled in search queries. The
interactions between these parameters have changed over time in Solr,
resulting in unexpected search results. Users have documented issues in
both blog posts and multi-year Jira issues. Further complications may
arise through legacy configuration tweaks that were once appropriate in
earlier releases but are now causing unexpected behaviors. I intend to
detail the relevant history of these parameters and their associate
functionality, so that users may better configure Solr to suit their
search requirements.
All historical Solr functionality assumptions in this document will use the 1.1.0 as the baseline release. For example, it is assumed that the standard and DisMax query parsers have always been available because they were both available in the 1.1.0 release, even though it’s more likely that the query parsers were not created simultaneously.
Relevant Solr History
Solr was started by Yonik Seeley as a close-sourced project in 2004. In 2006, the project was transitioned to the Apache Incubator, at which point it was open-sourced. The first Apache release was 1.1.0 on December 22, 2006. Solr releases were semantically versioned through 1.4.1, released June 25, 2010. Solr then merged with the Lucene project, and subsequent versioning followed the Lucene versioning. The first release to use Lucene’s versioning was 3.1.0, released March 31, 2011. As of this document’s creation, the latest Solr release is 7.3.0, released April 4, 2018.
Solr has closely tracked changes in their release notes and
has included the relevant issue keys when appropriate. These issue keys
are numerical and prefixed with SOLR-
; e.g., SOLR-1234
. The Solr
community currently uses Jira for bug and issue tracking.
Where appropriate, historical references in the document will link to
the relevant issue link.
As of today, Solr supports three main query parsers:
- Standard, also known as the “Lucene” parser
- DisMax
- Extended DisMax (eDisMax)
The Standard and DisMax query parsers have always been available in Solr. The Standard query parser is oriented toward Lucene query syntax (wildcards, ranges, boolean operators, etc.). The DisMax query parser is oriented toward simple phrases, similar to a Google search. The Solr documentation describes DisMax syntax as such:
The DisMax query parser supports an extremely simplified subset of the Lucene QueryParser syntax. As in Lucene, quotes can be used to group phrases, and +/- can be used to denote mandatory and optional clauses. All other Lucene query parser special characters (except AND and OR) are escaped to simplify the user experience. The DisMax query parser takes responsibility for building a good query from the user’s input using Boolean clauses containing DisMax queries across fields and boosts specified by the user.
The eDisMax query parser, an improved version of the DisMax query parser, was added in 3.1.0. eDisMax supports all functionality available in DisMax as well as the a greater subset of the Lucene query syntax. It is safe to think of eDisMax as a combination of the Standard and DisMax parsers. Unless otherwise noted, all further references DisMax references will be interchangeable with eDisMax.
defaultOperator and q.op
The defaultOperator
parameter sets the query parser’s default
operator. It was defined in the schema.xml
file as an attribute in the
solrQueryParser
element. The valid options are AND
or OR
. The
default value is OR
.
The q.op
parameter implements the defaultOperator
functionality as a client-facing query parameter. If q.op
is
specified, it will override the defaultOperator
value.
defaultOperator
and q.op
were originally intended for use with the
Standard query parser. They have been both available since the
initial Solr release. defaultOperator
was deprecated in
3.6.0 in favor of using the q.op
parameter, and was completely
removed in 7.0.0. q.op
continues to be supported.
mm
The mm
parameter is a DisMax parameters that makes it possible to
require a certain minimum number of optional clauses to match. These clauses
are further explained in the mm
documentation:
When processing queries, Lucene/Solr recognizes three types of clauses: mandatory, prohibited, and “optional” (also known as “should” clauses). By default, all words or phrases specified in the q parameter are treated as “optional” clauses unless they are preceded by a “+” or a “-“.
The syntax for mm
may be expressed as number, a percentage, or a
combination of a number and percentage. The default value is 100%
,
meaning that all clauses must match.
mm
only applies to top-level clauses; sub-groupings of clauses
through parentheses are not governed by mm
. For example, the query
A B (C D)
with a mm
of 100%
will only require three terms, with
both C
and D
matching for the third term. This applies to DisMax and
eDisMax. This functionality is not included in the Solr documentation; it
was found as a comment in an SOLR-2649.
There are no references to the initial implementation of the mm
parameter in the Solr release notes, and is therefore assumed that mm
has always been part of the DisMax query parser.
Interactions between the parameters
DisMax mm configurable by defaultOperator and q.op
Initially, the defaultOperator
and q.op
parameters were ignored by
DisMax query parsers; DisMax would only use the mm
parameter when
determining optional clause behavior. In release 4.0.0, this behavior
was changed so that defaultOperator
and q.op
could
influence optional clauses:
The default logic for the ‘mm’ param of the ‘dismax’ QParser has been changed. If no ‘mm’ param is specified (either in the query, or as a default in solrconfig.xml) then the effective value of the ‘q.op’ param (either in the query or as a default in solrconfig.xml or from the ‘defaultOperator’ option in schema.xml) is used to influence the behavior. If q.op is effectively “AND” then mm=100%. If q.op is effectively “OR” then mm=0%. Users who wish to force the legacy behavior should set a default value for the ‘mm’ param in their solrconfig.xml file.
SOLR-1889 justifies the change as a logical default behavior,
especially considering the atypical location of defaultOperator
in the
schema.xml
file, implying a global behavior and not a query parser
specific one. However, defaultOperator
was deprecated by SOLR-2724
in 3.6.0 before the DisMax behavior was changed. If the issue key
numbering are sequential, the defaultOperator
deprecation issue
(SOLR-2724) was opened and closed while the DisMax behavior change
(SOLR-1889) was active. This may explain why both the
global-oriented DisMax behavior and the parsing-oriented deprecation
were implemented.
eDisMax boolean operators causes mm to be ignored
The original implementation of eDisMax would set mm
to 100%
if the
query used any boolean operator except AND
(+
, -
, OR
, and
NOT
); DisMax was not affected by this behavior. However, mm=100%
could be inappropriate for the query; consider the follow eDisMax
queries in a cultural heritage Solr index:
Type | Terms | q.op /mm |
Row Count | Notes |
---|---|---|---|---|
Lucene | Ancient Art Sculpture |
OR |
2,612,261 | |
Lucene | Ancient Art Sculpture Marble |
OR |
2,621,552 | Additional optional terms increases row count |
Lucene | Ancient Art Sculpture Marble -Greek |
OR |
2,613,535 | Negated term reduces row count |
Lucene | Ancient Art Sculpture |
AND |
3,324 | |
Lucene | Ancient Art Sculpture Marble |
AND |
142 | Additional required terms reduce row count |
Lucene | Ancient Art Sculpture Marble -Greek |
AND |
93 | Negated term reduce row count |
eDisMax | Ancient Art Sculpture |
50% |
2,612,261 | mm is evaluated to 1 required term; matches Lucene q.op=OR query |
eDisMax | Ancient Art Sculpture Marble |
50% |
296,640 | Additional term increases minimum match from 1 to 2, reduces row count appropriately |
eDisMax | Ancient Art Sculpture Marble -Greek |
50% |
2,613,535 | Unexpected results Negated term sets mm to 100% ; matches Lucene q.op=OR query |
This led to the multi-year issue SOLR-2649, which started on July 12, 2011. A patch was include in Solr 5.5.0 that significantly changed eDisMax handling, and the issue was marked resolved on December 15,
- The issues SOLR-8812 and SOLR-9174 were opened to address
unset
mm
parameters, and the subsequent patch was included in Solr 5.5.3. These changes are summarized as follows:
Original Behavior | The default operator (q.op ) is hardcoded OR |
New Behavior | q.op and defaultOperator parameters affected how boolean operators are evaluated |
Original Behavior | The mm parameter is ignored if any boolean operators except AND are present |
New Behavior | If the mm parameter value is set, it is always used, regardless of the presence of boolean operators. If the mm parameter is not set and the query has boolean operators, a default mm value of 0% is used. |
Jason Hellman also presents a thorough summary with examples in his article Edismax Queries in a post-Solr 5.5 World: The AND, the OR, and the Ugly.
Practical Considerations
- The
mm
parameter will affect all eDisMax queries.mm
can be considered as having been moved from a ‘back-end’ parameter to a ‘front-end’ one, i.e., a defaultmm
value that had been set insolrconfig.xml
may now be impractical. - Boolean operators in eDisMax queries are now subject to
mm
values, which may significantly affected queries. The queryA OR B
parses two optional terms in bothq.op=OR
andq.op=AND
, but amm=100%
that was previously ignored will now require both terms to be present. A thorough understanding of how Solr parses boolean operators in queries is required; read Chris “Hoss” Hostetter’s Boolean Operators for Solr Users for an excellent overview. - A
defaultOperator
orq.op
parameter set toAND
may significantly affect eDisMax queries, especially with boolean operators. The queryA OR B C
withq.op=AND
will markA
andB
as optional andC
as required, but amm=100%
value will require all options. Asmm
only applies to top-level clauses,(A OR B) C
will reproduce the original behavior. Chris “Hoss” Hostetter, an active Solr contributor, advises users not to change the default operator in a January 3, 2012 comment from the blog post Boolean Operators for Solr Users. - Solr instances that have upgraded through multiple versions should be
checked for the deprecated
defaultOperator
parameter.