When you are responsible for a search engine, it goes without saying that you should be well informed about as many details pertaining to search relevance as possible. While most people don’t need to learn every single piece of information there is to know about a search; you should at least have a basic understanding of recall and precision. This article will focus on precision and recall with regard to search relevance.
What is Relevance?
• Are you able to find all the documents you are looking for?
• How many irrelevant documents are returned?
• How well are the documents ranked?
Precision vs. Recall
Precision and recall are two fundamental measures of search relevance. Given a particular query and the set of documents returned by the search engine (the result set), these measures are defined as follows:
- Precision is the percentage of documents in the result set that are relevant.
- Recall is the percentage of relevant documents that are returned in the result set.
The definition of precision is a number of documents retrieved that are relevant, divided by the number of documents retrieved in total. Recall means the number of documents retrieved that are relevant, divided by the number of total relevant documents.
The goal with Elasticsearch is to reach optimum recall, which means that when a search is performed, only (and all of) relevant documents are retrieved. It also is necessary to retrieve as many documents as possible that could be relevant, which means that you usually need to use the simplest filters and queries in order to optimize recall.
Recall
A recall is usually not a problem when it comes to search relevance, as most search algorithms tend to err on the side of this ability. They also tend to use relevance scores in order to give the impression of precision.
Perfect recall, however, can be achieved by using the “match all” query type provided by Elasticsearch. When you combine this query type with pre and post-search filters or other query functions that are able to help sift and rank documents based on another query or logic that is defined, you can easily achieve perfect recall.
If you want to query multiple fields at one time, you can use Dis Max in the multi-match query that is mentioned above. It can also be used directly, however, and this allows for the usage of analyzers for analyzed fields.
There is another relevant term here that you should understand: “fuzziness.” Fuzziness allows you to indicate the number of edits that can be performed on a term that you would still consider as a match. For example, if you use the auto setting for fuzziness, terms that contain 3-5 characters could have one of the characters edited, and this would still be considered a match.
This is an incredibly useful feature that will drastically improve your search relevancy as it is almost certain that not every document you consider to be relevant will actually be a perfect match for your term. This approach is useful with you have designed your queries with a bucketed approach. See Below…
There is, none-the-less, a downside to fuzziness, which can be a big hit to your precision in ElasticSearch. While fuzziness is required to catch all relevant documents, you will also catch a lot of documents that aren’t relevant to your search. For example, if your term is “snake” and you use fuzziness, you could get documents back that features the word “shake,” which is most likely not congruent with what your search was intended for. In order to understand why this is, it is important to understand what the Elastic search engine is doing with fuzziness is selected. When fuzziness is selected, elastic will expand the number of possible matching tokens based on the Levenshtein distance algorithm, and this can cause many additional matches from the original search team. This may be ok if this is what you intended.
Precision
When you orient your strategy around precision, the goal is to make sure that the documents which are retrieved are as relevant to your search as they can be. You want to apply as many rules as you can in order to fine-tune your queries in an effort to bring back only the results you consider relevant.
Filters are extremely useful in this case when trying to aid precision by coupling them with queries. Your filters can actually act on their own as a basic yes/no to retrieve results from the index, but, when coupled with your queries, the results are generally much more efficient and optimal. Boolean queries need to have defined “should not” “must not” “must” and “should” matches.
The relevancy score here is impacted by the number of conditionals. The scoring for a boolean search can also be affected by how your ElasticSearch query is structured, with the hierarchy being applied to the conditionals. Boolean queries usually demand some type of advanced search user interface in order for your users to construct their query rules.
In order to make sure that the result sets are as relevant and precise as possible, minimum thresholds should be applied that will exclude any documents that do not meet the standard that has been specified.
Configuring an ElasticSearch can seem complex, but when you know what to look for and which tools to use, as well as how to use them, it becomes much simpler. Your goal is both relevance and precision, and there is a multitude of tools and features you can use to achieve both.
Filters, fuzziness, dis max, match all, and boolean queries can all be used to help maximize your search precision and relevance. With a little practice, your users will soon be able to optimize their relevance and precision levels and retrieve every relevant document they need, every time.
Tips for Improving Precision and Recall
In the case of search Recall it can be improved by “widening the net“:
- Using should clauses instead of musts or using ORs instead of ANDs
- Adding fuzziness, stemming or synonyms but the only drawback is it makes precision worse
In the case of search Precision, it can be improved by making searches “more exact”
- Using must clauses instead of shoulds
- Using phrase queries instead of ANDs or ORs but the only drawback is it makes recall worse
Search relevancy tuning is needed to find the right balance for each case.
#Elasticsearch, #Kibana, #Weblink, #WeblinkTechnology, #X-Pack, #Search
WOULD YOU LIKE TO LEARN MORE?
If you’re interested in more, be sure to send us an email at [email protected]! And get in touch with Weblink if you have tough Elasticsearch problems you need help with!
Elasticsearch Consulting and Implementation Services
Weblink Technologies a leader in Elasticsearch products, to provide a solution based solely on Elastic-search. As an Elastic partner and reseller, we have worked with hundreds of customers across the globe to provide expert consulting and implementation for Elasticsearch, Logstash, Kibana (ELK), and Beats. Whether you are using Elasticsearch for a web-facing application, your corporate intranet, or a search-powered big data analytics platform, our Elasticsearch experts bring end-to-end services that support your search and analytics infrastructure, enabling you to maximize ROI.
- Elasticsearch consulting and strategy planning
- Search application assessment
- Elasticsearch, Logstash, Kibana, and Beats (Elastic Stack) implementation
- Search relevancy review and improvement
- Full support and managed services: (OnSite and Remote)
Contact us at [email protected] to learn more about how we can help you leverage Elastic products for high-performing, easy-to-maintain, and scalable search and analytics solutions.