Improving the ‘Relevance’ of Website Search Queries using ElasticSearch

Relevance is essential when you want to serve relatable data to your users which meets their requirements. It’s the key functionality of any business. A good marketing strategy to engage your audience is to satisfy the customers needs by giving them what they are searching for.

Having bad search can result in a noticeable decrease in the traffic on your platform. This is because users have been trained to expect results similar to Google searches which is not easy to acquire but with Elasticsearch, you can achieve it by tuning your search results with multiple factors.

Scoring in Elasticsearch

Elastic search uses a scoring system to filter and rank the query results that get displayed to the users. Scoring is done on the basis of field matches with the search query and other applied configuration which we will talk about later in this article. Elasticsearch calculates scores of the documents and uses this as a relevance factor to sort the documents. Higher the score implies higher the relevance of the document. Each clause in the query contributes to the score of the document.

The Practical Scoring Function

Since Elasticsearch is built over the Lucene library, so for calculating the relevance score elasticsearch uses Lucene’s Practical Scoring Function. This takes the boolean model, term frequency (TF), inverse document frequency (IDF) and vector space model for multi-term queries and combines them to assemble the matching documents and scores them as it goes.

For example, a multi-term query be like

GET /my_index/doc/_search
{
"query": {
"match": {
"text": "quick fox"
          }
      }
}

is rewritten internally to look like this:

GET /my_index/doc/_search
{
  "query": {
    "bool": {
      "should": [
        {"term": { "text": "quick" }},
        {"term": { "text": "fox"   }}
      ]
    }
  }
}

When a document matches the query, Lucene calculates the score by combining the score of each matching term. This scoring calculation is done by the practical scoring function.

score(q,d)  =  
           queryNorm(q)  
          · coord(q,d)    
          · ∑ (           
                tf(t in d)   
              · idf(t)²      
              · t.getBoost() 
              · norm(t,d)    
            ) (t in q)

score(q,d) is the relevance score of document d for query q.
queryNorm(q) is the query normalization factor.
coord(q,d) is the coordination factor.
The sum of the weights for each term t in the query q for document d.
- tf(t in d) is the term frequency for term t in document d.
- idf(t) is the inverse document frequency for term t.
- t.getBoost() is the boost that has been applied to the query.
- norm(t,d) is the field-length norm, combined with the index-time field-level boost, if any.

Now, let’s get more familiar with each of the scoring mechanisms that make up the Practical Scoring Function:

Term frequency (tf): This looks for the number of time the term appears in a field of a document and return the square root of it:

tf = sqrt(termFreq)

The idea behind calculating term frequency is that the more time a term appears in a document, the more relevant the document is.

Inverse document frequency (idf): This is one plus the natural log (as in “logarithm”, not “log file”) of the documents in the index divided by the number of documents that contain the term:

idf = 1 + ln(maxDocs/(docFreq + 1))

A common term, for example: ‘the’, which appears in almost all the documents should be considered less important than other terms which are included in fewer documents. This is the reason for calculating inverse document frequency.

Coordination (coord): Counts the number of terms from the query that appear in the document. With the coordination mechanism, if we have a 3-term query and a document contains 2 of those terms, then it will be scored higher than a document that has only 1 of those terms.
Field length normalization (norm): This is the inverse square root of the number of terms in the field:

norm = 1/sqrt(numFieldTerms)

For field length normalization, a document containing less number of terms and having the query term considered to be more relevant than a document containing more terms along with the search term.

Query normalization (queryNorm): This is typically the sum of squared weights for the terms in the query. This is done so that different queries can be compared.
Index boost: This is a percentage or absolute number used to boost any field at index time. Note that in practice an index boost is combined with the field length normalization so that only a single number will be stored for both in the index; however, Elasticsearch strongly recommends against using index-level boosts since there are many adverse effects associated with this mechanism.
Query boost: This is a percentage or absolute number that can be used to boost any query clause at query time. Query boosting allows us to indicate that some part(s) of the query should be more important than other parts.

Explain API:

Now we know how elasticsearch’s practical scoring function works. Before going any further, let’s talk about an important tool that we will be using for debugging the score for any query in Elasticsearch – “explain”.

Explain API returns information about why a specific document matches (or doesn’t match) a query. For example :

GET /twitter/_explain/0
{
      "query" : {
        "match" : { "message" : "elasticsearch" }
      }
}

The API returns the following response:

{
   "_index":"twitter",
   "_type":"_doc",
   "_id":"0",
   "matched":true,
   "explanation":{
      "value":1.6943598,
      "description":"weight(message:elasticsearch in 0) [PerFieldSimilarity], result of:",
      "details":[
         {
            "value":1.6943598,
            "description":"score(freq=1.0), computed as boost * idf * tf from:",
            "details":[
               {
                  "value":2.2,
                  "description":"boost",
                  "details":[]
               },
               {
                  "value":1.3862944,
                  "description":"idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
                  "details":[
                     {
                        "value":1,
                        "description":"n, number of documents containing term",
                        "details":[]
                     },
                     {
                        "value":5,
                        "description":"N, total number of documents with field",
                        "details":[]
                     }
                  ]
               },
               {
                  "value":0.5555556,
                  "description":"tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
                  "details":[
                     {
                        "value":1.0,
                        "description":"freq, occurrences of term within document",
                        "details":[]
                     },
                     {
                        "value":1.2,
                        "description":"k1, term saturation parameter",
                        "details":[]
                     },
                     {
                        "value":0.75,
                        "description":"b, length normalization parameter",
                        "details":[]
                     },
                     {
                        "value":3.0,
                        "description":"dl, length of field",
                        "details":[]
                     },
                     {
                        "value":5.4,
                        "description":"avgdl, average length of field",
                        "details":[]
                     }
                  ]
               }
            ]
         }
      ]
   }

As you can see, this is the detailed view of how the scoring is done for all the factors for a query. Hence, ‘explain’ API can help knowing whether a document matches the search query or not.

Factors Affecting Relevance :

Now let’s take a look at other factors that help us in tuning the relevancy.

Indexing & Mapping : Indexing your data is quite important and has a major impact on your search. By default elasticsearch indexes all the fields and dynamically maps them with matching data type. This could be done explicitly considering the case if you want to use the field for full-text search or keyword search.

For example :

PUT /my-index
{
  "mappings": {
    "properties": {
      "age":    { "type": "integer" },  
      "email":  { "type": "keyword"  }, 
      "name":   { "type": "text"  }     
    }
  }
}

Here, age is an integer field, email is a keyword field and name is text field.

Boosting : Fields can be boosted to elevate the relevance score to get the most relatable document in a specific scenario. Consider an example of a blog website where you want the title matching the search query up rather than the body matching the query. For such scenarios, boosting fields can play an important role.

Boosting can be done at both index time and query time. See the example below:

Index time boosting :

PUT blog-index
{
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "boost": 2 
      },
      "body": {
        "type": "text"
      }
    }
  }
}

Query time boosting :

POST _search
{
    "query": {
        "match" : {
            "title": {
                "query": "quick brown fox",
                "boost": 2
            }
        }
    }
}

In both the cases, when the query matched the title field, the score gets double than that of the body field.

Often index time boosting is not advisable because of the following reasons:

To change index-time boost values, reindexing of all documents required whereas you can change the boost values at query-time to meet the required relevancy.
Index-time boosts are stored as part of the norm, which is only one byte. This reduces the resolution of the field length normalization factor which can lead to lower quality relevance calculations.

Synonyms : Setting up synonyms can be useful in those cases where you want multiple words to refer to the same context and should retrieve the same documents. For example, when users search for the terms [ ‘water’, ‘rain’, ‘ocean’ ], you want them to have the same set of documents as a query result because they all share the same meaning in your case.

Similar to boosting, synonyms can be done at indexing time or at query time. Let’s find out the differences:

Index-time synonyms result in a bigger index, because all synonyms must be indexed whereas query-time synonyms have no effect on index-size.
Search scoring, which relies on term statistics, might suffer because synonyms are also counted, and the statistics for less common words become skewed during the indexed-time synonyms but in query-time, the term statistics in the corpus stay the same.
Reindexing reindexing required to change the synonym rules in indexed-time which is not the case with query-time synonyms.

Summary:

Elasticsearch is a very powerful tool and provides an entire ecosystem around the search. This blog only talks about the basic capabilities of Elasticsearch to improve the relevance of the search as this is often the foundation of creating a great user experience in website search.

Following are the important things covered in the blog

How elasticsearch scores the documents – The Practical Scoring Function

Factors that can help in tuning the relevance:
Indexing & Mapping
Boosting
Synonyms

Read more about the above discussed topics on Elastic Blogs & Documentation.

References:

Elasticsearch Reference

Elastic Blogs