No Comments

Relevance Search with Elastic – Part 2

Ajit Gadge I Senior Database Consultant, Ashnik
Singapore, 13 Jul 2018
ajit-gadge

by , , No Comments

13-Jul-2018

In my previous article Customizing relevancy in Elastic – Part 1, I talked about the important factors while trying to search on web or in your databases. Had also discussed about the various algorithms which are used while searching your contents or words. One of the most used methods is TFIDF and in the article I had discussed it in detail with examples, if you’d like to refer to it.

Now, in continuation of the Elastic series, I’m today writing on ‘how we can search these words / text data in our databases using Elastic’.

To showcase this, I have loaded a sample data of about 100,000 documents which contains data on employees of an organization. It contains Firstname, LastName, Designation, Salary, DateOfJoining, Address etc. I have downloaded this from the web and ingested it into my Elasticsearch cluster.

Take a look below:

rele-img

1. Multi Field search 

Many times, we know what to search but we are not sure where to search. For example, here I would like to search the name ‘Scott’ but I am not sure if Scott is FirstName or LastName. So how to go about it? I’ll have to search Scott in both fields efficiently.

I can write a DSL query which searches ‘Scott’ word in both fields or for that matter any field in given documents.

Query =>

GET companydatabase/_search

{

“query”: { “multi_match”: {

“query”: “Scott”, “fields”: [

“FirstName”,

“LastName”

],

“type”: “best_fields”

} }

}

Result =>

multi-field

If you observed above, ‘Scott’ appears in FirstName as well as LastName. Observe that our input is case insensitive. So it doesn’t matter if you searching “SCOTT” or “Scott”.

Look at the _score field which is calculated based on the TFIDF method which we have discussed last time.

Also see, I have given [type :  “best _fields”]. It means that here FirstName and LastName field compete with each other (“Scott” word is in both fields) so we need single best field based on their scorings, it will appear in our list. Please see the example for more details here.

2. Boosting your results in search

In the same example, now I would like to search the word ‘Scott’ in FirstName and LastName but I will give priority to the field FirstName instead of LastName. I can use “^” (upper cap) with field in which l would like to boost its score number of times.

Query =>

GET companydatabase/_search

{

“query”: { “multi_match”: {

“query”: “Scott”, “fields”: [

“FirstName^2”,

“LastName”

],

“type”: “best_fields”

} }

}

Result =>

boosting

This time my _score for FirstName is boost by 2 and appears earlier than LastName. Observed the _score is this time double than earlier _score.

3. Phrase Prefi

 Sometime, we are not sure if you like to search ‘Scott’ or it might be ‘Scotting’ or ‘Scotten’. In this case, you can use type => “phrase_prefix” so that while searching it will take those prefix considerations. Here’s how:

Query =>

GET companydatabase/_search

{

“query”: { “multi_match”: {

“query”: “Scott”, “fields”: [

“FirstName”,

“LastName”

],

“type”: “phrase_prefix”

} }

}

Result =>

phrase-img

4. Fuzziness

We might make a spelling mistake while searching, that’s okay. You can tackle those spell mistakes or fuzziness while searching.

Let’s say for example, I would like to search records which contains word “Douglasville”.  I can write a query which will give me the exact match query.

Query =>

GET companydatabase/_search

{

“query”: {

“match”: {

“Address”: “Douglasville”

}

} }

It gives me results with hits on 122 documents.

In the same query, if I make some spelling errors like instead of “Douglasville” I search for “Douglasvilla” then it would give me an output of 0 documents.

But if I add fuzziness, then it will ignore my spelling errors by 1 precision and provide me the results.

Query =>

GET companydatabase/_search

{

“query”: {

“match”: {

“Address”: {

“query”: “Douglasvilla”,

“fuzziness”: 1

} }

} }

Result = >

fuzzi-img

Observe the result, it still provides me 122 hits with spelling errors when I add the fuzziness option.

You can add multiple words and then add fuzziness as well. If I like to search on the field Address again with 2 words and if I have made a mistake on both the words, I can still add 1 level of fuzziness.

If I like to search for “Buttonwood” along with “Douglasville” – here is my query.

Query =>

GET companydatabase/_search

{

“query”: {

“match”: {

“Address”: {

“query”: “Butonwood Douglasvilla”,

“fuzziness”: 1

} }

} }

Result =>

fuzzi-img2

Observe the result.

One can increase fuzziness by level 2 or more but try to avoid confusing words like “hi” with “jim”, “him” etc.

5. Must match with option

In the same example, now I would like to search FirstName as ‘Scott’ but only those records of who live in ‘Floral Park’. You can write this with must match field is FirstName with filter by Address ‘Floral Park’.

One can provide these kind of facets while searching the results and thus narrow down the search with best possible custom results.

Query =>

GET companydatabase/_search

{

“query”: {

“bool”: {

“must”: {

“match”: {

“FirstName”: “Scott”

}

},

“filter”: {

“match”: {

“Address”: “Floral Park”

}

}

} }

}

Result:

must-img

See the result, earlier there were 46 documents with ‘Scott’ now with this filter in Address it is only 1 document. You can narrow down your results by applying these filters.

6. Sorting your results

 As you observed, our results by default sort by _score field which provide the best possible result sets. But if you like to sort these results with your custom field, how can you do that?
Below are a few examples:

Query = >

Search on FirstName with “Scott” but sort by DateofJoining.

GET companydatabase/_search

{

“query”: {

“match”: {

“FirstName”: “Scott”

}

}, “sort”: [

{

“DateOfJoining”: {

“order”: “desc”

}

}

]

}

Result =>

sort-img

In the above result, observe that _score is null and this guy is a trainee. But he has been added recently to these documents.

You can sort with multi fields as well.

Sort on DateofJoining and Salary

Query =>

GET companydatabase/_search

{

“query”: {

“match”: {

“FirstName”: “Scott”

}

}, “sort”: [

{

“DateOfJoining”: {

“order”: “desc”

}

},

{

“Salary”: {

“order”: “asc”

}

}

]

}

7. Highlights

We are searching these results and would like to highlight certain output fields so that you can show these fields while displaying as a webpage. You can do that by adding the “highlight”  tag.

Query =>

GET companydatabase/_search

{

“query”: {

“match_phrase”: {

“FirstName”: “Scott” }

},

“highlight”: {

“fields”: {

“FirstName”: {}

} }

}

Result =>

highlt-img

Observe in the above result, I am highlighting ‘Scott’ here.

So these were some methods on searching search words/text data in your databases using Elastic. I hope this helped you to build an application which provides relevant search using Elasticsearch.

If you need any enterprise-level guidance on Elastic products or suite, do get in touch with us. Our team will be happy to help!

Ajit Gadge – Senior Database Consultant


Ajit brings over 16 years of solid experience in solution architecting and implementation. He has successfully delivered solutions on various database technologies including PostgreSQL, MySQL, SQLServer. His derives his strength from his passion for database technologies and love of pre-sales engagement with customers. Ajit has successfully engaged with customers from across the globe including South East Asia, USA and India. His commitment and ability to find solutions has earned him great respect from customers. Prior to Ashnik he has worked with EnterpriseDB for 7 years and helped it grow in South East Asia and India


2
0


More from  Ajit Gadge I Senior Database Consultant, Ashnik :
13-Jul-2018