Query Optimization in MongoDB

I was inspired to write this article during one project, where I should increase the performance of NoSQL query. At the end, we will be able to see whether tuning the query helps us to go up the speed of execute.

I use the following technologies:

  • MongoDB as a database;
  • MongoDB Compass as a database management system;

What we have?!

I have encountered the problem where I needed to retrieve a particular part of data in MongoDB. The collection name contains about 160 million documents. Later on, you will see how much time it takes without tunning.

Also, it is important to emphasize that each document has several sub-documents or array of sub-documents. We can also check the point of view here that projection helps to go up the amount of query time.

The structure of document
Query for searching data between two dates

I’m going to use the above query, in order to retrieve data with specified parameters like n, d.r, t during one period(between two dates). By the way, I forgot to mention that names of the field were changed, because it helped us to decrease the size of the collection name and raise the speed of query a bit as well.

Let’s go!!!

I am going to run a query in Explain Plan tab, in order to see detailed information about the execution of the winning plan.

Result of above query

Here we can see in the blue circle that this query required 1797175 ms(1797,175 seconds, 29,9 minutes) to be executed. There are several parameters in query performance summary. But, for now, we need Documents Returned and Documents Examined. According to the result, the query returned 370815 documents among examined 160391316. Documents Examined is a number of documents examined during query execution. In our case, it equals 160391316, which means it scanned the entire collection to find matched 370815 documents.
Perfectly, Documents Returned and Documents Examined must be equal!

The description of solution

There are many, various articles and tips how to speed up the query performance. But, I am going to share with you a result of my own experience how I increased time execution for this specific query.
I just followed best practices of indexing in MongoDB.

Detailed result

According to the detailed result, it is apparent that the query used COLLSCAN stage, which is used for a collection scan.

But, we want to achieve IXSCAN stage. Due to the name of stage, it is clear that one of the key role to optimize a query is an index! It is very important to index fields that are used in queries, especially if you have a large dataset in the collection.

Creating Compound Index

Actually, creating the index is a simple thing and there are sufficient information how to do it on the Internet. I will just show my indexes that I created specially for this query.

The list of indexes

There are only compound indexes in the illustrated above picture, because we use several fields in query. By the way, _id index was already created by default by MongoDB.
P.S. If you are going to create an index in collection, where the count of documents are closer to today’s size, then the creating index will take a lot of time.

The result of compound index

So let’s see the result after adding indexes.

The query execution after adding index

It is incredible! The speed of execution rocketed rapidly. Now it takes 121ms (FETCH) and 249ms(IXSCAN). In sum, it gives 360ms with comparison of 1797175 ms(1797,175 seconds, 29,9 minutes) without index.

It was our first improvement. Let’s do a second one.

Using Covered Query

Covered queries return results from an index directly without having to access the source documents, and are therefore very efficient.

For a query to be covered all the fields needed for filtering, sorting and/or being returned to the client must be present in an index. In order to figure out, whether the query is covered, totalDocsExamined should be 0. The common gotcha during using covered query is _id field. It returns automatically by default, so it should be explicitly excluded from the result.

So how can we use covered query?

That’s simple. In order to get covered query, it should be used projection. The projection allows us to limit fields in the result by selecting them. We used t, n, d.r in the above query. Hence, we have to project them, but, be aware of default _id field. It has to be excluded.

Example of projection

The result of covered query

So let what will be after executing such query in MongoDB.

Covered query (1)
Covered query (2)

As it is shown, this is a covered query. There’re several proofs. Firstly, the Documents Examined equals 0. Some paragraphs above, we discussed that. Secondly, there is an explicit prompt as green mark(Query covered by index). Thirdly, PROJECTION_DEFAULT indicates that projection works well. In sum, it took 4ms including PROJECTION_DEFAULT(2ms) and IXSCAN(2ms) stages. Now, it became faster than before.

Conclusion

In conclusion, we made sure that using index and covered query were very useful for performance of query. But, it is important to remember of some disadvantages of indexes in MongoDB and use it wisely.

Software Developer | Upwork | Freelancer

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store