“Performance degrades after migrating to MongoDB”. Really? Read this.
Sameer Kumar I Senior Solution Architect, Ashnik
During the last couple of months, I have been busy travelling across ASEAN and India to meet customers and prospects. I could see that MongoDB has created a certain buzz. The fact that it can very well co-exist with other evolving open source technologies e.g. PostgreSQL and Hadoop has made it even more loved among the data scientists and big data groups. But surprisingly, I came across a group of “unhappy” and “dissatisfied” users of MongoDB or in general NoSQL data structures too. They complained about how the performance degraded when they migrated out of some of the popular relational database. They were upset with the fact that simple queries which were returning data in few seconds from relational database is taking a lot of time in MongoDB.
This shocked me a bit. I could not hold back and asked them a few questions on how they did the migration and transformation. They were quite puzzled with my questions and most of them reacted – “What? What transformations?”. That is when I realized they have mistaken the relational entities as their ‘real’ world entities which we generally refer to in Programming (more precisely) in object oriented programming world.
I went ahead and explained a few things about MongoDB:
1. The flexible design – Let’s consider the customer data model. Customer is an entity in programming world and address is an attribute of that entity. In relational world we end up breaking down the entity to two different ‘relational’ entities- customer and customer_address bound by a concept called foreign key. Now to fetch customer data you need to join these two entities which is done quite efficiently by Relational Databases. This is something which MongoDB does not do efficiently and in fact does not need to. MongoDB offers you a flexible schema in which the address can be pretty much stored as an attribute within customer entity- just like the way you program it. Most of us end up translating our normalized relational entities as collection in MongoDB which brings the burden of joining result set on applications. This is bound to degrade your performance.
2. Secondary Indexes – Relational Database are really good at allowing us to create secondary indexes and they help us improve performance of queries which are based on those filters. MongoDB has the same facility. Most of us either don’t know this or forget to create appropriate secondary indexes.
3. Sharding need not always help – Just like how partitioning in relational database is not something which you should do for every table, same ways sharding is not meant for every collection in MongoDB. You must choose wisely (based on projected data growth) which collection is to be sharded and what should be your shard-key. Sharding unnecessarily or choosing a wrong shard-key may impact your performance negatively.
After understanding that there are few fundamental differences in MongoDB and purely relational databases. These differences make a huge difference in the way you design your database. The prospects are happy to engage with us and work together with our team to revisit their MongoDB trial and give it a fair second chance. So far these POC are looking promising and some of them involve a hybrid setup of MongoDB working with PostgreSQL and probably Hadoop too. Stay tuned and we will keep you posted as we explore the NoSQL world with JSON datatype.
- Sameer Kumar is the Database Solution Architect working with Ashnik. Sameer and his team works towards providing enterprise level solutions based on open source and cloud technologies. He has an experience in working on various roles ranging from development to performance tuning and from installation and migration to support and services roles.
- Would PostgreSQL leak your data? No! But YOU might!
- Integrating Docker Trusted Registry with Google Chat
- Chaos Engineering with Docker EE