MongoDB takes your 3 Scalability Issues, head-on!
Sameer Kumar I Senior Solution Architect, Ashnik
I have been meeting customers who have liked MongoDB in their development and POC environments. These prospects/customers have been really very happy with the idea of flexible schema and that is where they find MongoDB the best fit for their R&D labs. Whenever they want to try out the new application features they first develop it on MongoDB. Considering the fact that MongoDB is the choice for handling large scale database and for high velocity of data, Scalability needs to be addressed in the production environment. This is where we get most queries. In this article, let me address the key features of MongoDB that help build a scalable system.
Today, one of the key considerations while choosing a solution for production deployment is – Scalability. MongoDB is designed to scale up as well as scale out. It can take advantage of resources available at hand and can be asked to spread a huge database across several servers. What’s more, it can also do a smart data tier-ing and keep the most relevant data (hardly a few GBs) of the all data (few hundred GBs or may be TBs) on the fastest machines.
We can categorize today’s scaling issues in three different classes- Scaling Up, Scaling out and Archiving. Let’s see how MongoDB is geared to solve these issues –
1. Scaling Up – Scaling up means adding more CPU and Memory resources to the existing server. MongoDB does not create a private cache like traditional database systems. It relies on OS Memory Mapped files. It tries to keep as much of the frequently accessed files in memory as possible. This helps in utilizing as much memory as available. Hence if you throw in more memory to MongoDB, it will try to use it, given that you have a considerable size of database compared to your RAM size.
2. Scaling Out – After a certain limit, scaling up does not produce same gains as it would initially. Moreover the cost of scaling generally comes out to be exponential and subsidizes the gains. In such situations one thinks about scaling out- Adding more servers. MongoDB fits very well in this equation. From day one it was designed to scale out. It has a great feature for scaling out- Sharding. Sharding helps you distribute your data across various nodes. If you need high write throughput a Sharded MongoDB Cluster is what you need to look for. The write-load can be well balanced between multiple nodes. Given that shard-key has been chosen wisely, even read queries will perform better since they do not have to traverse through all the documents stored on single server.
This way with Sharding feature MongoDB helps you scale out as your business grows. You can add server on the fly and MongoDB will take care of balancing the data among them. The best part is you don’t have to build any intelligence in your application for this. All of this- which document goes to which node, which data is fetched from which server etc., is taken care by MongoDB.
3. Data Archiving – It does not matter how much you invest in your infrastructure and resources, one or the other day you are bound to face the issues of data volume. This typically happens because we keep piling data without doing clean-up/archival on timely basis. But often historical data is needed for back dated queries or for reports or for calculations. In such cases if we archive data and put them on different system/servers it would take tremendous effort to pull data from such systems and correlate that with live data and generate required reports. This problem again is solved quite effectively by MongoDB. You can shard data based on tags or tiers. Hence the data which is most relevant or hot can be kept on a high end system with SSD or flash disks. Data which is warm or frequently accessed can be kept on moderately sized servers with faster spinning disks. Data which is cold or rarely accessed can be kept on low capacity servers with slower disks. Now as the ‘hot’ data becomes ‘warm’ and eventually ‘cold’, MongoDB would be moving it to appropriate shard and hence it would be moving to a server which fits its tier. This facility does not only make archiving easy but also makes it easy to retrieve the archived data.
I will end on a note that MongoDB has the potential to tackle scalability issues, all three of them. It not only tackles these issues but makes it easier in a way which we had never imagined. In my coming write-ups, I will try to share how MongoDB lives up to the other key expectations for Production deployment – Availability.
- Sameer Kumar is the Database Solution Architect working with Ashnik. Sameer and his team works towards providing enterprise level solutions based on open source and cloud technologies. He has an experience in working on various roles ranging from development to performance tuning and from installation and migration to support and services roles.
- Would PostgreSQL leak your data? No! But YOU might!
- Integrating Docker Trusted Registry with Google Chat
- Chaos Engineering with Docker EE