No Comments

If you use these 3 features of MongoDB, you don’t need to use Redis

Sameer_Ashnik

by , , No Comments

17-Jan-2015

I was recently talking to one of my good friends who is working on his startup idea. He is a hard core programmer and explores various technologies and algorithm before adopting them. The idea he is working on is really cool and involves Geo Spatial Calculation, A* Algorithm for shortest route calculation, heuristics and fuzzy match algorithms. He told me that all this is powered in backend by MongoDB and he is really happy with MongoDB. In his words “if I had to do these things in MySQL, I would have probably given up long back”. Going by his experience of these two databases, in his opinion he somewhere felt that the greatest advantage of MongoDB compared to MySQL was the flexibility without compromising on the data-types that he needed and rich query operators. He is a man who loves python. He is happy with MongoDB because he does not have to write queries.He can do all that he wants by writing python code and the python dictionary so well maps to a document of MongoDB! To him this means- Improved Productivity.

During our discussion he mentioned to me that, he was happy flirting with MongoDB until he realized that he had to cache certain data. That is when he introduced Redis into the equation. His dependency on Redis further increased when he realized that he needs to store some lifetime aware data and data with a cap on its size. When he took me through more details of his solution and architecture, I realized that he actually would not necessarily need Redis here. I am an advocate of right tool for the right job but at the same time I am person who would not hesitate to compromise a little bit to avoid multiple components in the solution. Though Redis may be the best datastore to do these operations, these could be easily done in MongoDB as well. My friend can avoid learning and maintaining multiple technologies. He was really thrilled when I told him about three features of MongoDB which he was not aware about. Let me share those with you

1. Capped Collections

Capped collections are collections with a fixed size. They more or less work like circular queue or buffer. They are really good for high throughput insertions or retrieval/deletion based on insert order. They offer an advantage of automatically removing the oldest entry to make room for new values. And since it preserves the insertion order you can avoid creating an index for fetching documents in inserted order. This also helps in higher insert throughput by avoiding indexing overheads. This could be really useful for maintain event/error logs, activity logs, last 10 transactions and such similar data. The recent data in such cases can be cached in capped collection and historical data can be maintained in large collections. While considering capped collection in you solution, please be aware of certain limitations of capped collections e.g. in-place updates, updates failures which expands the collection beyond the original size, you cannot delete documents from capped collections and capped collections cannot sharded. To know more about capped collections in MongoDB refer to this documentation .

2. Time-To-Live Collections

Time to live collections are a mechanism to define a lifetime to the documents. This is a very common requirement one would have for caching session information and session should get invalidated after every 1 hour or for storing authentication tokens which must be renewed every 2hours or for storing log events which needs immediate attention (but will not be of any use if you look at them after an hour). This feature was first added in v2.2 of MongoDB. All you need to do is create a TTL index on the collection. You can either set an expiry date and time or set an age (in terms of interval e.g. 3600 seconds). This gives a great level of flexibility to application programmer who are building functionalities for discarding the data after their purpose has been served (based on age or date-time). There are few caveats though, that you should be aware of before designing a solution around this features. Most important consideration is data-type and the field being indexed. If the field being indexed is not present in a document or has data type other Data BSON type or an array of Data BSON type it would never get expired. Also there are certain limitations on the index e.g. the index cannot be on _id field or the index cannot be a compound index. A capped collection cannot have TTL index. In one of our future MongoDB blogs we will be talking about TTL indexes in greater details. Here is a link for quick reference in MongoDB Tutorials.

3. MongoDB as a pure in-memory store

It’s not just my friend but many people move some frequently accessed data out of MongoDB because they need very fast response time for fetch and retrieve requests on that data.They move it out of MongoDB and store that in Redis or similar database. Though Redis is really good at handling such requirements, you are forced to have a simple data structure of key-value pair and live with very restrictive set of queries that you can fire. Moreover introducing a new technology always add an overhead for the team maintaining it. I am pretty sure my friend would realize this when he launches his services in a full blown way. Actually you can achieve this in MongoDB too and it is quite simple to do that. You can refer to this blogpost for complete details and steps. To give a brief all you need to do is use RAM Disk as your –dbpath while starting up your MongoDB instance. The blog though suggests you to setup a replica-set but if you are using this database purely for performance boost of read-only queries you can probably skip that and start the instance without any oplogs or journals.           You can store the documents as usual with any data-type and hierarchy but if you wish to use this as a pure key-value store, nothing stops you. You can either store all keys as a field and values as ‘data for the field’ in a single large document e.g. {key1:value1, key2:value2…}. Or you can store each key-value pair as { _id: key, value: “data”} structure in different documents withing same collection. Not to forget you can run complex queries on this collection and get faster results.

There are many such hidden gems and unconventional ways of using the technology that you already have in your data centre. At Ashnik we focus on getting the best out of what you already have rather than suggesting a technology overhaul. This has helped our customers focus on expanding their business operations and add new offering without having a need for additional technology. If you are also looking at optimizing your current investment with little changes in your IT eco system or changes which are transparent, you should be talking to us.

– Sameer Kumar, Database Solution Architect | Singapore 

0
0

More from  Sameer Kumar :
17-Jan-2015