- SG: +65 64383504
- IN: 022 25771219
- IN: 022 25792714
- IN: +91 9987536436
Achieving scalability of NoSQL and Complexity of SQL
Remember my article on archiving data from OLTP database to a MongoDB store ? The story was about a customer (a Bank) who wanted to archive data in a distributed and sharded data store. That blog was about how MongoDB can help store this data in a way customer desired to.
There is one more aspect to this requirement. The data that was so far being considered to be an overhead from maintenance and archiving perspective can be a real goldmine of important information. The data being archived can be a useful source of information e.g. in case of banks the GL or transaction postings can be useful for studying the spending and earning cycles of the customers. This can help banka upsell and cross sell other products to the same customer. This can be done by running analytics, aggregation and map reduce jobs. The team was quite excited when I shared this aspect of looking at the ‘archived-data’ as ‘big-data’.
But there was one challenge. The team working on the maintenance of this data and running ad-hoc reports to answer customer inquiries, didn’t have MongoDB skills. Moreover looking at the ‘big-data’ aspect of this story, one would need to do joins with several other tables e.g. customer details, account details to mine the information from Transaction Logs. These two challenges looked impossible to overcome … until I looked beyond MongoDB.
During our team call we reaslied that apparently this was not the only customer who needed a solution. In the past we have come across several such cases where customer is using MongoDB for write transaction or for collecting data from several sources and then they push that data to PostgreSQL or other relational databasesto internal users for querying or reporting. Someone tagged this as ‘a classic case where the customer wants horizontal scalability and still wants to stick with SQL queries’. And that is when the missing piece of the puzzle struck me. It occurred to me that I can use PostgreSQL to be in the front of MongoDB and use Foreign Data Wrappers (FDW) to build a logical-data hub in PostgreSQL. Foreign Data Wrappers in PostgreSQL are based on SQL/MED (SQL Management of External Data) specifications under SQL-2003. FDWs provide a way for PostgreSQL to interact with external data and manage them. This data could be relational database or a non-relational data store e.g. MongoDB, Hadoop or even filesystem. Refer to the PostgreSQL wiki page for more details.
With FDW the MongoDB collections would appear in PostgreSQL as Foreign Tables. These tables can be queried with SQL Language. They can be joined with other tables in PostgreSQL or other foreign tables. The best part which I am loving about PostgreSQL foreign Data wrappers is that PostgreSQL can maintain its own statistics on these foreign tables. Moreover certain predicates can be pushed down to underlying data store e.g. in case of MongoDB a where clause will be converted into a filter condition. The upcoming releases of FDW is even more promising as it would be possible to push down the complex aggregations as well to the underlying data store. This is one of the reasons why we love PostgreSQL so much at Ashnik. It continuously innovates and tries to address real life problems.
Our team downloaded MongoDB FDW for PostgreSQL and started playing around with it. We can now seethat it has potential to abstract the underlying noSQL data store from application and users. Users can insert data using INSERT statement just like how they would insert into a RDBMS, but it gets translated into a MongoDB insert statement. I can use complex SQLs, JOINs and aggregation on this data and combine it with other entities which might be in PostgreSQL or other data stores. And to do all this I don’t really have to compromise on Horizontal scalability which is one of key requirements today (as in my case).
This amalgamation of PostgreSQL and MongoDB via Foreign Data Wrappers has opened doors for us to build a logical data hub in PostgreSQL (without requiring to do actual data replication). It allows us to take full advantage of Horizontal Scalability of MongoDB along with complex querying capabilities of PostgreSQL. Here is a sample logical diagram of how we expect things to work with such integrated solutions. If this seems relevant to you, you know how to reach us! If you would like to know more about MongoDB FDW, you can check out this github page.