Achieving scalability of NoSQL and Complexity of SQL

Remember my article on archiving data from OLTP database to a MongoDB store ? The story was about a customer (a Bank) who wanted to archive data in a distributed and sharded data store. That blog was about how MongoDB can help store this data in a way customer desired to.
There is one more aspect to this requirement. The data that was so far being considered to be an overhead from maintenance and archiving perspective can be a real goldmine of important information. The data being archived can be a useful source of information e.g. in case of banks the GL or transaction postings can be useful for studying the spending and earning cycles of the customers. This can help banka upsell and cross sell other products to the same customer. This can be done by running analytics, aggregation and map reduce jobs. The team was quite excited when I shared this aspect of looking at the ‘archived-data’ as ‘big-data’.
But there was one challenge. The team working on the maintenance of this data and running ad-hoc reports to answer customer inquiries, didn’t have MongoDB skills. Moreover looking at the ‘big-data’ aspect of this story, one would need to do joins with several other tables e.g. customer details, account details to mine the information from Transaction Logs. These two challenges looked impossible to overcome … until I looked beyond MongoDB.
During our team call we reaslied that apparently this was not the only customer who needed a solution. In the past we have come across several such cases where customer is using MongoDB for write transaction or for collecting data from several sources and then they push that data to PostgreSQL or other relational databasesto internal users for querying or reporting. Someone tagged this as ‘a classic case where the customer wants horizontal scalability and still wants to stick with SQL queries’. And that is when the missing piece of the puzzle struck me. It occurred to me that I can use PostgreSQL to be in the front of MongoDB and use Foreign Data Wrappers (FDW) to build a logical-data hub in PostgreSQL. Foreign Data Wrappers in PostgreSQL are based on SQL/MED (SQL Management of External Data) specifications under SQL-2003. FDWs provide a way for PostgreSQL to interact with external data and manage them. This data could be relational database or a non-relational data store e.g. MongoDB, Hadoop or even filesystem. Refer to the PostgreSQL wiki page for more details.
With FDW the MongoDB collections would appear in PostgreSQL as Foreign Tables. These tables can be queried with SQL Language. They can be joined with other tables in PostgreSQL or other foreign tables. The best part which I am loving about PostgreSQL foreign Data wrappers is that PostgreSQL can maintain its own statistics on these foreign tables. Moreover certain predicates can be pushed down to underlying data store e.g. in case of MongoDB a where clause will be converted into a filter condition. The upcoming releases of FDW is even more promising as it would be possible to push down the complex aggregations as well to the underlying data store. This is one of the reasons why we love PostgreSQL so much at Ashnik. It continuously innovates and tries to address real life problems.
Our team downloaded MongoDB FDW for PostgreSQL and started playing around with it. We can now seethat it has potential to abstract the underlying noSQL data store from application and users. Users can insert data using INSERT statement just like how they would insert into a RDBMS, but it gets translated into a MongoDB insert statement. I can use complex SQLs, JOINs and aggregation on this data and combine it with other entities which might be in PostgreSQL or other data stores. And to do all this I don’t really have to compromise on Horizontal scalability which is one of key requirements today (as in my case).
This amalgamation of PostgreSQL and MongoDB via Foreign Data Wrappers has opened doors for us to build a logical data hub in PostgreSQL (without requiring to do actual data replication). It allows us to take full advantage of Horizontal Scalability of MongoDB along with complex querying capabilities of PostgreSQL. Here is a sample logical diagram of how we expect things to work with such integrated solutions. If this seems relevant to you, you know how to reach us! If you would like to know more about MongoDB FDW, you can check out this github page.

Postgres makes in-road in South East Asia: pgDay Asia 2016

Apr 13, 2016 | 4 MIN READ

EnterpriseDB® Announces Improved Database Integration, Scalability and Data A...

Jan 12, 2016 | 4 MIN READ

Exciting Times Ahead – Hybrid Data Cluster with Postgres, MongoDB and H...

Apr 15, 2015 | 3 MIN READ

Cookie	Duration	Description
cookielawinfo-checbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Building a Robust ELK Cluster on Amazon EKS, Challenges and Solutions Read More!

Revolutionize Your CX with
Unified Observability

CloudOps Automation tool for Infrastructure monitoring and deployment.

Indonesia’s top digital credit service provider leverages Ashnik’s PostgreSQL expertise and services

Revolutionize Your CX with Unified Observability

Automate and monitor your PostgreSQL with ease.

The CloudOps Automation Tool for easy Infrastructure deployment and monitoring

Maximize Potential of Your Data with Streaming Data Pipeline Architecture

End-to-End Traceability and Unified Observability for the Modern Infrastructure

Watch: How to auto-scale in deployments using Kubernetes(K8s): A Technical Demo