Hybrid Data Cluster With Postgres, MongoDB And Hadoop

Database Platform | Apr 15, 2015

3 min read

Exciting Times Ahead – Hybrid Data Cluster with Postgres, MongoDB and Hadoop

There is something really exciting happening in PostgreSQL world. Recently with this patch commit: cb1ca4d800621dcae67ca6c799006de99fa4f0a5 in PostgreSQL/Allow foreign tables to participate in inheritance Tom Lane pushed an important piece of work in Postgres (by Shigeru Hanada and Etsuro Fujita).

The change means that now you will be able to create a foreign table which participates in inheritance hierarchy. PostgreSQL already facilitates table partitioning and partition pruning via table inheritance. Now the same can be extended to Foreign Tables, i.e. you can have your partitions residing on different servers. Huh? Isn’t that sharding? Yes you got it right it is sharding. Though this is something which will be available in PostgreSQL v9.5, people have already started to play around with it .

In one of my previous posts I outlined how one can use Foreign Data Wrappers to store data in a underlying sharded MongoDB store and use Postgres’ SQL Interface. But this new feature of v9.5 is something even more exciting. You can have a sharded Postgres cluster where each shard could be on different servers and a central server/primary shard can be used for controlling the data insertion. So far the examples that I have seen around are primarily using postgres_fdw, but the possibilities are endless. In ‘not so distant future’ we may have other FDW too being capable of this. i.e. I can have one of my child tables on MongoDB and other one in Postgres or on Hadoop.

Now doesn’t Hadoop or MongoDB already have sharding capabilities? Why would I mix it up with PostgreSQL? The answer is ‘simplicity’. I just want to keep things simple for my users. My users understand SQL pretty well and they find SQL to be quite intuitive for them. So I can have my most recent data sitting on Postgres and as it becomes cold I might start moving it to MongoDB and then to Hadoop. I can use the oldest set of data on Hadoop for doing complex analytics – to churn out a model for predictive analysis etc and I can continue to use that data for answering rare queries on historical data. Or I can use a current ‘data store’ data according to its nature of a platform which is best suited for it and then logically combine them under one entity (a table) in PostgreSQL (using FDW and child tables).

Whatever be your use case the big advantage is, users (who prefer SQL) will still look at this data as one single logical table in a Postgres Database. When they fire a query Postgres optimizer will go ahead and figure out whether it needs to be fetched from local child table or child table linked to MongoDB or child table linked to Hadoop. There are two other things being discussed and actively pursued in Postgres world-

Pushdown of aggregates, predicates and joins i.e. I would be able to push down the aggregations queries meant for MongoDB nodes to Mongo server.
Parallel scans which means if the data for my query is spread across different child tables or foreign child tables I can scan them in parallel.

These possibilities make it really intersting. Imagine you are able to harness capabilities of MongoDB and Hadoop in parallel and combine the resulting set without doing any bit of programming at your end. This all sounds like a fiction but as you track the recent developments and discussions in Postgres community it all looks possible in near future. Postgres community has been thriving to be in mainstream and address recent trends and demands in data-world. The community is well positioned to address the needs of hybrid architecture which is going to pick up very soon. So organizations would look for combining all the bits and pieces of their data – whether ‘Big’ or ‘Large’, ‘hot’ or ‘warm’, ‘BI-data’ or ‘BA-data’, ‘Hadoop’ or ‘Mongo’, ‘structured’ or ‘unstructured’. Postgres with its FDW, inheritance and parallel scan capabilities will be a key player not just for being the data store but also for being the logical data-hub.

Exciting times are ahead!

Implementing Hadoop: 7 Common Mistakes and How to Avoid Them

Mar 10, 2017 | 8 MIN READ

Postgres makes in-road in South East Asia: pgDay Asia 2016

Apr 13, 2016 | 4 MIN READ

Achieving scalability of NoSQL and Complexity of SQL

Mar 20, 2015 | 3 MIN READ

Cookie	Duration	Description
cookielawinfo-checbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Bolt.new, Bolt.DIY & DeepSeek-V3: AI Transforming DevOps from Development to Deployment - Watch Now!

Revolutionize Your CX with
Unified Observability

CloudOps Automation tool for Infrastructure monitoring and deployment.

Indonesia’s top digital credit service provider leverages Ashnik’s PostgreSQL expertise and services

Revolutionize Your CX with Unified Observability

Automate and monitor your PostgreSQL with ease.

The CloudOps Automation Tool for easy Infrastructure deployment and monitoring

Maximize Potential of Your Data with Streaming Data Pipeline Architecture

End-to-End Traceability and Unified Observability for the Modern Infrastructure

Watch: How to auto-scale in deployments using Kubernetes(K8s): A Technical Demo

Exciting Times Ahead – Hybrid Data Cluster with Postgres, MongoDB and Hadoop

Read More

Implementing Hadoop: 7 Common Mistakes and How to Avoid Them

Postgres makes in-road in South East Asia: pgDay Asia 2016

Achieving scalability of NoSQL and Complexity of SQL

Products