No Comments

Modernizing Enterprise Data Warehouse solutions using open source

Monika Agrawal | Solution Consultant, Ashnik
Singapore, 11 Apr 2018

by , , No Comments


How to modernize Enterprise Data Warehouse solutions using open source?

Enterprise Data Warehouses have been around for more than 30 years and have become an essential part of any business intelligence operation. It is basically the science of storing and retrieving data so that some meaningful analysis can be performed on that data to corroborate / support a business decision or to predict a business outcome, while also offering secure access to only authorized users. And, by modernizing age-old EDW systems through the use of open source, we can transform it to turn efficient and swell. It’s already been making an encouraging difference in the overall performance of many enterprises’ IT architectures and businesses.
In this blog, I will take you through the advantages of having an efficient EDW and how you can implement one for your IT architecture – using a concoction of the most fitting open source technologies.

Having access to an effective enterprise data warehouse (EDW) provides several significant benefits:

  1. Data warehouses offer augmented support for data tracking, managing, and analysing.
  2. Data warehouses are capable of tracking and modifying marketing campaigns, for faster, more accurate evaluation of campaign effectiveness.
  3. It allows users to store as much data as needed with regards to a large variety of parameters. Also capable of drawing data from multiple, often-unrelated sources.
  4. EDW refines data, eliminating excess or redundant information, with improved data quality.
  5. Gives user the ability to examine information within the platform itself. This brings data manipulation to a minimum and data integrity to its highest level. Thus, allowing decisions to be made with the most accurate data possible.
  6. EDW has been known to reduce cost associated with data analytics, and to increase company ROIs.

Designing Enterprise Data Architecture using open source technologies
There are a several software/ technology stacks that can help build enterprise data warehouse architecture solutions that could fit perfectly with your existing systems and processes. I have explained a few such stacks and solution overviews here:

  1. 1. Using Pentaho Data Integration and EDB PostgreSQL

When we deal with structured data, integration tool like Pentaho and RDBMS like PostgreSql can be used to build an enterprise data warehouse. Pentaho Data Integration is the component of Pentaho responsible for the Extract, Transform and Load (ETL) processes. It is most frequently used in data warehouses environments including migrating data between applications or databases. Pentaho is used for data provisioning from the source systems, data enrichment, transformation and loading into databases.

Enterprise PostgreSQL is a commercial package that bundles open source PostgreSQL and additional tools provided by EDB. It comes as single compiled binary installer which can be easily deployed and updated in future as new patches are released. EDB PostgreSQL is used for deploying the data warehouse and reporting solution where you deal with structured data, be it for OLTP (online Transaction processing) or OLAP (online Analytical processing). EDB PostgreSql also bundles tools which are helpful for maintaining high availability, back up and monitoring of Data warehouse.

The combination of Pentaho and EDB Postgres to create an enterprise data warehouse is a judicial choice where low cost and high-performance balance is needed.

  1. 2. Using Pentaho Data Integration and Hadoop

When we deal with unstructured data, Hadoop addresses the limitations of traditional computing, helps businesses overcome real challenges, and powers new types of Big Data analytics. The goal is to break the information silos in the enterprise by bringing all the data into a single place for analysis without the restrictions of schema, security, or authorization. All the existing data as well as incremental data from the various source systems can be loaded in Hadoop file system for data analytics. Moreover, Hadoop complements the existing PostgreSQL Data warehouse. The ETL offloading from current EDW platform to parallel, open source, cost-effective, scale-out environment like Hadoop is a viable option. For example, processing XML files in Hadoop using Pig or Hive is an efficient solution.

To ingest data from the various source systems to Hadoop data lake, the Pentaho Data Integration ETL tool can be used to simplify the creation and orchestration of big data pipelines. Pentaho make it easier to harness the power of Hadoop, faster. Hadoop works in parallel to a data warehouse for faster processing.

  1. 3. Using Pentaho Data Integration and Couchbase

When we deal with creating or storing document-based data, NoSql plays a significant role. Couchbase has range of new capabilities including full text search and real-time analytics to simplify the way developers and architects build Digital applications. The Couchbase empowers developers to run complex, ad-hoc queries on operational JSON data stored in Couchbase to get truly real-time insights using the new distributed parallel analytics engine. This analytics engine is delivered as an independently-scalable service using Couchbase’s breakthrough Multi-Dimensional Scaling technology, enabling users to run analytics on the same data platform without any impact to their business operations. EDB Postgres is a sensible choice when it came to generating complex JSON documents along with the high data volume for processing. Pentaho can help to integrate the legacy application data to the digital application to provide the perfect user experience. The DI tool can also help in the dataflow orchestration as well as to provide real time job facility to respond to high volumes of incoming requests. When it comes to applications with two-way synchronization along with online & offline capabilities, the Couchbase Suite performs an exemplary job.

These are some of the high-level enterprise data warehouse solutions a business can implement as per their enterprise needs – using open source technologies along with its enterprise support. If you’re looking to top up reporting and dashboarding requirements on the EDW, then Pentaho also has a Business Analytics Suite which works very effectually.

I will be deep-diving into each of the above solutions in my upcoming blogs. Watch this space to stay tuned!

And, for any EDW solutions and implementation needs, you may write to We have a plethora of enterprise-grade open source technologies that can help you drive your business to success with the right solutioning.


  • Monika is a Solution Consultant with Ashnik. She is an experienced professional in data mining methodology and integration tools with more than7 years of experience in Business Intelligence, Integration Services and Database Engineering. Having worked for some of the global companies, she has been instrumental in leading and executing multiple projects and POCs in the capacity of product consultation, analytics and solution designing for gaming, media, FMCG, and chemicals, payment, and logistics industries.

More From Monika Agrawal | Solution Consultant, Ashnik :
Tags: , , , , , , , , , , , , , , , , , , , , , ,