Tag Results

Over the last 18 months or so, we at Pentaho have witnessed the hype train around Spark crank into full gear. The huge interest in Spark is of course justified.  As a data processing engine, Spark can scream because it leverages in-memory computing.  Spark is flexible – able to handle workloads like streaming, machine learning, and batch processing in a single application.  Finally, Spark is developer-friendly – equipped to work with popular programming languages and simpler than traditional Hadoop MapReduce. Many […]

A blueprint for big data success – What is the “Filling the Data Lake” blueprint? The blueprint for filling the data lake refers to a modern data onboarding process for ingesting big data into Hadoop data lakes that is flexible, scalable, and repeatable.  It streamlines data ingestion from a wide variety of source data and business users, reduces dependence on hard-coded data movement procedures, and it simplifies regular data movement at scale into the data lake. The “Filling the Data Lake”blueprint provides developers […]