Impulse Data Warehousing and OLAP Solution Outperforms Google BigQuery by 3x

Benchmark Summary

Benchmarking Method

  1. Prepared the SSB data using a publicly available data gen tool, https://github.com/lemire/StarSchemaBenchmark.
  2. The data was loaded using Impulse scalable ETL platform. We configured Impulse to store data in HDFS in parquet format.
  3. Impulse was configured to load the data into the data warehouse at the end of the ETL process.
  4. We loaded the parquet dataset into the Google Cloud Storage (GCS) bucket and created tables in BigQuery. The data was loaded into BigQuery from the GCS bucket.
  5. We executed the 13 queries that were provided in the original SSB paper, https://www.cs.umb.edu/~poneil/StarSchemaB.pdf.
  6. All 13 queries were executed on both Impulse and BigQuery and query result times were recorded. The same queries were repeated 5 times and the results were aggregated.

Data Preparation

Impulse Data Warehousing Cluster Details

Query Execution Performance Result

Price-Performance Comparison

Impulse Data Warehouse Features

  1. Blazing fast and scalable database for ad hoc analytics and online analytical processing (OLAP)
  2. A fully integrated ETL to ingest data from a wide variety of data sources and formats, transform, and build an automated data pipeline to load and index data for interactive and ad hoc query.
  3. Fully integrated web-based visualization and BI engine to create interactive dashboards.
  4. JDBC and Restful APIs are available to connect with third party BI tools.

--

--

--

Author, inventor and thought leader in computer vision, machine learning, and AI. 4 US Patents.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Emotional Intelligence for Data Scientists

The side profile of a zebra against a pure black backdrop

Announcing the NeurIPS 2021 Datasets and Benchmarks Track

The Benefits of Creating Concept Maps

Improving Dataset Creation for Machine Learning

Foraging for Mushrooms

Show me the data. The importance of Data Storytelling in an uncertain world.

DengAI — How to approach Data Science competitions? (EDA)

Databricks Lakehouse solving big data problem in Health Care

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Sam Ansari

Sam Ansari

Author, inventor and thought leader in computer vision, machine learning, and AI. 4 US Patents.

More from Medium

VERTEX AI — (Part 1/2) Introduction and example use case

Enabling effective data science through lifecycle rules in SageMaker Studio

Model Improvement in Production

How Data Science Is Helping Grocery Delivery Business | HData Systems