Impulse Data Warehousing and OLAP Solution Outperforms Google BigQuery by 3x

Benchmark Summary

Benchmarking Method

  1. Prepared the SSB data using a publicly available data gen tool, https://github.com/lemire/StarSchemaBenchmark.
  2. The data was loaded using Impulse scalable ETL platform. We configured Impulse to store data in HDFS in parquet format.
  3. Impulse was configured to load the data into the data warehouse at the end of the ETL process.
  4. We loaded the parquet dataset into the Google Cloud Storage (GCS) bucket and created tables in BigQuery. The data was loaded into BigQuery from the GCS bucket.
  5. We executed the 13 queries that were provided in the original SSB paper, https://www.cs.umb.edu/~poneil/StarSchemaB.pdf.
  6. All 13 queries were executed on both Impulse and BigQuery and query result times were recorded. The same queries were repeated 5 times and the results were aggregated.

Data Preparation

Impulse Data Warehousing Cluster Details

Query Execution Performance Result

Price-Performance Comparison

Impulse Data Warehouse Features

  1. Blazing fast and scalable database for ad hoc analytics and online analytical processing (OLAP)
  2. A fully integrated ETL to ingest data from a wide variety of data sources and formats, transform, and build an automated data pipeline to load and index data for interactive and ad hoc query.
  3. Fully integrated web-based visualization and BI engine to create interactive dashboards.
  4. JDBC and Restful APIs are available to connect with third party BI tools.

--

--

--

Author, inventor and thought leader in computer vision, machine learning, and AI. 4 US Patents.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

SQL Server with Python

Applied Multivariate Regression

Modelling the Public Transport Network — Part II

Volatility Regime Classification with GARCH(1,1)&Markov Models

Becoming a Data Analyst

Comparing the neighborhoods of Hyderabad City, India

Clustering U.S counties by their COVID-19 curves

How we built our data science infrastructure at Pew Research Center

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Sam Ansari

Sam Ansari

Author, inventor and thought leader in computer vision, machine learning, and AI. 4 US Patents.

More from Medium

AI & Data in Scotland: An Interview with Gillian Docherty

Ethics and Bias in Data Processing Algorithms

How can we marry ethics and data processing automation to safeguard diversity and inclusion priorities, while capitalizing on efficiencies?

Airbnb as a Metric to Assess Health of Major US Cities, Price Prediction, and Text Analysis of…

Data Quality & AI