Data warehouse upgrade

Data warehouse upgrade

Tom Weiss, Sat 04 March 2017

A customer had developed a data pipeline based on AWS technology that was becoming slow as their client based scaled.

They had reached well beyond five million devices when their retention window of two years was causing significant slow-downs in both loading the data and running queries against the datasets. Most of the time, analysts could not query data because all of the processing time was consumed loading data into the database.

As well as being slow, the system was also costly because it relied on expensive infrastructure - in this case, Redshift DC nodes - to provide a brute force solution to the data loading and analysis problem.

The engineering team that had initially built the platform was engaged on other new product development features and did not have time to revisit this project. We were hired to review the technical solution, re-design the database from the ground up, and deliver a new platform.

[Data Pipeline Rebuild]

We continued with an AWS solution, this time based on Redshift DS nodes, which are significantly more cost-effective, and we optimized the database schema to make better use of Redshift's distribution and sort keys to ensure the most efficient use of the available hardware. It's a common factor that we find many development teams create cloud deployments with time-to-market in mind and never revisit to optimize. The hosting costs can subsequently scale exponentially, and only a few design changes can make significant reductions in cost.

By the time we had completed the migration, we had reduced the client's hosting costs, fixed the throughput problems, and also enabled more sophisticated queries and analysis on the database. The client spent the costs saved on hosting on establishing an off-shore DBA service where we operate providing pro-active database optimization, remote query support, troubleshooting and database updates.

Need help? Get in touch...

Sign up below and one of our data consultants will get right back to you

Other articles about Data Pipelines


Dativa is a global consulting firm providing data consulting and engineering services to companies that want to build and implement strategies to put data to work. We work with primary data generators, businesses harvesting their own internal data, data-centric service providers, data brokers, agencies, media buyers and media sellers.

145 Marina Boulevard
San Rafael
California
94901

Registered in Delaware

Thames Tower
Station Road
Reading
RG1 1LX

Registered in England & Wales, number 10202531