A customer had developed a data pipeline based on AWS technology that was becoming slow as their client based scaled.
They had reached well beyond five million devices when their retention window of two years was causing significant slow-downs in both loading the data and running queries against the datasets. Most of the time, analysts could not query data because all of the processing time was consumed loading data into the database.
As well as being slow, the system was also costly because it relied on expensive infrastructure - in this case, Redshift DC nodes - to provide a brute force solution to the data loading and analysis problem.
The engineering team that had initially built the platform was engaged on other new product development features and did not have time to revisit this project. We were hired to review the technical solution, re-design the database from the ground up, and deliver a new platform.
We continued with an AWS solution, this time based on Redshift DS nodes, which are significantly more cost-effective, and we optimized the database schema to make better use of Redshift's distribution and sort keys to ensure the most efficient use of the available hardware. It's a common factor that we find many development teams create cloud deployments with time-to-market in mind and never revisit to optimize. The hosting costs can subsequently scale exponentially, and only a few design changes can make significant reductions in cost.
By the time we had completed the migration, we had reduced the client's hosting costs, fixed the throughput problems, and also enabled more sophisticated queries and analysis on the database. The client spent the costs saved on hosting on establishing an off-shore DBA service where we operate providing pro-active database optimization, remote query support, troubleshooting and database updates.