Three-hour Oracle queries, down to under a minute

A data analytics company that hit the ceiling of its database

NITS Solutions, a data analytics company in Novi, Michigan, helps businesses find the opportunities hidden inside their data. They were an early cloud adopter and already ran their software on AWS, and they tend to treat a performance problem as a reason to find a better approach rather than something to live with.

The problem that brought them to us showed up in a real client dataset: three years of vehicle repair records for several major automakers. Queries against that data had once returned in minutes. As the dataset grew and the questions got more complex, the same queries had stretched to two or three hours, and some were no longer practical to run at all.

Why Oracle ran out of room

The database underneath was Oracle, and the limit they were hitting was structural rather than a tuning problem. A relational database like Oracle scales vertically: it runs a query as one large process bounded by the memory and processing power of a single server. As datasets grow and queries get more complex, there is eventually no bigger machine left to move to. NITS had reached that point on a 128 GB server, where some queries ran for hours and others would not complete at all. With real client data affected, they decided it was time for a different architecture rather than a larger server.

How they found us, and what we built

NITS’ lead architect and database analyst came to the Michigan AWS meetup which, at the time, we ran. After comparing us against several other AWS partners, NITS asked us to help them reach the scalability and query performance their own customers needed.

We set a deliberately narrow goal for the engagement: prove out a horizontally scaling technology that could outperform Oracle on NITS’ real data, and get their engineers able to use it themselves. The technology we chose was Apache Spark, running on Amazon EMR. Where Oracle ran a query as a single large process, Spark splits the work into many smaller processes across many servers, so adding capacity becomes a matter of adding machines rather than buying a bigger one. We rewrote the queries that had been timing out in Oracle to run on Spark against the same vehicle-repair dataset.

For the visualization side, we used the ELK stack of Elasticsearch, Logstash, and Kibana, which gave NITS real-time charts and graphs along with the ability to build new views on demand through its interfaces.

This was a proof of concept, and we treated it like one. NITS embedded one of their engineers in our team for the duration so the knowledge would stay in-house, and the six weeks of work were aimed at demonstrating the approach on production-scale data and handing it over, not at replacing their entire database in one move.

What changed

Rewritten on Spark and EMR, the queries that had taken two to three hours on Oracle came back in just over a minute. Queries that had not been possible on the 128 GB Oracle machine became routine. With the ELK stack in place, NITS could generate real-time visualizations and answer new questions as they came up rather than waiting on long-running jobs.

“RightBrain exceeded our expectations both with its knowledge of AWS and big data technologies. Our developers are now making great progress with Spark, and we look forward to engaging with RightBrain again in the near future,” NITS told us.

The point of the six weeks was not only faster queries. It was leaving NITS with a scaling approach their engineers understood and could carry forward, which is the part that outlasts any single project.

Stack

Apache Spark · Amazon EMR · AWS · Elasticsearch · Logstash · Kibana