Building systems that automatically adjust to workloads and data



There are different ways to achieve the self-adjustment. With any system, you have a bunch of knobs and a bunch of design choices. If you take Redshift, you can tune the buffer size; you can create materialized views; you can create different types of sort orders. And database administrators can adjust these knobs and make design choices, based on their workloads, to get better performance.

Related content

Two authors of Amazon Redshift research paper that will be presented at leading international forum for database researchers reflect on how far the first petabyte scale cloud data warehouse has advanced since it was announced ten years ago.

The first form of self-adjustment is to make those decisions automatically. You have, let’s say, a machine learning model that observes the workload and figures out how to adjust these knobs and what materialized views and sort keys to create. Redshift already does this, for example, with a feature called Automated Materialized Views, which accelerates query performance.

The next step is that in some cases it’s possible to replace components through novel techniques that allow either more customization or tuning in ways that weren’t previously possible.

To give you an example, in the case of data layouts, current systems mainly support partitioning data by one attribute, which could be a composite key. The reason is that the developers of these systems always thought that someone has to eventually make these design choices manually. Thus, in the past, the tendency was to reduce the number of tuning parameters as much as possible.

Related content

Amazon researchers describe new method for distributing database tables across servers.

This, of course, changes the moment you have automatic tuning techniques using machine learning, which can explore the space much more efficiently. And now maybe the opposite is true: providing more degrees of freedom and more knobs is a good thing, as they offer more potential for customization and, thus, better performance.

The third self-adjustment method is where you deeply embed machine learning models into a component of the system to give you much better performance than is currently possible.

Every database, for example, has a query optimizer that takes a SQL query and optimizes it to an execution plan, which describes how to actually run that query. This query optimizer is a complex piece of software, which requires very carefully tuned heuristics and cost models to figure out how best to do this translation. The state of the art now is that you treat this as a deep-learning problem. So we talk at that stage about learned components.

The ultimate goal is to build a system out of learned components and to have everything tuned in a holistic way. There’s a model monitoring the workload, watching the system, and making the right adjustments — potentially in ways no human is able to.

Source link


Please enter your comment!
Please enter your name here

Share post:


More like this

What is revenue cycle management (RCM)? | Definition from TechTarget

What is revenue cycle management (RCM)? Revenue cycle management...

Former Xbox-exclusive game Pentiment announced for Nintendo Switch, launches tomorrow

Obsidian Entertainment's former Xbox-exclusive titles Pentiment and Grounded...

Stealthy backdoor Mac malware that can wipe out your files

MacOS is generally perceived to be more...