Authors:
(1) Limeng Zhang, Centre for Research on Engineering Software Technologies (CREST), The University of Adelaide, Australia;
(2) M. Ali Babar, Centre for Research on Engineering Software Technologies (CREST), The University of Adelaide, Australia.
1.1 Configuration Parameter Tuning Challenges and 1.2 Contributions
3 Overview of Tuning Framework
4 Workload Characterization and 4.1 Query-level Characterization
4.2 Runtime-based Characterization
5 Feature Pruning and 5.1 Workload-level Pruning
5.2 Configuration-level Pruning
7 Configuration Recommendation and 7.1 Bayesian Optimization
10 Discussion and Conclusion, and References
Abstract—Faced with the challenges of big data, modern cloud database management systems are designed to efficiently store, organize, and retrieve data, supporting optimal performance, scalability, and reliability for complex data processing and analysis. However, achieving good performance in modern databases is non-trivial as they are notorious for having dozens of configurable knobs, such as hardware setup, software setup, database physical and logical design, etc., that control runtime behaviors and impact database performance. To find the optimal configuration for achieving optimal performance, extensive research has been conducted on automatic parameter tuning in DBMS. This paper provides a comprehensive survey of predominant configuration tuning techniques, including Bayesian optimization-based solutions, Neural network-based solutions, Reinforcement learning-based solutions, and Search-based solutions. Moreover, it investigates the fundamental aspects of parameter tuning pipeline, including tuning objective, workload characterization, feature pruning, knowledge from experience, configuration recommendation, and experimental settings. We highlight technique comparisons in each component, corresponding solutions, and introduce the experimental setting for performance evaluation. Finally, we conclude this paper and present future research opportunities. This paper aims to assist future researchers and practitioners in gaining a better understanding of automatic parameter tuning in cloud databases by providing state-of-the-art existing solutions, research directions, and evaluation benchmarks.
In the increasingly digitized age, vast and diverse volumes of data are generated from various sources, including mobile devices, social media platforms, sensors, and more. Faced with this data explosion, cloud database management systems (DBMS) cloud database management systems (DBMS) for data storage, coupled with big data analytics frameworks (BDAF), have emerged as powerful solutions to tackle the complexities of handling and processing massive and intricate data sets in a scalable and flexible manner. This makes them invaluable tools for organizations grappling with the challenges of big data and digital transformation [1], [2].
However, achieving good performance in modern DBMSs is non-trivial. Modern DBMSs have hundreds of configurable knobs regarding hardware setup, software configuration, database physical and logical design, that affect their performance [3]–[6]. Efficient parameter configurations can strike a balance between resource utilization, query responsiveness, and cost-effectiveness, while an inappropriate configuration can lead to significant performance degradation and inefficient usage of system resources [3], [6]–[13].