论文标题
可扩展和云的本地高参数调谐系统
A Scalable and Cloud-Native Hyperparameter Tuning System
论文作者
论文摘要
在本文中,我们介绍了Katib:可扩展,云和生产就绪的高参数调谐系统,它是基础机器学习框架的不可知论。尽管有多个可用的高参数调谐系统,但这是第一个满足系统用户和管理员需求的第一个。我们介绍了系统的动机和设计,并将其与现有的超参数调谐系统进行对比,尤其是在多租期,可扩展性,容忍度和可扩展性方面。它可以部署在本地机器上,也可以在本地数据中心或私人/公共云中作为服务托管。我们使用实验结果以及实际生产用例证明了系统的优势。 Katib拥有来自多家公司的活跃贡献者,并在Apache 2.0许可下以\ emph {https://github.com/kubeflow/katib}开源。
In this paper, we introduce Katib: a scalable, cloud-native, and production-ready hyperparameter tuning system that is agnostic of the underlying machine learning framework. Though there are multiple hyperparameter tuning systems available, this is the first one that caters to the needs of both users and administrators of the system. We present the motivation and design of the system and contrast it with existing hyperparameter tuning systems, especially in terms of multi-tenancy, scalability, fault-tolerance, and extensibility. It can be deployed on local machines, or hosted as a service in on-premise data centers, or in private/public clouds. We demonstrate the advantage of our system using experimental results as well as real-world, production use cases. Katib has active contributors from multiple companies and is open-sourced at \emph{https://github.com/kubeflow/katib} under the Apache 2.0 license.