论文标题

使用基于缩放记忆的初始化(Zombi)快速贝叶斯对针中的针刺问题优化

Fast Bayesian Optimization of Needle-in-a-Haystack Problems using Zooming Memory-Based Initialization (ZoMBI)

论文作者

Siemenn, Alexander E., Ren, Zekun, Li, Qianxiao, Buonassisi, Tonio

论文摘要

在广泛的应用中存在针刺问题,包括罕见疾病预测,生态资源管理,欺诈检测和材料特性优化。当相对于数据集大小的最佳条件存在极端不平衡时,就会出现针中的问题。例如,在开放式材料项目数据库中,只有$ 0.82 \%$ $ 146 $ k的总材料的泊松比率为负。但是,当前的最新优化算法并没有设计用于找到这些挑战性的多维针中的解决方案的解决方案,从而导致与全球最佳最佳或pigeonhol的融合缓慢,将其转化为当地最低限度。在本文中,我们提出了一种基于缩放内存的初始化算法,标题为Zombi。 Zombi从先前表现最佳评估的实验中积极提取知识,以迭代搜索范围朝向全球最佳的“针”,然后预留出低表现的历史实验的记忆,以加速计算时间,从而通过$ O(n^3)$ o(n^3)$($ o(n^3)$($ o(n^3)$ nerrion fortion fortion,$ o(n^3)$($ o(n^3)$($)$(ϕ^3), $ O(1)$多数激活。此外,Zombi实现了两个自定义自适应采集功能,以进一步指导新实验对全局最佳的采样。我们在三个现实世界数据集上验证了该算法的优化性能,这些数据集显示出针刺中的针刺,并在另外的174个分析数据集中进一步强调算法的性能。与传统的贝叶斯优化相比,Zombi算法显示了400倍的计算时间加速度,并且在100个以下的实验中有效地发现了比通过类似方法MIP-EGO,Turbo和Hebo发现的100个实验的最佳时间。

Needle-in-a-Haystack problems exist across a wide range of applications including rare disease prediction, ecological resource management, fraud detection, and material property optimization. A Needle-in-a-Haystack problem arises when there is an extreme imbalance of optimum conditions relative to the size of the dataset. For example, only $0.82\%$ out of $146$k total materials in the open-access Materials Project database have a negative Poisson's ratio. However, current state-of-the-art optimization algorithms are not designed with the capabilities to find solutions to these challenging multidimensional Needle-in-a-Haystack problems, resulting in slow convergence to a global optimum or pigeonholing into a local minimum. In this paper, we present a Zooming Memory-Based Initialization algorithm, entitled ZoMBI. ZoMBI actively extracts knowledge from the previously best-performing evaluated experiments to iteratively zoom in the sampling search bounds towards the global optimum "needle" and then prunes the memory of low-performing historical experiments to accelerate compute times by reducing the algorithm time complexity from $O(n^3)$ to $O(ϕ^3)$ for $ϕ$ forward experiments per activation, which trends to a constant $O(1)$ over several activations. Additionally, ZoMBI implements two custom adaptive acquisition functions to further guide the sampling of new experiments toward the global optimum. We validate the algorithm's optimization performance on three real-world datasets exhibiting Needle-in-a-Haystack and further stress-test the algorithm's performance on an additional 174 analytical datasets. The ZoMBI algorithm demonstrates compute time speed-ups of 400x compared to traditional Bayesian optimization as well as efficiently discovering optima in under 100 experiments that are up to 3x more highly optimized than those discovered by similar methods MiP-EGO, TuRBO, and HEBO.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源