测试机器学习模型有限样本：工业真空抽水应用

论文标题

测试机器学习模型有限样本：工业真空抽水应用

Testing of Machine Learning Models with Limited Samples: An Industrial Vacuum Pumping Application

论文作者

Chatterjee, Ayan, Ahmed, Bestoun S., Hallin, Erik, Engman, Anton

论文摘要

工业生产中的机器学习（ML）分类和回归模型的培训数据通常稀缺，尤其是对于耗时或稀疏的制造过程。大多数有限的地面数据数据用于培训，而少数样品则用于测试。在这里，测试样品的数量不足以正确评估正在测试的分类和回归测试的ML模型的鲁棒性。此外，如果输入数据与预期不同，这些ML模型的输出可能不准确甚至失败。在精制钢铁工业的电磁架再效力（ESR）过程中使用的ML模型就是这种情况，以预测真空腔中的压力。一旦工作日在一年的抽水进行培训和测试的一年中，一次真空抽水事件就会发生几百个样本。在没有足够的培训和测试样本的情况下，本文首先提出了一种基于真空抽水原理生成新鲜的增强样品的方法。根据生成的增强样品，提出了三个测试场景和一个测试甲骨文，以评估用于工业规模生产的ML模型的鲁棒性。实验是通过从Uddeholms AB Steel Company获得的实际工业生产数据进行的。评估表明，当使用拟议的测试策略对增强数据进行培训时，合奏和神经网络是最强大的。该评估还证明了所提出的方法在检查和改善ML算法在这种情况下的鲁棒性方面的有效性。这项工作改善了在类似设置中的软件测试的最新鲁棒性测试。最后，本文介绍了实时ML模型预测和动作的MLOPS实现，并在边缘节点上进行了操作，并从云中自动连续交付ML软件。

There is often a scarcity of training data for machine learning (ML) classification and regression models in industrial production, especially for time-consuming or sparsely run manufacturing processes. A majority of the limited ground-truth data is used for training, while a handful of samples are left for testing. Here, the number of test samples is inadequate to properly evaluate the robustness of the ML models under test for classification and regression. Furthermore, the output of these ML models may be inaccurate or even fail if the input data differ from the expected. This is the case for ML models used in the Electroslag Remelting (ESR) process in the refined steel industry to predict the pressure in a vacuum chamber. A vacuum pumping event that occurs once a workday generates a few hundred samples in a year of pumping for training and testing. In the absence of adequate training and test samples, this paper first presents a method to generate a fresh set of augmented samples based on vacuum pumping principles. Based on the generated augmented samples, three test scenarios and one test oracle are presented to assess the robustness of an ML model used for production on an industrial scale. Experiments are conducted with real industrial production data obtained from Uddeholms AB steel company. The evaluations indicate that Ensemble and Neural Network are the most robust when trained on augmented data using the proposed testing strategy. The evaluation also demonstrates the proposed method's effectiveness in checking and improving ML algorithms' robustness in such situations. The work improves software testing's state-of-the-art robustness testing in similar settings. Finally, the paper presents an MLOps implementation of the proposed approach for real-time ML model prediction and action on the edge node and automated continuous delivery of ML software from the cloud.

下载PDF全文

下载文献需遵守相关版权规定

论文标题