论文标题

对协作机器学习模型培训的智能合同的调查

An Investigation of Smart Contract for Collaborative Machine Learning Model Training

论文作者

Ding, Shengwen, Hu, Chenhui

论文摘要

在大数据时代,机器学习(ML)已渗透了各个领域。协作机器学习(CML)比大多数常规ML的优势在于分散的节点或代理的共同努力,从而导致更好的模型性能和泛化。 As the training of ML models requires a massive amount of good quality data, it is necessary to eliminate concerns about data privacy and ensure high-quality data.为了解决这个问题,我们注视着CML和智能合约的整合。基于区块链,智能合约可以自动执行数据保存和验证以及CML模型培训的连续性。在我们的模拟实验中,我们定义了智能合约上的激励机制,调查重要因素,例如数据集中的特征数量(NUM_WORD),培训数据的大小,数据持有人提交数据等的成本,并结论这些因素如何影响模型的绩效指标:在训练有素的模型中的准确度量,将模型的准确置于模型和时间之间的差异,并在模型和时间之间进行隔离,并在时间上差距和时间。例如,NUM_WORDS的值的增加会导致更高的模型准确性,并从我们对实验结果的观察结果中消除了恶意药物的负面影响。统计分析表明,借助智能合约,无效数据的影响有效地减少并保持模型鲁棒性。我们还讨论了现有研究的差距,并提出了未来的进一步工作方向。

Machine learning (ML) has penetrated various fields in the era of big data. The advantage of collaborative machine learning (CML) over most conventional ML lies in the joint effort of decentralized nodes or agents that results in better model performance and generalization. As the training of ML models requires a massive amount of good quality data, it is necessary to eliminate concerns about data privacy and ensure high-quality data. To solve this problem, we cast our eyes on the integration of CML and smart contracts. Based on blockchain, smart contracts enable automatic execution of data preserving and validation, as well as the continuity of CML model training. In our simulation experiments, we define incentive mechanisms on the smart contract, investigate the important factors such as the number of features in the dataset (num_words), the size of the training data, the cost for the data holders to submit data, etc., and conclude how these factors impact the performance metrics of the model: the accuracy of the trained model, the gap between the accuracies of the model before and after simulation, and the time to use up the balance of bad agent. For instance, the increase of the value of num_words leads to higher model accuracy and eliminates the negative influence of malicious agents in a shorter time from our observation of the experiment results. Statistical analyses show that with the help of smart contracts, the influence of invalid data is efficiently diminished and model robustness is maintained. We also discuss the gap in existing research and put forward possible future directions for further works.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源