论文标题
NN-EMD:使用加密的多源数据集有效训练神经网络
NN-EMD: Efficiently Training Neural Networks using Encrypted Multi-Sourced Datasets
论文作者
论文摘要
通过加密数据集训练机器学习模型是一种解决隐私保护机器学习任务的现有有前途的方法,但是,有效地训练深度神经网络(DNN)模型在加密数据上,这是极具挑战性的,原因有两个:首先,它需要大量大型数据集对大规模计算;其次,现有的计算解决方案在加密数据(例如同构加密)上的效率低下。此外,为了提高DNN模型的性能,我们还需要使用来自来自多个数据源的数据组成的庞大培训数据集,这些数据源可能没有彼此之间没有预先建立的信任关系。我们建议一个新颖的框架NN-EMD,以通过从多个来源收集的多个加密数据集进行训练DNN。在此方面,我们建议使用混合功能加密方案提出一组安全计算协议。我们在MNIST数据集上的训练时间和模型准确性方面评估了框架的性能。与其他现有框架相比,我们提出的NN-EMD框架可以大大减少培训时间,同时提供可比的模型准确性和隐私保证以及支持多个数据源。此外,尽管引入了隐私的NN-EMD设置,但神经网络的深度和复杂性并不会影响训练时间。
Training a machine learning model over an encrypted dataset is an existing promising approach to address the privacy-preserving machine learning task, however, it is extremely challenging to efficiently train a deep neural network (DNN) model over encrypted data for two reasons: first, it requires large-scale computation over huge datasets; second, the existing solutions for computation over encrypted data, such as homomorphic encryption, is inefficient. Further, for an enhanced performance of a DNN model, we also need to use huge training datasets composed of data from multiple data sources that may not have pre-established trust relationships among each other. We propose a novel framework, NN-EMD, to train DNN over multiple encrypted datasets collected from multiple sources. Toward this, we propose a set of secure computation protocols using hybrid functional encryption schemes. We evaluate our framework for performance with regards to the training time and model accuracy on the MNIST datasets. Compared to other existing frameworks, our proposed NN-EMD framework can significantly reduce the training time, while providing comparable model accuracy and privacy guarantees as well as supporting multiple data sources. Furthermore, the depth and complexity of neural networks do not affect the training time despite introducing a privacy-preserving NN-EMD setting.