论文标题

量子主成分分析的协方差矩阵准备

Covariance matrix preparation for quantum principal component analysis

论文作者

Gordon, Max Hunter, Cerezo, M., Cincio, Lukasz, Coles, Patrick J.

论文摘要

主成分分析(PCA)是数据分析中的维度降低方法,涉及对角度化数据集的协方差矩阵。最近,基于对角度矩阵的对角度,已为PCA制定了量子算法。这些算法假定可以用密度矩阵编码协方差矩阵,但是缺乏该编码的具体协议。我们的工作旨在解决这一差距。假设数据的振幅编码,并带有集合$ \ {p_i,|给出的数据然后,ψ_i\ rangle \} $,那么人们可以轻松准备集合平均密度矩阵$ \overlineρ= \ sum_i p_i |ψ_i\ rangle \ langle \ langleψ_i| $。我们首先证明$ \overlineρ$恰好是数据集中的协方差矩阵。对于量子数据集,我们利用全局阶段对称性来争辩说,始终存在与$ \overlineρ$一致的中心数据集,因此$ \overlineρ$始终可以解释为协方差矩阵。这提供了一种简单的手段,用于准备任意量子数据集或中心经典数据集的协方差矩阵。对于未经输入的经典数据集,我们的方法是所谓的“无核心”,我们将其解释为对称数据集中的PCA。我们认为这与标准PCA密切相对应,我们得出了用我们从标准PCA的方法获得的频谱偏离的方程式和不等式。我们在数值上说明了MNIST手写数字数据集的方法。我们还认为,量子数据集上的PCA是自然而有意义的,我们从数值上实施了分子基态数据集的方法。

Principal component analysis (PCA) is a dimensionality reduction method in data analysis that involves diagonalizing the covariance matrix of the dataset. Recently, quantum algorithms have been formulated for PCA based on diagonalizing a density matrix. These algorithms assume that the covariance matrix can be encoded in a density matrix, but a concrete protocol for this encoding has been lacking. Our work aims to address this gap. Assuming amplitude encoding of the data, with the data given by the ensemble $\{p_i,| ψ_i \rangle\}$, then one can easily prepare the ensemble average density matrix $\overlineρ = \sum_i p_i |ψ_i\rangle \langle ψ_i |$. We first show that $\overlineρ$ is precisely the covariance matrix whenever the dataset is centered. For quantum datasets, we exploit global phase symmetry to argue that there always exists a centered dataset consistent with $\overlineρ$, and hence $\overlineρ$ can always be interpreted as a covariance matrix. This provides a simple means for preparing the covariance matrix for arbitrary quantum datasets or centered classical datasets. For uncentered classical datasets, our method is so-called "PCA without centering", which we interpret as PCA on a symmetrized dataset. We argue that this closely corresponds to standard PCA, and we derive equations and inequalities that bound the deviation of the spectrum obtained with our method from that of standard PCA. We numerically illustrate our method for the MNIST handwritten digit dataset. We also argue that PCA on quantum datasets is natural and meaningful, and we numerically implement our method for molecular ground-state datasets.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源