论文标题

EPEM:多类单调数据的有效参数估计

EPEM: Efficient Parameter Estimation for Multiple Class Monotone Missing Data

论文作者

Nguyen, Thu, Nguyen, Duy H. M., Nguyen, Huy, Nguyen, Binh T., Wade, Bruce A.

论文摘要

在过去的二十年中,已广泛研究了单调丢失数据的问题,并且在不同领域(例如生物信息学或统计)中有许多应用。常用的插补技术需要通过数据进行多次迭代,然后再产生收敛。此外,这些方法可能会引起随后的建模的额外声音和偏见。在这项工作中,我们得出了精确的公式,并提出了一种新型算法来计算多个类别的最大似然估计量(MLE),当所有类别的所有协方差矩阵都假定为相等的所有协方差矩阵时,单调缺失数据集。然后,我们说明我们提出的方法在线性判别分析(LDA)中的应用。由于计算是准确的,因此我们的EPEM算法不需要通过数据作为其他插补方法进行多次迭代,因此有望比其他方法要耗时要少得多。当EPEM显着降低错误率并需要短暂的计算时间与几种基于插补的方法相比,经验结果验证了这种有效性。我们还将实验的所有代码和数据发布在一个GitHub存储库中,以促进与此问题相关的研究社区。

The problem of monotone missing data has been broadly studied during the last two decades and has many applications in different fields such as bioinformatics or statistics. Commonly used imputation techniques require multiple iterations through the data before yielding convergence. Moreover, those approaches may introduce extra noises and biases to the subsequent modeling. In this work, we derive exact formulas and propose a novel algorithm to compute the maximum likelihood estimators (MLEs) of a multiple class, monotone missing dataset when all the covariance matrices of all categories are assumed to be equal, namely EPEM. We then illustrate an application of our proposed methods in Linear Discriminant Analysis (LDA). As the computation is exact, our EPEM algorithm does not require multiple iterations through the data as other imputation approaches, thus promising to handle much less time-consuming than other methods. This effectiveness was validated by empirical results when EPEM reduced the error rates significantly and required a short computation time compared to several imputation-based approaches. We also release all codes and data of our experiments in one GitHub repository to contribute to the research community related to this problem.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源