论文标题
通过张量完成的应用程序性能建模
Application Performance Modeling via Tensor Completion
论文作者
论文摘要
性能调整,软件/硬件共同设计和作业计划是依靠模型来预测应用程序性能的众多任务。我们建议和评估低量张量分解以建模应用程序性能。我们使用常规网格离散应用程序的输入和配置域。在网格单元中映射的应用执行时间平均并由张量元素表示。我们表明,低级别的规范 - 多层张量(CP)张量分解可有效近似这些张量。我们进一步表明,这种分解可以准确推断应用程序参数空间的未观察到的区域。然后,我们使用张量的完成来优化CP分解,并给定一组观察到的执行时间。我们考虑针对六个应用程序的替代性分段/网格模型和监督学习模型,并证明使用张量完成的CP分解为高维性能建模提供了更高的预测准确性和内存效率。
Performance tuning, software/hardware co-design, and job scheduling are among the many tasks that rely on models to predict application performance. We propose and evaluate low-rank tensor decomposition for modeling application performance. We discretize the input and configuration domains of an application using regular grids. Application execution times mapped within grid-cells are averaged and represented by tensor elements. We show that low-rank canonical-polyadic (CP) tensor decomposition is effective in approximating these tensors. We further show that this decomposition enables accurate extrapolation of unobserved regions of an application's parameter space. We then employ tensor completion to optimize a CP decomposition given a sparse set of observed execution times. We consider alternative piecewise/grid-based models and supervised learning models for six applications and demonstrate that CP decomposition optimized using tensor completion offers higher prediction accuracy and memory-efficiency for high-dimensional performance modeling.