论文标题

提高顶部标签及其使用Shapley值的解释

Boosted top tagging and its interpretation using Shapley values

论文作者

Bhattacherjee, Biplob, Bose, Camellia, Chakraborty, Amit, Sengupta, Rhitaja

论文摘要

由于顶级夸克在标准模型以外的物理学中的重要作用,因此顶级标签已成为快速发展的主题。对于顶级喷气式飞机的重建,与先前的方法相比,机器学习模型已显示出分类性能的显着改善。在这项工作中,我们使用$ n $ subjettiness的比率和几个能量相关性观察值作为输入功能来构建顶级标签,以训练极端梯度增强的决策树(XGBOOST)。该研究发现,更紧密的Parton级匹配导致更准确的标记。但是,在实际的实验数据中,Parton级别数据未知,则无法完成此匹配。我们在不执行此匹配的情况下训练XGBoost模型,并表明这种差异会影响标签者的有效性。此外,我们在不同的模拟条件下测试标签仪,包括质量中心能量的变化,Parton分布功能(PDF)和堆积效应,证明了其稳健性,性能偏差小于1%。此外,我们使用Shapley添加说明(SHAP)框架来计算训练有素的模型的特征的重要性。它有助于我们估计数据的每个功能有多少贡献了该模型的预测以及哪些区域对每个输入变量更为重要。最后,我们将所有标记变量组合在一起以形成混合标记器,并使用Shapley值解释结果。

Top tagging has emerged as a fast-evolving subject due to the top quark's significant role in probing physics beyond the standard model. For the reconstruction of top jets, machine learning models have shown a substantial improvement in the classification performance compared to the previous methods. In this work, we build top taggers using $N$-Subjettiness ratios and several Energy Correlation observables as input features to train the eXtreme Gradient BOOSTed decision tree (XGBOOST). The study finds that tighter parton-level matching lead to more accurate tagging. However, in real experimental data, where the parton level data are unknown, this matching cannot be done. We train the XGBOOST models without performing this matching and show that this difference impacts the taggers' effectiveness. Additionally, we test the tagger under different simulation conditions, including changes in center-of-mass energy, parton distribution functions (PDFs), and pileup effects, demonstrating its robustness with performance deviations of less than 1%. Furthermore, we use the SHapley Additive exPlanation (SHAP) framework to calculate the importance of the features of the trained models. It helps us to estimate how much each feature of the data contributed to the model's prediction and what regions are of more importance for each input variable. Finally, we combine all the tagger variables to form a hybrid tagger and interpret the results using the Shapley values.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源