YOLO置置：使用对象关键点相似性损失来增强多人姿势估算的YOLO

论文标题

YOLO置置：使用对象关键点相似性损失来增强多人姿势估算的YOLO

YOLO-Pose: Enhancing YOLO for Multi Person Pose Estimation Using Object Keypoint Similarity Loss

论文作者

Maji, Debapriya, Nagori, Soyeb, Mathew, Manu, Poddar, Deepak

论文摘要

我们基于流行的Yolo对象检测框架，在图像中引入了Yolo-Pose，这是一种新型无热图的方法，用于无热图的接头检测方法和2D多人姿势估计。现有的基于热图的两阶段方法是最佳的，因为它们不是端到端的训练，并且培训依赖于替代L1损失，而替代的L1损失不等于最大化评估度量，即对象关键点相似性（OKS）。我们的框架使我们能够端到端训练模型并优化OKS指标本身。拟议的模型学会了共同检测多个人的边界框及其相应的2D姿势，从而在单个前向通行证中，从而带来了自上而下和自下而上的方法。建议的方法不需要自下而上的方法在每个边界框都有相关姿势的情况下对骨骼进行分组的关键，从而导致关键点的固有分组。与自上而下的方法不同，多个前向通行证被取消了，因为所有人都在单个推论中局部定位。 Yolo pose在可可验证（90.2％AP50）和Test-DEV集（90.3％AP50）方面取得了新的最先进结果，在没有翻盖测试，多尺度测试或任何其他测试时间扩大的情况下，超过了单个正向通行证中所有现有的自下而上方法。本文中报告的所有实验和结果均无任何测试时间增加，这与传统方法不同，这些方法使用FLIP测试和多尺度测试来提高性能。我们的培训代码将在https://github.com/texasinstruments/edgeai-yolov5和https://github.com/texasinstruments/edgeai-yolox上公开提供。

We introduce YOLO-pose, a novel heatmap-free approach for joint detection, and 2D multi-person pose estimation in an image based on the popular YOLO object detection framework. Existing heatmap based two-stage approaches are sub-optimal as they are not end-to-end trainable and training relies on a surrogate L1 loss that is not equivalent to maximizing the evaluation metric, i.e. Object Keypoint Similarity (OKS). Our framework allows us to train the model end-to-end and optimize the OKS metric itself. The proposed model learns to jointly detect bounding boxes for multiple persons and their corresponding 2D poses in a single forward pass and thus bringing in the best of both top-down and bottom-up approaches. Proposed approach doesn't require the postprocessing of bottom-up approaches to group detected keypoints into a skeleton as each bounding box has an associated pose, resulting in an inherent grouping of the keypoints. Unlike top-down approaches, multiple forward passes are done away with since all persons are localized along with their pose in a single inference. YOLO-pose achieves new state-of-the-art results on COCO validation (90.2% AP50) and test-dev set (90.3% AP50), surpassing all existing bottom-up approaches in a single forward pass without flip test, multi-scale testing, or any other test time augmentation. All experiments and results reported in this paper are without any test time augmentation, unlike traditional approaches that use flip-test and multi-scale testing to boost performance. Our training codes will be made publicly available at https://github.com/TexasInstruments/edgeai-yolov5 and https://github.com/TexasInstruments/edgeai-yolox

下载PDF全文

下载文献需遵守相关版权规定

论文标题