分层神经架构寻找深度立体声匹配

论文标题

分层神经架构寻找深度立体声匹配

Hierarchical Neural Architecture Search for Deep Stereo Matching

论文作者

Cheng, Xuelian, Zhong, Yiran, Harandi, Mehrtash, Dai, Yuchao, Chang, Xiaojun, Drummond, Tom, Li, Hongdong, Ge, Zongyuan

论文摘要

为了减少人类在神经网络设计中的努力，神经体系结构搜索（NAS）已在各种高级视力任务（例如分类和语义分割）上取得了显着成功。 NAS算法的基本思想是直接的，即，可以使网络能够在一组操作中进行选择（例如，具有不同滤波器尺寸的卷积），人们可以找到一个更好地适应当前问题的最佳体系结构。但是，到目前为止，低级几何视觉任务（例如立体声匹配）尚未获得NAS的成功。这部分是由于人类设计的最先进的深度立体声匹配网络已经大大了。基于当前可用的主流计算资源，直接将NAS应用于此类庞大的结构是计算上的过敏性。在本文中，我们通过将特定于任务的人类知识纳入神经体系结构搜索框架中，提出了第一个端到端的端到端NAS NAS NAS框架。具体而言，遵循用于深立体声匹配的黄金标准管道（即功能提取 - 特征量构造和密集匹配），我们共同优化了整个管道的架构。广泛的实验表明，我们的搜索网络的表现优于所有最先进的立体声匹配体系结构，并在Kitti Stereo 2012，2015和Middlebury Benchmarks上排名前1的准确性，并且在Faceflow DataSet上排名前1位，对网络的大小和推荐速度进行了实质性的改进。该代码可从https://github.com/xueliancheng/leastereo获得。

To reduce the human efforts in neural network design, Neural Architecture Search (NAS) has been applied with remarkable success to various high-level vision tasks such as classification and semantic segmentation. The underlying idea for the NAS algorithm is straightforward, namely, to enable the network the ability to choose among a set of operations (e.g., convolution with different filter sizes), one is able to find an optimal architecture that is better adapted to the problem at hand. However, so far the success of NAS has not been enjoyed by low-level geometric vision tasks such as stereo matching. This is partly due to the fact that state-of-the-art deep stereo matching networks, designed by humans, are already sheer in size. Directly applying the NAS to such massive structures is computationally prohibitive based on the currently available mainstream computing resources. In this paper, we propose the first end-to-end hierarchical NAS framework for deep stereo matching by incorporating task-specific human knowledge into the neural architecture search framework. Specifically, following the gold standard pipeline for deep stereo matching (i.e., feature extraction -- feature volume construction and dense matching), we optimize the architectures of the entire pipeline jointly. Extensive experiments show that our searched network outperforms all state-of-the-art deep stereo matching architectures and is ranked at the top 1 accuracy on KITTI stereo 2012, 2015 and Middlebury benchmarks, as well as the top 1 on SceneFlow dataset with a substantial improvement on the size of the network and the speed of inference. The code is available at https://github.com/XuelianCheng/LEAStereo.

下载PDF全文

下载文献需遵守相关版权规定

论文标题