快速使用微型变压器网络进行数字全息显微镜的快速自动关注

论文标题

快速使用微型变压器网络进行数字全息显微镜的快速自动关注

Fast Autofocusing using Tiny Transformer Networks for Digital Holographic Microscopy

论文作者

Cuenat, Stéphane, Andréoli, Louis, André, Antoine N., Sandoz, Patrick, Laurent, Guillaume J., Couturier, Raphaël, Jacquot, Maxime

论文摘要

数字全息图的数值波前反向传播原理赋予了独特的扩展焦点功能，而没有沿Z轴的机械位移。但是，确定正确的聚焦距离是一个非平凡且耗时的问题。提出了一种深度学习（DL）解决方案，以将自动关注作为回归问题施放，并在实验和模拟全息图中进行了测试。单波长数字全息图是通过数字全息显微镜（DHM）记录的，其具有10 $ \ MATHRM {X} $显微镜物镜的10 $ \ MATHRM {x} $显微镜物镜，该目标是在3D中移动的92 $μ$ m的3D。提出了小型DL模型，并进行了比较，例如微小的视觉变压器（TVIT），Tiny VGG16（TVGG）和小型Swin-TransFomer（TSWINT）。将所提出的微型网络与其原始版本（VIT/B16，VGG16和SWIN-TRANSFORMER TINY）以及用于数字全息图（如Lenet和Alexnet）中的主要神经网络进行了比较。实验表明，预测的聚焦距离$ z_r^{\ mathrm {pred}} $准确地推断出，与15 $μ$ m的DHM野外深度相比，准确的准确度为1.2 $μ$ m。数值模拟表明，所有微型模型都给出了$ z_r^{\ mathrm {pred}} $，错误低于0.3 $μ$ m。这样的前景将显着提高计算机视觉位置感测的当前功能，例如生命科学或微生物制作的3D显微镜。此外，所有模型都达到CPU的推理时间，每次推理不到25 ms。在遮挡方面，基于其变压器体系结构的TVIT是最强大的。

The numerical wavefront backpropagation principle of digital holography confers unique extended focus capabilities, without mechanical displacements along z-axis. However, the determination of the correct focusing distance is a non-trivial and time consuming issue. A deep learning (DL) solution is proposed to cast the autofocusing as a regression problem and tested over both experimental and simulated holograms. Single wavelength digital holograms were recorded by a Digital Holographic Microscope (DHM) with a 10$\mathrm{x}$ microscope objective from a patterned target moving in 3D over an axial range of 92 $μ$m. Tiny DL models are proposed and compared such as a tiny Vision Transformer (TViT), tiny VGG16 (TVGG) and a tiny Swin-Transfomer (TSwinT). The proposed tiny networks are compared with their original versions (ViT/B16, VGG16 and Swin-Transformer Tiny) and the main neural networks used in digital holography such as LeNet and AlexNet. The experiments show that the predicted focusing distance $Z_R^{\mathrm{Pred}}$ is accurately inferred with an accuracy of 1.2 $μ$m in average in comparison with the DHM depth of field of 15 $μ$m. Numerical simulations show that all tiny models give the $Z_R^{\mathrm{Pred}}$ with an error below 0.3 $μ$m. Such a prospect would significantly improve the current capabilities of computer vision position sensing in applications such as 3D microscopy for life sciences or micro-robotics. Moreover, all models reach an inference time on CPU, inferior to 25 ms per inference. In terms of occlusions, TViT based on its Transformer architecture is the most robust.

下载PDF全文

下载文献需遵守相关版权规定

论文标题