Github上目前最热门的实时实例分割开源库YOLACT

YOLACT是ICCV2019收到的实时实例分割paper。

YOLACT提出的实时实例分割算法最近被作者扩展为YOLACT++：更好的实时实例分割。它的resnet50模型运行在Titan Xp上。速度达到33.5fps，在COCO的test dev数据集上达到34.1map，而且代码是开源的。

作者来自自加州大学戴维斯分校：

作者提出了一个简单的，全卷积的实时（> 30 fps）实例分割模型，该模型在单个Titan Xp上评估的MS COCO上取得了非常有竞争性结果，其速度明显快于任何现有技术方法。

此外是在一个GPU上训练后即可获得此结果。

作者通过将实例分割分为两个并行的子任务来完成此任务：

（1）生成一组原型蒙版

（2）预测每个实例的蒙版系数。

（3）通过将原型与模板系数线性组合来生成实例模板。

由于此过程不依赖于回收，因此此方法可产生非常高质量的蒙版并免费显示时间稳定性。分析了原型的涌现行为后，并显示了它们是完全卷积的，但学会了以翻译变体的方式自行定位实例。

作者还提出了快速NMS，这是对标准NMS的12毫秒快速替代，仅会影响性能。最后，通过将可变形卷积合并到骨干网络中，使用更好的锚定比例和长宽比优化预测头，并添加新颖的快速蒙版重新评分分支，我们的YOLACT ++模型可以在33.5 fps的MS COCO上实现34.1 mAP，即仍然非常先进，同时仍在实时运行。

下图显示了YOLACT/YOLACT++与其他实例分割算法的速度和精度比较：

由此可见，YOLACT级数具有很大的速度优势，YOLACT++在YOLACT的基础上提高了精度。

这些结果不是后处理的，而是在GPU上实时运行的。

YOLACT网络架构：

图2：YOLACT体系结构蓝色/黄色表示原型中的低/高值，灰色节点表示未经训练的功能，在此示例中，k = 4。我们使用ResNet-101 + FPN在RetinaNet [25]基础上建立了该架构。

YOLACT评估结果基于COCO的测试开发集。该基本模型在33.0 fps时达到29.8 mAP。所有图像的置信度阈值均设置为0.3。

与COCO数据集上其他算法的更详细比较结果：

表1：MS COCO结果我们将最先进的掩模mAP方法与COCO test-dev上的速度进行了比较，并包括了一些基本模型的删节，不同的骨干网络和图像大小。我们表示具有网络深度功能的骨干架构，其中R和D分别指ResNet和DarkNet。我们的基本模型，带有ResNet-101的YOLACT-550，比以前的具有竞争性口罩mAP的最快方法快3.9倍。我们的带有ResNet-50的YOLACT ++-550模型具有相同的速度，同时将基本模型的性能提高了4.3 mAP。与Mask R-CNN相比，YOLACT ++-R-50快3.9倍，仅落后1.6 mAP。

YOLACT/YOLACT++实现了最快的速度，同时获得了良好的分割精度。

作者已经开放了几个模型：

我们来看看如何使用：

关于COCO的定量结果

# Quantitatively evaluate a trained model on the entire validation set. Make sure you have COCO downloaded as above.# This should get 29.92 validation mask mAP last time I checked.python eval.py --trained_model=weights/yolact_base_54_800000.pth# Output a COCOEval json to submit to the website or to use the run_coco_eval.py script.# This command will create './results/bbox_detections.json' and './results/mask_detections.json' for detection and instance segmentation respectively.python eval.py --trained_model=weights/yolact_base_54_800000.pth --output_coco_json# You can run COCOEval on the files created in the previous command. The performance should match my implementation in eval.py.python run_coco_eval.py# To output a coco json file for test-dev, make sure you have test-dev downloaded from above and gopython eval.py --trained_model=weights/yolact_base_54_800000.pth --output_coco_json --dataset=coco2017_testdev_dataset

COCO的定性结果

# Display qualitative results on COCO. From here on I'll use a confidence threshold of 0.15.python eval.py --trained_model=weights/yolact_base_54_800000.pth --score_threshold=0.15 --top_k=15 --display

COCO基准

# Run just the raw model on the first 1k images of the validation setpython eval.py --trained_model=weights/yolact_base_54_800000.pth --benchmark --max_images=1000

Images

# Display qualitative results on the specified image.python eval.py --trained_model=weights/yolact_base_54_800000.pth --score_threshold=0.15 --top_k=15 --image=my_image.png# Process an image and save it to another file.python eval.py --trained_model=weights/yolact_base_54_800000.pth --score_threshold=0.15 --top_k=15 --image=input_image.png:output_image.png# Process a whole folder of images.python eval.py --trained_model=weights/yolact_base_54_800000.pth --score_threshold=0.15 --top_k=15 --images=path/to/input/folder:path/to/output/folder

Video

# Display a video in real-time. '--video_multiframe' will process that many frames at once for improved performance.# If you want, use '--display_fps' to draw the FPS directly on the frame.python eval.py --trained_model=weights/yolact_base_54_800000.pth --score_threshold=0.15 --top_k=15 --video_multiframe=4 --video=my_video.mp4# Display a webcam feed in real-time. If you have multiple webcams pass the index of the webcam you want instead of 0.python eval.py --trained_model=weights/yolact_base_54_800000.pth --score_threshold=0.15 --top_k=15 --video_multiframe=4 --video=0# Process a video and save it to another file. This uses the same pipeline as the ones above now, so it's fast!python eval.py --trained_model=weights/yolact_base_54_800000.pth --score_threshold=0.15 --top_k=15 --video_multiframe=4 --video=input_video.mp4:output_video.mp4

训练

# Trains using the base config with a batch size of 8 (the default).python train.py --config=yolact_base_config# Trains yolact_base_config with a batch_size of 5. For the 550px models, 1 batch takes up around 1.5 gigs of VRAM, so specify accordingly.python train.py --config=yolact_base_config --batch_size=5# Resume training yolact_base with a specific weight file and start from the iteration specified in the weight file's name.python train.py --config=yolact_base_config --resume=weights/yolact_base_10_32100.pth --start_iter=-1# Use the help option to see a description of all available command line argumentspython train.py --help

看下分割结果示例

论文地址：

https://arxiv.org/pdf/1912.06218.pdf

开源地址：

https://github.com/dbolya/yolact

本站仅提供存储服务，所有内容均由用户发布，如发现有害或侵权内容，请点击举报。