NMS Survery

define the problem:
   (1) single class(pedestrian) or multi-class?   single(pedestrian)
   (2) in Mask-RCNN(Faster-RCNN), NMS be applied in both training and inference, our target just replaces the inference part or both parts?    inference
   (3) at the moment, are we mainly focus on the single image or video frames?    single image
   (4) are we just focus on detection or both detection and segmentation?  detection, but need to use segmentation to help detection.
   (5) shall we use tight_boxes or lose_boxes as gt_annots?  (which make difference for gt_masks)

related papers:

cat1: pairwise(clustering)

1. Non-maximum suppression for object detection by passing messages between windows_Rasmus Rothe_ACCV2014
https://www.vision.ee.ethz.ch/publications/papers/proceedings/eth_biwi_01126.pdf

based on Affinity propagation clustering(APC),
using a Latent Structured SVM(LSSVM) to learn the weights of APC.


cat2: learning based:

2. Learning non-maximum suppression_Jan Hosang_CVPR2017
https://arxiv.org/abs/1705.02950

3. A convnet for non-maximum supression_Jan Hosang_GCPR2016
https://arxiv.org/abs/1511.06437

cat3: other alternatives:

4. Improving object detection with one line of code_ICCV2017
https://arxiv.org/abs/1704.04503

soft-NMS: decrease the detection scores as an increasing function of overlap instead of setting the score to zero as in NMS.
Add caption

Comments

Popular posts from this blog

github accumulation

7. compile faster-r-cnn