Instance-level segmentation survey

##-------------------------------------------------single frame----------------------------------------------------#

1. Path Aggregation Network for Instance Segmentation

1.PANet: Path aggregation network for instance segmentation_Shu Liu_CVPR 2018
paper link: https://arxiv.org/abs/1803.01534
source code: code will be available ( caffe based)

1> Framework:
2> Contributions:
1bottom-up network structure
(1) Bottom-up Path Augmentation: shorten the distance among lower and topmost feature levels for reliable information passing
(2) Adaptive Feature Pooling:  pool features from all feature levels
(3) Fully-connected Fusion: the complementary path is augmented to enrich feature for each proposal.

3> Performance:
(1)3~4 point better thatn Mask-RCNN on COCO, Cityscapes.
(2) 1st for instance segmentation and 2nd for detection on COCO.

4> disadvantages:
(1) maybe slow.
(2) computing expensive and memory consuming.


2. Pose2Seg:

2. Pose2Seg: Human Instance Segmentation Without Detection
paper link:https://arxiv.org/abs/1803.10683
code: no code avaliable

workflow: pose-> segmentation
method category: bottom_up  instance segmentation
key contribution: propose an align module based on human pose, called AffineAlign
observation: human pose can be as a guide for human segmentation


details: 17-channel part confidence map and a 22-channel PAFs map

Performance:
1. for medium and large objects on COCO dataset, similar performance has been achieved.
2. using gt_keypoints as input,  pose2seg achieve better performance than Mask-RCNN in heavy occlusions.


3. Pixelwise Instance Segmentation with a Dynamically Instantiated Network

paper link: https://arxiv.org/abs/1704.02386
code: no source code avaliable

Steps:
     1. We use the semantic segmentation result, along with the outputs of an object detector,
     2. to compute the unary potentials of a Conditional Random Field (CRF) defined over object instances.
    3. We perform mean field inference in this random field to obtain the Maximum a Posteriori (MAP) estimate, which is our labelling.
    4. Although our network consists of two conceptually different parts – a semantic segmentation module, and an instance segmentation network – the entire pipeline is fully differentiable, given object detections, and trained end-to-end.

##-------------------------------------------------two frames----------------------------------------------------##

##------------------------------------------------multiple frames-------------------------------------------------##
1. MaskRNN( can be combined with the ''pick-mask' idea)
MaskRNN: Instance Level Video Object Segmentation_Yuanting Hu_NIPS2017


Input: Gt_masks of the first frame and color image
Performance: Improve instance-segmentation on Davis(originally for VOT), about 4 percents.
Advantages: Making use of temporal information(consecutive frames) and
Disadvantages: Can't deal with MOT( with variable objects for different frames)
Details: 2 frames(in theory)    7 frames(in practical)

2. Detect-and-Track: Efficient Pose Estimation in Videos_

Detect-and-Track: Efficient Pose Estimation in Videos_Rohit Girdhar_cvpr2018

paper link: https://arxiv.org/abs/1712.09184

Performance:
Achieves state of the art performance on the ICCV 2017 PoseTrack keypoint tracking challenge


Comments

Popular posts from this blog

github accumulation

7. compile faster-r-cnn