Instance-level segmentation survey
##-------------------------------------------------single frame----------------------------------------------------#
1. Path Aggregation Network for Instance Segmentation
1.PANet: Path aggregation network for instance segmentation_Shu Liu_CVPR 2018
paper link: https://arxiv.org/abs/1803.01534
source code: code will be available ( caffe based)
1> Framework:
2> Contributions:
1bottom-up network structure
(1) Bottom-up Path Augmentation: shorten the distance among lower and topmost feature levels for reliable information passing
(2) Adaptive Feature Pooling: pool features from all feature levels
(3) Fully-connected Fusion: the complementary path is augmented to enrich feature for each proposal.
3> Performance:
(1)3~4 point better thatn Mask-RCNN on COCO, Cityscapes.
(2) 1st for instance segmentation and 2nd for detection on COCO.
4> disadvantages:
(1) maybe slow.
(2) computing expensive and memory consuming.
2. Pose2Seg:
2. Pose2Seg: Human Instance Segmentation Without Detection
paper link:https://arxiv.org/abs/1803.10683
code: no code avaliable
workflow: pose-> segmentation
method category: bottom_up instance segmentation
key contribution: propose an align module based on human pose, called AffineAlign
observation: human pose can be as a guide for human segmentation
details: 17-channel part confidence map and a 22-channel PAFs map
Performance:
1. for medium and large objects on COCO dataset, similar performance has been achieved.
2. using gt_keypoints as input, pose2seg achieve better performance than Mask-RCNN in heavy occlusions.
3. Pixelwise Instance Segmentation with a Dynamically Instantiated Network
paper link: https://arxiv.org/abs/1704.02386
code: no source code avaliable
Steps:
1. We use the semantic segmentation result, along with the outputs of an object detector,
2. to compute the unary potentials of a Conditional Random Field (CRF) defined over object instances.
3. We perform mean field inference in this random field to obtain the Maximum a Posteriori (MAP) estimate, which is our labelling.
4. Although our network consists of two conceptually different parts – a semantic segmentation module, and an instance segmentation network – the entire pipeline is fully differentiable, given object detections, and trained end-to-end.
##-------------------------------------------------two frames----------------------------------------------------##
##------------------------------------------------multiple frames-------------------------------------------------##
1. MaskRNN( can be combined with the ''pick-mask' idea)
MaskRNN: Instance Level Video Object Segmentation_Yuanting Hu_NIPS2017
Input: Gt_masks of the first frame and color image
Performance: Improve instance-segmentation on Davis(originally for VOT), about 4 percents.
Advantages: Making use of temporal information(consecutive frames) and
Disadvantages: Can't deal with MOT( with variable objects for different frames)
Details: 2 frames(in theory) 7 frames(in practical)
2. Detect-and-Track: Efficient Pose Estimation in Videos_
1. Path Aggregation Network for Instance Segmentation
1.PANet: Path aggregation network for instance segmentation_Shu Liu_CVPR 2018
paper link: https://arxiv.org/abs/1803.01534
source code: code will be available ( caffe based)
1> Framework:
2> Contributions:
1bottom-up network structure
(1) Bottom-up Path Augmentation: shorten the distance among lower and topmost feature levels for reliable information passing
(2) Adaptive Feature Pooling: pool features from all feature levels
(3) Fully-connected Fusion: the complementary path is augmented to enrich feature for each proposal.
3> Performance:
(1)3~4 point better thatn Mask-RCNN on COCO, Cityscapes.
(2) 1st for instance segmentation and 2nd for detection on COCO.
4> disadvantages:
(1) maybe slow.
(2) computing expensive and memory consuming.
2. Pose2Seg:
2. Pose2Seg: Human Instance Segmentation Without Detection
paper link:https://arxiv.org/abs/1803.10683
code: no code avaliable
workflow: pose-> segmentation
method category: bottom_up instance segmentation
key contribution: propose an align module based on human pose, called AffineAlign
observation: human pose can be as a guide for human segmentation
Performance:
1. for medium and large objects on COCO dataset, similar performance has been achieved.
2. using gt_keypoints as input, pose2seg achieve better performance than Mask-RCNN in heavy occlusions.
3. Pixelwise Instance Segmentation with a Dynamically Instantiated Network
paper link: https://arxiv.org/abs/1704.02386
code: no source code avaliable
Steps:
1. We use the semantic segmentation result, along with the outputs of an object detector,
2. to compute the unary potentials of a Conditional Random Field (CRF) defined over object instances.
3. We perform mean field inference in this random field to obtain the Maximum a Posteriori (MAP) estimate, which is our labelling.
4. Although our network consists of two conceptually different parts – a semantic segmentation module, and an instance segmentation network – the entire pipeline is fully differentiable, given object detections, and trained end-to-end.
##-------------------------------------------------two frames----------------------------------------------------##
##------------------------------------------------multiple frames-------------------------------------------------##
1. MaskRNN( can be combined with the ''pick-mask' idea)
MaskRNN: Instance Level Video Object Segmentation_Yuanting Hu_NIPS2017
Input: Gt_masks of the first frame and color image
Performance: Improve instance-segmentation on Davis(originally for VOT), about 4 percents.
Advantages: Making use of temporal information(consecutive frames) and
Disadvantages: Can't deal with MOT( with variable objects for different frames)
Details: 2 frames(in theory) 7 frames(in practical)
2. Detect-and-Track: Efficient Pose Estimation in Videos_
Detect-and-Track: Efficient Pose Estimation in Videos_Rohit Girdhar_cvpr2018
paper link: https://arxiv.org/abs/1712.09184
source code: https://github.com/facebookresearch/DetectAndTrack (check the video)
Performance:
Achieves state of the art performance on the ICCV 2017 PoseTrack keypoint tracking challenge
Comments
Post a Comment