博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
proposal_layer.py层解读
阅读量:6376 次
发布时间:2019-06-23

本文共 8595 字,大约阅读时间需要 28 分钟。

proposal_layer层是利用训练好的rpn网络来生成region proposal供fast rcnn使用。

proposal_layer整个处理过程:1.生成所有的anchor,对anchor进行4个坐标变换生成新的坐标变成proposals(按照老方法先在最后一层feature map的每个像素点上滑动生成所有的anchor,然后将所有的anchor坐标乘以16,即映射到原图就得到所有的region proposal,接着再用boundingbox regression对每个region proposal进行坐标变换生成更优的region proposal坐标,也是最终的region proposal坐标)  2.处理掉所有坐标超过了图像边界的proposal  3.处理掉所有长度宽度小于min_size的proposal  4.把所有的proposal按score高低进行排序  5.选择得分前pre_nms_topN的proposal,这是在进行nms前进行一次选择  6.进行nms处理  7.选择得分前post_nms_topN的proposal,这是在进行nms后进行的一次选择  最终就得到了需要传入fast rcnn网络的region proposal。

# --------------------------------------------------------# Faster R-CNN# Copyright (c) 2015 Microsoft# Licensed under The MIT License [see LICENSE for details]# Written by Ross Girshick and Sean Bell# --------------------------------------------------------import caffeimport numpy as npimport yamlfrom fast_rcnn.config import cfgfrom generate_anchors import generate_anchorsfrom fast_rcnn.bbox_transform import bbox_transform_inv, clip_boxesfrom fast_rcnn.nms_wrapper import nmsDEBUG = Falseclass ProposalLayer(caffe.Layer):    """    Outputs object detection proposals by applying estimated bounding-box    transformations to a set of regular boxes (called "anchors").    """    def setup(self, bottom, top):        # parse the layer parameter string, which must be valid YAML        layer_params = yaml.load(self.param_str_)        self._feat_stride = layer_params['feat_stride']        anchor_scales = layer_params.get('scales', (8, 16, 32))        self._anchors = generate_anchors(scales=np.array(anchor_scales))        self._num_anchors = self._anchors.shape[0]        if DEBUG:            print 'feat_stride: {}'.format(self._feat_stride)            print 'anchors:'            print self._anchors        # rois blob: holds R regions of interest, each is a 5-tuple        # (n, x1, y1, x2, y2) specifying an image batch index n and a        # rectangle (x1, y1, x2, y2)        top[0].reshape(1, 5)        # scores blob: holds scores for R regions of interest        if len(top) > 1:            top[1].reshape(1, 1, 1, 1)    def forward(self, bottom, top):        # Algorithm:        #        # for each (H, W) location i        #   generate A anchor boxes centered on cell i        #   apply predicted bbox deltas at cell i to each of the A anchors        # clip predicted boxes to image        # remove predicted boxes with either height or width < threshold        # sort all (proposal, score) pairs by score from highest to lowest        # take top pre_nms_topN proposals before NMS        # apply NMS with threshold 0.7 to remaining proposals        # take after_nms_topN proposals after NMS        # return the top proposals (-> RoIs top, scores top)        assert bottom[0].data.shape[0] == 1, \            'Only single item batches are supported'        cfg_key = str(self.phase) # either 'TRAIN' or 'TEST'        pre_nms_topN  = cfg[cfg_key].RPN_PRE_NMS_TOP_N         #这是在进行nms处理前,从anchor中筛选出前topn个        post_nms_topN = cfg[cfg_key].RPN_POST_NMS_TOP_N        #这是经过nms处理后,从anchor中筛选出钱topn个        nms_thresh    = cfg[cfg_key].RPN_NMS_THRESH        min_size      = cfg[cfg_key].RPN_MIN_SIZE        # the first set of _num_anchors channels are bg probs        # the second set are the fg probs, which we want        scores = bottom[0].data[:, self._num_anchors:, :, :]        bbox_deltas = bottom[1].data                    #和anchor_target_layer层一样,获得训练得到4个变化值        im_info = bottom[2].data[0, :]        if DEBUG:            print 'im_size: ({}, {})'.format(im_info[0], im_info[1])            print 'scale: {}'.format(im_info[2])        # 1. Generate proposals from bbox deltas and shifted anchors        height, width = scores.shape[-2:]                  #这里和anchor_target_layer层一样,都是通过rpn_cls_score得到最后一层特征提取层的长度和宽度if DEBUG:            print 'score map size: {}'.format(scores.shape)        # Enumerate all shifts        shift_x = np.arange(0, width) * self._feat_stride        shift_y = np.arange(0, height) * self._feat_stride        shift_x, shift_y = np.meshgrid(shift_x, shift_y)        shifts = np.vstack((shift_x.ravel(), shift_y.ravel(),                            shift_x.ravel(), shift_y.ravel())).transpose()        # Enumerate all shifted anchors:        #        # add A anchors (1, A, 4) to        # cell K shifts (K, 1, 4) to get        # shift anchors (K, A, 4)        # reshape to (K*A, 4) shifted anchors        A = self._num_anchors        K = shifts.shape[0]        anchors = self._anchors.reshape((1, A, 4)) + \                  shifts.reshape((1, K, 4)).transpose((1, 0, 2))        anchors = anchors.reshape((K * A, 4))                        #和anchor_target_layer层一样,得到所有的anchor坐标值,并且形状是4列多行         # Transpose and reshape predicted bbox transformations to get them        # into the same order as the anchors:        #        # bbox deltas will be (1, 4 * A, H, W) format        # transpose to (1, H, W, 4 * A)        # reshape to (1 * H * W * A, 4) where rows are ordered by (h, w, a)        # in slowest to fastest order        bbox_deltas = bbox_deltas.transpose((0, 2, 3, 1)).reshape((-1, 4))     #将bbox_deltas的shape改成和anchors一样,方便下面运算        # Same story for the scores:        #        # scores are (1, A, H, W) format        # transpose to (1, H, W, A)        # reshape to (1 * H * W * A, 1) where rows are ordered by (h, w, a)        scores = scores.transpose((0, 2, 3, 1)).reshape((-1, 1))           #将scores的shape也变成4列多行        # Convert anchors into proposals via bbox transformations        proposals = bbox_transform_inv(anchors, bbox_deltas)              #通过bbox_deltas将anchors转成proposals,        # 2. clip predicted boxes to image        proposals = clip_boxes(proposals, im_info[:2])        # 3. remove predicted boxes with either height or width < threshold        # (NOTE: convert min_size to input image scale stored in im_info[2])        keep = _filter_boxes(proposals, min_size * im_info[2])        proposals = proposals[keep, :]        scores = scores[keep]        # 4. sort all (proposal, score) pairs by score from highest to lowest        # 5. take top pre_nms_topN (e.g. 6000)        order = scores.ravel().argsort()[::-1]        if pre_nms_topN > 0:            order = order[:pre_nms_topN]        proposals = proposals[order, :]        scores = scores[order]        # 6. apply nms (e.g. threshold = 0.7)        # 7. take after_nms_topN (e.g. 300)        # 8. return the top proposals (-> RoIs top)        keep = nms(np.hstack((proposals, scores)), nms_thresh)        if post_nms_topN > 0:            keep = keep[:post_nms_topN]        proposals = proposals[keep, :]        scores = scores[keep]        # Output rois blob        # Our RPN implementation only supports a single input image, so all        # batch inds are 0        batch_inds = np.zeros((proposals.shape[0], 1), dtype=np.float32)        blob = np.hstack((batch_inds, proposals.astype(np.float32, copy=False)))        top[0].reshape(*(blob.shape))        top[0].data[...] = blob        # [Optional] output scores blob        if len(top) > 1:            top[1].reshape(*(scores.shape))            top[1].data[...] = scores    def backward(self, top, propagate_down, bottom):        """This layer does not propagate gradients."""        pass    def reshape(self, bottom, top):        """Reshaping happens during the call to forward."""        passdef _filter_boxes(boxes, min_size):    """Remove all boxes with any side smaller than min_size."""    ws = boxes[:, 2] - boxes[:, 0] + 1    hs = boxes[:, 3] - boxes[:, 1] + 1    keep = np.where((ws >= min_size) & (hs >= min_size))[0]    return keep

这是这一层的prototxt

layer {  name: 'proposal'  type: 'Python'  bottom: 'rpn_cls_prob_reshape'  bottom: 'rpn_bbox_pred'  bottom: 'im_info'  top: 'rois'  top: 'scores'  python_param {    module: 'rpn.proposal_layer'    layer: 'ProposalLayer'    param_str: "'feat_stride': 16"  }}

可以看到,bottom[1]就是rpn_bbox_pred

所以上面代码中的bbox_deltas = bottom[1].data就是训练得到的坐标的4个变化值。因为训练rpn网络,本身训练的就是这4个变化值,而不是直接的4个坐标值。

 

# bbox deltas will be (1, 4 * A, H, W) format        # transpose to (1, H, W, 4 * A)        # reshape to (1 * H * W * A, 4) where rows are ordered by (h, w, a)        # in slowest to fastest order        bbox_deltas = bbox_deltas.transpose((0, 2, 3, 1)).reshape((-1, 4))

代码中的这一部分必须理解一下。实际上,bbox deltas,也就是要学习的那4个变换值。首先必须知道的是,这4个变换值是训练学习来的,是由卷积训练来的,来自于rpn_bbox_pred这一层,他是一个feature map, shape是(4×anchor个数,h,w)。如何将这个feature map和生成的anchor进行变换,首先必须shape一样才能加或者其他运算。所以,这里所做的就是将bbox deltas的shape变成了和anchors一样的4列多行,4列就代表着x,y,w,h。

 

注意:无论是anchors还是bbox deltas,还是scores,他们的shape都是多行4列,排列的顺序都是(h,w,a),即第一行是h,w,a,第二行是h+1,w,a,当h排完了,再排w的变换,最后才是a

转载地址:http://cptqa.baihongyu.com/

你可能感兴趣的文章
C#数据库访问技术之DATAREADER对象读取数据
查看>>
各种排序方法
查看>>
编译时程序透彻理解异常并合理使用异常
查看>>
2013年5月18日星期六
查看>>
js 字符串操作函数集合
查看>>
nullnullCF 312B(Archer-等比数列极限求和)
查看>>
消息函数windows 程序设计 第三章 (下)
查看>>
java中调用web中的jsp或servlet去通知它们做一些操作
查看>>
Javascript 坦克大战
查看>>
JavaScript自动设置IFrame高度(兼容各主流浏览器)
查看>>
Linux内核中__init, __initdata, __initfunc(), asmlinkage, ENTRY(), FASTCALL()等作用
查看>>
leetcode -- Two Sum
查看>>
Windows多线程
查看>>
Resolve PSExec "Access is denied"
查看>>
C语言局部变量和全局变量问题汇总
查看>>
android 下的网络图片加载
查看>>
Paip.语义分析----情绪情感词汇表总结
查看>>
Linux下软件安装,卸载,管理
查看>>
View Programming Guide for iOS_读书笔记[正在更新……]
查看>>
排查VMWare虚拟机的性能问题
查看>>