1. 说在前面的话

在目标检测领域Faster RCNN可以说是无人不知无人不晓，它里面有一个网络结构RPN（Region Proposal Network）用于在特征图上产生候选预测区域。但是呢，这个网络结构具体是怎么工作的呢？网上有很多种解释，但是都是云里雾里的，还是直接撸代码来得直接，这里就直接从代码入手直接撸吧-_-||。
首先，来看一下Faster RCNN中RPN的结构是什么样子的吧。可以看到RPN直接通过一个卷积层rpn_conv/3x3直接接在了分类网络的特征层输出上面，之后接上两个卷积层rpn_clc_score与rpn_bbox_pred分别用于产生前景背景分类与预测框。之后再由python层AnchorTargetLayer产生anchor机制的分类与预测框。然后，经过ROI Proposal产生ROI区域的候选，并通过ROI Pooling规范到相同的尺寸上进行后续处理。大体的结构如下图所示：
在这里插入图片描述
虽然在上面的图中能够对RPN网络有一个比较直观但是笼统的概念，其具体内部搞了啥子，并不清楚。所以还是撸一下它里面的代码看看吧，首先来看RPN模块中各个文件说明。
（1）generate_anchors.py
在[0,0,15,15]基础anchor的基础上生成不同宽高比例以及缩放大小的anchor。
Generates a regular grid of multi-scale, multi-aspect anchor boxes.
（2）proposal_layer.py
将RPN网络的每个anchor的分类得分以及检测框回归预估转换为目标候选
Converts RPN outputs (per-anchor scores and bbox regression estimates) into object proposals.
（3）anchor_target_layer.py
为每个anchor生成训练目标或标签，分类的标签只是0（非目标）1（是目标）-1（忽略）。当分类的标签大于0的时候预测框的回归才被指定。
Generates training targets/labels for each anchor. Classification labels are 1 (object), 0 (not object) or -1 (ignore).
Bbox regression targets are specified when the classification label is > 0.
（4）proposal_target_layer.py
为每个目标候选生成训练目标或标签，分类标签从 $0 - K$ （背景0或目标类别 $\dots, K$ ），自然lable值大于0的才被指定预测框回归。
Generates training targets/labels for each object proposal: classification labels 0 - K (bg or object class 1, … , K)
and bbox regression targets in that case that the label is > 0.
（5）generate.py
使用RPN从IMDB输入数据上产生目标候选。
Generate object detection proposals from an imdb using an RPN.
现在对RPN网络的结构和RPN模块中文件有了一个大体的认识，那么接下来就开始阅读里面的实现代码，看看它究竟干了些什么事情。

2. RPN网络部分

这个部分使用到的文件有anchor_target_layer.py、generate_anchors.py。这里的generate_anchors.py是用来产生模型需要的anchor的，其中也包含了一些其它的辅助函数，它不是讲解说明的重点，这里不作介绍。主要来看anchor_target_layer.py文件。
首先，来看看这个层的初始化函数：

def setup(self, bottom, top):layer_params = yaml.load(self.param_str_)anchor_scales = layer_params.get('scales', (8, 16, 32)) # 尺度变化参数self._anchors = generate_anchors(scales=np.array(anchor_scales)) # 生成默认的9个anchorself._num_anchors = self._anchors.shape[0]self._feat_stride = layer_params['feat_stride']# allow boxes to sit over the edge by a small amount# 设为0，则取出任何超过图像边界的proposals，只要超出一点点，都要去除self._allowed_border = layer_params.get('allowed_border', 0)height, width = bottom[0].data.shape[-2:]if DEBUG:print 'AnchorTargetLayer: height', height, 'width', widthA = self._num_anchors# labels 是否为目标的分类top[0].reshape(1, 1, A * height, width)# bbox_targetstop[1].reshape(1, A * 4, height, width)# bbox_inside_weightstop[2].reshape(1, A * 4, height, width)# bbox_outside_weights
top[3].reshape(1, A * 4, height, width)

接下来就是重头的forward函数，首先，该函数在特征图生成需要运算的总的anchor

# 1. Generate proposals from bbox deltas and shifted anchors
# x方向的偏移个数，大小为特征图的width
shift_x = np.arange(0, width) * self._feat_stride
# y方向的偏移个数，大小为特征图的height
shift_y = np.arange(0, height) * self._feat_stride
# shift_x，shift_y均为width×height的二维数组（meshgrid生成），对应位置的元素组合即构成图像上需要偏移量大小
#（偏移量大小是相对与图像最左上角的那9个anchor的偏移量大小），也就是说总共会得到width×height×9个偏移值对。
# 这些偏移值对与初始的anchor相加即可得到
# 所有的anchors，所以总共会产生width×height×9个anchors，且存储在all_anchors变量中
shift_x, shift_y = np.meshgrid(shift_x, shift_y)
shifts = np.vstack((shift_x.ravel(), shift_y.ravel(),shift_x.ravel(), shift_y.ravel())).transpose() # 维度输出为(width*height)*4
# add A anchors (1, A, 4) to
# cell K shifts (K, 1, 4) to get
# shift anchors (K, A, 4)
# reshape to (K*A, 4) shifted anchors
A = self._num_anchors
K = shifts.shape[0] # K=width*height
# 在之前9个anchor的基础上产生K*A个anchor，既是总的anchor数量
all_anchors = (self._anchors.reshape((1, A, 4)) +shifts.reshape((1, K, 4)).transpose((1, 0, 2)))
all_anchors = all_anchors.reshape((K * A, 4))
total_anchors = int(K * A) # 总的anchor数量

产生这么多的anchor自然有一些超出了边界，那么就需要对其进行剔除

# only keep anchors inside the image 在图像内部的anchor，即是有效anchor，边界之外的删除掉
inds_inside = np.where((all_anchors[:, 0] >= -self._allowed_border) &(all_anchors[:, 1] >= -self._allowed_border) &(all_anchors[:, 2] < im_info[1] + self._allowed_border) &  # width(all_anchors[:, 3] < im_info[0] + self._allowed_border)    # height)[0]

初始化可用anchor对应的lable，分类标签的含义下面写了

# label: 1 is positive, 0 is negative, -1 is dont care
# 图像内部anchor对应的分类，是否为目标的分类，大小为符合条件anchor的数量
labels = np.empty((len(inds_inside), ), dtype=np.float32)
labels.fill(-1)

在之前生成了计算需要的anchor了那么接下来就是需要计算anchor与gt之间的关系了，也就是使用overlap area的面积来度量，每个anchor的是否为目标分类也是根据这个度量来设置的。

# overlaps between the anchors and the gt boxes
# overlaps (ex, gt)返回维度为【anchors * gt_boxes】大小的二维数组
overlaps = bbox_overlaps(np.ascontiguousarray(anchors, dtype=np.float),np.ascontiguousarray(gt_boxes, dtype=np.float))
argmax_overlaps = overlaps.argmax(axis=1) # 求取于anchor重叠最大的gt
max_overlaps = overlaps[np.arange(len(inds_inside)), argmax_overlaps] # 取出与每个anchor重叠最大gt的重叠面积
gt_argmax_overlaps = overlaps.argmax(axis=0) # 求出与每个gt重叠面积最大的anchor
gt_max_overlaps = overlaps[gt_argmax_overlaps,np.arange(overlaps.shape[1])] # 取出与每个gt重叠面积最大的
gt_argmax_overlaps = np.where(overlaps == gt_max_overlaps)[0]# 重叠面积小于阈值0.3的标注为0
if not cfg.TRAIN.RPN_CLOBBER_POSITIVES:# assign bg labels first so that positive labels can clobber themlabels[max_overlaps < cfg.TRAIN.RPN_NEGATIVE_OVERLAP] = 0# fg label: for each gt, anchor with highest overlap 与gt图重叠最大的对应anchor分类被设置为1
labels[gt_argmax_overlaps] = 1# fg label: above threshold IOU 将与gt重叠的面积大于阈值0.7的anchor也将其分类设置为1
labels[max_overlaps >= cfg.TRAIN.RPN_POSITIVE_OVERLAP] = 1if cfg.TRAIN.RPN_CLOBBER_POSITIVES:# assign bg labels last so that negative labels can clobber positiveslabels[max_overlaps < cfg.TRAIN.RPN_NEGATIVE_OVERLAP] = 0

论文中说从所有anchor中随机选取256个anchor，前景128个，背景128个。注意：那种label为-1的不会当前景也不会当背景。
下面这两段代码是前一部分是在所有前景的anchor中选128个，后一部分是在所有的背景anchor中选128个。如果前景的个数少于了128个，就把所有的anchor选出来，差的由背景部分补。这和Fast RCNN选取ROI一样。

# subsample positive labels if we have too many 要是运行到这里得到的分类为1的太多了那就进行采样
# 从所有label为1的anchor中选择128个，剩下的anchor的label全部置为-1
num_fg = int(cfg.TRAIN.RPN_FG_FRACTION * cfg.TRAIN.RPN_BATCHSIZE) # 采样的阈值
fg_inds = np.where(labels == 1)[0]
if len(fg_inds) > num_fg:disable_inds = npr.choice(fg_inds, size=