Task2-基于MobileaNet的预训练模型优化

article/2025/8/24 23:11:38

##项目背景

mobilenet_v2_0.75_224 预训练模型是基于224224的训练集图片训练而来，需要通过imagenet 100M数据集把模型训练成适用于448448的预训练模型

##思考的问题
1、预训练模型怎么使用和导入
2、如何在服务器上训练
3、如何把训练集弄出来
4、如何评价训练效果，先用448448的图像去224224的预训练模型中查看效果指标，定一个baseline。

##思路
slim库封装好了所有数据下载和train model的接口，直接使用即可。

https://blog.csdn.net/angelbeats11/article/details/79009858
1、按照slim文档下载并制作tfrecord的数据集，这里是flower
2、按照slim文档下载pretrain的model，这里是 Inceptionv3 和 mobilenetv2。选取inceptionv3是因为slim的finetune case就是基于inceptionv3实现的，所以先跑一遍。
3、完成inceptionv3在flower上的finetune
4、完成mobilenetv2在flower上的finetune（遇到下列问题1、2、3）
5、下载imagenet数据集，并转换成tfrecord，在服务器上finetune

##遇到的问题

1 在 train_image_classifer.py 中 mobilenet网络未设置 default_image_size，所以需要制定 train_image_size。这个可以通过对比slim->nets下其他网络的代码获知。因此需要制定 train_image_size。
2 pretrain的model是 mobilenet_v2_0.5，需要在nets_factory中增加 v2_0.5的映射，这样才能找到对应的网络文件。
3 在slim中提供的 pretrain model都是基于 imagenet 2012 实现的，默认输出的logit layer 是 2048*1001，如果需要在其他训练集上 finetune（如这里使用的是flower数据集，只有5类），需要通过 labels_offset 参数进行调整，labels_offset = dataset.num_classes - network.num_clasess = 5-1001 = -996
4 imagenet官网（http://image-net.org）打不开，因此无法用slim脚本直接执行，需要基于已经下载好的imagenet jpeg文件，制作tfrecord文件，然后调用slim 的脚本finetune。参考
https://blog.csdn.net/gavin__zhou/article/details/80242998
imagenet_lsvrc_2015_synsets.txt 和 imagenet_metadata.txt
https://blog.csdn.net/s_sunnyy/article/details/78909427
5 编码问题，服务器上是python3.6，其对str 和 byte 的差异化管理造成以前的代码频繁报错
https://www.cnblogs.com/chownjy/p/6625299.html
tfrecord文件是二进制文件，_bytes_feature 写入时需要将str转为bytes，bytes(s,‘utf-8’)

##Doing

imagenet数据集：
https://pan.baidu.com/s/17gQSrqD2j921HEMYVGVZ7Q#list/path=%2Fdatasets%2FILSVRC2012&parentPath=%2Fdatasets
加载TF预训练模型
- https://blog.csdn.net/huachao1001/article/details/78501928
- https://blog.csdn.net/lujiandong1/article/details/53301994
- https://blog.csdn.net/laolu1573/article/details/66971800
fine-tune a pretrained model
- https://github.com/tensorflow/models/tree/master/research/slim#Data
- 基于不同数据集pretrain的model的logit layer层不同（因为要分类的对象种类不同，因此需要调整logit layer。–checkpoint_exclude_scopes
图像尺寸预处理
- 一张RGB三通道的彩色图像可以看成一个三维矩阵，矩阵中的不位置上的数字代表图像的像素值。然后图像在存储时并不是直接记录这些矩阵中的数字，而是经过了压缩编码。所以将一张图像还原成一个三维矩阵的过程就是解码的过程，反之就是编码。
- tf.image.resize_images
- tf.image.resize_image_with_crop_or_pad
- https://blog.csdn.net/chaipp0607/article/details/73029923
- 图像裁剪：bounding box crop
  - https://blog.csdn.net/tz_zs/article/details/77920116
finetune的调参
权重L2正则衰减（weight decay）和学习率衰减（learning rate decay）：https://blog.csdn.net/program_developer/article/details/80867468
label_smoothing（ label-smoothing regularization, or LSR）:
https://blog.csdn.net/edogawachia/article/details/78552257

##参考

官网
- https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet/mobilenet_example.ipynb
- tensorflow中slim模块api介绍https://blog.csdn.net/guvcolie/article/details/77686555

##实践
###模型的存储和导入

https://cv-tricks.com/tensorflow-tutorial/save-restore-tensorflow-models-quick-complete-tutorial/
Save model：
- 定义saver类
  - saver = tf.train.Saver()
  - 如果只需要保存其中的几个变量，则 saver = tf.train.Saver([w1,w2])
- 定义saver存储路径
  - saver.save(sess, checkpoint_dir + ‘modelname.ckpt’)
- model存储包含3个部分：

官网介绍：
- https://github.com/tensorflow/models/tree/master/research/slim#Pretrained
https://www.2cto.com/kf/201706/649266.html
https://blog.csdn.net/wuguangbin1230/article/details/79222564
https://blog.csdn.net/c20081052/article/details/81295942
使用教程：
- https://blog.csdn.net/u014061630/article/details/80632736
Import mdel：
- Create the network
  - 模型结构信息保存在meta文件中，通过读取meta文件获取模型结构
    - saver = tf.train.import_meta_graph(‘my_test_model-1000.meta’)
- Load the parameters
  - 模型参数信息存储在ckpt文件中，通过读取ckpt文件获取参数数据
    - saver.restore(sess,tf.train.latest_checkpoint(’./’))
  - 获取制定op 或 variable 用
    - graph = tf.get_default_graph()
    - graph.get_tensor_by_name(“op_to_restore:0”)
      - name需要在定义时统一
      - :0 表示第一个？
tfrecord读取数据
- https://blog.csdn.net/happyhorizion/article/details/77894055

https://cv-tricks.com/tensorflow-tutorial/training-convolutional-neural-network-for-image-classification/
OpenCV
- https://blog.csdn.net/qq_41185868/article/details/79675875
- cv2.resize()
  - https://blog.csdn.net/william_hehe/article/details/79604082
Tensorflow
- 多用with block
- graph.get_operations()
Padding
- https://blog.csdn.net/weicao1990/article/details/80282341