2019-11-24

阿里云2000元红包!本站用户参与享受九折优惠!

今日工作陈述

1.运行SSD模型,开始训练。

训练背景

16GB RAM free
GeForce RTX 2070 8GB nvidia-smi
Ubuntu 18.04.3 LTS cat /etc/issue/
Pytorch 1.0.1.post2 print(torch._version_)
CUDA 10.0.130 cat /usr/local/cuda/version.txt
cuDNN 7.5.0 cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2
Python 3.6.8 python -V

问题及解决方案

(1)batch_size过大显存过小导致:RuntimeError: CUDA error: out of memory.

#修改代码,调小batch_size
#/ssd.pytorch/train.py Line32
parser.add_argument('--batch_size', default=16, type=int,
                    help='Batch size for training')

(2)矩阵向量尺寸不统一(Issues):RuntimeError: The shape of the mask [32, 8732] at index 0 does not match the shape of the indexed tensor [279424, 1] at index 0

#1
#/ssd.pytroch/layers/modules/multibox_loss.py Line97
#修改某些位置的代码,原始代码如下:
loss_c[pos] = 0  # filter out pos boxes for now
loss_c = loss_c.view(num, -1)
#调换位置:
loss_c = loss_c.view(num, -1)
loss_c[pos] = 0  # filter out pos boxes for now
#2
#/ssd.pytroch/layers/modules/multibox_loss.py Line115
#修改类型,改为double型
N = num_pos.data.sum().double()
loss_l  = loss_l.double()
loss_c = loss_c.double()
loss_l /= N
loss_c /= N
#3
#/ssd.pytroch/train.py 
#修改取值方式
#LIne187
loc_loss += loss_l.item() 
#Line188
conf_loss += loss_c.item() 
#Line192
print('iter ' + repr(iteration) + ' || Loss: %.4f ||' % (loss.item() ), end=' ')
#Line195
update_vis_plot(iteration, loss_l.item(), loss_c.item() ,

(3)停止迭代(Issues)StopIteration ERROR

#/ssd.pytroch/train.py 
#修改batch_iterator的值
#Line164
images, targets = next(batch_iterator)
#删掉上面的代码,替换为:
try:
    images,targets = next(batch_iterator)
except StopIteration:
    batch_iterator = iter(data_loader)
    images, targets = next(batch_iterator)

https://www.jianshu.com/p/b69c28064cf5

「点点赞赏,手留余香」

    还没有人赞赏,快来当第一个赞赏的人吧!
0 条回复 A 作者 M 管理员
    所有的伟大,都源于一个勇敢的开始!
欢迎您,新朋友,感谢参与互动!欢迎您 {{author}},您在本站有{{commentsCount}}条评论