
在这个例子中,我们开始尝试通过Python调用
Solver
接口来训练一个网络。
环境设置
1 | from pylab import * |
1 | caffe_root = '/home/ldy/workspace/caffe/' # this file should be run from {caffe_root}/examples (otherwise change this line) |
- 下载训练用的数据,并导入lmdb
1 | # run scripts from caffe root |
Downloading...
Creating lmdb...
I0505 20:49:32.535013 18388 db_lmdb.cpp:35] Opened lmdb examples/mnist/mnist_train_lmdb
I0505 20:49:32.535306 18388 convert_mnist_data.cpp:88] A total of 60000 items.
I0505 20:49:32.535323 18388 convert_mnist_data.cpp:89] Rows: 28 Cols: 28
I0505 20:49:32.547651 18388 db_lmdb.cpp:101] Doubling LMDB map size to 2MB ...
I0505 20:49:32.556696 18388 db_lmdb.cpp:101] Doubling LMDB map size to 4MB ...
I0505 20:49:32.578054 18388 db_lmdb.cpp:101] Doubling LMDB map size to 8MB ...
I0505 20:49:32.627709 18388 db_lmdb.cpp:101] Doubling LMDB map size to 16MB ...
I0505 20:49:32.718138 18388 db_lmdb.cpp:101] Doubling LMDB map size to 32MB ...
I0505 20:49:32.960189 18388 db_lmdb.cpp:101] Doubling LMDB map size to 64MB ...
I0505 20:49:33.271764 18388 convert_mnist_data.cpp:108] Processed 60000 files.
I0505 20:49:33.403015 18390 db_lmdb.cpp:35] Opened lmdb examples/mnist/mnist_test_lmdb
I0505 20:49:33.403692 18390 convert_mnist_data.cpp:88] A total of 10000 items.
I0505 20:49:33.403733 18390 convert_mnist_data.cpp:89] Rows: 28 Cols: 28
I0505 20:49:33.423638 18390 db_lmdb.cpp:101] Doubling LMDB map size to 2MB ...
I0505 20:49:33.439213 18390 db_lmdb.cpp:101] Doubling LMDB map size to 4MB ...
I0505 20:49:33.470553 18390 db_lmdb.cpp:101] Doubling LMDB map size to 8MB ...
I0505 20:49:33.525192 18390 db_lmdb.cpp:101] Doubling LMDB map size to 16MB ...
I0505 20:49:33.546480 18390 convert_mnist_data.cpp:108] Processed 10000 files.
Done.
搭建网络
搭建网络结构,并保存为lenet_auto_train.prototxt(训练网络),lenet_auto_test.prototxt(测试网络)。
1 | from caffe import layers as L, params as P |
查看训练网络结构:
1 | !cat mnist/lenet_auto_train.prototxt |
layer {
name: "data"
type: "Data"
top: "data"
top: "label"
transform_param {
scale: 0.00392156862745
}
data_param {
source: "mnist/mnist_train_lmdb"
batch_size: 64
backend: LMDB
}
}
layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1"
convolution_param {
num_output: 20
kernel_size: 5
weight_filler {
type: "xavier"
}
}
}
layer {
name: "pool1"
type: "Pooling"
bottom: "conv1"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: "conv2"
type: "Convolution"
bottom: "pool1"
top: "conv2"
convolution_param {
num_output: 50
kernel_size: 5
weight_filler {
type: "xavier"
}
}
}
layer {
name: "pool2"
type: "Pooling"
bottom: "conv2"
top: "pool2"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: "fc1"
type: "InnerProduct"
bottom: "pool2"
top: "fc1"
inner_product_param {
num_output: 500
weight_filler {
type: "xavier"
}
}
}
layer {
name: "relu1"
type: "ReLU"
bottom: "fc1"
top: "fc1"
}
layer {
name: "score"
type: "InnerProduct"
bottom: "fc1"
top: "score"
inner_product_param {
num_output: 10
weight_filler {
type: "xavier"
}
}
}
layer {
name: "loss"
type: "SoftmaxWithLoss"
bottom: "score"
bottom: "label"
top: "loss"
}
查看学习参数,参数文件已经保存在本地磁盘:
1 | !cat mnist/lenet_auto_solver.prototxt |
# The train/test net protocol buffer definition
train_net: "mnist/lenet_auto_train.prototxt"
test_net: "mnist/lenet_auto_test.prototxt"
# test_iter specifies how many forward passes the test should carry out.
# In the case of MNIST, we have test batch size 100 and 100 test iterations,
# covering the full 10,000 testing images.
test_iter: 100
# Carry out testing every 500 training iterations.
test_interval: 500
# The base learning rate, momentum and the weight decay of the network.
base_lr: 0.01
momentum: 0.9
weight_decay: 0.0005
# The learning rate policy
lr_policy: "inv"
gamma: 0.0001
power: 0.75
# Display every 100 iterations
display: 100
# The maximum number of iterations
max_iter: 10000
# snapshot intermediate results
snapshot: 5000
snapshot_prefix: "mnist/lenet"
加载并检查solver
1 | caffe.set_device(0) |
- 检查网络参数
1 | # each output is (batch size, feature dim, spatial dim) |
[('data', (64, 1, 28, 28)),
('label', (64,)),
('conv1', (64, 20, 24, 24)),
('pool1', (64, 20, 12, 12)),
('conv2', (64, 50, 8, 8)),
('pool2', (64, 50, 4, 4)),
('fc1', (64, 500)),
('score', (64, 10)),
('loss', ())]
1 | # just print the weight sizes (we'll omit the biases) |
[('conv1', (20, 1, 5, 5)),
('conv2', (50, 20, 5, 5)),
('fc1', (500, 800)),
('score', (10, 500))]
- 在开始前,我们先检查下训练网络和测试网络是否包含我们的数据
1 | solver.net.forward() # train net |
{'loss': array(2.3089799880981445, dtype=float32)}
1 | # we use a little trick to tile the first eight images |
train labels: [ 5. 0. 4. 1. 9. 2. 1. 3.]
1 | imshow(solver.test_nets[0].blobs['data'].data[:8, 0].transpose(1, 0, 2).reshape(28, 8*28), cmap='gray'); axis('off') |
test labels: [ 7. 2. 1. 0. 4. 1. 4. 9.]
开始训练
- 先训练一个batch看会有什么结果
1 | solver.step(1) |
运行一次之后,看看我们的第一层卷积层的滤波器是否有变化,20个滤波器如下所示:
1 | imshow(solver.net.params['conv1'][0].diff[:, 0].reshape(4, 5, 5, 5) |
(-0.5, 24.5, 19.5, -0.5)
上面说明权重已经更新,我们可以在迭代训练的时候,记录一些参数,决定什么时候停止迭代
1 | %%time |
Iteration 0 testing...
Iteration 25 testing...
Iteration 50 testing...
Iteration 75 testing...
Iteration 100 testing...
Iteration 125 testing...
Iteration 150 testing...
Iteration 175 testing...
CPU times: user 1min 15s, sys: 15.3 s, total: 1min 31s
Wall time: 1min 18s
- 画出train loss和test accuracy
1 | _, ax1 = subplots() |
<matplotlib.text.Text at 0x7feabeae91d0>
- 因为我们保存第一次测试batch的结果,所以可以看看每次迭代结果的变化,下面画出每个图像随迭代次数每个标签的可能性。(只显示了一个数字,其他的数字类似)
1 | for i in range(8): |
尝试改变网络结构和优化函数
1 | train_net_path = 'mnist/custom_auto_train.prototxt' |