Ldy's Blog

Generative Adversarial Networks

2016-11-27T02:00:00.000Z

人工智能目前的核心目标应该是赋予机器自主理解我们所在世界的能力。对于人类来说，我们对这个世界所了解的知识可能很快就会忘记，比如我们所处的三维环境中，物体能够交互，移动，碰撞；什么动物会飞，什么动物吃草等等。这些巨大的并且不断扩大的信息现在是很容易被机器获取的，问题的关键是怎么设计模型和算法让机器更好的去分析和理解这些数据中所蕴含的宝藏。

Generative models(生成模型)现在被认为是能够实现这一目标的最有前景的方法之一。Generative models通过输入一大堆特定领域的数据进行训练（比如图像，句子，声音等）来使得模型能够产生和输入数据相似的输出。这一直觉的背后可以由下面名言阐述。

“What I cannot create, I do not understand.” —Richard Feynman

生成模型由一个参数数量比训练数据少的多神经网络构成，所以生成模型为了能够产生和训练数据相似的输出就会迫使自己去发现数据中内在的本质内容。训练Generative models的方法有几种，在这里我们主要阐述其中的Adversarial Training（对抗训练）方法。

Adversarial Training

上文说过Adversarial Training是训练生成模型的一种方法。为了训练生成模型，Adversarial Training提出一种Discriminative Model(判别模型)来和生成模型产生对抗，下面来说说Generative models $G(z)$ 和 Discriminative Model $D(x)$ 是如何相互作用的。

生成模型的目标是模仿输入训练数据, 通过输入一个随机噪声来产生和训练数据相似的样本；
判别模型的目标就是判断生成模型产生的样本和真实的输入样本之间的相似性。

其中生成模型和判别模型合起来的框架被称为GAN网络。通过下图我们来理清判别模型和生成模型之间的输入输出关系：生成模型通过输入随机噪声 $z(z \sim p_z)$ 产生合成样本；而判别模型通过分别输入真实的训练数据和生成模型的训练数据来判断输入的数据是否真实。

描述了GAN的网络结构，但它的优化目标是什么？怎么就可以通过训练使得生成模型能够产生和真实数据相似的输出？优化的目标其实很简单，简单来说就是：

判别模型努力的想把真实的数据预测为1，把生成的数据预测为0；
而生成模型的奋斗目标则为‘我’要尽力的让判别模型对‘我’生成的数据预测为1，让判别模型分不清‘我’产生的数据和真实数据之间的区别，从而达到‘以假乱真’的效果。

下面用形式化说明下如果训练GAN网络, 先定义一些参数：

参数	含义
$p_z$	输入随机噪声 $z$ 的分布
$p_{data}$	未知的输入样本的数据分布
$p_g$	生成模型的输出样本的数据分布，GAN的目标就是要$p_g=p_{data}$

训练判别模型 $D(x)$ 的目标：

对每一个输入数据 $x \sim p_{data}$ 要使得 $D(x)$ 最大；
对每一个输入数据 $x \nsim p_{data}$ 要使得 $D(x)$ 最小。

训练生成模型 $G(z)$ 的目标是来产生样本来欺骗判别模型 $D$, 因此目标为最大化 $D(G(z))$，也就是把生成模型的输出输入到判别模型，然后要让判别模型预测其为真实数据。同时，最大化 $D(G(z))$ 等同于最小化 $1-D(G(z))$，因为 $D$ 的输出是介于0到1之间的，真实数据努力预测为1，否则为0。

所以把生成模型和判别模型的训练目标结合起来，就得到了GAN的优化目标：

$$\min_G \max_D {\mathbb E}_{x\sim p_{\rm data}} \log D(x)+{\mathbb E}_{z\sim p_z}[\log (1-D(G(z)))] $$

总结一下上面的内容，GAN启发自博弈论中的二人零和博弈，在二人零和博弈中，两位博弈方的利益之和为零或一个常数，即一方有所得，另一方必有所失。GAN模型中的两位博弈方分别由生成模型和判别模型充当。生成模型G捕捉样本数据的分布，判别模型是一个二分类器，估计一个样本来自于训练数据（而非生成数据）的概率。G和D一般都是非线性映射函数，例如多层感知机、卷积神经网络等。生成模型的输入是一些服从某一简单分布（例如高斯分布）的随机噪声z，输出是与训练图像相同尺寸的生成图像。向判别模型D输入生成样本，对于D来说期望输出低概率（判断为生成样本），对于生成模型G来说要尽量欺骗D，使判别模型输出高概率（误判为真实样本），从而形成竞争与对抗。

GAN实现

一个简单的一维数据GAN网络的tensorflow实现:genadv_tutorial
其一维训练数据分布如下所示，是一个均值-1， $\sigma =1$ 的正态分布。

我们结合代码和上面的理论内容来分析下GAN的具体实现，判别模型的优化目标为最大化下式，其中 $D_1(x)$ 表示判别真实数据, $D_2(G(z))$ 表示对生成的数据进行判别，其中 $D_1$ 和 $D_2$ 是共享参数的，也就是说是同一个判别模型。

$$\log(D_1(x))+\log(1-D_2(G(z)))$$

对应的python代码如下：

batch=tf.Variable(0)
obj_d=tf.reduce_mean(tf.log(D1)+tf.log(1-D2))
opt_d=tf.train.GradientDescentOptimizer(0.01)
              .minimize(1-obj_d,global_step=batch,var_list=theta_d)

为了优化 $G$, 我们想要最大化 $D_2(x’)$(成功欺骗 $D$ )，因此 $G$ 的优化函数为：

$$\log(D_2(G(z)))$$

对应的python代码：

batch=tf.Variable(0)
obj_g=tf.reduce_mean(tf.log(D2))
opt_g=tf.train.GradientDescentOptimizer(0.01)
              .minimize(1-obj_g,global_step=batch,var_list=theta_g)

定义好优化目标后，下面就是训练的主要代码了：

# Algorithm 1, GoodFellow et al. 2014
for i in range(TRAIN_ITERS):
    x= np.random.normal(mu,sigma,M) # sample minibatch from p_data
    z= np.random.random(M)  # sample minibatch from noise prior
    sess.run(opt_d, {x_node: x, z_node: z}) # update discriminator D
    z= np.random.random(M) # sample noise prior
    sess.run(opt_g, {z_node: z}) # update generator G

下面是实验的结果，左图是训练之间的数据，可以看到生成数据的分布和训练数据相差甚远；右图是训练后的数据分析，生成数据和训练数据的分布接近了很多，且此时判别模型的输出分布在0.5左右，说明生成模型顺利的欺骗到判别模型。

DCGAN

GAN的一个改进模型就是DCGAN。这个网络的生成模型的输入为一个100个符合均匀分布的随机数（通常被称为code），然后产生输出为64x64x3的输出图像(下图中 $G(z)$ ), 当code逐渐递增时，生成模型输出的图像也逐渐变化。下图中的生产模型主要由反卷积层构成, 判别模型就由简单的卷积层组成，最后输出一个判断输入图片是否为真实数据的概率 $P(x)$ 。

下图为随着迭代次数，DCGAN产生图像的变化过程。

训练好网络之后，其中的生成模型和判别模型都有其他的作用。一个训练好的判别模型能够用来对数据提取特征然后进行分类任务。通过输入随机向量生成模型可以产生一些非常有意思的的图片，如下图所示，当输入空间平滑变化时，输出的图片也在平滑转变。

还有一个非常有意思的属性就是如果对生产模型的输入向量做一些简单的数学运算，那么学习的特征输出也有同样的性质，如下图所示。

GAN的训练及其改进

上面使用GAN产生的图像虽然效果不错，但其实GAN网络的训练过程是非常不稳定的。
通常在实际训练GAN中所碰到的一个问题就是判别模型的收敛速度要比生成模型的收敛速度要快很多，通常的做法就是让生成模型多训练几次来赶上生成模型，但是存在的一个问题就是通常生成模型和判别模型的训练是相辅相成的，理想的状态是让生成模型和判别模型在每次的训练过程中同时变得更好。判别模型理想的minimum loss应该为0.5，这样才说明判别模型分不出是真实数据还是生成模型产生的数据。

Improved GANs

Improved techniques for training GANs这篇文章提出了很多改进GANs训练的方法，其中提出一个想法叫Feature matching，之前判别模型只判别输入数据是来自真实数据还是生成模型。现在为判别模型提出了一个新的目标函数来判别生成模型产生图像的统计信息是否和真实数据的相似。让 $f(x)$ 表示判别模型中间层的输出，新的目标函数被定义为 $|| \mathbb{E}_{x \sim p_{data}}f(x) - \mathbb{E}_{z \sim p_z}f(G(z))||^2_2$, 其实就是要求真实图像和合成图像在判别模型中间层的距离要最小。这样可以防止生成模型在当前判别模型上过拟合。

InfoGAN

到这可能有些同学会想到，我要是想通过GAN产生我想要的特定属性的图片改怎么办？普通的GAN输入的是随机的噪声，输出也是与之对应的随机图片，我们并不能控制输出噪声和输出图片的对应关系。这样在训练的过程中也会倒置生成模型倾向于产生更容易欺骗判别模型的某一类特定图片，而不是更好的去学习训练数据的分布，这样对模型的训练肯定是不好的。InfoGAN的提出就是为了解决这一问题，通过对输入噪声添加一些类别信息以及控制图像特征(如mnist数字的角度和厚度)的隐含变量来使得生成模型的输入不在是随机噪声。虽然现在输入不再是随机噪声，但是生成模型可能会忽略这些输入的额外信息还是把输入当成和输出无关的噪声，所以需要定义一个生成模型输入输出的互信息，互信息越高，说明输入输出的关联越大。

下面三张图片展示了通过分别控制输入噪声的类别信息，数字角度信息，数字笔画厚度信息产生指定输出的图片，可以看出InfoGAN产生图片的效果还是很好的。

其他应用

GAN网络还有很多其他的有趣应用，比如下图所示的根据一句话来产生对应的图片，可能大家都有了解karpathy大神的看图说话, 但是GAN有能力把这个过程给反过来。

还有下面这个“图像补全”, 根据图像剩余的信息来匹配最佳的补全内容。

还有下面这个图像增强的例子，有点去马赛克的意思，效果还是挺不错的:-D。

总结

颜乐存说过，2016年深度学习领域最让他兴奋技术莫过于对抗学习。对抗学习确实是解决非监督学习的一个有效方法，而无监督学习一直都是人工智能领域研究者所孜孜追求的“终极目标”之一。

参考

Generative Adversarial Networks

Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks

Improved Techniques for Training GANs

InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets

Transposed Convolution, Fractionally Strided Convolution or Deconvolution

2016-10-29T02:00:00.000Z

反卷积（Deconvolution）的概念第一次出现是Zeiler在2010年发表的论文Deconvolutional networks中，但是并没有指定反卷积这个名字，反卷积这个术语正式的使用是在其之后的工作中(Adaptive deconvolutional networks for mid and high level feature learning)。随着反卷积在神经网络可视化上的成功应用，其被越来越多的工作所采纳比如：场景分割、生成模型等。其中反卷积（Deconvolution）也有很多其他的叫法，比如：Transposed Convolution，Fractional Strided Convolution等等。

这篇文章的目的主要有两方面：
1. 解释卷积层和反卷积层之间的关系；
2. 弄清楚反卷积层输入特征大小和输出特征大小之间的关系。

## 卷积层

卷积层大家应该都很熟悉了,为了方便说明，定义如下：
- 二维的离散卷积（$N = 2$）
- 方形的特征输入（$i_1 = i_2 = i$）
- 方形的卷积核尺寸（$k_1 = k_2 = k$）
- 每个维度相同的步长（$s_1 = s_2 = s$）
- 每个维度相同的padding ($p_1 = p_2 = p$)

下图表示参数为 $(i=5,k=3,s=2,p=1)$ 的卷积计算过程，从计算结果可以看出输出特征的尺寸为 $(o_1 = o_2 = o = 3)$。

下图表示参数为 $(i=6,k=3,s=2,p=1)$ 的卷积计算过程，从计算结果可以看出输出特征的尺寸为 $(o_1 = o_2 = o = 3)$。

从上述两个例子我们可以总结出卷积层输入特征与输出特征尺寸和卷积核参数的关系为：
$$o = \left\lfloor \frac{i + 2p - k}{s} \right\rfloor + 1.$$
其中 $\lfloor x \rfloor$ 表示对 $x$ 向下取整。

反卷积层

在介绍反卷积之前，我们先来看看卷积运算和矩阵运算之间的关系。

卷积和矩阵相乘

考虑如下一个简单的卷积层运算，其参数为 $(i=4,k=3,s=1,p=0)$，输出 $o=2$。

对于上述卷积运算，我们把上图所示的3×3卷积核展成一个如下所示的[4,16]的稀疏矩阵 $\mathbf{C}$，其中非0元素 $w_{i,j}$ 表示卷积核的第 $i$ 行和第 $j$ 列。

\begin{pmatrix}
w_{0,0} & w_{0,1} & w_{0,2} & 0 & w_{1,0} & w_{1,1} & w_{1,2} & 0 &
w_{2,0} & w_{2,1} & w_{2,2} & 0 & 0 & 0 & 0 & 0 \\
0 & w_{0,0} & w_{0,1} & w_{0,2} & 0 & w_{1,0} & w_{1,1} & w_{1,2} &
0 & w_{2,0} & w_{2,1} & w_{2,2} & 0 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 & w_{0,0} & w_{0,1} & w_{0,2} & 0 &
w_{1,0} & w_{1,1} & w_{1,2} & 0 & w_{2,0} & w_{2,1} & w_{2,2} & 0 \\
0 & 0 & 0 & 0 & 0 & w_{0,0} & w_{0,1} & w_{0,2} &
0 & w_{1,0} & w_{1,1} & w_{1,2} & 0 & w_{2,0} & w_{2,1} & w_{2,2} \\
\end{pmatrix}

我们再把4×4的输入特征展成[16,1]的矩阵 $\mathbf{X}$，那么 $\mathbf{Y = CX}$ 则是一个[4,1]的输出特征矩阵，把它重新排列2×2的输出特征就得到最终的结果，从上述分析可以看出卷积层的计算其实是可以转化成矩阵相乘的。值得注意的是，在一些深度学习网络的开源框架中并不是通过这种这个转换方法来计算卷积的，因为这个转换会存在很多无用的0乘操作，Caffe中具体实现卷积计算的方法可参考Implementing convolution as a matrix multiplication。

通过上述的分析，我们已经知道卷积层的前向操作可以表示为和矩阵$\mathbf{C}$相乘，那么 我们很容易得到卷积层的反向传播就是和$\mathbf{C}$的转置相乘。

反卷积和卷积的关系

全面我们已经说过反卷积又被称为Transposed(转置) Convolution，我们可以看出其实卷积层的前向传播过程就是反卷积层的反向传播过程，卷积层的反向传播过程就是反卷积层的前向传播过程。因为卷积层的前向反向计算分别为乘 $\mathbf{C}$ 和 $\mathbf{C^T}$,而反卷积层的前向反向计算分别为乘 $\mathbf{C^T}$ 和 $\mathbf{(C^T)^T}$ ，所以它们的前向传播和反向传播刚好交换过来。

下图表示一个和上图卷积计算对应的反卷积操作，其中他们的输入输出关系正好相反。如果不考虑通道以卷积运算的反向运算来计算反卷积运算的话，我们还可以通过离散卷积的方法来求反卷积（这里只是为了说明，实际工作中不会这么做）。

同样为了说明，定义反卷积操作参数如下：

二维的离散卷积（$N = 2$）
方形的特征输入（$i’_1 = i’_2 = i’$）
方形的卷积核尺寸（$k’_1 = k’_2 = k’$）
每个维度相同的步长（$s’_1 = s’_2 = s’$）
每个维度相同的padding ($p’_1 = p’_2 = p’$)

下图表示的是参数为( $i’=2,k’=3,s’=1,p’=2$)的反卷积操作，其对应的卷积操作参数为 $(i=4,k=3,s=1,p=0)$。我们可以发现对应的卷积和非卷积操作其 $(k=k’,s=s’)$，但是反卷积却多了$p’=2$。通过对比我们可以发现卷积层中左上角的输入只对左上角的输出有贡献，所以反卷积层会出现 $p’=k-p-1=2$。通过示意图，我们可以发现，反卷积层的输入输出在 $s=s’=1$ 的情况下关系为：

$$o’=i’-k’+2p’+1=i’+(k-1)-2p$$

Fractionally Strided Convolution

上面也提到过反卷积有时候也被叫做Fractionally Strided Convolution，翻译过来大概意思就是小数步长的卷积。对于步长 $s>1$的卷积，我们可能会想到其对应的反卷积步长 $s’<1$。如下图所示为一个参数为 $i = 5, k = 3, s = 2 , p = 1$的卷积操作(就是第一张图所演示的)所对应的反卷积操作。对于反卷积操作的小数步长我们可以理解为：在其输入特征单元之间插入 $s-1$ 个0，插入0后把其看出是新的特征输入，然后此时步长 $s’$ 不再是小数而是为1。因此，结合上面所得到的结论，我们可以得出Fractionally Strided Convolution的输入输出关系为：

$$ o’ = s(i’ −1)+k −2p$$

参考

conv_arithmetic

Is the deconvolution layer the same as a convolutional layer?

Caffe Source Code Analysis

2016-10-09T02:00:00.000Z

Caffe简介

Caffe作为一个优秀的深度学习框架网上已经有很多内容介绍了，这里就不在多说。作为一个C++新手，断断续续看Caffe源码一个月以来发现越看不懂的东西越多，因此在博客里记录和分享一下学习的过程。其中我把自己看源码的一些注释结合了网上一些同学的注释以及在学习源码过程中查到到的一些资源(包括怎么使用IDE单步调试以及一些Caffe中使用的第三方库的介绍)放在github上：Caffe_Code_Analysis，感兴趣的同学可以看一看，希望能对你有帮助。

一般在介绍Caffe代码结构的时候，大家都会说Caffe主要由Blob Layer Net 和 Solver这几个部分组成。

Blob 主要用来表示网络中的数据，包括训练数据，网络各层自身的参数(包括权值、偏置以及它们的梯度)，网络之间传递的数据都是通过 Blob 来实现的，同时 Blob 数据也支持在 CPU 与 GPU 上存储，能够在两者之间做同步。
Layer 是对神经网络中各种层的一个抽象，包括我们熟知的卷积层和下采样层，还有全连接层和各种激活函数层等等。同时每种 Layer 都实现了前向传播和反向传播，并通过 Blob 来传递数据。
Net 是对整个网络的表示，由各种 Layer 前后连接组合而成，也是我们所构建的网络模型。
Solver 定义了针对 Net 网络模型的求解方法，记录网络的训练过程，保存网络模型参数，中断并恢复网络的训练过程。自定义 Solver 能够实现不同的网络求解方式。

不过在刚开始准备阅读Caffe代码的时候，就算知道了代码是由上面四部分组成还是感觉会无从下手，下面我们准备通过一个Caffe训练LeNet的实例并结合代码来解释Caffe是如何初始化网络，然后正向传播、反向传播开始训练，最终得到训练好的模型这一过程。

训练LeNet

在Caffe提供的例子里，训练LeNet网络的命令为：

1 2	cd $CAFFE_ROOT ./build/tools/caffe train --solver=examples/mnist/lenet_solver.prototxt

其中第一个参数build/tools/caffe是Caffe框架的主要框架，由tools/caffe.cpp文件编译而来，第二个参数train表示是要训练网络，第三个参数是 solver的protobuf描述文件。在Caffe中，网络模型的描述及其求解都是通过 protobuf 定义的，并不需要通过敲代码来实现。同时，模型的参数也是通过 protobuf 实现加载和存储，包括 CPU 与 GPU 之间的无缝切换，都是通过配置来实现的，不需要通过硬编码的方式实现，有关
protobuf的具体内容可参考这篇博文：http://alanse7en.github.io/caffedai-ma-jie-xi-2/。

网络初始化

下面我们从caffe.cpp的main函数入口开始观察Caffe是怎么一步一步训练网络的。在caffe.cpp中main函数之外通过RegisterBrewFunction这个宏在每一个实现主要功能的函数之后将这个函数的名字和其对应的函数指针添加到了g_brew_map中,具体分别为train()，test()，device_query()，time()这四个函数。

在运行的时候,根据传入的参数在main函数中，通过GetBrewFunction得到了我们需要调用的那个函数的函数指针，并完成了调用。

1 2	// caffe.cpp return GetBrewFunction(caffe::string(argv[1])) ();

在我们上面所说的训练LeNet的例子中，传入的第二个参数为train，所以调用的函数为caffe.cpp中的int train()函数，接下来主要看这个函数的内容。在train函数中有下面两行代码，下面的代码定义了一个指向Solver的shared_ptr。其中主要是通过调用SolverRegistry这个类的静态成员函数CreateSolver得到一个指向Solver的指针来构造shared_ptr类型的solver。而且由于C++多态的特性，尽管solver是一个指向基类Solver类型的指针，通过solver这个智能指针来调用各个成员函数会调用到各个子类(SGDSolver等)的函数。

// caffe.cpp
// 其中输入参数solver_param就是上面所说的第三个参数：网络的模型及求解文件
shared_ptr<caffe::Solver<float> >
    solver(caffe::SolverRegistry<float>::CreateSolver(solver_param);

因为在caffe.proto文件中默认的优化type为SGD,所以上面的代码会实例化一个SGDSolver的对象，’SGDSolver’类继承于Solver类，在新建SGDSolver对象时会调用其构造函数如下所示：

1
2
3

//sgd_solvers.hpp
explicit SGDSolver(const SolverParameter& param)
    : Solver<Dtype>(param) { PreSolve(); }

从上面代码可以看出，会先调用父类Solver的构造函数，如下所示。Solver类的构造函数通过Init(param)函数来初始化网络。

//solver.cpp
template <typename Dtype>
Solver<Dtype>::Solver(const SolverParameter& param, const Solver* root_solver)
    : net_(), callbacks_(), root_solver_(root_solver),requested_early_exit_(false)
{
  Init(param);
}

而在Init(paran)函数中，又主要是通过InitTrainNet()和InitTestNets()函数分别来搭建训练网络结构和测试网络结构。

训练网络只能有一个,在InitTrainNet()函数中首先会设置一些基本参数，包括设置网络的状态为TRAIN，确定训练网络只有一个等，然会会通过下面这条语句新建了一个Net对象。InitTestNets()函数和InitTrainNet()函数基本类似，不再赘述。

1 2	//solver.cpp net_.reset(new Net<Dtype>(net_param));

上面语句新建了Net对象之后会调用Net类的构造函数，如下所示。可以看出构造函数是通过Init(param)函数来初始化网络结构的。

//net.cpp
template <typename Dtype>
Net<Dtype>::Net(const NetParameter& param, const Net* root_net)
    : root_net_(root_net) {
  Init(param);
}

下面是net.cpp文件里Init()函数的主要内容(忽略具体细节)，其中LayerRegistry<Dtype>::CreateLayer(layer_param)主要是通过调用LayerRegistry这个类的静态成员函数CreateLayer得到一个指向Layer类的shared_ptr类型指针。并把每一层的指针存放在vector<shared_ptr<Layer<Dtype> > > layers_这个指针容器里。这里相当于根据每层的参数layer_param实例化了对应的各个子类层，比如conv_layer(卷积层)和pooling_layer(池化层)。实例化了各层就会调用每个层的构造函数，但每层的构造函数都没有做什么大的设置。

接下来在Init()函数中主要由四部分组成：

AppendBottom：设置每一层的输入数据
AppendTop：设置每一层的输出数据
layers_[layer_id]->SetUp：对上面设置的输入输出数据计算分配空间，并设置每层的可学习参数(权值和偏置),下面会详细降到这个函数

AppendParam：对上面申请的可学习参数进行设置，主要包括学习率和正则率等。

//net.cpp Init()
for (int layer_id = 0; layer_id < param.layer_size(); ++layer_id) {//param是网络参数，layer_size()返回网络拥有的层数
    const LayerParameter& layer_param = param.layer(layer_id);//获取当前layer的参数
    layers_.push_back(LayerRegistry<Dtype>::CreateLayer(layer_param));//根据参数实例化layer


//下面的两个for循环将此layer的bottom blob的指针和top blob的指针放入bottom_vecs_和top_vecs_,bottom blob和top blob的实例全都存放在blobs_中。相邻的两层，前一层的top blob是后一层的bottom blob，所以blobs_的同一个blob既可能是bottom blob，也可能使top blob。
    for (int bottom_id = 0; bottom_id < layer_param.bottom_size();++bottom_id) {
       const int blob_id=AppendBottom(param,layer_id,bottom_id,&available_blobs,&blob_name_to_idx);
    }

    for (int top_id = 0; top_id < num_top; ++top_id) {
       AppendTop(param, layer_id, top_id, &available_blobs, &blob_name_to_idx);
    }

// 调用layer类的Setup函数进行初始化，输入参数：每个layer的输入blobs以及输出blobs,为每个blob设置大小
layers_[layer_id]->SetUp(bottom_vecs_[layer_id], top_vecs_[layer_id]);

//接下来的工作是将每层的parameter的指针塞进params_，尤其是learnable_params_。
   const int num_param_blobs = layers_[layer_id]->blobs().size();
   for (int param_id = 0; param_id < num_param_blobs; ++param_id) {
       AppendParam(param, layer_id, param_id);
       //AppendParam负责具体的dirtywork
    }


    }

经过上面的过程，Net类的初始化工作基本就完成了，接着我们具体来看看上面所说的layers_[layer_id]->SetUp对每一具体的层结构进行设置，我们来看看Layer类的Setup()函数，对每一层的设置主要由下面三个函数组成：
LayerSetUp(bottom, top)：由Layer类派生出的特定类都需要重写这个函数，主要功能是设置权值参数(包括偏置)的空间以及对权值参数经行随机初始化。
Reshape(bottom, top)：根据输出blob和权值参数计算输出blob的维数，并申请空间。

//layer.hpp
// layer 初始化设置
void SetUp(const vector<Blob<Dtype>*>& bottom,   
    const vector<Blob<Dtype>*>& top) {
  InitMutex();
  CheckBlobCounts(bottom, top);
  LayerSetUp(bottom, top);
  Reshape(bottom, top);
  SetLossWeights(top);
}

经过上述过程基本上就完成了初始化的工作，总体的流程大概就是新建一个Solver对象，然后调用Solver类的构造函数，然后在Solver的构造函数中又会新建Net类实例，在Net类的构造函数中又会新建各个Layer的实例,一直具体到设置每个Blob,大概就介绍完了网络初始化的工作，当然里面还有很多具体的细节，但大概的流程就是这样。

训练过程

上面介绍了网络初始化的大概流程，如上面所说的网络的初始化就是从下面一行代码新建一个solver指针开始一步一步的调用Solver，Net,Layer,Blob类的构造函数，完成整个网络的初始化。

1
2
3

//caffe.cpp
shared_ptr<caffe::Solver<float> > //初始化
     solver(caffe::SolverRegistry<float>::CreateSolver(solver_param));

完成初始化之后，就可以开始对网络经行训练了，开始训练的代码如下所示，指向Solver类的指针solver开始调用Solver类的成员函数Solve()，名称比较绕啊。

1 2	// 开始优化 solver->Solve();

接下来我们来看看Solver类的成员函数Solve(),Solve函数其实主要就是调用了Solver的另一个成员函数Step（）来完成实际的迭代训练过程。

//solver.cpp
template <typename Dtype>
void Solver<Dtype>::Solve(const char* resume_file) {
  ...
  int start_iter = iter_;
  ...
  // 然后调用了'Step'函数，这个函数执行了实际的逐步的迭代过程
  Step(param_.max_iter() - iter_);
  ...
  LOG(INFO) << "Optimization Done.";
}

顺着来看看这个Step()函数的主要代码,首先是一个大循环设置了总的迭代次数，在每次迭代中训练iter_size x batch_size个样本，这个设置是为了在GPU的显存不够的时候使用，比如我本来想把batch_size设置为128，iter_size是默认为1的，但是会out_of_memory，借助这个方法，可以设置batch_size=32，iter_size=4，那实际上每次迭代还是处理了128个数据。

//solver.cpp
template <typename Dtype>
void Solver<Dtype>::Step(int iters) {
  ...
  //迭代
  while (iter_ < stop_iter) {
    ...
    // iter_size也是在solver.prototxt里设置，实际上的batch_size=iter_size*网络定义里的batch_size，
    // 因此每一次迭代的loss是iter_size次迭代的和，再除以iter_size，这个loss是通过调用`Net::ForwardBackward`函数得到的
    // accumulate gradients over `iter_size` x `batch_size` instances
    for (int i = 0; i < param_.iter_size(); ++i) {
    /*
     * 调用了Net中的代码，主要完成了前向后向的计算，
     * 前向用于计算模型的最终输出和Loss，后向用于
     * 计算每一层网络和参数的梯度。
     */
      loss += net_->ForwardBackward();
    }

    ...

    /*
     * 这个函数主要做Loss的平滑。由于Caffe的训练方式是SGD，我们无法把所有的数据同时
     * 放入模型进行训练，那么部分数据产生的Loss就可能会和全样本的平均Loss不同，在必要
     * 时候将Loss和历史过程中更新的Loss求平均就可以减少Loss的震荡问题。
     */
    UpdateSmoothedLoss(loss, start_iter, average_loss);


    ...
    // 执行梯度的更新，这个函数在基类`Solver`中没有实现，会调用每个子类自己的实现
    //，后面具体分析`SGDSolver`的实现
    ApplyUpdate();

    // 迭代次数加1
    ++iter_;
    ...

  }
}

上面Step()函数主要分为三部分：

`loss += net_->ForwardBackward();`

这行代码通过Net类的net_指针调用其成员函数ForwardBackward()，其代码如下所示,分别调用了成员函数Forward(&loss)和成员函数Backward()来进行前向传播和反向传播。

// net.hpp
// 进行一次正向传播，一次反向传播
Dtype ForwardBackward() {
  Dtype loss;
  Forward(&loss);
  Backward();
  return loss;
}

前面的Forward(&loss)函数最终会执行到下面一段代码,Net类的Forward()函数会对网络中的每一层执行Layer类的成员函数Forward()，而具体的每一层Layer的派生类会重写Forward()函数来实现不同层的前向计算功能。上面的Backward()反向求导函数也和Forward()类似，调用不同层的Backward()函数来计算每层的梯度。

//net.cpp
for (int i = start; i <= end; ++i) {
// 对每一层进行前向计算，返回每层的loss，其实只有最后一层loss不为0
  Dtype layer_loss = layers_[i]->Forward(bottom_vecs_[i], top_vecs_[i]);
  loss += layer_loss;
  if (debug_info_) { ForwardDebugInfo(i); }
}

`UpdateSmoothedLoss();`

这个函数主要做Loss的平滑。由于Caffe的训练方式是SGD，我们无法把所有的数据同时放入模型进行训练，那么部分数据产生的Loss就可能会和全样本的平均Loss不同，在必要时候将Loss和历史过程中更新的Loss求平均就可以减少Loss的震荡问题

`ApplyUpdate();`

这个函数是Solver类的纯虚函数，需要派生类来实现，比如SGDSolver类实现的ApplyUpdate();函数如下，主要内容包括：设置参数的学习率；对梯度进行Normalize；对反向求导得到的梯度添加正则项的梯度；最后根据SGD算法计算最终的梯度；最后的最后把计算得到的最终梯度对权值进行更新。

template <typename Dtype>
void SGDSolver<Dtype>::ApplyUpdate() {
  CHECK(Caffe::root_solver());

  // GetLearningRate根据设置的lr_policy来计算当前迭代的learning rate的值
  Dtype rate = GetLearningRate();

  // 判断是否需要输出当前的learning rate
  if (this->param_.display() && this->iter_ % this->param_.display() == 0) {
    LOG(INFO) << "Iteration " << this->iter_ << ", lr = " << rate;
  }

  // 避免梯度爆炸，如果梯度的二范数超过了某个数值则进行scale操作，将梯度减小
  ClipGradients();

  // 对所有可更新的网络参数进行操作
  for (int param_id = 0; param_id < this->net_->learnable_params().size();
       ++param_id) {
	// 将第param_id个参数的梯度除以iter_size，
	// 这一步的作用是保证实际的batch_size=iter_size*设置的batch_size
    Normalize(param_id);

    // 将正则化部分的梯度降入到每个参数的梯度中
    Regularize(param_id);

    // 计算SGD算法的梯度(momentum等)
    ComputeUpdateValue(param_id, rate);
  }
  // 调用`Net::Update`更新所有的参数
  this->net_->Update();
}

等进行了所有的循环，网络的训练也算是完成了。上面大概说了下使用Caffe进行网络训练时网络初始化以及前向传播、反向传播、梯度更新的过程，其中省略了大量的细节。上面还有很多东西都没提到，比如说Caffe中Layer派生类的注册及各个具体层前向反向的实现、Solver派生类的注册、网络结构的读取、模型的保存等等大量内容。

Implementing convolution as a matrix multiplication

2016-10-01T02:00:00.000Z

CNN中的卷积操作

卷积层是CNNs网络中可以说是最重要的层了，卷积层的主要作用是对输入图像求卷积运算。如下图所示，输入图片的维数为$[c_0,h_0,w_0]$ ；卷积核的维数为$[c_1,c_0,h_k,w_k]$，其中$c_0$在图中没有表示出来，一个卷积核可以看成由$c_1$个维数为$[c_0,h_k,w_k]$的三维滤波器组成；除了这些参数通常在计算卷积运算的时候还有一些超参数比如：stride（步长）：$S$,padding（填充）：$P$。

根据上面所说的参数就可以求出输出特征的维数为$[c_1,h_1,w_1]$,其中$h_1 = (h_0-h_k+2P)/S+1$,$w_1 = (w_0-w_k+2P)/S+1$。

卷积的计算过程其实很简单，但不是很容易说清楚，下面通过代码来说明。

基本环境设置:

%load_ext cython  #代码运行在jupyter-notebook中
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline
plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'

卷积层计算的代码如下，想象一副图像尺寸为MxM，卷积核mxm。在计算时，卷积核与图像中每个mxm大小的图像块做element-wise相乘，然后得到的结果相加得到一个值，然后再移动一个stride，做同样的运算，直到整副输入图像遍历完，上述过程得到的值就组成了输出特征，具体运算过程还是看代码。

def conv_forward_naive(x, w, b, conv_param):
  out = None
  stride = conv_param['stride']
  pad = conv_param['pad']
  N, C, W, H = x.shape
  F, C, HH, WW = w.shape
  H_out = 1 + (H + 2 * pad - HH) / stride
  W_out = 1 + (W + 2 * pad - WW) / stride
  npad = ((0,0), (0,0), (pad,pad), (pad,pad))
  x_pad = np.pad(x, pad_width=npad, mode='constant', constant_values=0)
  out = np.zeros((N, F, H_out, W_out))
  for i in xrange(N):
      for j in xrange(F):
          for k in xrange(H_out):
              for z in xrange(W_out):
                  out[i, j, k, z] = np.sum(x_pad[i, :, k*stride:k*stride+HH,  z*stride:z*stride+WW]*w[j, :, :, :])+b[j]            
  cache = (x, w, b, conv_param)
  return out, cache

下面来检测下上面的卷积计算代码，我们人为的设置两个卷积核（分别为求灰度特征，和边缘特征），然后对两幅输入图像求卷积，观察输出的结果：

from scipy.misc import imread, imresize
kitten, puppy = imread('kitten.jpg'), imread('puppy.jpg')

d = kitten.shape[1] - kitten.shape[0]
kitten_cropped = kitten[:, d/2:-d/2, :]

img_size = 200   # Make this smaller if it runs too slow
x = np.zeros((2, 3, img_size, img_size))
x[0, :, :, :] = imresize(puppy, (img_size, img_size)).transpose((2, 0, 1))
x[1, :, :, :] = imresize(kitten_cropped, (img_size, img_size)).transpose((2, 0, 1))

# Set up a convolutional weights holding 2 filters, each 3x3
w = np.zeros((2, 3, 3, 3))

# The first filter converts the image to grayscale.
# Set up the red, green, and blue channels of the filter.
w[0, 0, :, :] = [[0, 0, 0], [0, 0.3, 0], [0, 0, 0]]
w[0, 1, :, :] = [[0, 0, 0], [0, 0.6, 0], [0, 0, 0]]
w[0, 2, :, :] = [[0, 0, 0], [0, 0.1, 0], [0, 0, 0]]

# Second filter detects horizontal edges in the blue channel.
w[1, 2, :, :] = [[1, 2, 1], [0, 0, 0], [-1, -2, -1]]

# Vector of biases. We don't need any bias for the grayscale
# filter, but for the edge detection filter we want to add 128
# to each output so that nothing is negative.
b = np.array([0, 128])

# Compute the result of convolving each input in x with each filter in w,
# offsetting by b, and storing the results in out.
out, _ = conv_forward_naive(x, w, b, {'stride': 1, 'pad': 1})

def imshow_noax(img, normalize=True):
    """ Tiny helper to show images as uint8 and remove axis labels """
    if normalize:
        img_max, img_min = np.max(img), np.min(img)
        img = 255.0 * (img - img_min) / (img_max - img_min)
    plt.imshow(img.astype('uint8'))
    plt.gca().axis('off')

# Show the original images and the results of the conv operation
plt.subplot(2, 3, 1)
imshow_noax(puppy, normalize=False)
plt.title('Original image')
plt.subplot(2, 3, 2)
imshow_noax(out[0, 0])
plt.title('Grayscale')
plt.subplot(2, 3, 3)
imshow_noax(out[0, 1])
plt.title('Edges')
plt.subplot(2, 3, 4)
imshow_noax(kitten_cropped, normalize=False)
plt.subplot(2, 3, 5)
imshow_noax(out[1, 0])
plt.subplot(2, 3, 6)
imshow_noax(out[1, 1])
plt.show()

图像经过卷积后，输入结果如下所示：

im2col

运行上面代码的时候，我们发现对这两张图片计算卷积还是比较慢的，而在CNN中是存在大量的卷积运算的，所以我们需要一个更加快速的计算卷积的方法。如下图所示为Caffe中计算卷积的示意图，通过上面普通卷积运算的实现我们可以发现，卷积操作实际上是在对输入特征的一定范围内和卷积核滤波器做点乘，如下图我们可以利用这一特性把卷积操作转换成两个大矩阵相乘。

把输入图像要经行卷积操作的这一区域展成列向量的操作通常称为im2col，具体过程如下图所示。

下图为一个具体的例子，看懂下面这个图应该就会清楚上面的做法。

下面的im2col_cython是使用Cython代码来实现im2col功能，有关Cython在Python中的具体使用可参考：Python速度优化-Cython中numpy以及多线程的使用。

%%cython
import cython
cimport numpy as np
import numpy as np
ctypedef fused DTYPE_t:
    np.float32_t
    np.float64_t

def im2col_cython(np.ndarray[DTYPE_t, ndim=4] x, int field_height,
                  int field_width, int padding, int stride):
    cdef int N = x.shape[0]
    cdef int C = x.shape[1]
    cdef int H = x.shape[2]
    cdef int W = x.shape[3]

    cdef int HH = (H + 2 * padding - field_height) / stride + 1
    cdef int WW = (W + 2 * padding - field_width) / stride + 1

    cdef int p = padding
    cdef np.ndarray[DTYPE_t, ndim=4] x_padded = np.pad(x,
            ((0, 0), (0, 0), (p, p), (p, p)), mode='constant')

    cdef np.ndarray[DTYPE_t, ndim=2] cols = np.zeros(
            (C * field_height * field_width, N * HH * WW),
            dtype=x.dtype)

    # Moving the inner loop to a C function with no bounds checking works, but does
    # not seem to help performance in any measurable way.

    cdef int c, ii, jj, row, yy, xx, i, col

    for c in range(C):
        for yy in range(HH):
            for xx in range(WW):
                for ii in range(field_height):
                    for jj in range(field_width):
                        row = c * field_width * field_height + ii * field_height + jj
                        for i in range(N):
                            col = yy * WW * N + xx * N + i
                            cols[row, col] = x_padded[i, c, stride * yy + ii, stride * xx + jj]
    return cols

调用上面的im2col_cython函数来实现卷积操作：

def conv_forward_im2col(x, w, b, conv_param):
  """
  A fast implementation of the forward pass for a convolutional layer
  based on im2col and col2im.
  """
  N, C, H, W = x.shape
  num_filters, _, filter_height, filter_width = w.shape
  stride, pad = conv_param['stride'], conv_param['pad']

  # Check dimensions
  assert (W + 2 * pad - filter_width) % stride == 0, 'width does not work'
  assert (H + 2 * pad - filter_height) % stride == 0, 'height does not work'

  # Create output
  out_height = (H + 2 * pad - filter_height) / stride + 1
  out_width = (W + 2 * pad - filter_width) / stride + 1
  out = np.zeros((N, num_filters, out_height, out_width), dtype=x.dtype)

  # x_cols = im2col_indices(x, w.shape[2], w.shape[3], pad, stride)
  x_cols = im2col_cython(x, w.shape[2], w.shape[3], pad, stride)
  res = w.reshape((w.shape[0], -1)).dot(x_cols) + b.reshape(-1, 1)

  out = res.reshape(w.shape[0], out.shape[2], out.shape[3], x.shape[0])
  out = out.transpose(3, 0, 1, 2)

  cache = (x, w, b, conv_param, x_cols)
  return out, cache

测试使用im2col方法的卷积操作，从输出的图片可以看出和原始卷积方法一样。

out, _ = conv_forward_im2col(x, w, b, {'stride': 1, 'pad': 1})
# Show the original images and the results of the conv operation
plt.subplot(2, 3, 1)
imshow_noax(puppy, normalize=False)
plt.title('Original image')
plt.subplot(2, 3, 2)
imshow_noax(out[0, 0])
plt.title('Grayscale')
plt.subplot(2, 3, 3)
imshow_noax(out[0, 1])
plt.title('Edges')
plt.subplot(2, 3, 4)
imshow_noax(kitten_cropped, normalize=False)
plt.subplot(2, 3, 5)
imshow_noax(out[1, 0])
plt.subplot(2, 3, 6)
imshow_noax(out[1, 1])
plt.show()

下面来测试一下使用两种方法的时间，使用原始的卷积操作每次循环需要2.19s，而使用im2col方法则只需要28.3ms，时间大概缩短了77倍，当然这其中也包括了使用Cython所降低的时间，但总体上来说还是大大加快了卷积的计算速度。

虽然使用im2col方法加快了计算速度，但也会使用更多的内存，因为把输入图像转换为col的时候，会有很多重复的元素。

1 2	%timeit conv_forward_naive(x, w, b, {'stride': 1, 'pad': 1}) %timeit conv_forward_im2col(x, w, b, {'stride': 1, 'pad': 1})

1 2	1 loop, best of 3: 2.19 s per loop 10 loops, best of 3: 28.3 ms per loop

参考

Convolutional Neural Networks (CNNs / ConvNets)

深入理解Caffe源码（卷积实现详细分析

Ways for Visualizing Convolutional Networks

2016-09-25T02:00:00.000Z

近年来，卷积神经网络（CNN）在海量数据的物体分类、识别取得了巨大的成功，但是我们对CNN为什么能够取得这么好的效果以及其中间层所计算得到的特征的理解却是远远落后与CNN的应用。更多的时候CNN对于我们来说就像个黑盒子，输入数据和便签进行训练，然后就可以拟合出我们想要的结果。

如果不能弄明白CNN为什么能够工作的这么好，构建一个好的CNN模型就只能靠试错。为了对CNN有个直观的了解，近年来有许多工作围绕着CNN可视化来展开。

目前CNN的可视化方法主要分为两种：

(1) 前向计算可视化

通过前向计算直接可视化深度卷积网络每层的feature map，然后观察feature map的数值变化。一个训练成功的CNN网络，其feature map的值伴随网络深度的增加，会越来越稀疏。

(2)反向计算可视化

反向信号向后传播将低维的feature maps 还原到原图像空间，可视化该feature map被原图哪部分特征激活，从而理解该feature map从原图像学习了何种特征。

本文后面的内容也主要围绕这两方面展开。

模型介绍

在介绍一些具体的可视化方法之前，我们先介绍一下我们使用的模型,我们使用的网络是经过CaffeNet微调，用来分类21类光学遥感图像的模型，具体内容可参考CNN在光学遥感图像上的应用。

CaffeNet其实就是AlexNet在Caffe上的实现，为了适应我们具体的分类任务，输出层改为21个节点。

其中要分类的21类光学遥感图像如下图所示：

前向计算可视化

特征可视化

通过可视化CNN计算得到的特征通常是大家都能想到的事情，通常第一层能提取到的特征能够和图像对应上，但是到了CNN的更高层，提取到的特征就变的更加抽象，不容易解释。

如下图所示，Input为输入图像，Filter为CNN第一层卷积层所学
习到的参数，可视化后其实就是一个个抽取边缘的滤波器，然后Output为CNN第一层卷积层所提取到的特征，从图中可以看出来输入图像经过CNN第一层卷积层之后得到了边缘特征。

但是CNN高层滤波器对前面输入特征的组合，提取得到的高维特征就不怎么好解释了，如下图所示，顺着箭头方向依次为上述输入图片通过CNN高层卷积层所提取到的特征，可以发现特征随着网络的加深，会越来越抽象、越来越稀疏。

t-SNE visualization

有时为了体现CNN提取到特征的相关性，我们可以把提取到的特征经行t-SNE降维，然后在二维平面显示出来，如下图所示。从下图可以看出，视觉上看上去相似的图片，在降维后在平面上也很靠近。我们提取的是fc7层的特征（也称为CNN-Code）,t-SNE降维为2维向量显示如下。

遮挡实验

如下图，左边的图为输入图像，注意上边的黑色遮挡区域，我们在输入图像上逐渐移动遮挡区域，然后记录对应输入图像所对应的正确类别的输出概率。很容易理解，当我们遮挡住输入图像的关键区域时，对应的正确输出概率会很低，从下图也可以看出来，当遮挡住飞机的关键部位时，CNN判别为飞机场的概率下降到0.2以下。说明CNN模型确实学习到了物体的关键部分，而不是只依靠一些上下文环境，遮挡实验的代码可参考：occlusion_experiments。

反向计算可视化

前面介绍的几种前向计算可视化的方法都比较好理解，但是还是不能解释CNN深层提取到的特征究竟是什么，究竟对应了输入图像的哪一部分。

反向求导可视化

在探讨对图像反向求导可视化之前，我们先看看那一个线性分类器，公式如下：

$$f(x_i)=Wx_i+b$$

$W$为线性分类器权值、$x_i$表示一幅输入图像，$b$为偏置。如下图所示，$W$为线性分类器的权值维度为[3×4]，3表示要分类的数目，4表示为图片的每一个像素值打分;其中$x_i$为一幅图像展成的列向量，维度为[4×1];$b$的维度为[3×1],所以$Wx_i+b$得到一个[3×1]的向量表示当前输入图像$x_i$在每个类别上的打分，其中最高分预判为输入图像的类别。通过上述分析
可知$W$值决定了图像中的对应像素的重要性，某一类中某个像素越重要，则其对应的权值越大。

对于CNN来说因为有很多层非线性函数，$f(x_i)$为一个高度非线性话的分类器，不过我们可以把它看做一个整体，近似的等于一个线性分类器：

$$f(x_i) \approx Wx_i+b$$

然后我们可以对某个输入图片$x_0$上对上式求导，得打权值$W$，也就得到了对应输入图片的重要性大小。

$$W = \frac{\partial f(x_i)}{\partial I}\vert _{x_0}$$

产生的图像如下图所示，不是很明显，仔细看能看出飞机的轮廓。

欺骗CNN网络

上面讨论了通过对图片求导来得到对应图片像素的重要性，我们可以利用上面求到的图像导数来欺骗CNN网络，如下图所示坐上图为输入图片类别为airplane，然后给定一个目标类别denseresidential,我们通过对输入图像求梯度上升来最大化目标类别的输出，求得的梯度累加到输入图像上，知道CNN判别为目标类别。下图中我们可以看出，上面的两个图人眼看起来都是airplane类别，差别看起来也不大，但是CNN判别第二张图为denseresidential类比，从某种意义上说我们欺骗了CNN。

Class Model Visualisation

对于一个训练好的CNN模型，我们可以通过随机产生一张带噪声的图片然后在我们感兴趣的类别上通过梯度上升逐渐优化输入图片可以产生对应类别的图片。

更一般的, 让$I$ 表示随机产生的噪声图片， $y$ 表示我们感兴趣的类别， $s_y(I)$ 表示CNN 对图片 $I$ 在类别 $y$上的打分。我们希望能够产生的图片 $I^*$ 使得在类别 $y$ 上打分最高。

$$
I^* = \arg\max_I s_y(I) - R(I)
$$

其中 $R$ 为正则项，我们可以通过梯度上升法来求解。

产生的图片如下所示，可以看出产生的图像对目标的分类又一定的旋转不变形和尺度不变性。

Feature Inversion

为了CNN怎么去学习和理解特征，最近也有文章提出通过提取到的特征重建原图像的方法。我们在训练好的CNN模型的基础上，可以通过对图像的求导来实现。具体来说，给定图片
$I$, 让$\phi_\ell(I)$ 表示卷积神经$\phi$中 $\ell$ 层所提取到的特征。我们想要求得一张图片$I^*$ 在网络$\phi$中的$\ell$ 层和图片 $I$有相同的特征。

$$
I^* = \arg\min_{I’} |\phi_\ell(I) - \phi_\ell(I’)|_2^2 + R(I’)
$$

其中 $|\cdot|_2^2$ 为欧式距离，$R$ 表示正则项。

下图展示了从不同层提取的特征重建原图的结果，可以看出层数越深，重建出的结果和原图差异越大，因为CNN在特取特征的过程中，还有一个压缩学习图片最本质特征的作用，所以越往后层，重建得到图片越是代表原图片的本质。

DeepDream

2015年夏天，google发布了一种从神经网络产生图片的新方法，原理其实很简单，就是从神经网络中的某一层提取特征，然后让这一层的反向梯度等于这一层提取到的特征，然后在反向传导回图像，通常会选择在卷积层进行操作，所以可以产生任意分辨率的图像。

过程如下，我们先对CNN输入一张原图

然后选择激活某一层的特征，如果选择的是高层特征，反向传递得到的结果如下，高层特征反向传递得到了一些复杂的模式；

如果是低层的特征，则得到的是一些线条，纹理特征。

如果我们把上述输出的结果当成输入再次传入，经过一定次数的循环，一些模式会得到增强，输出结果看起来有点惊悚:

反卷积可视化

反卷积顾名思义是和卷积相反的操作，使用反卷积进行特征的可视化，可以理解为把得到的特征映射回原图像的输入空间。反卷积网络如下图所示，其中下图左边为反卷积网络、右边为卷积网络。其中反卷积网络中的反卷积层和卷积网络中卷积层对应，Unpooling层和pooling层对应。卷积网络是输入图像提取特征，而反卷积网络是从特征映射到输入图像。

流程如上图所示。

正常卷积过程convnet：

如图右侧黑框流程图部分，上一层pooled的特征图，通过本层的filter卷积后，形成本层的卷积特征，然后经过ReLU函数进行非线性变换的到Recitifed特征图，再经过本层的max-pooling操作，完成本层的卷积池化操作；之后传入下一层。本层需要记录在执行max-pooling操作时，每个pooing局域内最大值的位置

选择激活值：

为了理解某一个给定的pooling特征激活值，先把特征中其他的激活值设置为0；然后利用deconvnet把这个给定的激活值映射到初始像素层。

反卷积过程deconvnet：

Unpooling

顾名思义就是反pooling过程，由于pooling是不可逆的，所以unpooling只是正常pooling的一种近似；通过记录正常pooling时的位置，把传进来的特征按照记录的方式重新“摆放”，来近似pooling前的卷基层特征。如图中彩色部分

Filtering

利用卷积过程filter的转置（实际上就是水平和数值翻转filter）版本来计算卷积前的特征图；从而形成重构的特征。从一个单独的激活值获得的重构图片类似原始图片的一个部分。

反卷积反池化过程如下所示：

总结

通过CNN可视化，我们可以看到底层卷积网络学习到的是一些边缘、颜色块等信息；高层网络通过对底层网络抽取到的特征经行组合，学习到了更加复杂以及具有不变性的特征。特征的可视化都是通过对图片方向求导来计算，通过设置不同的优化函数，梯度下降求导来达到可视化的目的。

参考

Understanding deep image representations by inverting them.

Deep neural networks are easily fooled: High confidence predictions for unrecognizable images.

Deep inside convolutional networks: Visualising image classification models and saliency maps.

Understanding neural networks through deep visualization.

Visualizing and understanding convolutional networks.

Linux Deepin Note

2016-08-30T02:00:00.000Z

系统备份及还原

深度操作系统，是一个Linux发行版，由武汉深之度科技有限公司开发。Deepin系统不仅仅注重系统和桌面环境的开发，同时还注重配套的基础软件开发，目前Deepin系统已经拥有相当多深度特色应用并与许多第三方厂商合作推出热门应用的Linux版本。以上来自维基百科，总的来说Deepin界面很漂亮，对新手也很友好，但就是有点不稳定加上我又爱折腾，所以有时会崩溃，所以进行系统备份还是很有必要的。

备份系统前我们先了解下Linux文件系统的目录结构，清楚哪些文件夹需要备份，哪些不需要。

备份过程

切换到root

sudo su

进入根目录

cd /

执行打包命令

1	tar -cvpzf /media/ldy/6482108A82/backup1.tgz --exclude=/proc --exclude=/lost+found --exclude=/tmp --exclude=/sys --exclude=/media --exclude=/home /

命令解释：

tar：linux常用的打包程序
cvpzf：式tar的参数

c创建新文档
v处理过程中输出相关信息
p表示保持相同的权限
z调用gzip来压缩归档文件，与x联用时调用gzip完成解压缩
f对普通文件操作

/media/ldy/6482108A82/backup1.tgz：表示打包到你挂载的硬盘里并命名为backup1.tgz

exclude=/proc：排除/proc目录，不打包这个目录，后面也同理，为什么排除参考上面的Linux文件系统的目录结构，为什么排除/home，因为我把/home新分了一区，在重装系统的时候选择不格式化/home分区即可保留数据

/：表示打包linux根目录所有文件，当然了排除的文件不包含在内

恢复过程(还未实践)

切换到root

sudo su

进入根目录

cd /

解压恢复系统

1	tar xvpfz linuxbackup.tgz -C /

等执行完后，别急着重启系统，要记得创建刚才在备份时候排除的目录，手动创建，例如上面我们排除，我们需创建

mkdir proc  
mdkir lost+found  
mkdir mnt   
mkdir sys  
mkdir tmp
mkdir media

fsck命令

前两天由于笔记本突然掉电，导致/home分区损坏，开机出现：

1	Cannot open access to console , the root account is locked.

解决方法

用deepin安装u盘启动，出现选择安装语言的界面时，按ctrl+alt+F1，进入tty，然后输入startx，进入live cd模式，挂载硬盘的根分区，然后修改/etc/fstab文件，把里面的/home分区里的启动项注释掉，如下所示。mount 命令在开始时会读取这个文件，确定设备和分区的挂载选项，注释掉后开机就不会挂载/home分区。

# /dev/sda2
UUID=79813e75-eab0-42e4-b77c-daba9a9b7d01	/         	ext4      	rw,relatime,data=ordered	0 1

# /dev/sda6
#UUID=8b23af2a-2fd6-426e-8e63-f791378d8485	/home     	ext4      	rw,relatime,data=ordered	0 2

# /dev/sda5
UUID=730d40c7-946a-478e-bde9-9501ba156103	none      	swap      	defaults  	0 0

修改后退出liveCD模式进入原系统，因为没有挂载损坏的/home分区，所以能进入系统，但是是不能进入图形界面的，进入文字界面执行下述命令修护损坏的/home分区，其中/dev/sda6为/home分区所在的设备名，设备名可以通过fdisk -l查看。

1	sudo fsck -y /dev/sda6

修复成功后，取消/etc/fstab的注释，重启即可。

双硬盘开机挂载

前面已经介绍过/etc/fstab文件，要开机加载其他硬盘修改这个文件就可以。

UUID

所有分区和设备都有唯一的 UUID。它们由文件系统生成工具 (mkfs.*) 在创建文件系统时生成。
lsblk -f命令将显示所有设备的 UUID 值。/etc/fstab 中使用 UUID= 前缀:

/etc/fstab
# <file system>                           <dir>         <type>    <options>             <dump> <pass>

tmpfs                                     /tmp          tmpfs     nodev,nosuid          0      0

UUID=24f28fc6-717e-4bcd-a5f7-32b959024e26 /     ext4              defaults,noatime      0      1
UUID=03ec5dd3-45c0-4f95-a363-61ff321a09ff /home ext4              defaults,noatime      0      2
UUID=4209c845-f495-4c43-8a03-5363dd433153 none  swap              defaults              0      0

各段含义

<file systems> ：要挂载的分区或存储设备.

<dir> ： <file systems>的挂载位置。

<type> 要挂载设备或是分区的文件系统类型，支持许多种不同的文件系统：ext2, ext3, ext4, reiserfs, xfs, jfs, smbfs, iso9660, vfat, ntfs, swap 及 auto。设置成auto类型，mount 命令会猜测使用的文件系统类型，对 CDROM 和 DVD 等移动设备是非常有用的。

<options> 挂载时使用的参数，使用默认参数defaults即可。

<dump>dump 工具通过它决定何时作备份. dump 会检查其内容，并用数字来决定是否对这个文件系统进行备份。允许的数字是 0 和 1 。0 表示忽略， 1 则进行备份。大部分的用户是没有安装 dump 的，对他们而言应设为 0。

<pass> fsck 读取的数值来决定需要检查的文件系统的检查顺序。允许的数字是0, 1, 和2。根目录应当获得最高的优先权 1, 其它所有需要被检查的设备设置为 2. 0 表示设备不会被 fsck 所检查。

实例

比如我要开机自动挂载/dev/sdb5这个设备，在/etc/fstab后面加入下面内容即可。

1 2	# /dev/sdb5 UUID=6482108A821062BA /media/ldy/6482108A82 ntfs defaults 0 0

Implementation of Batch Normalization Layer

2016-08-18T02:00:00.000Z

数据归一化

通常在神经网络训练开始前,都要对输入数据做一个归一化处理,那么具体为什么需要归一化呢?归一化后有什么好处呢?原因在于神经网络学习过程本质就是为了学习数据分布,一旦训练数据与测试数据的分布不同,那么网络的泛化能力也大大降低;另外一方面,一旦每批训练数据的分布各不相同(batch 梯度下降),那么网络就要在每次迭代都去学习适应不同的分布,这样将会大大降低网络的训练速度,这也正是为什么我们需要对数据都要做一个归一化预处理的原因。对于深度网络的训练是一个复杂的过程,只要网络的前面几层发生微小的改变,那么后面几层就会被累积放大下去。一旦网络某一层的输入数据的分布发生改变,那么这一层网络就需要去适应学习这个新的数据分布,所以如果训练过程中,训练数据的分布一直在发生变化,那么将会影响网络的训练速度。

举例说明进行数据预处理能够加速训练过程，上图中红点代表2维的数据点，由于图像数据的每一维一般都是0-255之间的数字，因此数据点只会落在第一象限，而且图像数据具有很强的相关性，比如第一个灰度值为30，比较黑，那它旁边的一个像素值一般不会超过100，否则给人的感觉就像噪声一样。由于强相关性，数据点仅会落在第一象限的很小的区域中，形成类似上图所示的狭长分布。

而神经网络模型在初始化的时候，权重W是随机采样生成的，一个常见的神经元表示为：ReLU(Wx+b) = max(Wx+b,0)，即在Wx+b=0的两侧，对数据采用不同的操作方法。具体到ReLU就是一侧收缩，一侧保持不变。

随机的Wx+b=0表现为上图中的随机虚线，注意到，两条绿色虚线实际上并没有什么意义，在使用梯度下降时，可能需要很多次迭代才会使这些虚线对数据点进行有效的分割，就像紫色虚线那样，这势必会带来求解速率变慢的问题。更何况，我们这只是个二维的演示，数据占据四个象限中的一个，如果是几百、几千、上万维呢？而且数据在第一象限中也只是占了很小的一部分区域而已，可想而知不对数据进行预处理带来了多少运算资源的浪费，而且大量的数据外分割面在迭代时很可能会在刚进入数据中时就遇到了一个局部最优，导致overfit的问题。

这时，如果我们将数据减去其均值，数据点就不再只分布在第一象限，这时一个随机分界面落入数据分布的概率增加了多少呢？2^n倍！如果我们使用去除相关性的算法，例如PCA和ZCA白化，数据不再是一个狭长的分布，随机分界面有效的概率就又大大增加了。

不过计算协方差矩阵的特征值太耗时也太耗空间，我们一般最多只用到z-score处理，即每一维度减去自身均值，再除以自身标准差，这样能使数据点在每维上具有相似的宽度，可以起到一定的增大数据分布范围，进而使更多随机分界面有意义的作用。

batch normalization 算法

算法基本流程：

如果在ReLU=max(Wx+b,0)之后，对数据进行归一化。然而，文章中说这样做在训练初期，分界面还在剧烈变化时，计算出的参数不稳定，所以退而求其次，在Wx+b之后进行归一化。因为初始的W是从标准高斯分布中采样得到的，而W中元素的数量远大于x，Wx+b每维的均值本身就接近0、方差接近1，所以在Wx+b后使用Batch Normalization能得到更稳定的结果。

文中使用了类似z-score的归一化方式：每一维度减去自身均值，再除以自身标准差，由于使用的是随机梯度下降法，这些均值和方差也只能在当前迭代的batch中计算，故作者给这个算法命名为Batch Normalization。

在Normalization完成后，Google的研究员仍对数值稳定性不放心，又加入了两个参数gamma和beta，使得

$$y_i=\gamma \hat{x}_i+ \beta$$

注意到，如果我们令gamma等于之前求得的标准差，beta等于之前求得的均值，则这个变换就又将数据还原回去了。在他们的模型中，这两个参数与每层的W和b一样，是需要迭代求解的。为什么进行归一化之后又添加两个可学习的参数对数据进行变化：实际上BN可以看作是在原模型上加入的“新操作”，这个新操作很大可能会改变某层原来的输入。当然也可能不改变，不改变的时候就是“还原原来输入”。如此一来，既可以改变同时也可以保持原输入，那么模型的容纳能力（capacity）就提升了。

算法实现

根据链式求导法则，我们可以把复杂的运算分解成一步一步能够简单求导的运算，然后根据链式求导法则来求得最终的导数，参考cs231n。

前向传播

def batchnorm_forward(x, gamma, beta, eps):

  N, D = x.shape

  #step1: calculate mean
  mu = 1./N * np.sum(x, axis = 0)

  #step2: subtract mean vector of every trainings example
  xmu = x - mu

  #step3: following the lower branch - calculation denominator
  sq = xmu ** 2

  #step4: calculate variance
  var = 1./N * np.sum(sq, axis = 0)

  #step5: add eps for numerical stability, then sqrt
  sqrtvar = np.sqrt(var + eps)

  #step6: invert sqrtwar
  ivar = 1./sqrtvar

  #step7: execute normalization
  xhat = xmu * ivar

  #step8: Nor the two transformation steps
  gammax = gamma * xhat

  #step9
  out = gammax + beta

  #store intermediate
  cache = (xhat,gamma,xmu,ivar,sqrtvar,var,eps)

  return out, cache

反向求导

def batchnorm_backward(dout, cache):

  #unfold the variables stored in cache
  xhat,gamma,xmu,ivar,sqrtvar,var,eps = cache

  #get the dimensions of the input/output
  N,D = dout.shape

  #step9
  dbeta = np.sum(dout, axis=0)
  dgammax = dout #not necessary, but more understandable

  #step8
  dgamma = np.sum(dgammax*xhat, axis=0)
  dxhat = dgammax * gamma

  #step7
  divar = np.sum(dxhat*xmu, axis=0)
  dxmu1 = dxhat * ivar

  #step6
  dsqrtvar = -1. /(sqrtvar**2) * divar

  #step5
  dvar = 0.5 * 1. /np.sqrt(var+eps) * dsqrtvar

  #step4
  dsq = 1. /N * np.ones((N,D)) * dvar

  #step3
  dxmu2 = 2 * xmu * dsq

  #step2
  dx1 = (dxmu1 + dxmu2)
  dmu = -1 * np.sum(dxmu1+dxmu2, axis=0)

  #step1
  dx2 = 1. /N * np.ones((N,D)) * dmu

  #step0
  dx = dx1 + dx2

  return dx, dgamma, dbeta

卷积层batch normalization

这里有一点需要注意，像卷积层这样具有权值共享的层，Wx+b的均值和方差是对整张map求得的，在batch_size * channel * height * width这么大的一层中，对总共batch_size*height*width个像素点统计得到一个均值和一个标准差，共得到channel组参数。

也就是说把每个channel看出一批数据，然后就可以调用全连接层的batch normalization 算法了。

def spatial_batchnorm_forward(x, gamma, beta, bn_param):
  """
  Computes the forward pass for spatial batch normalization.

  Inputs:
  - x: Input data of shape (N, C, H, W)
  - gamma: Scale parameter, of shape (C,)
  - beta: Shift parameter, of shape (C,)
  - bn_param: Dictionary with the following keys:
    - mode: 'train' or 'test'; required
    - eps: Constant for numeric stability
    - momentum: Constant for running mean / variance. momentum=0 means that
      old information is discarded completely at every time step, while
      momentum=1 means that new information is never incorporated. The
      default of momentum=0.9 should work well in most situations.
    - running_mean: Array of shape (D,) giving running mean of features
    - running_var Array of shape (D,) giving running variance of features

  Returns a tuple of:
  - out: Output data, of shape (N, C, H, W)
  - cache: Values needed for the backward pass
  """
  out, cache = None, None
  N, C, H, W = x.shape
  x_flat = x.transpose(0, 2, 3, 1).reshape(-1, C)
  out_flat, cache = batchnorm_forward(x_flat, gamma, beta, bn_param)
  out = out_flat.reshape(N, H, W, C).transpose(0, 3, 1, 2)

  return out, cache


def spatial_batchnorm_backward(dout, cache):
  """
  Computes the backward pass for spatial batch normalization.

  Inputs:
  - dout: Upstream derivatives, of shape (N, C, H, W)
  - cache: Values from the forward pass

  Returns a tuple of:
  - dx: Gradient with respect to inputs, of shape (N, C, H, W)
  - dgamma: Gradient with respect to scale parameter, of shape (C,)
  - dbeta: Gradient with respect to shift parameter, of shape (C,)
  """
  dx, dgamma, dbeta = None, None, None                                   
  N, C, H, W = dout.shape
  dout_flat = dout.transpose(0, 2, 3, 1).reshape(-1, C)
  dx_flat, dgamma, dbeta = batchnorm_backward(dout_flat, cache)
  dx = dx_flat.reshape(N, H, W, C).transpose(0, 3, 1, 2)                        
  return dx, dgamma, dbeta

参考

Batch Normalization 学习笔记

《Batch Normalization Accelerating Deep Network Training by Reducing Internal Covariate Shift》阅读笔记与实现

深度学习中 Batch Normalization为什么效果好？ - 回答作者: 魏秀参

Understanding the backward pass through Batch Normalization Layer

Shopping Reviews sentiment analysis

2016-07-20T02:00:00.000Z

情感分析是一种常见的自然语言处理（NLP）方法的应用，特别是在以提取文本的情感内容为目标的分类方法中。通过这种方式，情感分析可以被视为利用一些情感得分指标来量化定性数据的方法。尽管情绪在很大程度上是主观的，但是情感量化分析已经有很多有用的实践，比如企业分析消费者对产品的反馈信息，或者检测在线评论中的差评信息。

最简单的情感分析方法是利用词语的正负属性来判定。句子中的每个单词都有一个得分，乐观的单词得分为+1，悲观的单词则为-1。然后我们对句子中所有单词得分进行加总求和得到一个最终的情感总分。很明显，这种方法有许多局限之处，最重要的一点在于它忽略了上下文的信息。例如，在这个简易模型中，因为“not”的得分为-1，而“good”的得分为 +1，所以词组“not good”将被归类到中性词组中。但是“not good”通常是消极的。

另外一个常见的方法是将文本视为一个“词袋”。我们将每个文本看出一个1xN的向量，其中N表示文本词汇的数量。该向量中每一列都是一个单词，其对应的值为该单词出现的频数。例如，词组“bag of bag of words”可以被编码为[2, 2, 1]。这些数据可以被应用到机器学习分类算法中（比如罗吉斯回归或者支持向量机），从而预测未知数据的情感状况。需要注意的是，这种有监督学习的方法要求利用已知情感状况的数据作为训练集。虽然这个方法改进了之前的模型，但是它仍然忽略了上下文的信息和数据集的规模情况。

Word2Vec and Doc2Vec

谷歌开发了一个叫做Word2Vec的方法，该方法可以在捕捉语境信息的同时压缩数据规模。Word2Vec实际上是两种不同的方法：Continuous Bag of Words (CBOW) 和 Skip-gram。CBOW的目标是根据上下文来预测当前词语。Skip-gram刚好相反：根据当前词语来预测上下文。这两种方法都利用人工神经网络作为它们的分类算法。起初，每个单词都是一个随机的 N 维向量。经过训练之后，该算法利用 CBOW 或者 Skip-gram 的方法获得了每个单词的最优向量。

在上图中 $w(t)$ 表示当前的词汇，$w(t-2)$ ， $w(t-1)$ 等表示上下文词汇。

现在这些词向量已经捕捉到上下文的信息。我们可以利用基本代数公式来发现单词之间的关系（比如，“国王”-“男人”+“女人”=“王后”）。这些词向量可以代替词袋用来预测未知数据的情感状况。该模型的优点在于不仅考虑了语境信息还压缩了数据规模（通常情况下，词汇量规模大约在300个单词左右而不是之前模型的100000个单词）。因为神经网络可以替我们提取出这些特征的信息，所以我们仅需要做很少的手动工作。

使用SVM和Word2Vec进行情感分类

我们使用的训练数据是网友苏剑林收集分享的两万多条中文标注语料，涉及六个领域的评论数据。

我们随机正负这两组数据中抽取样本，构建比例为8：2的训练集和测试集。随后，我们对训练集数据构建Word2Vec模型，其中分类器的输入值为推文中所有词向量的加权平均值。word2vec工具和svm分类器分别使用python中的gensim库和sklearn库。

加载文件，并分词

# 加载文件，导入数据,分词
def loadfile():
    neg=pd.read_excel('data/neg.xls',header=None,index=None)
    pos=pd.read_excel('data/pos.xls',header=None,index=None)

    cw = lambda x: list(jieba.cut(x))
    pos['words'] = pos[0].apply(cw)
    neg['words'] = neg[0].apply(cw)

    #print pos['words']
    #use 1 for positive sentiment, 0 for negative
    y = np.concatenate((np.ones(len(pos)), np.zeros(len(neg))))

    x_train, x_test, y_train, y_test = train_test_split(np.concatenate((pos['words'], neg['words'])), y, test_size=0.2)

    np.save('svm_data/y_train.npy',y_train)
    np.save('svm_data/y_test.npy',y_test)
    return x_train,x_test

计算词向量，并对每个评论的所有词向量取均值作为每个评论的输入

#对每个句子的所有词向量取均值
def buildWordVector(text, size,imdb_w2v):
    vec = np.zeros(size).reshape((1, size))
    count = 0.
    for word in text:
        try:
            vec += imdb_w2v[word].reshape((1, size))
            count += 1.
        except KeyError:
            continue
    if count != 0:
        vec /= count
    return vec

#计算词向量
def get_train_vecs(x_train,x_test):
    n_dim = 300
    #Initialize model and build vocab
    imdb_w2v = Word2Vec(size=n_dim, min_count=10)
    imdb_w2v.build_vocab(x_train)

    #Train the model over train_reviews (this may take several minutes)
    imdb_w2v.train(x_train)

    train_vecs = np.concatenate([buildWordVector(z, n_dim,imdb_w2v) for z in x_train])
    #train_vecs = scale(train_vecs)

    np.save('svm_data/train_vecs.npy',train_vecs)
    print train_vecs.shape
    #Train word2vec on test tweets
    imdb_w2v.train(x_test)
    imdb_w2v.save('svm_data/w2v_model/w2v_model.pkl')
    #Build test tweet vectors then scale
    test_vecs = np.concatenate([buildWordVector(z, n_dim,imdb_w2v) for z in x_test])
    #test_vecs = scale(test_vecs)
    np.save('svm_data/test_vecs.npy',test_vecs)
    print test_vecs.shape

训练svm模型

##训练svm模型
def svm_train(train_vecs,y_train,test_vecs,y_test):
    clf=SVC(kernel='rbf',verbose=True)
    clf.fit(train_vecs,y_train)
    joblib.dump(clf, 'svm_data/svm_model/model.pkl')
    print clf.score(test_vecs,y_test)

在没有创建任何类型的特性和最小文本预处理的情况下，我们利用Scikit-Learn构建的简单线性模型的预测精度为80%左右。有趣的是，删除标点符号会影响预测精度，这说明Word2Vec模型可以提取出文档中符号所包含的信息。处理单独的单词，训练更长时间，做更多的数据预处理工作，和调整模型的参数都可以提高预测精度。用svm分类有一个缺点是，我们把每个句子的词向量求平均丢失了句子词语之间的顺序信息。

使用LSTM和Word2Vec进行情感分类

人类的思维不是每时每刻都是崭新的，就像你阅读一篇文章时，你理解当前词语的基础是基于对之前词语的理解，人类的思维是能保持一段时间的。传统的人工神经网络，并不能模拟人类思维具有记忆性这一特征，例如，你想要分类电影在某一时间点发生了什么事情，使用传统的人工神经网络并不能清楚的表现出之前出现的镜头对当前镜头的提示。循环神经网络能够很好的处理这个问题。

RNN相对于传统的神经网络，它允许我们对向量序列进行操作：输入序列、输出序列、或大部分的输入输出序列。如下图所示，每一个矩形是一个向量，箭头则表示函数（比如矩阵相乘）。输入向量用红色标出，输出向量用蓝色标出，绿色的矩形是RNN的状态（下面会详细介绍）。从做到右：（1）没有使用RNN的Vanilla模型，从固定大小的输入得到固定大小输出（比如图像分类）。（2）序列输出（比如图片字幕，输入一张图片输出一段文字序列）。（3）序列输入（比如情感分析，输入一段文字然后将它分类成积极或者消极情感）。（4）序列输入和序列输出（比如机器翻译：一个RNN读取一条英文语句然后将它以法语形式输出）。（5）同步序列输入输出（比如视频分类，对视频中每一帧打标签）。我们注意到在每一个案例中，都没有对序列长度进行预先特定约束，因为递归变换（绿色部分）是固定的，而且我们可以多次使用。

单纯循环神经网络因为无法处理随着递归，权重指数级爆炸或消失的问题（Vanishing gradient problem），难以捕捉长期时间关联；而结合不同的LSTM可以很好解决这个问题。

LSTM 全称叫 Long Short Term Memory networks，它和传统 RNN 唯一的不同就在与其中的神经元（感知机）的构造不同。传统的 RNN 每个神经元和一般神经网络的感知机没啥区别，但在 LSTM 中，每个神经元是一个“记忆细胞”，细胞里面有一个“输入门”（input gate）, 一个“遗忘门”（forget gate），一个“输出门”（output gate）。

这个设计的用意在于，能够使得LSTM维持两条线，一条明线：当前时刻的数据流（包括其他细胞的输入和来自数据的输入）；一条暗线：这个细胞本身的记忆流。两条线互相呼应，互相纠缠，就像佛祖青灯里的两根灯芯。典型的工作流如下：在“输入门”中，根据当前的数据流来控制接受细胞记忆的影响；接着，在“遗忘门”里，更新这个细胞的记忆和数据流；然后在“输出门”里产生输出更新后的记忆和数据流。LSTM 模型的关键之一就在于这个“遗忘门”，它能够控制训练时候梯度在这里的收敛性（从而避免了 RNN 中的梯度 vanishing/exploding问题），同时也能够保持长期的记忆性。

实现步骤:

加载训练文件并分词

#加载训练文件
def loadfile():
    neg=pd.read_excel('data/neg.xls',header=None,index=None)
    pos=pd.read_excel('data/pos.xls',header=None,index=None)

    combined=np.concatenate((pos[0], neg[0]))
    y = np.concatenate((np.ones(len(pos),dtype=int), np.zeros(len(neg),dtype=int)))

    return combined,y

#对句子经行分词，并去掉换行符
def tokenizer(text):
    ''' Simple Parser converting each document to lower-case, then
        removing the breaks for new lines and finally splitting on the
        whitespace
    '''
    text = [jieba.lcut(document.replace('\n', '')) for document in text]
    return text

创建词语字典，并返回每个词语的索引，词向量，以及每个句子所对应的词语索引

def create_dictionaries(model=None,
                        combined=None):
    ''' Function does are number of Jobs:
        1- Creates a word to index mapping
        2- Creates a word to vector mapping
        3- Transforms the Training and Testing Dictionaries

    '''
    if (combined is not None) and (model is not None):
        gensim_dict = Dictionary()
        gensim_dict.doc2bow(model.vocab.keys(),
                            allow_update=True)
        w2indx = {v: k+1 for k, v in gensim_dict.items()}#所有频数超过10的词语的索引
        w2vec = {word: model[word] for word in w2indx.keys()}#所有频数超过10的词语的词向量

        def parse_dataset(combined):
            ''' Words become integers
            '''
            data=[]
            for sentence in combined:
                new_txt = []
                for word in sentence:
                    try:
                        new_txt.append(w2indx[word])
                    except:
                        new_txt.append(0)
                data.append(new_txt)
            return data
        combined=parse_dataset(combined)
        combined= sequence.pad_sequences(combined, maxlen=maxlen)#每个句子所含词语对应的索引，所以句子中含有频数小于10的词语，索引为0
        return w2indx, w2vec,combined
    else:
        print 'No data provided...'


#创建词语字典，并返回每个词语的索引，词向量，以及每个句子所对应的词语索引
def word2vec_train(combined):

    model = Word2Vec(size=vocab_dim,
                     min_count=n_exposures,
                     window=window_size,
                     workers=cpu_count,
                     iter=n_iterations)
    model.build_vocab(combined)
    model.train(combined)
    model.save('lstm_data/Word2vec_model.pkl')
    index_dict, word_vectors,combined = create_dictionaries(model=model,combined=combined)
    return   index_dict, word_vectors,combined

训练网络，并保存模型，其中LSTM的实现采用Python中的keras库

def get_data(index_dict,word_vectors,combined,y):

    n_symbols = len(index_dict) + 1  # 所有单词的索引数，频数小于10的词语索引为0，所以加1
    embedding_weights = np.zeros((n_symbols, vocab_dim))#索引为0的词语，词向量全为0
    for word, index in index_dict.items():#从索引为1的词语开始，对每个词语对应其词向量
        embedding_weights[index, :] = word_vectors[word]
    x_train, x_test, y_train, y_test = train_test_split(combined, y, test_size=0.2)
    print x_train.shape,y_train.shape
    return n_symbols,embedding_weights,x_train,y_train,x_test,y_test


##定义网络结构
def train_lstm(n_symbols,embedding_weights,x_train,y_train,x_test,y_test):
    print 'Defining a Simple Keras Model...'
    model = Sequential()  # or Graph or whatever
    model.add(Embedding(output_dim=vocab_dim,
                        input_dim=n_symbols,
                        mask_zero=True,
                        weights=[embedding_weights],
                        input_length=input_length))  # Adding Input Length
    model.add(LSTM(output_dim=50, activation='sigmoid', inner_activation='hard_sigmoid'))
    model.add(Dropout(0.5))
    model.add(Dense(1))
    model.add(Activation('sigmoid'))

    print 'Compiling the Model...'
    model.compile(loss='binary_crossentropy',
                  optimizer='adam',metrics=['accuracy'])

    print "Train..."
    model.fit(x_train, y_train, batch_size=batch_size, nb_epoch=n_epoch,verbose=1, validation_data=(x_test, y_test),show_accuracy=True)

    print "Evaluate..."
    score = model.evaluate(x_test, y_test,
                                batch_size=batch_size)

    yaml_string = model.to_yaml()
    with open('lstm_data/lstm.yml', 'w') as outfile:
        outfile.write( yaml.dump(yaml_string, default_flow_style=True) )
    model.save_weights('lstm_data/lstm.h5')
    print 'Test score:', score


#训练模型，并保存
def train():
    print 'Loading Data...'
    combined,y=loadfile()
    print len(combined),len(y)
    print 'Tokenising...'
    combined = tokenizer(combined)
    print 'Training a Word2vec model...'
    index_dict, word_vectors,combined=word2vec_train(combined)
    print 'Setting up Arrays for Keras Embedding Layer...'
    n_symbols,embedding_weights,x_train,y_train,x_test,y_test=get_data(index_dict, word_vectors,combined,y)
    print x_train.shape,y_train.shape
    train_lstm(n_symbols,embedding_weights,x_train,y_train,x_test,y_test)

结果分析

使用LSTM网络在测试集上的准确率为92%，比用SVM分类提高了不少。

代码地址

https://github.com/BUPTLdy/Sentiment-Analysis

参考

http://www.15yan.com/story/huxAyyeuYAj/

http://colah.github.io/posts/2015-08-Understanding-LSTMs/

http://karpathy.github.io/2015/05/21/rnn-effectiveness/

Solve Linear Classifier by SGD

2016-06-20T02:00:00.000Z

线性分类器

一个线性分类器的基本形式如下所示：
$$f(x_i,W,b)=Wx_i+b （1）$$
在上面的公式中，如果是对图像经行分类，$x_i$表示对一张图片展开成一个列向量维数为[D,1],矩阵W维数为[K,D],向量b维数为[K,1]。参数W通常成为权值，b为偏置。

导入数据

import numpy as np
from sklearn.datasets import load_digits
from sklearn.cross_validation import train_test_split
import matplotlib.pyplot as plt

# This is a bit of magic to make matplotlib figures appear inline in the
# notebook rather than in a new window.
%matplotlib inline
plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'

digits=load_digits()
X_train, X_test, y_train, y_test = train_test_split(digits.data, digits.target, test_size=0.2)
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.1)
print X_train.shape, y_train.shape
print X_val.shape, y_val.shape
print X_test.shape, y_test.shape

(1293, 64) (1293,)
(144, 64) (144,)
(360, 64) (360,)

classes = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']
num_classes = len(classes)
samples_per_class = 3
for y, cls in enumerate(classes):
    idxs = np.flatnonzero(y_train == y)
    idxs = np.random.choice(idxs, samples_per_class, replace=False)
    for i, idx in enumerate(idxs):
        plt_idx = i * num_classes + y + 1
        plt.subplot(samples_per_class, num_classes, plt_idx)
        plt.imshow(X_train[idx].astype('uint8').reshape(8,8))
        plt.axis('off')
        if i == 0:
            plt.title(cls)
plt.show()

比如在数字手写识别数据集中，要分类[0-9]共十类数字手写图片，每张图片的像素为8×8，如上图所示。分类器的目的就是通过训练得到参数W,b，应为我们知道输入数据是 $(x_i,y_i)$ （$x_i$是输入图片像素值，$y_i$为对应类别号）是给定而且是固定的，我们的目标就是通过控制参数W,b来尽量拟合公式(1), 使得公式(1)能通过参数对输入数据$x_i$计算得到正确的$y_i$。

Bias trick,在公式(1)中有两个参数W,b，通过一个小技巧可以把这两个参数组合在一个矩阵中，通过把$x_i$增加一维，设置值为1，就可以把公式(1)写为：
$$f(x_i,W)=Wx_i$$
原理如下图所示：

X_train = np.hstack([X_train, np.ones((X_train.shape[0], 1))])
X_val = np.hstack([X_val, np.ones((X_val.shape[0], 1))])
X_test = np.hstack([X_test, np.ones((X_test.shape[0], 1))])

print X_train.shape, X_val.shape, X_test.shape

(1293, 65) (144, 65) (360, 65)

损失函数

上面说到使用公式$f(x_i,W)=Wx_i$对输入图片经行每一类的打分，但是开始时线性分类器预测的打分和我们真实的类别可能相差比较远，我们需要一个函数来表示真实的分数和分类器所计算到的分数之间的距离，这个函数就叫损失函数。

比如说我们输入一张图像像素值$x_i$，其真实类别为$y_i$,我们通过分类器计算每类的得分 $f(x_i,W)$ 。例如 $s_j=f(x_i,W)_ j$ 表示分类器对输入数据 $x_i$ 预测为第j类的可能性，那么损失函数就可以定义为：

$$L_i=\sum _{j\neq y_i}max(0,s_j-s_{y_i}+\Delta) (2)$$

假如我们有三类通过分类器得到每类的分数为[13,-7,11],并假设第一类是正确的类别($y_i=0$), 并假设 $\Delta=10$ ，我们可以通过上述公式计算得到损失函数值为：
$$L_i=max(0,−7−13+10)+max(0,11−13+10)$$

我们可以看到第一个max函数求得的值为0，我们可以理解为对第一类的打分13和第二类的打分-7之间的距离为20已经超过我们设置的间隔10，所以不需要惩罚，即这一部分计算得到的损失函数值为0；第一类与第三类的打分距离为2，小于设定的间隔10，所以计算得到损失函数为8。通过上诉例子我们发现损失函数就是用来描述我们对预测的不满意程度，如下图所示，如果预测到的真实类别的分数与错误类别的分数之间的距离都大于我们设置的阈值，则损失函数的值为0。

这种损失函数就称为hinge loss，因为$s_j=w_j^Tx_i$ ， $w_j$为矩阵W的第$j$行展成的列向量，所以公式(2)可以写为：

$$L_i=\sum _ {j\neq y_i}max(0,w_j^Tx_i-w_{y_i}^Tx_i+\Delta) (3)$$

正则化

上述损失函数用来约束预测打分和真实打分之间的区别，我们好需要一些参数来约束参数矩阵W值的大小，L2正则如下所示，会惩罚过大的参数值：

$$R(W)=\sum _ k \sum _ lW _ {k,l}^2$$

所以对整个数据集总的损失函数如下所示：

$$L= \frac {1}{N} \sum _ i \sum _ {j \neq y_i} [max(0, f(x_i;W) _ j -f(x_i;W) _ { y_i} +\Delta)]+\lambda\sum _ k \sum _ lW _ {k,l}^2 $$

梯度下降

对公式(2)的 $w_{y_i}$ 求导，可以得到：

$$\nabla _ {W _ {y _ i}}L_i =- (\sum _ {j \neq y_i}1(w_j^Tx_i-w _ {y_i}^Tx_i + \Delta >0))x_i$$

其中1为指示函数，当括号里的条件成立是函数值为1，否则为0。所以上述对正确类别所对应分类器权值的求导结果就是把错误类别的打分与正确类别打分间距小于阈值的个数再乘以输入数据$x_i$。

对 $j\neq y_i$ 的其他行，求导结果如下所示，也就是如果这一行所对应的滤波器打分相对于正确的类别分数间隔小于阈值，则对这一行求导所得就是 $x_i$

$$\nabla _ {W _ {j}}L_i =1(w_j^Tx_i-w _ {y_i}^Tx_i + \Delta >0)x_i$$

其中SVM的hinge loss以及梯度计算如下所示：

def svm_loss_vectorized(W, X, y, reg):

  loss = 0.0
  dW = np.zeros(W.shape) # initialize the gradient as zero

  N=X.shape[0]
  D=X.shape[1]

  scores = X.dot(W)
  correct_scores = scores[np.arange(N),y]
  margin = np.maximum(np.zeros(scores.shape),scores+1-correct_scores.reshape(N,-1))
  margin[np.arange(N),y] = 0
  loss = np.sum(margin)
  loss /= N
  loss += 0.5 * reg * np.sum(W * W)

  binary = margin
  binary[margin>0] = 1
  row_sum = np.sum(binary, axis=1)
  binary[np.arange(N), y] = -row_sum[np.arange(N)]
  dW = X.T.dot(binary)
  dW /= N
  dW += reg * W

  return loss, dW

Softmax分类器

分类器相对于SVM分类器来说，增加了一个计算概率的过程，SVM选择得分最大的一类输出，Softmax把所有的得分转换为每一类的概率，如下公式所示：
$$P(y_i|x_i;W)=\frac{e^{f _ {y_i}}}{\sum_je^{f_j}}$$
其中$f_j$为分类器对每一类的打分。

Softmax 分类器的损失函数为cross-entropy loss，如下所示，其实就是正确类别概率取对数再乘以-1。
$$L_i = -\log\left(\frac{e^{f_{y_i}}}{ \sum_j e^{f_j} }\right) \hspace{0.5in} \text{或等于} \hspace{0.5in} L_i = -f_{y_i} + \log\sum_j e^{f_j}$$

Softmax 和 SVM分类器的联系区别如下图所示：

cross-entropy loss求导

对$w_{y_i}$:
$$\nabla _ {W _ {y _ i}}L_i =-x_i+p _ {y_i}x_i$$

对$w_j(j \neq y_i)$:
$$\nabla _ {W _ {y _ i}}L_i =p _ {j}x_i$$
其中 $p_{j}$ 为Softmax分类器输出为第 $j$ 类的概率。

Softmax的cross-entropy loss以及梯度计算如下所示：

def softmax_loss_vectorized(W, X, y, reg):
  """
  Softmax loss function, vectorized version.

  Inputs and outputs are the same as softmax_loss_naive.
  """
  # Initialize the loss and gradient to zero.
  loss = 0.0
  dW = np.zeros_like(W)

  N, D = X.shape


  scores = X.dot(W) #(N,C)


  p = np.exp(scores.T)/np.sum(np.exp(scores.T),axis=0)

  p=p.T

  loss = -np.sum(np.log(p[np.arange(N), y]))

  p[np.arange(N), y] = p[np.arange(N), y]-1

  dW = X.T.dot(p)


  loss /=N
  dW /=N
  loss += 0.5 * reg * np.sum(W * W)
  dW += reg * W

  return loss, dW

随机梯度下降

在大数据集的训练中，计算所有数据的损失函数只更新一次参数是很浪费的行为。一个通常的做法是计算一批训练数据的梯度然后更新，能用这个方法的是基于所以训练数据都是相关的假设，每一批数据的梯度是所有数据的一个近似估计。

class LinearClassifier(object):

  def __init__(self):
    self.W = None

  def train(self, X, y, learning_rate=1e-3, reg=1e-5, num_iters=100,
            batch_size=200, verbose=False):

    num_train, dim = X.shape
    num_classes = np.max(y) + 1 # assume y takes values 0...K-1 where K is number of classes
    if self.W is None:
      # lazily initialize W
      self.W = 0.001 * np.random.randn(dim, num_classes)

    # Run stochastic gradient descent to optimize W
    loss_history = []
    for it in xrange(num_iters):
      X_batch = None
      y_batch = None

      index=np.random.choice(num_train,batch_size,replace=False)

      X_batch=X[index]
      y_batch = y[index]

      loss, grad = self.loss(X_batch, y_batch, reg)
      loss_history.append(loss)

      self.W -= learning_rate*grad

      if verbose and it % 100 == 0:
        print 'iteration %d / %d: loss %f' % (it, num_iters, loss)

    return loss_history

  def predict(self, X):
    y_pred = np.zeros(X.shape[1])
    scores = X.dot(self.W)

    y_pred = np.argmax(scores,axis=1)
    return y_pred

  def loss(self, X_batch, y_batch, reg):
    pass


class LinearSVM(LinearClassifier):

  def loss(self, X_batch, y_batch, reg):
    return svm_loss_vectorized(self.W, X_batch, y_batch, reg)

class Softmax(LinearClassifier):
  """ A subclass that uses the Softmax + Cross-entropy loss function """

  def loss(self, X_batch, y_batch, reg):
    return softmax_loss_vectorized(self.W, X_batch, y_batch, reg)

训练SVM：

svm = LinearSVM()
svm.train(X_train, y_train, learning_rate=1e-3, reg=1e0, num_iters=400,verbose=True)
y_train_pred = svm.predict(X_train)
acc_train = np.mean(y_train == y_train_pred)
y_val_pred = svm.predict(X_val)
acc_val = np.mean(y_val == y_val_pred)
y_test_pred = svm.predict(X_test)
acc_test = np.mean(y_test == y_test_pred)
print train_accuracy, val_accuracy, acc_test

iteration 0 / 400: loss 8.953550
iteration 100 / 400: loss 0.208299
iteration 200 / 400: loss 0.287735
iteration 300 / 400: loss 0.233046
0.940448569219 0.951388888889 0.952777777778

对线性分类器的直观解释

参数W的每一行可以理解为一个图像模板，每个类别的得分就是输入图片与每一行的图片模板内积的结果，看输出的图片最符合哪个图片模板，也就是最可能符合哪一类。也就是说通过训练，分类器的每一行学习到了每类图片的模板，如下图所示，线性SVM分类器学习到的每一类数字的模板图片。


w = svm.W[:-1,:] # strip out the bias
#print w
w = w.reshape(8, 8, 10)
w_min, w_max = np.min(w), np.max(w)
for i in xrange(10):
  plt.subplot(2, 5, i + 1)

  #Rescale the weights to be between 0 and 255
  wimg = 255.0 * (w[:, :, i].squeeze() - w_min) / (w_max - w_min)
  #wimg = w[:, :, i]
  plt.imshow(wimg.astype('uint8'))
  plt.axis('off')
  plt.title(classes[i])

sf = Softmax()
sf.train(X_train, y_train, learning_rate=1e-2, reg=0.5, num_iters=500,verbose=True)
y_train_pred = sf.predict(X_train)
acc_train = np.mean(y_train == y_train_pred)
y_val_pred = sf.predict(X_val)
acc_val = np.mean(y_val == y_val_pred)
y_test_pred = sf.predict(X_test)
acc_test = np.mean(y_test == y_test_pred)
print train_accuracy, val_accuracy, acc_test

iteration 0 / 500: loss 2.301230
iteration 100 / 500: loss 0.362278
iteration 200 / 500: loss 0.361298
iteration 300 / 500: loss 0.352944
iteration 400 / 500: loss 0.365702
0.940448569219 0.951388888889 0.95

w = sf.W[:-1,:] # strip out the bias
print w.shape
w = w.reshape(8, 8, 10)
w_min, w_max = np.min(w), np.max(w)
for i in xrange(10):
  plt.subplot(2, 5, i + 1)

  #Rescale the weights to be between 0 and 255
  wimg = 255.0 * (w[:, :, i].squeeze() - w_min) / (w_max - w_min)
  #wimg = w[:, :, i]
  plt.imshow(wimg.astype('uint8'))
  plt.axis('off')
  plt.title(classes[i])

(64, 10)

参考

CS231n: Convolutional Neural Networks for Visual Recognition.

Speed-up with Cython and Numpy in Python

2016-06-15T02:00:00.000Z

Cython代码和Python代码区别

代码运行在IPython-Notebook中，在IPython-Notebook中导入cython环境。

1	%load_ext cython

Cython可以在Python中掺杂C和C++的静态类型，cython编译器可以把Cython源码编译成C或C++代码，编译后的代码可以单独执行或者作为Python中的模型使用。Cython中的强大之处在于可以把Python和C结合起来，它使得看起来像Python语言的Cython代码有着和C相似的运行速度。

我们使用一个简单的Fibonacci函数来比较下Python和Cython的区别：

#python
def fib1(n):
    a,b=0.0,1.0
    for i in range(n):
        a,b=a+b,a
    return a

下面代码使用%%cython标志表示下面的代码使用cython编译

%%cython

def fib2(int n):
    cdef double a=0.0, b=1.0
    for i in range(n):
        a,b = a+b,a
    return a

通过比较上面的代码，为了把Python中的动态类型转换为Cython中的静态类型，我们用cdef来定义C语言中的变量i，a，b。
我们用C语言实现Fibonacci函数，然后通过Cython用Python封装，其中cfib.h为Fibonacci函数C语言实现，如下：

double cfib(int n) {
  int i;
  double a=0.0, b=1.0, tmp;
  for (i=0; i<n; ++i) {
    tmp = a; a = a + b; b = tmp;
  }
  return a;
}

%%cython

cdef extern from "/home/ldy/MEGA/python/cython/cfib.h":
    double cfib(int n)  
def fib3(n):
    """Returns the nth Fibonacci number."""
    return cfib(n)

比较不同方法的运行时间：


%timeit result=fib1(1000)

%timeit result=fib2(1000)

%timeit result=fib3(1000)

10000 loops, best of 3: 73.6 µs per loop
1000000 loops, best of 3: 1.94 µs per loop
1000000 loops, best of 3: 1.92 µs per loop

Cython代码的编译

Cython代码的编译为Python可调用模块的过程主要分为两步：第一步是cython编译器把Cython代码优化成C或C++代码；第二步是使用C或C++编译器编译产生的C或C++代码得到Python可调用的模块。

我们通过一个setup.py脚本来编译上面写的fib.pyxCython代码，如下所示，关键就在第三行，cythonize函数的作用是通过cython编译器把Cython代码转换为C代码，setup函数则是把产生的C代码转换成Python可调用模块。

from distutils.core import setup
from Cython.Build import cythonize
setup(ext_modules=cythonize('fib.pyx'))
#setup(ext_modules=cythonize('*.pyx','fib1.pyx'))也可以一次编译多个Cython文件

写好setup.py文件后，就可以通过下述命令执行编译：

1	python setup.py build_ext --inplace

执行后产生了fib.c代码以及fib.so文件，以及一些中间结果保存在build文件夹里。

import os
os.chdir('/home/ldy/MEGA/python/cython/test')
os.getcwd()
!ls

build  fib.c  fib.pyx  fib.so  setup.py

通过Python调用产出的fib.so模块：

1 2	import fib fib.fib2(90)

2.880067194370816e+18

Cython中类型的定义

为什么Cython和Python比会提高很多性能，主要原因有两点：一是Python是解释型语言，在运行之前Python解释器把Python代码解释成Python字节码运行在Python虚拟机上，Python虚拟机把Python字节码最终翻译成CPU能执行的机器码；而Cython代码是事先直接编译成可被Python调用的机器码，在运行时可直接执行。第二个主要的原因是Python是动态类型，Python解释器在解释时需要判断类型，然后再提取出底层能够运行的数据以及操作；然而C语言等比较底层的语言是静态类型，编译器直接提取数据进行操作产生机器码。

Cython中使用cdef来定义静态类型：

1
2
3

cdef int i
cdef int j
cdef float f

也可以一次定义多个：

cdef:
    int i
    int j
    float f

Cython中还允许在静态类型和动态类型同时存在及相互赋值：

%%cython
cdef int a=1,b=2,c=3
list_of_ints=[a,b,c]
list_of_ints.append(4)
a=list_of_ints[1]
print a,list_of_ints

2 [1, 2, 3, 4]

声明Python类型为静态类型，Cython支持把一些Python内置的如list,tuple,dict等类型声明为静态类型，这样声明使得它们能像正常Python类型一样使用，但是需要约束成只能是他们所申明的类型，不能随意变动。

%%cython
cdef:
    list names
    dict name_num

name_num={'jerry':1,'Tom':2,'Bell':3}
names=list(name_num.keys())
print names
other_names=names#动态类型可以从静态类型的Python对象初始化
del other_names[0]#因为引用了同一个list，所以都会删除第一个元素
print names,other_names
other_names=tuple(other_names)#names和other_names的区别在于names只能是list类型，
print other_names           #other_names可以引用任何类型

['Bell', 'jerry', 'Tom']
['jerry', 'Tom'] ['jerry', 'Tom']
('jerry', 'Tom')

Cython中numpy的使用

我们先构造一个函数来测试下使用纯Python时的运算时间来做对比，这个函数的作用是对一副输入图像求梯度（不必过分关注函数的功能，在这只是使用这个函数作为测试）。函数的输入数据是indata一个像素为1400*1600的图片；输出为outdata,为每个像素梯度值，下面是这个函数的纯Python实现：

import numpy as np
indata = np.random.rand(1400,1600)
outdata = np.zeros(shape=indata.shape, dtype='float64')  # eventually holds our output
from numpy.lib import pad
print("shape before", indata.shape)
indata = pad(indata, (1, 1), 'reflect', reflect_type='odd')  # allow edge calcs
print("shape after", indata.shape)

import math
def slope(indata, outdata):
    I = outdata.shape[0]
    J = outdata.shape[1]
    for i in range(I):
        for j in range(J):
            # percent slope using Zevenbergen-Thorne method
            # assume edges added, inarr is offset by one on both axes cmp to outarr
            dzdx = (indata[i+1, j] - indata[i+1, j+2]) / 2  # assume cellsize == one unit, otherwise (2 * cellsize)
            dzdy = (indata[i, j+1] - indata[i+2, j+1]) / 2
            slp = math.sqrt((dzdx * dzdx) + (dzdy * dzdy)) * 100  # percent slope (take math.atan to get angle)
            outdata[i, j] = slp

('shape before', (1400, 1600))
('shape after', (1402, 1602))

测试运行时间，为5.31 s每个循环

1	%timeit slope(indata, outdata)

1 loop, best of 3: 5.31 s per loop

重置输出：

def reset_outdata():
    outdata = np.zeros(shape=indata.shape, dtype='float64')

reset_outdata()

使用Cython重写求图像梯度函数,其中函数slope_cython2使用Cython里的numpy类型，并重写了里面的开方函数，其中%%cython -a表示使用cython编译Cython代码，并可以对照显示编译器把Cython代码编译成的C代码。

%%cython
import cython
cimport numpy as np
ctypedef np.float64_t DTYPE_t
@cython.boundscheck(False)
def slope_cython2(np.ndarray[DTYPE_t, ndim=2] indata, np.ndarray[DTYPE_t, ndim=2] outdata):
    cdef int I, J
    cdef int i, j, x
    cdef double k, slp, dzdx, dzdy
    I = outdata.shape[0]
    J = outdata.shape[1]
    for i in range(I):
        for j in range(J):
            dzdx = (indata[i+1, j] - indata[i+1, j+2]) / 2
            dzdy = (indata[i, j+1] - indata[i+2, j+1]) / 2
            k = (dzdx * dzdx) + (dzdy * dzdy)
            slp = k**0.5 * 100
            outdata[i, j] = slp

测试运行时间：208ms,快了有25倍左右

1	%timeit slope_cython2(indata, outdata)

1 loop, best of 3: 208 ms per loop

Cython中多进程

Cython还支持并行运算,后台由OpenMP支持，所以在编译Cython语言时需要加上如下代码第一行所示的标记。在进行并行计算时，需使用nogil关键词来释放Python里的GIL锁,当代码中只有C而没有Python对象时，这样做是安全的。

%%cython --compile-args=-fopenmp --link-args=-fopenmp --force

import cython
from cython.parallel import prange, parallel

@cython.boundscheck(False)
def slope_cython_openmp(double [:, :] indata, double [:, :] outdata):
    cdef int I, J
    cdef int i, j, x
    cdef double k, slp, dzdx, dzdy
    I = outdata.shape[0]
    J = outdata.shape[1]
    with nogil, parallel(num_threads=4):
        for i in prange(I, schedule='dynamic'):
            for j in range(J):
                dzdx = (indata[i+1, j] - indata[i+1, j+2]) / 2
                dzdy = (indata[i, j+1] - indata[i+2, j+1]) / 2
                k = (dzdx * dzdx) + (dzdy * dzdy)
                slp = k**0.5 * 100
                outdata[i, j] = slp

1 2	reset_outdata() %timeit slope_cython_openmp(indata, outdata)

10 loops, best of 3: 78.2 ms per loop

测试的时间如上所示，多进程大概快了2.7倍左右。

Classification in Remote Sensing Optical Images by CNNs

2016-06-12T02:00:00.000Z

CNN简介

从06年开始，深度结构学习方法（深度学习或者分层学习方法）作为机器学习领域的新的研究方向出现。由于芯片处理性能的巨大提升，数据爆炸性增长，在过去的短短几年时间，深度学习技术得到快速发展，已经深深的影响了学术领域，其研究涉及的应用领域包括计算机视觉、语音识别、对话语音识别、图像特征编码、语意表达分类、自然语言理解、手写识别、音频处理、信息检索、机器人学。

由于深度学习在众多领域表现比较好的性能，越来越多的学术机构把目光投入深度学习领域。今年来活跃在机器学习领域的研究机构包括众多高校比如斯坦福，伯克利，还有一些企业例如Google，IBM 研究院，微软研究院，FaceBook，百度等等。

神经网络

人工神经网络是一种模仿生物神经网络(动物的中枢神经系统，特别是大脑)的结构和功能的数学模型或计算模型，简单结构如下图所示，包含了输入层，隐含层，和输出层，其中隐含层可能有多层。在神经网络中每个神经元都和它前一层的所有节点相连接，称之为全连接，其中每个连接以一定的权值相连接，网络训练的过程就是得到权值的过程。

不管是机器学习还是深度学习实际上都是在解决分类问题，当数据线性可分时，一个sigmoid函数就可以把数据分开，如下图所示，其中两类数据是线性可分的，我们只需要神经网络的输入和输出层就可以把两类数据分开，其中黄色的连线表示权值为负，蓝色的连线表示权值为正，连线的粗细表示权值的绝对值大小。

如下图，当原本的数据不可分时，我们就需要对数据进行一些非线性的变化，使得数据可分，而神经网络中的隐含层的作用就是对线性不可分的数据进行非线性变化，下图中包含了4个隐含层节点，数据被正确的分类。

卷积神经网络(CNN)

一维的CNN如下所示，和人工神经网络相比，CNN中的卷积层只与前一层节点的部分节点相连，称为局部连接，且卷积层中的每个神经元的权值相等，这一属性称为权值共享。卷积神经网络为什么有卷积两个字，就是因为这两个属性：局部连接，权值相等，具体原因可参考http://colah.github.io/posts/2014-07-Understanding-Convolutions/。下图中的max层成为池化层(pooling),下图为max pooling ，就是对两个神经元的输出取其中的较大值。池化操作能够降低特征的维度(相比使用所有提取得到的特征)，同时还会改善结果(不容易过拟合)，池化单元也具有一定的平移不变性。下图中的B层为第二层卷积层卷积层，F层为全连接层，也就是上面所说的人工神经网络。

二维卷积神经网络如下所示，二维数据的输入可以看成是一张图像的每个像素值，卷积层看做是一个滤波器对图像提取特征，max pooling层相当于对图像进行更高维的抽象，然后后面连接全连接层(也就是传统的人工神经网络)进行分类。所以总的说来，利用CNN进行图像处理就是前面的卷积层对图像进行特征提取，经过学习提取出利于图像分类的特征，然后对提取出的特征利用人工神经网络进行分类。

训练：上面说到了网络训练的过程就是得到权值的过程，我们在开始训练之前网络的权值是随机初始化的，也就是我们的图片滤波器是随机初始化的。比如我们输入一张图片，随机初始化的CNN分类告诉我们有6%的可能是一个网球场，但实际上我们告诉CNN这是一个飞机场，然后其中会有一个反向传播的处理过程来稍稍改变滤波器的参数以便它下次碰到相同的图片时会更可能的预测为网球场。然后我们对我们的训练数据重复这一过程，CNN里的滤波器会逐渐的调整到能够提取我们图片里的供我们分类的重要特征。

数据集分析

UC Merced Land Use数据集包含21类土地类型，每类图像为100张，每张图像的像素为256*256。

数据集特点，数据集比较小，每一类只有100张图片，这个数据集还有其他的一些特点比如类间距离小，如下图所示，不同类的图片之间很相似。

类内距离大，同类图片之间差别较大，如下图所示：

这些特点都是不利于图片的分类的，尤其是数据量太小，如果从头开始用数据集来训练网络肯定会造成严重的过拟合。考虑到这种情况，一个解决方法就是使用训练好的网络进行微调以适应我们自己的数据集，这种方法不仅能解决数据集小的问题，也能大大加快训练的速度。

网络微调

网络微调就是使用事先已经训练好的网络，对网络进行微小的改造再训练以适用与我们自己的数据库。为什么别人训练好的网络，我们自己拿到改改就能使用呢？就像之前所说的，CNN的卷积层是用来提取图像的特征的，事实上图片的线条一级色彩纹理大致上是一样的，也就是说一个训练好CNN网络的卷积层也可以用来提取其他数据集图像的特征，因为图像的特征基本相似。特别的，能够使用网络微调的一个重要因素是使用的事先训练好的网络使用的数据集要和我们自己的训练集图像之间的‘距离’要比较小。因为我们的数据集是光学遥感图像，所以和我们的光学图像在底层上的特征有非常强的相似性。

下图是Imagenet数据集的部分图片，也是我们要使用的预先训练好的所用网络的数据集。

基于遥感SAR图像每个像素级别的统计特性，这种用光学图像训练好的网络微调的方法是不适用与SAR图像分类的。SAR图像如下所示，直观上看也与光学图像差别很大。

我们选择Caffe里预先使用Imagnet训练好的CaffeNet网络来经行微调，CaffeNet网络结构如下所示，fc6前为CNN中的卷积层用来提取图像特征，f6、fc7、fc8为全连接层(可以看成是人工神经网络的输入层，隐含层，输出层)，因为CaffeNet网络是用来分类1000类的图像的，所以最后一层有1000个神经元。

而我们的数据集是分开21类的图像，所以微调网络中的调整主要就体现在这里，修改上述网络以使用我们自己的数据集，如下所示，只要把网络的输出层改为21个神经元即可。

我们说的要使用要使用预先训练好的网络就是要使用它事先训练好的权值，比较上述两个网络，只有最后一层不同，所以它们的其他层的权值的维数都是相同的，所以我们把CaffeNet训练好的权值直接用在我们自己定义的网络上，最后一层的权值则随机初始化并设置较大的学习速率，然后就可以用我们定义好的网络训练我们自己的数据集。

定义好网络之后就可以开始训练了，把数据集按4:1分为训练集和测试集，在测试集上的预测准确率在92%左右。

还有一种常用的方法是不用CNN的最后一层分类，用CNN提取到的特征用SVM来分类，也能达到不错的效果。在这里我们提取fc7层输出的特征，根据上面定义的网络结构，fc7层共有4096个神经元，所以每张图片的特征维数为4096维，维数比较大，所以我们使用SVM的线性核即可达到分类效果。

结果展示与分析

fine-turning结果展示：
其中对预测结果做了一些可视化展示，左图表示为预测前五类的概率，左右为图片真实的类别。

CNN提取特征，SVM分类结果展示：

从两种方法中可以看出，虽然都分类正确了，但用SVM作为分类器的正确分类的概率更高。

fine-turning方法每个类别的准确率：

从上图中我们可以看出，tenniscount类别的预测准确率最低，我们来看看有哪些tenniscount类是预测错了的：

从上面两个图片中可以看出，其实并不能说是预测错误，因为上面两张图中既包含了tenniscount类和CNN预测的类别，可以说本来就是有两个类。

t-sne特征降维可视化

对CNN中第七层提取到的4096维特征经行降维可视化，从下图可以看出，分类准备率比较低的类别靠的都比较紧密，难以区分。

CNN+SVM每个类别的准确率：

CNN中间层可视化

神经网络不仅仅是一个黑盒子，我们可以查看一些中间结果和参数。上面我们也说了一个卷积层就相当与一个图像滤波器，在上面的网络的第一层的卷积层中我们定义了96个滤波器，96个滤波器可视化如下图所示，学过图像处理的同学都知道，下图中第一个滤波器是提取斜向下的边缘特征，第二个滤波器是提取斜向上的边缘特征，前面的滤波器大多数是在提取边缘特征，后面的大多是在统计颜色特征。

我们输入一张图片，并输出其经过第一层卷积层滤波器滤波后的输出：

从第一层滤波后的结果可以看出，前面两个滤波器就是在显示斜向下和斜向上的边缘。

第五层卷积层滤波器输出如下图所示，高层的滤波器输出比较抽象。

总结及展望

当我们数据集不够的时候可以使用微调的方法，探索CNN怎么应用于SAR图像分类，解决图片类标签的分类问题。

代码地址

land_use_CNN

参考

http://vision.ucmerced.edu/datasets/landuse.html
http://ufldl.stanford.edu/wiki/index.php/%E6%B1%A0%E5%8C%96
http://colah.github.io/posts/2014-07-Conv-Nets-Modular/
Tinker With a Neural Network Right Here in Your Browser

Force-Directed Graph Visualization Based in Location

2016-06-11T02:00:00.000Z

任务介绍

图是表现社交网络、知识图谱等关系的主要形式，对图的节点进行布局是图可视化的重要内容。然而，现有方法大多在布局时没有考虑节点地理位置对布局的约束。比如在POI点评应用中，我们希望一个“餐厅”节点出现在它实际的地理位置上，或者在热点事件应用中，希望“北京”节点出现在“上海”节点的北方（上方）。在布局中加入地理位置约束，能够使图的可视化结果更好的与位置关联，包含地理信息相关隐喻，在增加其承载信息量的同时，更好的辅助地理空间数据的可视分析。
任务1：调研图可视化中节点布局相关方法，特别是力引导方法和二分图布局方法，形成小综述；
任务2：将二分图中一类节点加入绝对地理位置或彼此间相对位置不变作为约束条件，改进一种基于力引导布局的二分图可视化方法，给出模型、公式、算法流程描述；
任务3：基于给定数据集（两类节点，一类节点包含地理坐标），选择一种可视化工具（如VTK、D3等），对上述改进算法进行实现。

数据集形式如下所示：

文件PlaceTolation.txt内容如下，分别为地名和经纬度

地名	纬度,经度
北京	39.90,116.40
北京市	39.90,116.40
北京站	39.90,116.40
北京路	39.90,116.40
天安门	39.90,116.38
崇文	39.88,116.43
崇文区	39.88,116.43
......

文件TitlePlace.txt内容如下,分别为序号,新闻标题和从该新闻中抽取出来的地名实体

1	落马高官忏悔：从未感觉到还有党组织存在	中国
2	佩帅：442阵型没问题对方进球很无解不怪门将	利物浦,切尔西
3	今日数据趣谈：半场20+命中率8成5小加变大加	北京,德安,奎尔,孟菲斯
4	工业领域控煤计划将出台：2020年力争节煤1.6亿吨	北京,河北,山西
5	公交乘客与司机扭打发生车祸致1人重伤(图)	呼和浩特,呼和浩特市,青城,内蒙古,赛罕区,青洲
6	深圳机场行人围观飞机起降被撞倒已致5死24伤	深圳
7	云南临沧发生3.5级地震震源深度14千米	中国,云南省,临沧市,沧源佤族自治县,云南
......

需要构建的二分图中两类节点分别为新闻标题和地名，节点间的关系为标题和地名的映射关系（多对多的），其中地名节点具有经纬度属性。

数据清洗

从数据中可以看出，有很多地名是重复的，比如北京其实和北京市是同一个意思，还有什么天安门，崇文区都是属于北京的，从经纬度上来看，应该把他们都归为一类，不然在地图上也不好显示，都是相聚很短的重合的点，基于以上考虑，我们可以根据经纬度把每个地点替换为其的所属的省或直辖市的名称。

要想判读每个地名所属的省市，那我们就需要每个省市的经纬度范围，在网上找到的中国地图的JSON文件,其中包含了每个省边界的经纬度值，为一系列的点，判断某个地点属于哪一个省实际上就是根据地点的经纬度判断这一点是否在某所有省边界点围成的多边形里，也就是一个Point in Polygon问题。

Python matplotlib包中的Path提供了相应的函数：

1
2
3

import matplotlib.path as mplPath
bbPath = mplPath.Path(np.array([[0,0],[1,0],[1,1],[0,1]]))
bbPath.contains_point((0.5, 0.5))

力导向图的制作

力导向图中每一个节点都受到力的作用而运动，这种是一种非常绚丽的图表。

力导向图（Force-Directed Graph），是绘图的一种算法。在二维或三维空间里配置节点，节点之间用线连接，称为连线。各连线的长度几乎相等，且尽可能不相交。节点和连线都被施加了力的作用，力是根据节点和连线的相对位置计算的。根据力的作用，来计算节点和连线的运动轨迹，并不断降低它们的能量，最终达到一种能量很低的安定状态。力导向图能表示节点之间的多对多的关系。

d3.layout.force()包含了力导向算法的实现，其主要参数为：

d3.layout.force - 使用物理模拟排放链接节点的位置。
force.alpha - 取得或者设置力布局的冷却参数。
force.chargeDistance - 取得或者设置最大电荷距离。
force.charge - 取得或者设置电荷强度。
force.drag - 给节点绑定拖动行为。
force.friction - 取得或者设置摩擦系数。
force.gravity - 取得或者设置重力强度。
force.linkDistance - 取得或者设置链接距离。
force.linkStrength - 取得或者设置链接强度。
force.links - 取得或者设置节点间的链接数组。
force.nodes - 取得或者设置布局的节点数组。
force.on - 监听在计算布局位置时的更新。
force.resume - 重新加热冷却参数，并重启模拟。
force.size - 取得或者设置布局大小。
force.start - 当节点变化时启动或者重启模拟。
force.stop - 立即停止模拟。
force.theta - 取得或者设置电荷作用的精度。
force.tick - 运行布局模拟的一步。

关于d3.layout.force()的使用可参考力导向图的制作

具体实现

结合我们题目的实际要求，我们有两类节点：一类是地点节点，其位置要求固定；一类是新闻节点，其位置根据力导向算法计算得到，所以节点定义如下。

var nodes = [
              {name:"青海",x:青海[0],y:青海[1],fixed:true,"group":1},
              {name:"河南",x:河南[0],y:河南[1],fixed:true,"group":1},
              {name:"山东",x:山东[0],y:山东[1],fixed:true,"group":1},
              .
              .
              .

              {name:"从WCBA争冠到无缘新赛季浙江女篮怎么了",fixed:false,"group":2},
              {name:"成都的哥:专车司机玩着跑半个月超过我月收入",fixed:false,"group":2},
              {name:"部分农村教师月薪不到2千暑假当小工补贴家用",fixed:false,"group":2}
                ];

其中第一类节点为固定地点节点，第二类节点为新闻节点，使用力导向算法计算节点的位置。所以我们需要提供地点节点的位置，在定义节点之前，加上地点经纬度：

var 青海 =[96.5122866869,35.12781926];
var 河南 =[114.130772484,34.00715756];
var 山东 =[118.354817653,36.2612648184];
.
.
.

接下来是连线之间的定义，某一新闻里包含哪几个地点，则这几个地点就和这个新闻之间连一条线，其中0表示上面定义的第一个节点,185表示第186个节点。

var edges = [
                {source:0,target:185},
                {source:0,target:204},
                {source:0,target:389},
                {source:0,target:430},
                {source:0,target:494},
                {source:1,target:42},
                .
                .
                .
              ]

定义好数据之后，就可以开始布局了

定义一个力导向图的布局如下。

var force = d3.layout.force()
      .nodes(nodes) //指定节点数组
      .links(edges) //指定连线数组
      .size([width,height]) //指定作用域范围
      .linkDistance(150) //指定连线长度
      .charge([-400]); //相互之间的作用力

然后，使力学作用生效：

1	force.start(); //开始作用

可视化

力学作业生效以后，新闻节点的坐标地址就会产生，根据产生的新闻坐标地址就可以绘制出整个可视化图。

分别绘制三种图形元素：

line，线段，表示连线。
circle，圆，表示节点。
text，文字，描述节点。

代码如下：

//添加连线
 var svg_edges = svg.selectAll("line")
     .data(edges)
     .enter()
     .append("line")
     .style("stroke","#ccc")
     .style("stroke-width",1);

 var color = d3.scale.category20();

 //添加节点
 var svg_nodes = svg.selectAll("circle")
     .data(nodes)
     .enter()
     .append("circle")
     .attr("r",20)
     .style("fill",function(d,i){
         return color(i);
     })
     .call(force.drag);  //使得节点能够拖动

 //添加描述节点的文字
 var svg_texts = svg.selectAll("text")
     .data(nodes)
     .enter()
     .append("text")
     .style("fill", "black")
     .attr("dx", 20)
     .attr("dy", 8)
     .text(function(d){
        return d.name;
     });

调用 call( force.drag ) 后节点可被拖动。force.drag() 是一个函数，将其作为 call() 的参数，相当于将当前选择的元素传到 force.drag() 函数中。

结果展示

可视化结果如下所示，在线演示地址:http://buptldy.github.io/DEMO/news_map.html

Basic Sorting Algorithms Implemented In Python

2016-05-09T04:00:00.000Z

冒泡排序

冒泡排序比较简单，主要过程如下：

比较相邻的元素。如果第一个比第二个大，就交换他们两个。
对每一对相邻元素作同样的工作，从开始第一对到结尾的最后一对。这步做完后，最后的元素会是最大的数。
针对所有的元素重复以上的步骤，除了最后一个。
持续每次对越来越少的元素重复上面的步骤，直到没有任何一对数字需要比较。

def BubbleSort(array):
    for i in xrange(len(array)):
        for j in xrange(len(array)-1):
            if array[j]>array[j+1]:
                array[j],array[j+1]=array[j+1],array[j]
    return array

选择排序

选择排序（Selection sort）是一种简单直观的排序算法。它的工作原理如下。首先在未排序序列中找到最小（大）元素，存放到排序序列的起始位置，然后，再从剩余未排序元素中继续寻找最小（大）元素，然后放到已排序序列的末尾。以此类推，直到所有元素均排序完毕。

def SelectionSort(array):
    for i in xrange(len(array)):
        min_index=i
        for j in xrange(i+1,len(array)):
            if array[j]<array[min_index]:
                min_index=j
        array[i],array[min_index]=array[min_index],array[i]
    return array

插入排序

插入排序（英语：Insertion Sort）是一种简单直观的排序算法。它的工作原理是通过构建有序序列，对于未排序数据，在已排序序列中从后向前扫描，找到相应位置并插入。插入排序在实现上，通常采用in-place排序（即只需用到O(1)的额外空间的排序），因而在从后向前扫描过程中，需要反复把已排序元素逐步向后挪位，为最新元素提供插入空间。

def InsertionSort(array):
    for i in xrange(1,len(array)):
        temp=array[i]
        for j in xrange(i,-1,-1):
            if temp>array[j-1]:
                break
            else:
                array[j]=array[j-1]
        array[j]=temp
    return array

归并排序

归并排序（英语：Merge sort，或mergesort），是创建在归并操作上的一种有效的排序算法，效率为O(n log n)。1945年由约翰·冯·诺伊曼首次提出。该算法是采用分治法（Divide and Conquer）的一个非常典型的应用，且各层分治递归可以同时进行。

有关归并排序中的详细内容可以参考分治策略中的归并排序

def MergeSort(array):
    n=len(array)
    if n<=1:
        return array
    else:
        n=n/2
        left=MergeSort(array[0:n])
        right=MergeSort(array[n:])
        return Merge(left,right)

def Merge(left,right):
    array=[]
    while len(left)>0 and len(right)>0:
        if left[0]<right[0]:
            array.append(left[0])
            del left[0]
        else:
            array.append(right[0])
            del right[0]
    if len(left)>0:
        array.extend(left)
    if len(right)>0:
        array.extend(right)
    return array

快速排序

快速排序使用分治法（Divide and conquer）策略来把一个序列（list）分为两个子序列（sub-lists）。

步骤为：

从数列中挑出一个元素，称为”基准”（pivot），重新排序数列，所有元素比基准值小的摆放在基准前面，所有元素比基准值大的摆在基准的后面（相同的数可以到任一边）。在这个分区结束之后，该基准就处于数列的中间位置。这个称为分区（partition）操作。
递归地（recursive）把小于基准值元素的子数列和大于基准值元素的子数列排序。
递归的最底部情形，是数列的大小是零或一，也就是永远都已经被排序好了。虽然一直递归下去，但是这个算法总会结束，因为在每次的迭代（iteration）中，它至少会把一个元素摆到它最后的位置去。

def QuickSort(array):
    if len(array)<=1:
        return array
    pivot=array[0]
    left=[x for x in array[1:]if x<pivot ]
    right=[x for x in array[1:] if x>=pivot]
    return QuickSort(left)+[pivot]+QuickSort(right)

堆排序

在堆的数据结构中，堆中的最大值总是位于根节点。堆中定义以下几种操作：

最大堆调整（Max_Heapify）：将堆的末端子节点作调整，使得子节点永远小于父节点
创建最大堆（Build_Max_Heap）：将堆所有数据重新排序
堆排序（HeapSort）：移除位在第一个数据的根节点，并做最大堆调整的递归运算

堆排序可以参考这篇博文：[http://www.cnblogs.com/cj723/archive/2011/04/22/2024269.html]（http://www.cnblogs.com/cj723/archive/2011/04/22/2024269.html）


def heap_sort(array):

def sift_down(start, end):
"""最大堆调整"""
root = start
while True:
    child = 2 * root + 1    #左子节点
    if child > end:         #如果没有子节点退出
        break
    if child + 1 <= end and array[child] < array[child + 1]: #如果左子节点值小于右子节点
        child += 1                             #下标由左子节点更换为右子节点
    if array[root] < array[child]:             #如果父节点小与子节点，则值相互交换
        array[root], array[child] = array[child], array[root]
        root = child                           #对发生变化的子节点向下递归，重复上述过程
    else:
        break

# 创建最大堆
for start in xrange((len(array) - 2) // 2, -1, -1):#从最后一个非叶子节点开始构造最大堆
sift_down(start, len(array) - 1)

# 堆排序
for end in xrange(len(array) - 1, 0, -1):
array[0], array[end] = array[end], array[0] #把最大值放在最后
sift_down(0, end - 1)                      #除最大值之外的继续构造最大堆
return array

Implementing a Singly Linked List in Python

2016-05-09T03:00:00.000Z

链表中最简单的一种是单向链表，它包含两个域，一个信息域和一个指针域。这个链接指向列表中的下一个节点，而最后一个节点则指向一个空值。一个单向链表的节点被分成两个部分。第一个部分保存或者显示关于节点的信息，第二个部分存储下一个节点的地址。单向链表只可向一个方向遍历。

链表节点类的实现

class Node:
    def __init__(self,initdata):
        self.data = initdata
        self.next = None

    def getData(self):
        return self.data

    def getNext(self):
        return self.next

    def setData(self,newdata):
        self.data = newdata

    def setNext(self,newnext):
        self.next = newnext

生成一个节点对象：

1
2
3

>>> temp = Node(93)
>>> temp.getData()
93

结构如下图所示：

链表类的实现

class UnorderedList:

    def __init__(self):
        self.head = None

新建一个链表对象：

1	>>> mylist = UnorderedList()

往链表前端中加入节点

def add(self,item):
    temp = Node(item)
    temp.setNext(self.head)
    self.head = temp

>>> mylist.add(31)
>>> mylist.add(77)
>>> mylist.add(17)
>>> mylist.add(93)
>>> mylist.add(26)
>>> mylist.add(54)

现在链表结构如下图所示：

在链表尾端添加节点

def append(self,item):
    temp=Node(item)
    if self.head == None:
        self.head=item
    else:
        current=self.head
        while current.getNext()!=None:
            current=current.getNext
        current.setNext(temp)

链表的长度计算

def size(self):
    count=0
    current=self.head
    while current.getNext !=None:
        count=count+1
        current=current.getNext

计算过程如下图所示：

寻找是否存在某一节点

def serch(self,item):
    current=self.head
    while current.getNext()!=None:
        if current.getData==item:
            return True
        else:
            current=current.getNext()

    return False

删除某一节点

def remove(self,item):
    current=self.head
    pre=None
    while current!=None:
        if current.getData()==item:
            if not pre:
                self.head=current.getNext()
            else:
                pre.setNext(current.getNext())
            break
        else:
            pre=current
            current=current.getNext()

链表反转

def rev(self):
    pre=None
    current=self.head
    while current!=None:
        next=current.getNext()
        current.setNext=pre
        pre=current
        curren=next
    return pre

链表成对调换

例如：1->2->3->4转换成2->1->4->3

def pairswap(self):
    curren=self.head
    while curren!=None and curren.getNext().getNext()!=None:
        temp=curren.getData()
        curren.setData(curren.getNext().getData())
        curren.getNext().setData(temp)
        curren=curren.getNext().getNext()

Python Binary Search Tree implementation

2016-05-09T02:00:00.000Z

二叉查找树（英语：Binary Search Tree），也称二叉搜索树、有序二叉树（英语：ordered binary tree），排序二叉树（英语：sorted binary tree），是指一棵空树或者具有下列性质的二叉树：

- 任意节点的左子树不空，则左子树上所有结点的值均小于它的根结点的值；
- 任意节点的右子树不空，则右子树上所有结点的值均大于它的根结点的值；
- 任意节点的左、右子树也分别为二叉查找树；
- 没有键值相等的节点。
如下所示为一棵二叉查找树：

定义节点类

二叉树的每个节点有三个属性:

左节点
右节点
节点值

所以用Python定义一个节点类为：

class Node:
    def __init__(self, data,left=None,right=None):
        self.left = left
        self.right = right
        self.data = data

现在来创建一个根节点为8的树：

1	root=Node(8)

如下图所示：

插入节点

比较要插入数据和根节点的大小，递归的调用插入方法

class Node:
    ...
    def insert(self, data):
        if self.data:#如果存在根节点
            if data < self.data:
                if self.left is None:
                    self.left = Node(data)
                else:
                    self.left.insert(data)
            elif data > self.data:
                if self.right is None:
                    self.right = Node(data)
                else:
                    self.right.insert(data)
        else:
            self.data = data

现在来插入三个节点：

1
2
3

root.insert(3)
root.insert(10)
root.insert(1)

现在的二叉树如下所示：

继续增加一些节点，让二叉树看起来更完整：

root.insert(6)
root.insert(4)
root.insert(7)
root.insert(14)
root.insert(13)

二叉查找树的查找

class Node:
    ...
    def lookup(self, data, parent=None):
        if data < self.data:
            if self.left is None:
                return None, None
            return self.left.lookup(data, self)
        elif data > self.data:
            if self.right is None:
                return None, None
            return self.right.lookup(data, self)
        else:
            return self, parent

查找是否存在节点6，并返回这个节点和其父节点：

1	node, parent = root.lookup(6)

其中查找的过程如下所示：

删除节点

在删除节点时，首先得统计节点孩子的个数：

class Node:
    ...
    def children_count(self):
        cnt = 0
        if self.left:
            cnt += 1
        if self.right:
            cnt += 1
        return cnt

删除节点，分三种情况：

要删除的节点没有孩子节点
要删除的节点有一个孩子节点
要删除的节点有两个孩子节点

class Node:
    ...
    def delete(self, data):
        node, parent = self.lookup(data)
        if node is not None:
            children_count = node.children_count()
                if children_count == 0:
                    # if node has no children, just remove it
                    if parent:
                        if parent.left is node:
                            parent.left = None
                        else:
                            parent.right = None
                        del node
                    else:
                        self.data = None
                elif children_count == 1:
                      # if node has 1 child
                      # replace node with its child
                    if node.left:
                        n = node.left
                    else:
                        n = node.right
                    if parent:
                        if parent.left is node:
                            parent.left = n
                        else:
                            parent.right = n
                        del node
                    else:
                        self.left = n.left
                        self.right = n.right
                        self.data = n.data
                else:
                    # if node has 2 children
                    # find its successor
                    parent = node
                    successor = node.right
                    while successor.left:
                        parent = successor
                        successor = successor.left
                    # replace node data by its successor data
                    node.data = successor.data
                    # fix successor's parent's child
                    if parent.left == successor:
                        parent.left = successor.right
                    else:
                        parent.right = successor.right

打印二叉树

按照中序打印二叉树，前序和后序只需要修改打印的顺序就行。

class Node:
    ...
    def print_tree(self):
        """
        Print tree content inorder
        """
        if self.left:
            self.left.print_tree()
        print self.data,
        if self.right:
            self.right.print_tree()

按层次打印一个树：

class Node:
    ...
    def print_each_level(self):
      # Start off with root node
      thislevel = [self]

      # While there is another level
      while thislevel:
        nextlevel = list()
        #Print all the nodes in the current level, and   store the next level in a list
        for node in thislevel:
          print node.data
          if node.left: nextlevel.append(node.left)
          if node.right: nextlevel.append(node.right)
          print
          thislevel = nextlevel

比较两棵树

class Node:
    ...
    def compare_trees(self, node):
        if node is None:
            return False
        if self.data != node.data:
            return False
        res = True
        if self.left is None:
            if node.left:
                return False
        else:
            res = self.left.compare_trees(node.left)
        if res is False:
            return False
        if self.right is None:
            if node.right:
                return False
        else:
            res = self.right.compare_trees(node.right)
        return res

二叉树的重建

根据前序遍历和中序遍历来重建树，重建的原理可以参看这篇博文根据二叉树的前序和中序求后序:

def rebuilt(preorder,inorder):
    if preorder=='' or inorder=='':
        return None
    root=preorder[0]
    index=inorder.index(root)
    return Node(root,
                rebuilt(preorder[1:1+index],inorder[0:index]),
                rebuilt(preorder[index+1:],inorder[index+1:]))

根据中序和后序来重建树：

def rebuilt1(inorder,postorder):
    if postorder=='' or inorder=='':
        return None
    root=postorder[-1]
    index=inorder.index(root)
    return Node(root,
                rebuilt1(inorder[0:index],postorder[0:index]),
                rebuilt1(inorder[index+1:],postorder[index:-1]))

参考

二叉搜索树
 Binary Search Tree library in Python

Learning with Caffe in Python

2016-05-05T02:00:00.000Z

在这个例子中，我们开始尝试通过Python调用Solver接口来训练一个网络。

环境设置

1 2	from pylab import * %matplotlib inline

caffe_root = '/home/ldy/workspace/caffe/'  # this file should be run from {caffe_root}/examples (otherwise change this line)

import sys
sys.path.insert(0, caffe_root + 'python')
import caffe

下载训练用的数据，并导入lmdb

# run scripts from caffe root
import os
os.chdir(caffe_root)
# Download data
!data/mnist/get_mnist.sh
# Prepare data
!examples/mnist/create_mnist.sh
# back to examples
os.chdir('examples')

Downloading...
Creating lmdb...
I0505 20:49:32.535013 18388 db_lmdb.cpp:35] Opened lmdb examples/mnist/mnist_train_lmdb
I0505 20:49:32.535306 18388 convert_mnist_data.cpp:88] A total of 60000 items.
I0505 20:49:32.535323 18388 convert_mnist_data.cpp:89] Rows: 28 Cols: 28
I0505 20:49:32.547651 18388 db_lmdb.cpp:101] Doubling LMDB map size to 2MB ...
I0505 20:49:32.556696 18388 db_lmdb.cpp:101] Doubling LMDB map size to 4MB ...
I0505 20:49:32.578054 18388 db_lmdb.cpp:101] Doubling LMDB map size to 8MB ...
I0505 20:49:32.627709 18388 db_lmdb.cpp:101] Doubling LMDB map size to 16MB ...
I0505 20:49:32.718138 18388 db_lmdb.cpp:101] Doubling LMDB map size to 32MB ...
I0505 20:49:32.960189 18388 db_lmdb.cpp:101] Doubling LMDB map size to 64MB ...
I0505 20:49:33.271764 18388 convert_mnist_data.cpp:108] Processed 60000 files.
I0505 20:49:33.403015 18390 db_lmdb.cpp:35] Opened lmdb examples/mnist/mnist_test_lmdb
I0505 20:49:33.403692 18390 convert_mnist_data.cpp:88] A total of 10000 items.
I0505 20:49:33.403733 18390 convert_mnist_data.cpp:89] Rows: 28 Cols: 28
I0505 20:49:33.423638 18390 db_lmdb.cpp:101] Doubling LMDB map size to 2MB ...
I0505 20:49:33.439213 18390 db_lmdb.cpp:101] Doubling LMDB map size to 4MB ...
I0505 20:49:33.470553 18390 db_lmdb.cpp:101] Doubling LMDB map size to 8MB ...
I0505 20:49:33.525192 18390 db_lmdb.cpp:101] Doubling LMDB map size to 16MB ...
I0505 20:49:33.546480 18390 convert_mnist_data.cpp:108] Processed 10000 files.
Done.

搭建网络

搭建网络结构，并保存为lenet_auto_train.prototxt（训练网络），lenet_auto_test.prototxt（测试网络）。

from caffe import layers as L, params as P

def lenet(lmdb, batch_size):
    # our version of LeNet: a series of linear and simple nonlinear transformations
    n = caffe.NetSpec()

    n.data, n.label = L.Data(batch_size=batch_size, backend=P.Data.LMDB, source=lmdb,
                             transform_param=dict(scale=1./255), ntop=2)

    n.conv1 = L.Convolution(n.data, kernel_size=5, num_output=20, weight_filler=dict(type='xavier'))
    n.pool1 = L.Pooling(n.conv1, kernel_size=2, stride=2, pool=P.Pooling.MAX)
    n.conv2 = L.Convolution(n.pool1, kernel_size=5, num_output=50, weight_filler=dict(type='xavier'))
    n.pool2 = L.Pooling(n.conv2, kernel_size=2, stride=2, pool=P.Pooling.MAX)
    n.fc1 =   L.InnerProduct(n.pool2, num_output=500, weight_filler=dict(type='xavier'))
    n.relu1 = L.ReLU(n.fc1, in_place=True)
    n.score = L.InnerProduct(n.relu1, num_output=10, weight_filler=dict(type='xavier'))
    n.loss =  L.SoftmaxWithLoss(n.score, n.label)

    return n.to_proto()

with open('mnist/lenet_auto_train.prototxt', 'w') as f:
    f.write(str(lenet('mnist/mnist_train_lmdb', 64)))

with open('mnist/lenet_auto_test.prototxt', 'w') as f:
    f.write(str(lenet('mnist/mnist_test_lmdb', 100)))

查看训练网络结构：

1	!cat mnist/lenet_auto_train.prototxt

layer {
  name: "data"
  type: "Data"
  top: "data"
  top: "label"
  transform_param {
    scale: 0.00392156862745
  }
  data_param {
    source: "mnist/mnist_train_lmdb"
    batch_size: 64
    backend: LMDB
  }
}
layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data"
  top: "conv1"
  convolution_param {
    num_output: 20
    kernel_size: 5
    weight_filler {
      type: "xavier"
    }
  }
}
layer {
  name: "pool1"
  type: "Pooling"
  bottom: "conv1"
  top: "pool1"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layer {
  name: "conv2"
  type: "Convolution"
  bottom: "pool1"
  top: "conv2"
  convolution_param {
    num_output: 50
    kernel_size: 5
    weight_filler {
      type: "xavier"
    }
  }
}
layer {
  name: "pool2"
  type: "Pooling"
  bottom: "conv2"
  top: "pool2"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layer {
  name: "fc1"
  type: "InnerProduct"
  bottom: "pool2"
  top: "fc1"
  inner_product_param {
    num_output: 500
    weight_filler {
      type: "xavier"
    }
  }
}
layer {
  name: "relu1"
  type: "ReLU"
  bottom: "fc1"
  top: "fc1"
}
layer {
  name: "score"
  type: "InnerProduct"
  bottom: "fc1"
  top: "score"
  inner_product_param {
    num_output: 10
    weight_filler {
      type: "xavier"
    }
  }
}
layer {
  name: "loss"
  type: "SoftmaxWithLoss"
  bottom: "score"
  bottom: "label"
  top: "loss"
}

查看学习参数，参数文件已经保存在本地磁盘：

1	!cat mnist/lenet_auto_solver.prototxt

# The train/test net protocol buffer definition
train_net: "mnist/lenet_auto_train.prototxt"
test_net: "mnist/lenet_auto_test.prototxt"
# test_iter specifies how many forward passes the test should carry out.
# In the case of MNIST, we have test batch size 100 and 100 test iterations,
# covering the full 10,000 testing images.
test_iter: 100
# Carry out testing every 500 training iterations.
test_interval: 500
# The base learning rate, momentum and the weight decay of the network.
base_lr: 0.01
momentum: 0.9
weight_decay: 0.0005
# The learning rate policy
lr_policy: "inv"
gamma: 0.0001
power: 0.75
# Display every 100 iterations
display: 100
# The maximum number of iterations
max_iter: 10000
# snapshot intermediate results
snapshot: 5000
snapshot_prefix: "mnist/lenet"

加载并检查solver

caffe.set_device(0)
caffe.set_mode_gpu()

### load the solver and create train and test nets
solver = None  # ignore this workaround for lmdb data (can't instantiate two solvers on the same data)
solver = caffe.SGDSolver('mnist/lenet_auto_solver.prototxt')

检查网络参数

1 2	# each output is (batch size, feature dim, spatial dim) [(k, v.data.shape) for k, v in solver.net.blobs.items()]

[('data', (64, 1, 28, 28)),
 ('label', (64,)),
 ('conv1', (64, 20, 24, 24)),
 ('pool1', (64, 20, 12, 12)),
 ('conv2', (64, 50, 8, 8)),
 ('pool2', (64, 50, 4, 4)),
 ('fc1', (64, 500)),
 ('score', (64, 10)),
 ('loss', ())]

1 2	# just print the weight sizes (we'll omit the biases) [(k, v[0].data.shape) for k, v in solver.net.params.items()]

[('conv1', (20, 1, 5, 5)),
 ('conv2', (50, 20, 5, 5)),
 ('fc1', (500, 800)),
 ('score', (10, 500))]

在开始前，我们先检查下训练网络和测试网络是否包含我们的数据

1 2	solver.net.forward() # train net solver.test_nets[0].forward() # test net (there can be more than one)

{'loss': array(2.3089799880981445, dtype=float32)}

1
2
3

# we use a little trick to tile the first eight images
imshow(solver.net.blobs['data'].data[:8, 0].transpose(1, 0, 2).reshape(28, 8*28), cmap='gray'); axis('off')
print 'train labels:', solver.net.blobs['label'].data[:8]

train labels: [ 5.  0.  4.  1.  9.  2.  1.  3.]

1 2	imshow(solver.test_nets[0].blobs['data'].data[:8, 0].transpose(1, 0, 2).reshape(28, 8*28), cmap='gray'); axis('off') print 'test labels:', solver.test_nets[0].blobs['label'].data[:8]

test labels: [ 7.  2.  1.  0.  4.  1.  4.  9.]

开始训练

先训练一个batch看会有什么结果

1	solver.step(1)

运行一次之后，看看我们的第一层卷积层的滤波器是否有变化，20个滤波器如下所示：

1 2	imshow(solver.net.params['conv1'][0].diff[:, 0].reshape(4, 5, 5, 5) .transpose(0, 2, 1, 3).reshape(45, 55), cmap='gray'); axis('off')

(-0.5, 24.5, 19.5, -0.5)

上面说明权重已经更新，我们可以在迭代训练的时候，记录一些参数，决定什么时候停止迭代

%%time
niter = 200
test_interval = 25
# losses will also be stored in the log
train_loss = zeros(niter)
test_acc = zeros(int(np.ceil(niter / test_interval)))
output = zeros((niter, 8, 10))

# the main solver loop
for it in range(niter):
    solver.step(1)  # SGD by Caffe

    # store the train loss
    train_loss[it] = solver.net.blobs['loss'].data

    # store the output on the first test batch
    # (start the forward pass at conv1 to avoid loading new data)
    solver.test_nets[0].forward(start='conv1')
    output[it] = solver.test_nets[0].blobs['score'].data[:8]

    # run a full test every so often
    # (Caffe can also do this for us and write to a log, but we show here
    #  how to do it directly in Python, where more complicated things are easier.)
    if it % test_interval == 0:
        print 'Iteration', it, 'testing...'
        correct = 0
        for test_it in range(100):
            solver.test_nets[0].forward()
            correct += sum(solver.test_nets[0].blobs['score'].data.argmax(1)
                           == solver.test_nets[0].blobs['label'].data)
        test_acc[it // test_interval] = correct / 1e4

Iteration 0 testing...
Iteration 25 testing...
Iteration 50 testing...
Iteration 75 testing...
Iteration 100 testing...
Iteration 125 testing...
Iteration 150 testing...
Iteration 175 testing...
CPU times: user 1min 15s, sys: 15.3 s, total: 1min 31s
Wall time: 1min 18s

画出train loss和test accuracy

_, ax1 = subplots()
ax2 = ax1.twinx()
ax1.plot(arange(niter), train_loss)
ax2.plot(test_interval * arange(len(test_acc)), test_acc, 'r')
ax1.set_xlabel('iteration')
ax1.set_ylabel('train loss')
ax2.set_ylabel('test accuracy')
ax2.set_title('Test Accuracy: {:.2f}'.format(test_acc[-1]))

<matplotlib.text.Text at 0x7feabeae91d0>

因为我们保存第一次测试batch的结果，所以可以看看每次迭代结果的变化，下面画出每个图像随迭代次数每个标签的可能性。(只显示了一个数字，其他的数字类似)

for i in range(8):
    figure(figsize=(2, 2))
    imshow(solver.test_nets[0].blobs['data'].data[i, 0], cmap='gray')
    figure(figsize=(10, 2))
    imshow(output[:50, i].T, interpolation='nearest', cmap='gray')
    xlabel('iteration')
    ylabel('label')

尝试改变网络结构和优化函数

train_net_path = 'mnist/custom_auto_train.prototxt'
test_net_path = 'mnist/custom_auto_test.prototxt'
solver_config_path = 'mnist/custom_auto_solver.prototxt'

### define net
def custom_net(lmdb, batch_size):
    # define your own net!
    n = caffe.NetSpec()

    # keep this data layer for all networks
    n.data, n.label = L.Data(batch_size=batch_size, backend=P.Data.LMDB, source=lmdb,
                             transform_param=dict(scale=1./255), ntop=2)

    # EDIT HERE to try different networks
    # this single layer defines a simple linear classifier
    # (in particular this defines a multiway logistic regression)
    n.score =   L.InnerProduct(n.data, num_output=10, weight_filler=dict(type='xavier'))

    # EDIT HERE this is the LeNet variant we have already tried
    # n.conv1 = L.Convolution(n.data, kernel_size=5, num_output=20, weight_filler=dict(type='xavier'))
    # n.pool1 = L.Pooling(n.conv1, kernel_size=2, stride=2, pool=P.Pooling.MAX)
    # n.conv2 = L.Convolution(n.pool1, kernel_size=5, num_output=50, weight_filler=dict(type='xavier'))
    # n.pool2 = L.Pooling(n.conv2, kernel_size=2, stride=2, pool=P.Pooling.MAX)
    # n.fc1 =   L.InnerProduct(n.pool2, num_output=500, weight_filler=dict(type='xavier'))
    # EDIT HERE consider L.ELU or L.Sigmoid for the nonlinearity
    # n.relu1 = L.ReLU(n.fc1, in_place=True)
    # n.score =   L.InnerProduct(n.fc1, num_output=10, weight_filler=dict(type='xavier'))

    # keep this loss layer for all networks
    n.loss =  L.SoftmaxWithLoss(n.score, n.label)

    return n.to_proto()

with open(train_net_path, 'w') as f:
    f.write(str(custom_net('mnist/mnist_train_lmdb', 64)))    
with open(test_net_path, 'w') as f:
    f.write(str(custom_net('mnist/mnist_test_lmdb', 100)))

### define solver
from caffe.proto import caffe_pb2
s = caffe_pb2.SolverParameter()

# Set a seed for reproducible experiments:
# this controls for randomization in training.
s.random_seed = 0xCAFFE

# Specify locations of the train and (maybe) test networks.
s.train_net = train_net_path
s.test_net.append(test_net_path)
s.test_interval = 500  # Test after every 500 training iterations.
s.test_iter.append(100) # Test on 100 batches each time we test.

s.max_iter = 10000     # no. of times to update the net (training iterations)

# EDIT HERE to try different solvers
# solver types include "SGD", "Adam", and "Nesterov" among others.
s.type = "SGD"

# Set the initial learning rate for SGD.
s.base_lr = 0.01  # EDIT HERE to try different learning rates
# Set momentum to accelerate learning by
# taking weighted average of current and previous updates.
s.momentum = 0.9
# Set weight decay to regularize and prevent overfitting
s.weight_decay = 5e-4

# Set `lr_policy` to define how the learning rate changes during training.
# This is the same policy as our default LeNet.
s.lr_policy = 'inv'
s.gamma = 0.0001
s.power = 0.75
# EDIT HERE to try the fixed rate (and compare with adaptive solvers)
# `fixed` is the simplest policy that keeps the learning rate constant.
# s.lr_policy = 'fixed'

# Display the current training loss and accuracy every 1000 iterations.
s.display = 1000

# Snapshots are files used to store networks we've trained.
# We'll snapshot every 5K iterations -- twice during training.
s.snapshot = 5000
s.snapshot_prefix = 'mnist/custom_net'

# Train on the GPU
s.solver_mode = caffe_pb2.SolverParameter.GPU

# Write the solver to a temporary file and return its filename.
with open(solver_config_path, 'w') as f:
    f.write(str(s))

### load the solver and create train and test nets
solver = None  # ignore this workaround for lmdb data (can't instantiate two solvers on the same data)
solver = caffe.get_solver(solver_config_path)

### solve
niter = 250  # EDIT HERE increase to train for longer
test_interval = niter / 10
# losses will also be stored in the log
train_loss = zeros(niter)
test_acc = zeros(int(np.ceil(niter / test_interval)))

# the main solver loop
for it in range(niter):
    solver.step(1)  # SGD by Caffe

    # store the train loss
    train_loss[it] = solver.net.blobs['loss'].data

    # run a full test every so often
    # (Caffe can also do this for us and write to a log, but we show here
    #  how to do it directly in Python, where more complicated things are easier.)
    if it % test_interval == 0:
        print 'Iteration', it, 'testing...'
        correct = 0
        for test_it in range(100):
            solver.test_nets[0].forward()
            correct += sum(solver.test_nets[0].blobs['score'].data.argmax(1)
                           == solver.test_nets[0].blobs['label'].data)
        test_acc[it // test_interval] = correct / 1e4

_, ax1 = subplots()
ax2 = ax1.twinx()
ax1.plot(arange(niter), train_loss)
ax2.plot(test_interval * arange(len(test_acc)), test_acc, 'r')
ax1.set_xlabel('iteration')
ax1.set_ylabel('train loss')
ax2.set_ylabel('test accuracy')
ax2.set_title('Custom Test Accuracy: {:.2f}'.format(test_acc[-1]))

参考

Solving in Python with LeNet

Classification with Caffenet

2016-05-03T02:00:00.000Z

Caffe直接使用训练好的CaffeNet模型来进行分类，Caffe的安装有很多教程，千秋轻松装Caffe教程（含CUDA 7.0和CuDNN）这个教程说的很详细，其中比较繁琐的就是CUDA的安装了，可以参考这里：Deepin CUDA安装及Keras使用GPU模式运行。其中遇到的一个比较大的坑就是cuDNN的安装，首先得确定你的GPU是否支持cuDNN，cuDNN要求GPU的计算能力在3.0以上，这里 http://developer.nvidia.com/cuda-gpus可以查询GPU的计算能力，也能查询你的GPU是否支持CUDA，如果你的GPU不支持cuDNN但是支持CUDA，在编译配置文件注释掉USE_CUDNN :=1和CPU_ONLY :=1就可以使用CUDA了。如果你的GPU支持GUDA和cuDNN，得注意你下的Caffe所支持cuDNN的版本，这里可以查看http://caffe.berkeleyvision.org/installation.html。

在这里我们比较下CPU和GPU模式下，网络的运行速度，并了解模型特征的提取。

设置环境

导入Python,numpy,matplotlib

import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
# set display defaults
plt.rcParams['figure.figsize'] = (10, 10)        # large images
plt.rcParams['image.interpolation'] = 'nearest'  # don't interpolate: show square pixels
plt.rcParams['image.cmap'] = 'gray'  # use grayscale output rather than a (potentially misleading) color heatmap

导入caffe，其中注意caffe的路径设置

import sys
caffe_root='/home/ldy/workspace/caffe/' #设置你caffe的安装目录
sys.path.insert(0,caffe_root+'python')
import caffe                            #导入caffe

第一次运行需要联网下载模型

import os
if os.path.isfile(caffe_root + 'models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel'):
    print 'CaffeNet found.'
else:
    print 'Downloading pre-trained CaffeNet model...'
    !/home/ldy/workspace/caffe/scripts/download_model_binary.py /home/ldy/workspace/caffe/models/bvlc_reference_caffenet

CaffeNet found.

设置网络并对输入进行处理

设置CPU模式并从本地加载网络

caffe.set_mode_cpu()

model_def = caffe_root + 'models/bvlc_reference_caffenet/deploy.prototxt'
model_weights = caffe_root + 'models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel'

net = caffe.Net(model_def,      # defines the structure of the model
                model_weights,  # contains the trained weights
                caffe.TEST)     # use test mode (e.g., don't perform dropout)

设置输入处理

CaffeNet默认的输入图像格式是BGR模式，像素值是[0,255]然后减去ImageNet的像素平均值，而且图像通道的维数是在第一维。

matplotlib导入图像的格式是RGB,像素值的范围是[0,1]，通道维数在第三维，所以我们需要进行转换。

# load the mean ImageNet image (as distributed with Caffe) for subtraction
mu = np.load(caffe_root + 'python/caffe/imagenet/ilsvrc_2012_mean.npy')
mu = mu.mean(1).mean(1)  # average over pixels to obtain the mean (BGR) pixel values
print 'mean-subtracted values:', zip('BGR', mu)

# create transformer for the input called 'data'
transformer = caffe.io.Transformer({'data': net.blobs['data'].data.shape})

transformer.set_transpose('data', (2,0,1))  # move image channels to outermost dimension
transformer.set_mean('data', mu)            # subtract the dataset-mean value in each channel
transformer.set_raw_scale('data', 255)      # rescale from [0, 1] to [0, 255]
transformer.set_channel_swap('data', (2,1,0))  # swap channels from RGB to BGR

mean-subtracted values: [('B', 104.0069879317889), ('G', 116.66876761696767), ('R', 122.6789143406786)]

CPU模式分类

设置输入的大小

# set the size of the input (we can skip this if we're happy
#  with the default; we can also change it later, e.g., for different batch sizes)
net.blobs['data'].reshape(50,        # batch size
                          3,         # 3-channel (BGR) images
                          227, 227)  # image size is 227x227

加载图片并转换

1
2
3

image = caffe.io.load_image(caffe_root + 'examples/images/cat.jpg')
transformed_image = transformer.preprocess('data', image)
plt.imshow(image)

<matplotlib.image.AxesImage at 0x7f7ba44f0a50>

进行分类

# copy the image data into the memory allocated for the net
net.blobs['data'].data[...] = transformed_image

### perform classification
output = net.forward()

output_prob = output['prob'][0]  # the output probability vector for the first image in the batch

print 'predicted class is:', output_prob.argmax()

predicted class is: 281

从上面的输出，我们得到输入的图片得到的类别可能是第281类，但是并不知道它对应的标签，下面我们来加载ImageNet的标签(首次需要联网)。

# load ImageNet labels
labels_file = caffe_root + 'data/ilsvrc12/synset_words.txt'
if not os.path.exists(labels_file):
    !/home/ldy/workspace/caffe/data/ilsvrc12/get_ilsvrc_aux.sh

labels = np.loadtxt(labels_file, str, delimiter='\t')

print 'output label:', labels[output_prob.argmax()]

Downloading...
--2016-05-03 10:54:43--  http://dl.caffe.berkeleyvision.org/caffe_ilsvrc12.tar.gz
正在解析主机 dl.caffe.berkeleyvision.org (dl.caffe.berkeleyvision.org)... 169.229.222.251
正在连接 dl.caffe.berkeleyvision.org (dl.caffe.berkeleyvision.org)|169.229.222.251|:80... 已连接。
已发出 HTTP 请求，正在等待回应... 200 OK
长度：17858008 (17M) [application/octet-stream]
正在保存至: “caffe_ilsvrc12.tar.gz”

caffe_ilsvrc12.tar. 100%[===================>]  17.03M  2.54MB/s    in 8.9s    

2016-05-03 10:54:53 (1.91 MB/s) - 已保存 “caffe_ilsvrc12.tar.gz” [17858008/17858008])

Unzipping...
Done.
output label: n02123045 tabby, tabby cat

现在我们得到了输出为｀tabby cat｀，如果我们想得到其他的可能类别，如下所示：

# sort top five predictions from softmax output
top_inds = output_prob.argsort()[::-1][:5]  # reverse sort and take five largest items

print 'probabilities and labels:'
zip(output_prob[top_inds], labels[top_inds])

probabilities and labels:
[(0.31243625, 'n02123045 tabby, tabby cat'),
 (0.23797135, 'n02123159 tiger cat'),
 (0.12387258, 'n02124075 Egyptian cat'),
 (0.10075716, 'n02119022 red fox, Vulpes vulpes'),
 (0.070957333, 'n02127052 lynx, catamount')]

切换到GPU模式

查看CPU模式花费的时间

1	%timeit net.forward()

1 loop, best of 3: 8.87 s per loop

切换到GPU模式，查看花费时间

#caffe.set_device(0)  # if we have multiple GPUs, pick the first one
caffe.set_mode_gpu()
net.forward()  # run once before timing to set up memory
%timeit net.forward()

1 loop, best of 3: 2.27 s per loop

##查看中间输入

神经网络不仅仅是一个黑盒子，我们可以查看一些中间结果和参数。

查看激活函数输出的数据维数，格式为(batch_size, channel_dim, height, width)。

1
2
3

# for each layer, show the output shape
for layer_name, blob in net.blobs.iteritems():
    print layer_name + '\t' + str(blob.data.shape)

data    (50, 3, 227, 227)
conv1    (50, 96, 55, 55)
pool1    (50, 96, 27, 27)
norm1    (50, 96, 27, 27)
conv2    (50, 256, 27, 27)
pool2    (50, 256, 13, 13)
norm2    (50, 256, 13, 13)
conv3    (50, 384, 13, 13)
conv4    (50, 384, 13, 13)
conv5    (50, 256, 13, 13)
pool5    (50, 256, 6, 6)
fc6    (50, 4096)
fc7    (50, 4096)
fc8    (50, 1000)
prob    (50, 1000)

查看权值参数的维数，权值格式为(output_channels, input_channels, filter_height, filter_width)，偏置的格式为(output_channels,)。

1 2	for layer_name, param in net.params.iteritems(): print layer_name + '\t' + str(param[0].data.shape), str(param[1].data.shape)

conv1    (96, 3, 11, 11) (96,)
conv2    (256, 48, 5, 5) (256,)
conv3    (384, 256, 3, 3) (384,)
conv4    (384, 192, 3, 3) (384,)
conv5    (256, 192, 3, 3) (256,)
fc6    (4096, 9216) (4096,)
fc7    (4096, 4096) (4096,)
fc8    (1000, 4096) (1000,)

输出可视化

def vis_square(data):
    """Take an array of shape (n, height, width) or (n, height, width, 3)
       and visualize each (height, width) thing in a grid of size approx. sqrt(n) by sqrt(n)"""

    # normalize data for display
    data = (data - data.min()) / (data.max() - data.min())

    # force the number of filters to be square
    n = int(np.ceil(np.sqrt(data.shape[0])))
    padding = (((0, n ** 2 - data.shape[0]),
               (0, 1), (0, 1))                 # add some space between filters
               + ((0, 0),) * (data.ndim - 3))  # don't pad the last dimension (if there is one)
    data = np.pad(data, padding, mode='constant', constant_values=1)  # pad with ones (white)

    # tile the filters into an image
    data = data.reshape((n, n) + data.shape[1:]).transpose((0, 2, 1, 3) + tuple(range(4, data.ndim + 1)))
    data = data.reshape((n * data.shape[1], n * data.shape[3]) + data.shape[4:])

    plt.imshow(data); plt.axis('off')

第一层卷积滤波器

1
2
3

# the parameters are a list of [weights, biases]
filters = net.params['conv1'][0].data
vis_square(filters.transpose(0, 2, 3, 1))

第一层卷积层的输出

1 2	feat = net.blobs['conv1'].data[0, :36] vis_square(feat)

第五层pooling之后的输出

1 2	feat = net.blobs['pool5'].data[0] vis_square(feat)

第一个全连接层的输出

feat = net.blobs['fc6'].data[0]
plt.subplot(2, 1, 1)
plt.plot(feat.flat)
plt.subplot(2, 1, 2)
_ = plt.hist(feat.flat[feat.flat > 0], bins=100)

最后的类别概率输出

1
2
3

feat = net.blobs['prob'].data[0]
plt.figure(figsize=(15, 3))
plt.plot(feat.flat)

[<matplotlib.lines.Line2D at 0x7f7ba0177d10>]

对自己的图片分类

设置好图片的链接就好了

# download an image
#my_image_url = "..."  # paste your URL here
# for example:
my_image_url = "https://upload.wikimedia.org/wikipedia/commons/b/be/Orang_Utan%2C_Semenggok_Forest_Reserve%2C_Sarawak%2C_Borneo%2C_Malaysia.JPG"
!wget -O image.jpg $my_image_url

# transform it and copy it into the net
image = caffe.io.load_image('image.jpg')
net.blobs['data'].data[...] = transformer.preprocess('data', image)

# perform classification
net.forward()

# obtain the output probabilities
output_prob = net.blobs['prob'].data[0]

# sort top five predictions from softmax output
top_inds = output_prob.argsort()[::-1][:5]

plt.imshow(image)

print 'probabilities and labels:'
zip(output_prob[top_inds], labels[top_inds])

--2016-05-03 11:23:33--  https://upload.wikimedia.org/wikipedia/commons/b/be/Orang_Utan%2C_Semenggok_Forest_Reserve%2C_Sarawak%2C_Borneo%2C_Malaysia.JPG
正在解析主机 upload.wikimedia.org (upload.wikimedia.org)... 2620:0:863:ed1a::2:b, 2620:0:863:ed1a::2:b, 198.35.26.112, ...
正在连接 upload.wikimedia.org (upload.wikimedia.org)|2620:0:863:ed1a::2:b|:443... 已连接。
已发出 HTTP 请求，正在等待回应... 200 OK
长度：1443340 (1.4M) [image/jpeg]
正在保存至: “image.jpg”

image.jpg           100%[===================>]   1.38M  1.41MB/s    in 1.0s    

2016-05-03 11:23:35 (1.41 MB/s) - 已保存 “image.jpg” [1443340/1443340])

probabilities and labels:

[(0.9680779, 'n02480495 orangutan, orang, orangutang, Pongo pygmaeus'),
 (0.030589299, 'n02492660 howler monkey, howler'),
 (0.00085892546, 'n02493509 titi, titi monkey'),
 (0.00015429084, 'n02493793 spider monkey, Ateles geoffroyi'),
 (7.2596376e-05, 'n02488291 langur')]

参考

Classification: Instant Recognition with Caffe

Vim Cheat SHeet

2016-04-23T03:00:00.000Z

基本操作

光标在屏幕文本中的移动既可以用箭头键，也可以使用 hjkl 字母键。
h (左移) j (下行) k (上行) l (右移)
欲进入 Vim 编辑器(从命令行提示符)，请输入：vim 文件名 <回车>
欲退出 Vim 编辑器，请输入 :q! <回车> 放弃所有改动。或者输入 :wq <回车> 保存改动。
在正常模式下删除光标所在位置的字符，请按： x
欲插入或添加文本，请输入：

i 输入欲插入文本在光标前插入文本
A 输入欲添加文本在一行后添加文本

特别提示：按下键会带您回到正常模式或者撤消一个不想输入或部分完整的命令。

删除类命令

欲从当前光标删除至下一个单词，请输入：dw
欲从当前光标删除至当前行末尾，请输入：d$
欲删除整行，请输入：dd
欲重复一个动作，请在它前面加上一个数字：2w
在正常模式下修改命令的格式是：
```
operator   [number]   motion
```
其中：
operator - 操作符，代表要做的事情，比如 d 代表删除
[number] - 可以附加的数字，代表动作重复的次数
motion - 动作，代表在所操作的文本上的移动，例如 w 代表单词(word)，
$ 代表行末等等。
欲移动光标到行首，请按数字0键：0
欲撤消以前的操作，请输入：u (小写的u)
欲撤消在一行中所做的改动，请输入：U (大写的U)
欲撤消以前的撤消命令，恢复以前的操作结果，请输入：CTRL-R

置入类命令

要重新置入已经删除的文本内容，请按小写字母 p 键。该操作可以将已删除
的文本内容置于光标之后。如果最后一次删除的是一个整行，那么该行将置
于当前光标所在行的下一行。
要替换光标所在位置的字符，请输入小写的 r 和要替换掉原位置字符的新字
符即可。
更改类命令允许您改变从当前光标所在位置直到动作指示的位置中间的文本。
比如输入 ce 可以替换当前光标到单词的末尾的内容；输入 c$ 可以替换当
前光标到行末的内容。
更改类命令的格式是：
```
c   [number]   motion
```

定位及文件状态

CTRL-G 用于显示当前光标所在位置和文件状态信息。
G 用于将光标跳转至文件最后一行。
先敲入一个行号然后输入大写 G 则是将光标移动至该行号代表的行。
gg 用于将光标跳转至文件第一行。
输入 / 然后紧随一个字符串是在当前所编辑的文档中正向查找该字符串。
输入 ? 然后紧随一个字符串则是在当前所编辑的文档中反向查找该字符串。
完成一次查找之后按 n 键是重复上一次的命令，可在同一方向上查
找下一个匹配字符串所在；或者按大写 N 向相反方向查找下一匹配字符串所在。
CTRL-O 带您跳转回较旧的位置，CTRL-I 则带您到较新的位置。
如果光标当前位置是括号(、)、[、]、{、}，按 % 会将光标移动到配对的括号上。
在一行内替换头一个字符串 old 为新的字符串 new，请输入 :s/old/new
在一行内替换所有的字符串 old 为新的字符串 new，请输入 :s/old/new/g
在两行内替换所有的字符串 old 为新的字符串 new，请输入 :#,#s/old/new/g
在文件内替换所有的字符串 old 为新的字符串 new，请输入 :%s/old/new/g
进行全文替换时询问用户确认每个替换需添加 c 标志 :%s/old/new/gc

在 VIM 内执行外部命令的方法

:!command 用于执行一个外部命令 command。

请看一些实际例子：
:!dir :!ls - 用于显示当前目录的内容。
:!del FILENAME :!rm FILENAME - 用于删除名为 FILENAME 的文件。
:w FILENAME 可将当前 VIM 中正在编辑的文件保存到名为 FILENAME 的文件中。
v motion :w FILENAME 可将当前编辑文件中可视模式下选中的内容保存到文件FILENAME 中。
:r FILENAME 可提取磁盘文件 FILENAME 并将其插入到当前文件的光标位置后面。
:r !dir 可以读取 dir 命令的输出并将其放置到当前文件的光标位置后面。

打开类命令

输入小写的 o 可以在光标下方打开新的一行并进入插入模式。
输入大写的 O 可以在光标上方打开新的一行。
输入小写的 a 可以在光标所在位置之后插入文本。
输入大写的 A 可以在光标所在行的行末之后插入文本。
e 命令可以使光标移动到单词末尾。
操作符 y 复制文本，p 粘贴先前复制的文本。
输入大写的 R 将进入替换模式，直至按键回到正常模式。
输入 :set xxx 可以设置 xxx 选项。一些有用的选项如下：
‘ic’ ‘ignorecase’ 查找时忽略字母大小写
‘is’ ‘incsearch’ 查找短语时显示部分匹配
‘hls’ ‘hlsearch’ 高亮显示所有的匹配短语
选项名可以用完整版本，也可以用缩略版本。
在选项前加上 no 可以关闭选项： :set noic

获取帮助信息

输入 :help 或者按键或键可以打开帮助窗口。
输入 :help cmd 可以找到关于 cmd 命令的帮助。
输入 CTRL-W CTRL-W 可以使您在窗口之间跳转。
输入 :q 以关闭帮助窗口
您可以创建一个 vimrc 启动脚本文件用来保存您偏好的设置。
当输入 : 命令时，按 CTRL-D 可以查看可能的补全结果。按可以使用一个补全。

Deepin CUDA Install and Run Keras on GPU

2016-04-09T03:00:00.000Z

Deepin简介

Deepin是由武汉深之度科技有限公司开发的Linux发行版,Deepin 为所有人提供稳定、高效的操作系统，强调安全、易用、美观。其口号为“免除新手痛苦，节约老手时间”。

cuda安装

下载

按照系统的版本下载对应的cuda版本，下载地址：https://developer.nvidia.com/cuda-downloads

安装

注意执行安装文件的时候一定要加上’–override’，不然会出现错误：’”Toolkit: Installation Failed. Using unsupported Compiler.”‘

1 2	chmod 755 cuda_7.5.18_linux.run sudo ./cuda_7.5.18_linux.run --override

如果你电脑里已经装好比cuda内置的NVIDIA驱动更新的版本，那么在安装的时候就不要选择安装NVIDIA驱动。

安装过程的设置如下所示：

-------------------------------------------------------------
Do you accept the previously read EULA? (accept/decline/quit): accept
You are attempting to install on an unsupported configuration. Do you wish to continue? ((y)es/(n)o) [ default is no ]: y
Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 352.39? ((y)es/(n)o/(q)uit): n
Install the CUDA 7.5 Toolkit? ((y)es/(n)o/(q)uit): y
Enter Toolkit Location [ default is /usr/local/cuda-7.5 ]:
Do you want to install a symbolic link at /usr/local/cuda? ((y)es/(n)o/(q)uit): y
Install the CUDA 7.5 Samples? ((y)es/(n)o/(q)uit): y
Enter CUDA Samples Location [ default is /home/kinghorn ]: /usr/local/cuda-7.5
Installing the CUDA Toolkit in /usr/local/cuda-7.5 ...
Finished copying samples.

===========
= Summary =
===========

Driver:   Not Selected
Toolkit:  Installed in /usr/local/cuda-7.5
Samples:  Installed in /usr/local/cuda-7.5

环境设置

打开~/.bashrc

1	gedit ~/.bashrc

添加下面两条语句：


export PATH=$PATH:/usr/local/cuda/bin

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64

强制cuda使用gcc 5

因为cuda默认不使用gcc>4.8，通过注释掉报错行来强制使用gcc 5。

sudo gedit /usr/local/cuda/include/host_config.h

//注释掉115行
//#error -- unsupported GNU version! gcc versions later than 4.9 are not supported!

运行cuda内置的例子

为了测试是否安装成功

进入内置例程

1	cd /usr/local/cuda/samples/1_Utilities/deviceQuery

编译

make

运行

1	./deviceQuery

得到结果：

CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce GT 520M"
 CUDA Driver Version / Runtime Version          8.0 / 7.5
 CUDA Capability Major/Minor version number:    2.1
 Total amount of global memory:                 1024 MBytes (1073414144 bytes)
 ( 1) Multiprocessors, ( 48) CUDA Cores/MP:     48 CUDA Cores
 GPU Max Clock rate:                            1480 MHz (1.48 GHz)
 Memory Clock rate:                             800 Mhz
 Memory Bus Width:                              64-bit
 L2 Cache Size:                                 65536 bytes
 Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65535), 3D=(2048, 2048, 2048)
 Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
 Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
 Total amount of constant memory:               65536 bytes
 Total amount of shared memory per block:       49152 bytes
 Total number of registers available per block: 32768
 Warp size:                                     32
 Maximum number of threads per multiprocessor:  1536
 Maximum number of threads per block:           1024
 Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
 Max dimension size of a grid size    (x,y,z): (65535, 65535, 65535)
 Maximum memory pitch:                          2147483647 bytes
 Texture alignment:                             512 bytes
 Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
 Run time limit on kernels:                     No
 Integrated GPU sharing Host Memory:            No
 Support host page-locked memory mapping:       Yes
 Alignment requirement for Surfaces:            Yes
 Device has ECC support:                        Disabled
 Device supports Unified Addressing (UVA):      Yes
 Device PCI Domain ID / Bus ID / location ID:   0 / 1 / 0
 Compute Mode:
    < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 8.0, CUDA Runtime Version = 7.5, NumDevs = 1, Device0 = GeForce GT 520M
Result = PASS

如果编译出错，检查是否有强制设置gcc 5来编译；如果输出结果为fail，说明没有检查到显卡，解决方案是升级你的NVIDIA驱动，确保你电脑的NVIDIA驱动版本要不低于cuda的内置版本。

设置Keras运行于GPU模式

方法一

使用如下命令行运行

1	THEANO_FLAGS=device=gpu,floatX=float32 python my_keras_script.py

方法二

设置$HOME/.theanorc文件

添加如下所示文件

[global]
floatX = float32
device = gpu

[lib]
cnmem = 0.9

[cuda]
root = /usr/local/cuda

方法三

在你的代码前面，加上如下所示代码：

1
2
3

import theano
theano.config.device = 'gpu'
theano.config.floatX = 'float32'

我们来运行Keras里的一个用于电影评论情感分析的例子imdb_cnn.py,第一次运行时需要联网，要下载数据库。

'''This example demonstrates the use of Convolution1D for text classification.
Run on GPU: THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 python imdb_cnn.py
Get to 0.835 test accuracy after 2 epochs. 100s/epoch on K520 GPU.
'''

from __future__ import print_function
import numpy as np
np.random.seed(1337)  # for reproducibility

from keras.preprocessing import sequence
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation, Flatten
from keras.layers.embeddings import Embedding
from keras.layers.convolutional import Convolution1D, MaxPooling1D
from keras.datasets import imdb


# set parameters:
max_features = 5000
maxlen = 100
batch_size = 32
embedding_dims = 100
nb_filter = 250
filter_length = 3
hidden_dims = 250
nb_epoch = 2

print('Loading data...')
(X_train, y_train), (X_test, y_test) = imdb.load_data(nb_words=max_features,
                                                      test_split=0.2)
print(len(X_train), 'train sequences')
print(len(X_test), 'test sequences')

print('Pad sequences (samples x time)')
X_train = sequence.pad_sequences(X_train, maxlen=maxlen)
X_test = sequence.pad_sequences(X_test, maxlen=maxlen)
print('X_train shape:', X_train.shape)
print('X_test shape:', X_test.shape)

print('Build model...')
model = Sequential()

# we start off with an efficient embedding layer which maps
# our vocab indices into embedding_dims dimensions
model.add(Embedding(max_features, embedding_dims, input_length=maxlen))
model.add(Dropout(0.25))

# we add a Convolution1D, which will learn nb_filter
# word group filters of size filter_length:
model.add(Convolution1D(nb_filter=nb_filter,
                        filter_length=filter_length,
                        border_mode='valid',
                        activation='relu',
                        subsample_length=1))
# we use standard max pooling (halving the output of the previous layer):
model.add(MaxPooling1D(pool_length=2))

# We flatten the output of the conv layer,
# so that we can add a vanilla dense layer:
model.add(Flatten())

# We add a vanilla hidden layer:
model.add(Dense(hidden_dims))
model.add(Dropout(0.25))
model.add(Activation('relu'))

# We project onto a single unit output layer, and squash it with a sigmoid:
model.add(Dense(1))
model.add(Activation('sigmoid'))

model.compile(loss='binary_crossentropy',
              optimizer='rmsprop')
model.fit(X_train, y_train, batch_size=batch_size,
          nb_epoch=nb_epoch, show_accuracy=True,
          validation_data=(X_test, y_test))

运行这个例子，在K520 GPU上是100s一次循环，我电脑显卡型号为GeForce GT 520M，大概需要175s一次循环，不过比在cpu上运行快多啦，在我这四年前旧电脑cpu上运行差不多要一个小时。

参考

NVIDIA CUDA with Ubuntu 16.04 beta on a laptop

Keras FAQ

Keras Introduction

2016-04-07T03:00:00.000Z

Keras 简介

Keras是一个用Python编写的基于 TensorFlow 和 Theano高度模块化的神经网络库。其最大的优点在于样例丰富，现有主流模型封装完美。复杂点的模型可以像搭积木一样搞出来，适合快速地搭建模型。

安装：

1	sudo pip install keras

Keras里的基本模块

optimizers

Keras包含了很多优化方法。比如最常用的随机梯度下降法(SGD)，还有Adagrad、Adadelta、RMSprop、Adam等。下面通过具体的代码介绍一下优化器的使用方法。
在编译一个Keras模型时，优化器是2个参数之一（另外一个是损失函数）。看如下代码：

model = Sequential()  
model.add(Dense(64, init='uniform', input_dim=10))  
model.add(Activation('tanh'))  
model.add(Activation('softmax'))  

sgd = SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)  
model.compile(loss='mean_squared_error', optimizer=sgd)

这个例子中是在调用compile之前实例化了一个优化器。我们也可以通过传递名字的方式调用默认的优化器。代码如下：

1 2	# passoptimizer by name: default parameters will be used model.compile(loss='mean_squared_error', optimizer='sgd')

SGD（随机梯度下降优化器，性价比最好的算法）

1	keras.optimizers.SGD(lr=0.01, momentum=0., decay=0., nesterov=False)

参数：

lr :float>=0，学习速率
momentum :float>=0 参数更新的动量
decay : float>=0 每次更新后学习速率的衰减量
nesterov :Boolean 是否使用Nesterov动量项

objectives

目标函数模块，keras提供了mean_squared_error，mean_absolute_error，squared_hinge，hinge，binary_crossentropy，categorical_crossentropy这几种目标函数。

这里binary_crossentropy 和categorical_crossentropy也就是常说的logloss.

Activations

激活函数模块，keras提供了linear、sigmoid、hard_sigmoid、tanh、softplus、relu、softplus，另外softmax也放在Activations模块里。此外，像LeakyReLU和PReLU这种比较新的激活函数，keras在keras.layers.advanced_activations模块里提供。

initializations

权值初始化，在Keras中对权值矩阵初始化的方式很简单，就是在add某一层时，同时注明初始化该层的概率分布是什么就可以了。代码如下：

1 2	# init是关键字，’uniform’表示用均匀分布去初始化 model.add(Dense(64, init='uniform'))

keras提供了uniform、lecun_uniform、normal、orthogonal、zero、glorot_normal、he_normal这几种。

regularizers

深度学习容易出现过拟合，通过使用正则化方法，防止过拟合，提高泛化能力。

使用示例代码如下：

1 2	from keras.regularizers import l2, activity_l2 model.add(Dense(64, input_dim=64, W_regularizer=l2(0.01), activity_regularizer=activity_l2(0.01)))

constraints

除了正则化外，Keras还有一个约束限制功能。函数可以设置在训练网络到最优时对网络参数的约束。这个约束就是限制参数值的取值范围。比如最大值是多少，不允许为负值等。

2个关键的参数：

W_constraint：约束主要的权值矩阵
b_constraint：约束偏置值

使用示例代码如下：

1
2
3

from keras.constraints import maxnorm
model.add(Dense(64, W_constraint =maxnorm(2)))
#限制权值的各个参数不能大于2

可用的约束限制

maxnorm(m=2): 最大值约束
nonneg(): 不允许负值
unitnorm(): 归一化

实例：解决XOR问题

import numpy as np
from keras.models import Sequential
from keras.layers.core import Activation, Dense
from keras.optimizers import SGD

X = np.zeros((4, 2), dtype='uint8')#训练数据
y = np.zeros(4, dtype='uint8')#训练标签

X[0] = [0, 0]
y[0] = 0
X[1] = [0, 1]
y[1] = 1
X[2] = [1, 0]
y[2] = 1
X[3] = [1, 1]
y[3] = 0

model = Sequential()#实例化模型
model.add(Dense(2, input_dim=2))#输入层，输入数据维数为2
model.add(Activation('sigmoid'))#设置激活函数
model.add(Dense(1))
model.add(Activation('sigmoid'))

sgd = SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='mean_squared_error', optimizer=sgd)

history = model.fit(X, y, nb_epoch=10000, batch_size=4, show_accuracy=True, verbose=2)

print model.predict(X)#预测

参考

Keras Documentation

Keras 学习随笔

深度学习框架Keras简介