python neural_style.py
(运行脚本)
--iterations 5000
(迭代次数)
--content example.jpg
(被风格化图片)
--styles style.jpg
(提供风格图片)
--output result.jpg
(输出图片名称)
--checkpoint-iterations 50
(每多少次迭代输出一个check图片)
--checkpoint-output temp%s.jpg
(check图片的名称,需要带有%s)vgg.py是stylize的基础,它有以下5个函数:
我所理解的风格化的实现过程如下:
In my project AI.Application.StylizePictures.I, I mentioned that to stylize a picture, is to minimize the sum of style loss, content loss and total varition denoising. The style loss, which is the sum of l2_loss of the difference between the calculated result of stylized image(a trainable variable) and style images, which the calculation is to get features in VGG model from beginning to layer relu1-1, relu2-1, relu3-1, relu4-1 and relu5-1. Shows in the following fig.
for style_layer in STYLE_LAYERS:
to layer=style_layer=STYLE_LAYERS[i]
变分自编码器(VAE) 参考自:http://blog.csdn.net/jackytintin/article/details/53641885
layer | shape | notes |
---|---|---|
input | (None, n) | 输入 |
encoder | (None, ?) | 判别器,可采用MLP,CNN,RNN等 |
mean | (None, m) | 均值,连接到hidden层 |
log_var | (None, m) | 方差对数,和mean层并列,直接连接到hidden层 |
gaussian_out | (None, m) | 生成高斯分布随机数 |
decoder | (None, ?) | 生成器 |
output | (None, n) | 恢复原始输出 |
<layer>(<parameter>)(<last_layer>)
的方式连接,因为z_mean和z_log_var相互是并联在中间层上的,用sequence()无法实现在创建了完整的VAE模型并增添loss函数之后,将喂入手写数据集用于训练(注意这里只会喂入image,没有任何label,因为是非监督的);训练完成之后将输入和z_mean连在一起构成新的模型,喂入手写数据集进行可视化;之后将(latent_dim, )作为输入节点,最终生成的图片作为输出节点,构建decoder模型,喂入随机数生成图片
vae = Model(x, y)
,之后创建优化函数,喂入手写数字数据集进行训练encoder = Model(x, z_mean)
连接输入和z_mean输出节点,之后喂入手写数据进行预测,作图generator = Model(decoder_input, _x_decoder_mean)
这个例子将上个例子的全连接隐藏层变为了卷积和反卷积层,也是采用<layer>(<parameter>)(<last_layer>)
的方式构建完整VAE网络以及encoder和decoder,具体完整VAE模型结构如下
layer | shape |
---|---|
input | (None, 28, 28, 1) |
conv2d | (None, 28, 28, 1) |
conv2d | (None, 14, 14, 64) |
conv2d | (None, 14, 14, 64) |
conv2d | (None, 14, 14, 64) |
flatten | (None, 12544) |
dense_out | (None, 128) |
以下z_mean和z_log_var是并列关系 | 直接连接到dense_out |
z_mean | (None, 2) |
z_log_var | (None, 2) |
dense_decoder_in | (None, 128) |
dense | (None, 12544) |
reshape | (None, 14, 14, 64) |
deconv2d | (None, 14, 14, 64) |
deconv2d | (None, 14, 14, 64) |
deconv2d | (None, 29, 29, 64) |
conv2d | (None, 28, 28, 1) |
自建层 | (None, 28, 28, 1)和(None, 28, 28, 1),求取loss |
训练完成之后,用于特征提取的encoder的完整结构如下:
layer | shape |
---|---|
input | (None, 28, 28, 1) |
conv2d | (None, 28, 28, 1) |
conv2d | (None, 14, 14, 64) |
conv2d | (None, 14, 14, 64) |
conv2d | (None, 14, 14, 64) |
flatten | (None, 12544) |
dense | (None, 128) |
dense | (None, 2) |
decoder的结构如下:
layer | shape |
---|---|
input | (None, 2) |
dense | (None, 128) |
dense | (None, 12544) |
reshape | (None, 14, 14, 64) |
deconv2d | (None, 14, 14, 64) |
deconv2d | (None, 14, 14, 64) |
deconv2d | (None, 29, 29, 64) |
conv2d | (None, 28, 28, 1) |
完整的VAE模型相当于将encoder和decoder连接在一起,并增加loss函数用于训练
不过有时候我也这样想,当我们睁开眼睛,辨别物体的时候,是不是就像机器那样开始读取图像数据集,并训练成标签。而当我们闭上眼睛的时候,是不是就看到了那些训练好的卷积网络的隐藏层权重图像?而当我们做梦,是不是就相当于将训练好的标签经过decoder又返回变成图形?这样一想,突然感觉深度学习非常值得深入讨论,如何结合我们的大脑,构建强大的机器? 也许有一天,如果我们构建了一个和大脑完全相同的神经网络结构,加上传感器,也许就能够完完全全创造一个人类的复制品,但是,人类研究这样久之后得到的东西,自然界又是怎样发展而来,达到如此小的功耗的呢?
用记忆神经网络训练bAbI数据集
path = get_file(...) print(path)
,然后删除这个文件,不过一般在home目录下,windows在用户目录下tokenize(sent)
:输入一个句子,将句子的每一个词和符号作为元素存储在list中parse_stories(lines, only_supporting)
:only_supporting为True时,只有支持答案的句子才会出现。这个函数基于tokenize
,针对bAbI的数据集,输入行,得到单词的listget_stories()
:基于parse_stories
,输入文件名,解析出句子中词语的list,比如某个输出:`[([‘Mary’, ‘moved’, ‘to’, ‘the’, ‘bathroom’, ‘.’, ‘John’, ‘went’, ‘to’, ‘the’, ‘hallway’, ‘.’], [‘Where’, ‘is’, ‘Mary’, ‘?’], ‘bathroom’), 每个set是一个完整的问题,含有两个list和一个string,前一个list是描述,后一个是问题,最后一个string是答案vectorize_stories()
直接转换为词向量,一个单词具有独一无二的序号模型结构 模型结构包含并联层,在layer前用n-m-l注明,n为层的深度,m,l等等为同一个串联层的标志,在这里m=1代表描述的输入,m=2代表问题的输入。layer后面的括号中代表了对哪个层进行了运算(省略了dropout和activation层)
layer | shape |
---|---|
1-1, story_input | (None, 68) |
1-2, question_input | (None, 4) |
2-1-1, story_encoder_m, embedding(1-1) | (None, 68, 64) |
2-1-2, story_encoder_c, embedding(1-1) | (None, 68, 4) |
2-2, question_encoder, embedding(1-2) | (None, 4, 64) |
3, dot(2-1-1, 2-2) | (None, 68, 4) |
4, add(3, 2-1-2) | (None, 68, 4) |
5, permute(4) | (None, 4, 68) |
6, concatenate(5, 2-2) | (None, 4, 132) |
7, LSTM(6) | (None, 32) |
8, dense(7) | (None, 22) |
之后输出的(None, 22)和结果和真实答案(None, 22)进行损失计算。
数据处理与上一例类似,只不过改变了数据集,描述的最大长度为552,问题最大长度为5,答案的维度(总也就是词汇量)为36 模型结构
layer | shape |
---|---|
1-1, story_input | (None, 552) |
1-2, question_input | (None, 5) |
2-1, story_encoded, embedding(1-1) | (None, 552, 50) |
2-2, question_encoded, embedding(1-2) | (None, 5, 50) |
3-2, recurrent.LSTM(2-2) | (None, 50) |
4-2, repeat_vector(3-2) | (None, 552, 50) |
5, add(2-1, 3-2) | (None, 552, 50) |
6, recurrent.LSTM(5) | (None, 50) |
7, dense(6) | (None, 36) |
layer | shape |
---|---|
input | (None, None, 40, 40, 1) |
conv2d_LSTM | (None, None, 40, 40, 40) |
batch_normalization | - |
conv2d_LSTM | (None, None, 40, 40, 40) |
batch_normalization | - |
conv2d_LSTM | (None, None, 40, 40, 40) |
batch_normalization | - |
conv2d_LSTM | (None, None, 40, 40, 40) |
batch_normalization | - |
conv3d | (None, None, 40, 40, 1) |
预测和可视化
利用LSTM处理IMDB数据集,IMDB是一个网络电影数据集,输入是已经转化为词向量的数据,输出为正面或负面评价,是二分类问题。 模型
layer | shape |
---|---|
input | (None, 80) |
embedding | (None, None, 128) |
LSTM | (None, 128) |
dense | (None, 1) |
在IMDB中使用conv1D的卷积。LSTM可以处理序列化的数据,而一维卷积则是类似于2维卷积,处理序列上相关的数据。 模型
layer | shape :–: | :–: input | (None, 100) embedding | (None, 100, 128) dropout | (None, 100, 128) conv1d | (None, 96, 64) pooling | (None, 24, 64) LSTM | (None, 70) dense | (None, 1) activation | (None, 1)
双向LSTM网络 模型
layer | shape |
---|---|
input | (None, 100) |
embedding | (None, 100, 128) |
bidirectional | (None, 128) |
dropout | (None, 128) |
dense | (None, 1) |
多层反馈神经网络,这个例子中是采用的keras函数式建模(不同于 Model.add(),采用layer(parameters)(last_layer))。 模型
layer | shape |
---|---|
input-x | (None, 28-row, 28-col, 1) |
LSTM(row) | (None, 128) |
timeDirtributed(x) | (None, 28, 128) |
LSTM(col) | (None, 128) |
dense | (None, 10) |
将(28, 1)的行向量用LSTM编码成(128)的行向量,之后将(28, 128)的图形用LSTM编码成(128)的图像向量,最后全连接获得标签。
基于LSTM的文本生成。采用的数据集是尼采的著作,可以采用其他语料库进行替换生成(如中文、学科论文、笑话语料库等)
len(chars)
的数组,采用独热编码表示单词,如abc就是[1,1,1,0,0,0,0…0]layer | shape |
---|---|
input | (None, 40, 58) |
LSTM | (None, 128) |
dense | (None, 58) |
activation | - |
心得和展望
利用fast text进行text的分类
<set>.update(<set>)
),完成过后,得到了一些含多个词的词库,对于固定搭配很有用。比如,ngram_range=1,则ngram_set为((1),(2)…),如果ngram_range=3,则ngram_set为((1),(2),(1,2),(1,2,3)),之后建立字典,在已有词的情况下(载入的数据在ngram_value=1的情况下已经实现了转换为词向量),在后面追加ngram_value=2,3…直到ngram_range词语数量对应的映射。layer | shape |
---|---|
input | (None, 400) |
embedding | (None, 400, 50) |
global_average_pooling1D | (None, 50) |
dense | (None, 1) |
让机器学会加法运算 如遇运行报错,检查tensorflow版本,多为tf.concat和tf.unstack的问题
layer | output_shape |
---|---|
input | (None, 7, 12) |
LSTM | (None, 128) |
repeat_vector | (None, 4, 128) |
LSTM | (None, 4, 128) |
time_distributed | (None, 4, 12) |
activation(softmax) | (None, 4, 12) |
创建一个自定义的层(layer)
判别器模型:
layer | output_shape |
---|---|
input | (None, 1, 28, 28) |
conv2d | (None, 32, 14, 14) |
LeakyReLU | - |
dropout | - |
conv2d | (None, 64, 14, 14) |
LeakyReLU | - |
dropput | - |
conv2d | (None, 128, 7, 7) |
LeakyReLU | - |
dropput | - |
conv2d | (None, 256, 7, 7) |
LeakyReLU | - |
dropput | - |
flatten | (None, 12544) |
输出1: fake | 输出2: auxiliary |
output1: fake | Dense(1, activation=”sigmoid”) |
output2: aux | Dense(10, activation=”softmax”) |
生成器模型: 例子中的参数,latent_size = 100
layer | output_shape |
---|---|
input | (None, 100) |
dense | (None, 1024) |
dense | (None, 128 x 7 x 7=6272) |
reshape | (None, 128, 7, 7) |
upsampling | (None, 128, 14, 14) |
conv2d | (None, 256, 14, 14) |
upsampling | (None, 256, 28, 28) |
conv2d | (None, 128, 28, 28) |
cond2d | (None, 1, 28, 28) |
利用卷积神经网络训练cifar10数据集
layer | output_shape |
---|---|
input | (None, 32, 32, 3) |
conv2d | (None, 32, 32, 32) |
activation | - |
conv2d | (None, 30, 30, 32) |
activation | (None, 30, 30, 32) |
maxpooling | (None, 15, 15, 32) |
dropout | - |
conv2d | (None, 15, 15, 64) |
activation | - |
conv2d | (None, 13, 13, 64) |
activation | - |
maxpooling | (None, 6, 6, 64) |
dropout | - |
flatten | (None, 6 x 6 x 64 = 2304) |
dense | (None, 512) |
activation | - |
dropout | - |
dense | (None, 10) |
activation | - |
优化:RMSprop loss函数:categorical_crossentropy
迁移卷积神经网络:一篇有关迁移学习的论文提到,只要改变最后的全连接层(之前的都为瓶颈层)即可实现迁移学习,可以快速收敛。本例子是利用手写数字前5位数训练模型,之后冻结全连接层前的所有层,将模型应用于后5位数字的训练
迁移学习保留瓶颈层,是因为瓶颈层是在进行特征提取,而全连接层进行特征的组合。对于某些监督学习,特征差异较大,可以之后再连接可训练的卷积层(而不是像本例中只连接可训练的全连接层)进行
layer | output_shape |
---|---|
input | (None, 28, 28, 1) |
conv2d | (None, 26, 26, 32) |
activation | - |
conv2d | (None, 24, 24, 32) |
activation | - |
maxpooling | (None, 12, 12, 32) |
dropout | - |
flatten | (None, 12 x 12 x 32 = 4608) |
以上为瓶颈层 | 以下为迁移学习后可训练层 |
dense | (None, 128) |
activation | - |
dropout | - |
dense | (None, 5) |
activation | - |
用keras实现deep dream的效果,一开始需要下载模型的权重数据
这样的一个过程相当于把模型的权重(即分析图片得到的特征)返回到图片上,是训练的逆过程,所以会产生非常奇异的结果