Theano(2) RNN训练词向量

阅读：评论：0

Theano(2) RNN训练词向量

一、项目简介

项目
Recurrent Neural Networks with Word Embeddings
教程地址：.html
Task
The Slot-Filling (Spoken Language Understanding)给句子中每个word分配标签，是一个分类问题。
Dataset
数据集是DARPA的一个小型数据集：ATIS (Airline Travel Information System)，使用Inside Outside Beginning (IOB)表示
数据集中训练集句子4978个，word 56590个；测试集句子893，word 9198个；平均句长 15；The number of classes (different slots) is 128 including the O label (NULL).
注：B- prefix 实体的开始, I- prefix 实体内部,O tag 不属于任何实体
评价指标
Precision，Recall，F1 score
教程中使用conlleval PERL script（是一个计算上述指标的脚本代码）来评价性能

二、RNN简介

词向量
这里使用的词向量是context window word embeddings，即定义一个窗口大小，把句子中的每个word及其前后的word index提取出来，再把 index 转换成 embeddings作为对应每个 word 的实数向量。

不同于传统的FNNs(Feed-forward Neural Networks，前向反馈神经网络)，RNNs引入了定向循环，能够处理那些输入之间前后关联的问题。教程中使用的是(Elman) recurrent neural network (E-RNN)，把当前时刻（t）的输入以及前一时刻（ t-1）的隐藏层状态作为输入。

参数
E-RNN中需要学习的参数如下：
the word embeddings（词向量表）
the initial hidden state (real-value vector)（初始化隐含层状态）
two matrices for the linear projection of the input t and the previous hidden layer state t-1（线性投影层和前一状态隐含层状态）
(optional) bias. (偏置项，此处不用)
softmax classification layer on top（softmax分类）
超参数如下：
dimension of the word embedding（de，50）(词向量维度)
size of the vocabulary（词表长度）
number of hidden units（隐藏单元数量）
number of classes（类别数）
random seed + way to initialize the model（初始化模型种子和方式）

三、训练

更新
使用随机梯度下降，一个句子为一个minibatch，每一个句子更新一次参数。每次更新后，都要对 word embeddings 归一化。

停止条件
Early-stoppong，按给定数量的epochs训练，每次epoch后都在验证集上测试 F1 score，保留最佳模型。

超参选择
使用 KISS random search （没太看懂，是暴搜策略吗？）
教程附的超参如下：
learning rate : uniform([0.05,0.01])
window size : random value from {3,…,19}
number of hidden units : random value from {100,200}
embedding dimension : random value from {50,100}

主要代码及注释

#coding=utf-8
import theano
import numpy
import osfrom theano import tensor as T
from collections import OrderedDictclass RNNSLU(object):''' elman neural net model '''def __init__(self, nh, nc, ne, de, cs):'''nh :: dimension of the hidden layernc :: number of classesne :: number of word embeddings in the vocabularyde :: dimension of the word embeddingscs :: word window context size'''# parameters of b = theano.shared(0.2 * numpy.random.uniform(-1.0, 1.0,(ne+1, de)).fig.floatX)) # add one for PADDING at the end 首尾处的单词窗口记为-1self.Wx  = theano.shared(0.2 * numpy.random.uniform(-1.0, 1.0,(de * cs, nh)).fig.floatX)) #连接nh隐藏单元与de个输入词(每个维度cs)self.Wh  = theano.shared(0.2 * numpy.random.uniform(-1.0, 1.0,(nh, nh)).fig.floatX)) #隐藏层连接的也是隐藏层，nh*nhself.W   = theano.shared(0.2 * numpy.random.uniform(-1.0, 1.0,(nh, nc)).fig.floatX)) #输出层 nh个隐藏层连接nc个输出单元self.bh  = theano.s(nh, dtype&#fig.floatX))#每个隐藏单元一个bia 共nh个self.b   = theano.s(nc, dtype&#fig.floatX))#每个输出单元一个bia 共nc类即nc个输出单元self.h0  = theano.s(nh, dtype&#fig.floatX))#定义一个t0时刻的输出，每个隐藏单元一个# bundleself.params = [b, self.Wx, self.Wh, self.W, self.bh, self.b, self.h0 ]self.names = ['embeddings', 'Wx', 'Wh', 'W', 'bh', 'b', 'h0']idxs = T.imatrix() # as many columns as context window size/lines as words in the sentencex = b[idxs].reshape((idxs.shape[0], de*cs)) #共n个输入，每个输入de个词(每个维度cs)  y = T.iscalar('y') # labeldef recurrence(x_t, h_tm1):h_t = T.nnet.sigmoid(T.dot(x_t, self.Wx) + T.dot(h_tm1, self.Wh) + self.bh)#隐藏层t时刻输出=f(输入*Wx+h_t-1*Wh+b)s_t = T.nnet.softmax(T.dot(h_t, self.W) + self.b)#输出层t时刻的输出return [h_t, s_t]#将某个函数作用于输入序列上，得到每一步输出的结果。 #和Reduction和map不同之处在于，scan在计算的时候，可以访问以前n步的输出结果，比较适合RNN[h, s],  = theano.scan(fn=recurrence, sequences=x, outputs_info=[self.h0, None], n_steps=x.shape[0])p_y_given_x_lastword = s[-1,0,:]p_y_given_x_sentence = s[:,0,:]y_pred = T.argmax(p_y_given_x_sentence, axis=1)#learning ratelr = T.scalar('lr')#cost and gradientsnll = -T.log(p_y_given_x_lastword)[y]gradients = T.grad(nll, self.params)updates = OrderedDict((p, p-lr*g) for p, g in zip(self.params, gradients))# theano functionsself.classify = theano.function(inputs=[idxs], outputs=y_pred)#训练ain = theano.function( inputs = [idxs, y, lr],outputs = nll,updates = updates )#归一化alize = theano.function( inputs = [],updates = {b:b/T.sqrt((b**2).sum(axis=1)).dimshuffle(0,'x')})#保存参数def save(self, folder):   for param, name in zip(self.params, self.names):numpy.save(os.path.join(folder, name + '.npy'), _value())

本文发布于:2024-01-31 05:43:28，感谢您对本站的认可！

本文链接：https://www.4u4v.net/it/170665101325970.html

上一篇：Deep Learning Tutorial (翻译) 之 RNN

下一篇：【Shell】使用date,expr,%,if等判断单双周

标签：向量 Theano RNN

留言与评论（共有 0 条评论）