首页 > 编程札记 > 编程

Keras lstm+ctc学习心得

阅读：评论：0

Keras lstm+ctc学习心得

主要内容

记录了一些自己在用keras简单实现lstm+ctc中觉得需要注意的点。

lstm和ctc的相关原理不再赘述，附以下两个链接，可供参考。

人人都能看懂的LSTM

一文读懂CRNN+CTC文字识别

Layer 输入输出shape

有的时候，虽然感觉原理看了个大概，但实际操作起来还是有点无从下手，所以如果对网络每一层layer中输入输出的shape有着清晰的了解，对于网络的代码实现会有很大帮助。

LSTM层

lstm = LSTM(units=40, return_sequences=True)

输入shape为（batch_size, time_steps, step_length)

输出shape为（batch_size, time_steps, units)

这里的time_steps可以是提取语音特征mfcc的帧数，step_length则是一帧mfcc的特征数

Dense层

dense = Dense(n_classes, activation='softmax')(lstm)

输入shape为（batch_size, time_steps, units)

输出shape为（batch_size, time_steps, n_classes)

这里的n_classes是音素的个数，如26个英文字母+1个space+1个blank

CTC loss

keras自带ctc loss函数为_batch_cost，需要Lambda层进行层封装。

import keras.backend as K

def ctc_lambda_func(args):y_pred, labels, input_length, label_length = _batch_cost(labels, y_pred, input_length, label_length)

loss = Lambda(ctc_lambda_func, output_shape=(1, ), name='ctc')([dense, label_true, input_length, label_length])

这里的input_length的shape为（batch_size, 1)，元素为训练数据的time_steps

label_length的shape为（batch_size, 1)，元素为训练数据的max_string_length

模型构建

我们需要构建两个模型base_model和model

base_model 以 dense 作为输出，用于训练好之后的预测

model 以 loss 作为输出，用于训练参数

以下模型使用 GRU，同 LSTM相似

input = Input(shape=(time_steps, step_length))
gru = Bidirectional(GRU(units=40, return_sequences=True), merge_mode='concat')(input)
dense = Dense(n_classes, activation='softmax')(gru)
base_model = Model(inputs=input, outputs=dense)label_true = Input(shape=[max_label_length])
input_length = Input(shape=[1])
label_length = Input(shape=[1])
loss = Lambda(ctc_lambda_func, output_shape=(1, ), name='ctc')([dense, label_true, input_length, label_length])
model = Model(inputs=[input, label_true, input_length, label_length], outputs=loss)

模型训练

首先我们需要modelpile中使用自己定义的ctc_loss损失函数

modelpile(loss={'ctc': lambda y_true, y_pred: y_pred}, optimizer='adadelta')

模型的损失函数参数为模型输出y_pred和真实标签y_true, 由于我们的model输出已经是ctc_loss，所以直接将y_pred作为loss

fittedModel = model.fit([input, labels, input_length, label_length], np.ones(1), batch_size=1, epochs=100,verbose=2)

由于真实标签 labels 已经作为输入参与到 layer 层计算中，因此 model.fit 中的 y 只需要随意赋值，与 batch_size 大小保持一致

模型测试

训练好 model 后，使用 base_model 进行预测

y_pred = base_model.predict(input_test)

使用 ctc_decode 对 y_pred 进行解码

decode = K.get__decode(y_pred, input_length&#s(y_pred.shape[0]) * y_pred.shape[1], greedy=True)[0][0])

这里的 decode 是对应类别的下标，根据下标转换成实际类别即可

简单代码实现

dels import  Model
from keras.layers import GRU, Dense, Bidirectional, Input, Lambda
from python_speech_features import *
import keras.backend as K
import numpy as np
import scipy.io.wavfile as wavdef ctc_lambda_func(args):y_pred, labels, input_length, label_length = _batch_cost(labels, y_pred, input_length, label_length)def get_audio_feature(audio_path):fs, audio = ad(audio_path)print(fs)print(audio.shape)# 提取mfcc特征wav_feature = mfcc(audio, fs, nfft=int(0.025*fs), winfunc=np.hamming)# deltad_mfcc_feat1 = delta(wav_feature, 1)d_mfcc_feat2 = delta(wav_feature, 2)feature = np.hstack((wav_feature, d_mfcc_feat1, d_mfcc_feat2))return featuredef get_audio_label(filepath):SPACE_TOKEN = '<space>'SPACE_INDEX = 0FIRST_INDEX = ord('a') - 1with open(filepath, 'r') as f:line = f.readlines()[0].strip()# 空格字符转换成两个空格字符targets = place(' ', '  ')# 按空格切分，两个空格之间为''targets = targets.split(' ')# 将''转换成空格tokentargets = np.hstack([SPACE_TOKEN if x == '' else list(x) for x in targets])print(targets)# 将 token转换成数字targets = np.hstack([SPACE_INDEX if x == SPACE_TOKEN else ord(x) - FIRST_INDEXfor x in targets])return targetsdef decode_ctc(out):batch_size, decode_len = out.shape[0], out.shape[1]for i in range(batch_size):pre = ''.join([' ' if x == 0 else chr(x + ord('a') - 1) for x in out[i]])print(pre)feature = get_audio_feature('001.wav')
feature = waxis, :]
print(feature.shape)
labels = get_audio_label(&#')
labels = waxis, :]
print(labels.shape)
max_label_length = labels.shape[1]
il = np.ones(1) * feature.shape[1]
print(il.shape)
ll = np.ones(1) * max_label_length
print(ll.shape)time_step, step_length = feature.shape[1], feature.shape[2]
n_classes = 26 + 1 + 1input = Input(shape=(time_step, step_length))
gru = Bidirectional(GRU(units=40, return_sequences=True), merge_mode='concat')(input)
dense = Dense(n_classes, activation='softmax')(gru)
base_model = Model(inputs=input, outputs=dense)label_true = Input(shape=[max_label_length])
input_length = Input(shape=[1])
label_length = Input(shape=[1])
loss = Lambda(ctc_lambda_func, output_shape=(1, ), name='ctc')([dense, label_true, input_length, label_length])
model = Model(inputs=[input, label_true, input_length, label_length], outputs=loss)modelpile(loss={'ctc': lambda y_true, y_pred: y_pred}, optimizer='adadelta')
# model.summary()fittedModel = model.fit([feature, labels, il, ll], np.ones(1), batch_size=1, epochs=100,verbose=2)
model.save('lstm_ctc.h5')base_model.load_weights('lstm_ctc.h5')
y_pred = base_model.predict(feature)
decode = K.ctc_decode(y_pred, input_length&#s(y_pred.shape[0]) * y_pred.shape[1], greedy=True)
out = K.get_value(decode[0][0])
decode_ctc(out)

部分代码参考

本文发布于:2024-02-01 07:37:50，感谢您对本站的认可！

本文链接：https://www.4u4v.net/it/170674427234937.html

上一篇：Speech Recognition模型：Connectionist Temporal Classification（CTC）

下一篇：CTC算法基本原理解释

标签：学习心得 Keras lstm ctc

留言与评论（共有 0 条评论）