使用

阅读：评论：0

使用

作者：chen_h
微信号 & QQ：862251340
微信公众号：coderpai

当我们要使用神经网络来构建一个多分类模型时，我们一般都会采用 softmax 函数来作为最后的分类函数。softmax 函数对每一个分类结果都会分配一个概率，我们把比较高的那个概率对应的类别作为模型的输出。这就是为什么我们能从模型中推导出具体分类结果。为了训练模型，我们使用 softmax 函数进行反向传播，进行训练。我们最后输出的就是一个 0-1 向量。

在这篇文章中，我们不会去解释什么是 softmax 回归或者什么是 CNN。这篇文章的主要工作是如何在 TensorFlow 上面设计一个 L2 约束的 softmax 函数，我们使用的数据集是 MNIST。完整的理论分析可以查看这篇论文。

在具体实现之前，我们先来弄清楚一些概念。

softmax 损失函数

softmax 损失函数可以定义如下：

其中各个参数定义如下：

L2 约束的 softmax 损失函数

带约束的损失函数定义几乎和之前的一样，我们的目的还是最小化这个损失函数。

但是，我们需要对 f(x) 函数进行修改。

我们不是直接计算最后层权重与前一层网络输出 f(x) 之间的乘积，而是对前一层的 f(x) 先做一次归一化，然后对这个归一化的值进行 α 倍数的放大，最后我们进行常规的 softmax 函数进行计算。

也就是说，损失函数是受到如下约束：

程序细节

所以，我们的架构看起来是如下图（这也是我想要实现的架构图）：

C 表示卷积层，P 表示池化层，FC 表示全连接层，L2-Norm 层和Scale 层是我们重点要实现的层。

利用 TensorFlow 进行实现

为了实现这个模型，我们使用这个代码库进行学习。

在应用 dropout 之前，我们先对 N-1 层的输出进行正则化，然后把正则化之后的结果乘以参数 alpha，然后进行 softmax 函数计算。下面是具体的代码展示：

fc1 = alpha * tf.divide(fc1, tf.norm(fc1, ord='euclidean'))

如果我们把 alpha 设置为 0，那么这就是常规的 softmax 函数，否则就是一个 L2 约束。

完整代码如下：

# Actual Code : .ipynb
# Modified By: Manashfrom __future__ import division, print_function, absolute_import# Import MNIST data
ist import input_data
mnist = ad_data_sets("/tmp/data/", one_hot=False)import tensorflow as tf
import matplotlib.pyplot as plt
import numpy as np# Training Parameters
learning_rate = 0.001
num_steps = 100
batch_size = 20# Network Parameters
num_input = 784 # MNIST data input (img shape: 28*28)
num_classes = 10 # MNIST total classes (0-9 digits)
dropout = 0.75 # Dropout, probability to keep units# Create the neural network
def conv_net(x_dict, n_classes, dropout, reuse, is_training, alpha=5):# Define a scope for reusing the variableswith tf.variable_scope('ConvNet', reuse=reuse):# TF Estimator input is a dict, in case of multiple inputsx = x_dict['images']# MNIST data input is a 1-D vector of 784 features (28*28 pixels)# Reshape to match picture format [Height x Width x Channel]# Tensor input become 4-D: [Batch Size, Height, Width, Channel]x = tf.reshape(x, shape=[-1, 28, 28, 1])# Convolution Layer with 32 filters and a kernel size of 5conv1 = v2d(x, 32, 5, activation&#lu)# Max Pooling (down-sampling) with strides of 2 and kernel size of 2conv1 = tf.layers.max_pooling2d(conv1, 2, 2)# Convolution Layer with 32 filters and a kernel size of 5conv2 = v2d(conv1, 64, 3, activation&#lu)# Max Pooling (down-sampling) with strides of 2 and kernel size of 2conv2 = tf.layers.max_pooling2d(conv2, 2, 2)# Flatten the data to a 1-D vector for the fully connected layerfc1 = tf.contrib.layers.flatten(conv2)# Fully connected layer (in tf contrib folder for now)fc1 = tf.layers.dense(fc1, 1024)# If alpha is not zero then perform the l2-Normalization then scaling upif alpha != 0:fc1 = alpha * tf.divide(fc1, tf.norm(fc1, ord='euclidean'))# Apply Dropout (if is_training is False, dropout is not applied)fc1 = tf.layers.dropout(fc1, rate=dropout, training=is_training)# Output layer, class predictionout = tf.layers.dense(fc1, n_classes)return out# Define the model function (following TF Estimator Template)
def model_fn(features, labels, mode):# Set alphaalph = 50# Build the neural network# Because Dropout have different behavior at training and prediction time, we# need to create 2 distinct computation graphs that still share the same weights.logits_train = conv_net(features, num_classes, dropout, reuse=False, is_training=True, alpha=alph)# At test time we don't need to normalize or scale, it's redundant as per paper : .09507logits_test = conv_net(features, num_classes, dropout, reuse=True, is_training=False, alpha=0)# Predictionspred_classes = tf.argmax(logits_test, axis=1)pred_probas = tf.nn.softmax(logits_test)# If prediction mode, early returnif mode == tf.estimator.ModeKeys.PREDICT:return tf.estimator.EstimatorSpec(mode, predictions=pred_classes) # Define loss and optimizerloss_op = tf.reduce_sparse_softmax_cross_entropy_with_logits(logits=logits_train, labels=tf.cast(labels, dtype=tf.int32)))optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)train_op = optimizer.minimize(loss_op, global_step&#_global_step())# Evaluate the accuracy of the modelacc_op = tf.metrics.accuracy(labels=labels, predictions=pred_classes)# TF Estimators requires to return a EstimatorSpec, that specify# the different ops for training, evaluating, ...estim_specs = tf.estimator.EstimatorSpec(mode=mode,predictions=pred_classes,loss=loss_op,train_op=train_op,eval_metric_ops={'accuracy': acc_op})return estim_specs# Build the Estimator
model = tf.estimator.Estimator(model_fn)# Define the input function for training
input_fn = tf.estimator.inputs.numpy_input_fn(x={'images': ain.images}, y&#ain.labels,batch_size=batch_size, num_epochs=None, shuffle=False)
# Train the Model
ain(input_fn, steps=num_steps)# Evaluate the Model
# Define the input function for evaluating
input_fn = tf.estimator.inputs.numpy_input_fn(x={'images': st.images}, y&#st.labels,batch_size=batch_size, shuffle=False)
# Use the Estimator 'evaluate' method
model.evaluate(input_fn)# Predict single images
n_images = 4
# Get images from test set
test_images = st.images[:n_images]
# Prepare the input data
input_fn = tf.estimator.inputs.numpy_input_fn(x={'images': test_images}, shuffle=False)
# Use the model to predict the images class
preds = list(model.predict(input_fn))# Display
for i in range(n_images):plt.shape(test_images[i], [28, 28]), cmap='gray')plt.show()print("Model prediction:", preds[i])

性能评估

这个真的能提高性能吗？是的，而且效果非常好，它能提高大约 1% 的性能。我没有计算很多的迭代，主要是我没有很好的电脑。如果你对这个性能有你疑惑，你可以自己试试看。

以下是不同 alpha 值对应的模型性能：

橘黄色的线表示用常规的 softmax 函数，蓝色的线是用 L2 约束的 softmax 函数。

算法社区直播课：请点击这里

本文发布于:2024-01-28 18:50:58，感谢您对本站的认可！

本文链接：https://www.4u4v.net/it/17064390619514.html

上一篇：论文阅读和分析:Applying a Deep Learning Network in Continuous Physiological Parameter Estimation

下一篇：kaggle 实战（2）: CNN 手写数字识别

标签：

留言与评论（共有 0 条评论）