[Tensorflow] 使用 tf.train.Checkpoint() 保存 / 加载 keras subclassed model
原标题:[Tensorflow] 使用 tf.train.Checkpoint() 保存 / 加载 keras subclassed model
原文来自:博客园 原文链接:https://www.cnblogs.com/zlian2016/p/11096448.html
在 subclassed_model.py 中,通过对 tf.keras.Model 进行子类化,设计了两个自定义模型。
import tensorflow as tf tf.enable_eager_execution() # parameters UNITS = 8 class Encoder(tf.keras.Model): def __init__(self): super(Encoder, self).__init__() self.fc1 = tf.keras.layers.Dense(units=UNITS * 2, activation='relu') self.fc2 = tf.keras.layers.Dense(units=UNITS, activation='relu') def call(self, inputs): r = self.fc1(inputs) return self.fc2(r) class Decoder(tf.keras.Model): def __init__(self): super(Decoder, self).__init__() self.fc = tf.keras.layers.Dense(units=1) def call(self, inputs): return self.fc(inputs)
在 save_subclassed_model.py 中,创建了 5000 组训练数据集,实例化 Encoder()、Decoder() 模型,优化器采用 tf.train.AdamOptimizer(),以均方误差作为 Loss 函数。训练过程中,每 5 个 epoch 保存一次模型。
from subclassed_model import * import numpy as np import matplotlib.pyplot as plt import os import tensorflow as tf tf.enable_eager_execution() # create training data X = np.linspace(-1, 1, 5000) np.random.shuffle(X) y = X ** 3 + 1 + np.random.normal(0, 0.05, (5000,)) # plot data plt.scatter(X, y) plt.show() # training dataset BATCH_SIZE = 16 BUFFER_SIZE = 128 training_dataset = tf.data.Dataset.from_tensor_slices((X, y)).batch(BATCH_SIZE).shuffle(BUFFER_SIZE) # initialize subclassed models encoder = Encoder() decoder = Decoder() optimizer = tf.train.AdamOptimizer() # loss function def loss_function(real, pred): return tf.losses.mean_squared_error(labels=real, predictions=pred) # training EPOCHS = 15 # checkpoint checkpoint_dir = './training_checkpoints' checkpoint_prefix = os.path.join(checkpoint_dir, 'ckpt') checkpoint = tf.train.Checkpoint(optimizer=optimizer, encoder=encoder, decoder=decoder) if not os.path.exists(checkpoint_dir): os.makedirs(checkpoint_dir) for epoch in range(EPOCHS): epoch_loss = 0 for (batch, (x, y)) in enumerate(training_dataset): x = tf.cast(x, tf.float32) y = tf.cast(y, tf.float32) x = tf.expand_dims(x, axis=1) # tf.Tensor([...], shape=(16, 1), dtype=float32) y = tf.expand_dims(y, axis=1) # tf.Tensor([...], shape=(16, 1), dtype=float32) with tf.GradientTape() as tape: y_ = encoder(x) prediction = decoder(y_) batch_loss = loss_function(real=y, pred=prediction) grads = tape.gradient(batch_loss, encoder.variables + decoder.variables) optimizer.apply_gradients(zip(grads, encoder.variables + decoder.variables), global_step=tf.train.get_or_create_global_step()) epoch_loss += batch_loss if (batch + 1) % 100 == 0: print('Epoch {} Batch {} Loss {:.4f}'.format(epoch + 1, batch + 1, batch_loss.numpy())) print('Epoch {} Loss {:.4f}'.format(epoch + 1, epoch_loss / len(X))) if (epoch + 1) % 5 == 0: checkpoint.save(file_prefix=checkpoint_prefix)
运行 save_subclassed_model.py。
2019-06-27 12:57:14.253635: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 2019-06-27 12:57:15.660142: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties: name: GeForce GTX 1060 major: 6 minor: 1 memoryClockRate(GHz): 1.6705 pciBusID: 0000:01:00.0 totalMemory: 6.00GiB freeMemory: 4.97GiB 2019-06-27 12:57:15.660397: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0 2019-06-27 12:57:16.488227: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-06-27 12:57:16.488385: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0 2019-06-27 12:57:16.488476: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N 2019-06-27 12:57:16.488772: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4722 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060, pci bus id: 0000:01:00.0, compute capability: 6.1) Epoch 1 Batch 100 Loss 0.1120 Epoch 1 Batch 200 Loss 0.0179 Epoch 1 Batch 300 Loss 0.0347 Epoch 1 Loss 0.0111 Epoch 2 Batch 100 Loss 0.0144 Epoch 2 Batch 200 Loss 0.0097 Epoch 2 Batch 300 Loss 0.0141 Epoch 2 Loss 0.0012 Epoch 3 Batch 100 Loss 0.0060 Epoch 3 Batch 200 Loss 0.0037 Epoch 3 Batch 300 Loss 0.0054 Epoch 3 Loss 0.0007 Epoch 4 Batch 100 Loss 0.0088 Epoch 4 Batch 200 Loss 0.0038 Epoch 4 Batch 300 Loss 0.0093 Epoch 4 Loss 0.0004 Epoch 5 Batch 100 Loss 0.0039 Epoch 5 Batch 200 Loss 0.0044 Epoch 5 Batch 300 Loss 0.0031 Epoch 5 Loss 0.0003 Epoch 6 Batch 100 Loss 0.0025 Epoch 6 Batch 200 Loss 0.0038 Epoch 6 Batch 300 Loss 0.0027 Epoch 6 Loss 0.0002 Epoch 7 Batch 100 Loss 0.0026 Epoch 7 Batch 200 Loss 0.0032 Epoch 7 Batch 300 Loss 0.0041 Epoch 7 Loss 0.0002 Epoch 8 Batch 100 Loss 0.0022 Epoch 8 Batch 200 Loss 0.0031 Epoch 8 Batch 300 Loss 0.0026 Epoch 8 Loss 0.0002 Epoch 9 Batch 100 Loss 0.0040 Epoch 9 Batch 200 Loss 0.0014 Epoch 9 Batch 300 Loss 0.0040 Epoch 9 Loss 0.0002 Epoch 10 Batch 100 Loss 0.0023 Epoch 10 Batch 200 Loss 0.0030 Epoch 10 Batch 300 Loss 0.0038 Epoch 10 Loss 0.0002 Epoch 11 Batch 100 Loss 0.0028 Epoch 11 Batch 200 Loss 0.0020 Epoch 11 Batch 300 Loss 0.0025 Epoch 11 Loss 0.0002 Epoch 12 Batch 100 Loss 0.0027 Epoch 12 Batch 200 Loss 0.0045 Epoch 12 Batch 300 Loss 0.0021 Epoch 12 Loss 0.0002 Epoch 13 Batch 100 Loss 0.0016 Epoch 13 Batch 200 Loss 0.0033 Epoch 13 Batch 300 Loss 0.0024 Epoch 13 Loss 0.0002 Epoch 14 Batch 100 Loss 0.0034 Epoch 14 Batch 200 Loss 0.0028 Epoch 14 Batch 300 Loss 0.0033 Epoch 14 Loss 0.0002 Epoch 15 Batch 100 Loss 0.0019 Epoch 15 Batch 200 Loss 0.0030 Epoch 15 Batch 300 Loss 0.0037 Epoch 15 Loss 0.0002 Process finished with exit code 0
查看 checkpoint_dir 目录下的文件。
在 load_subclassed_model.py 中,创建了 200 组测试数据,加载了 the latest checkpoint 中保存的模型参数,对模型进行了测试。
from subclassed_model import * import numpy as np import matplotlib.pyplot as plt import tensorflow as tf tf.enable_eager_execution() # load model encoder = Encoder() decoder = Decoder() optimizer = tf.train.AdamOptimizer() checkpoint_dir = './training_checkpoints' checkpoint = tf.train.Checkpoint(optimizer=optimizer, encoder=encoder, decoder=decoder) checkpoint.restore(tf.train.latest_checkpoint(checkpoint_dir)) # build model BATCH_SIZE = 16 encoder.build(input_shape=tf.TensorShape((BATCH_SIZE, 1))) decoder.build(input_shape=tf.TensorShape((BATCH_SIZE, UNITS))) encoder.summary() decoder.summary() # create validation data X_test = np.linspace(-1, 1, 200) # validation dataset val_dataset = tf.data.Dataset.from_tensor_slices(X_test).batch(1) # predict and plot results = [] for (batch, x) in enumerate(val_dataset): x = tf.cast(x, tf.float32) x = tf.expand_dims(x, axis=1) y_ = encoder(x) prediction = decoder(y_) # print(prediction.numpy()[0][0]) results.append(prediction.numpy()[0][0]) # plot results plt.scatter(X_test, results) plt.show()
运行 load_subclassed_model.py。
2019-06-27 13:27:40.712260: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 2019-06-27 13:27:42.105938: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties: name: GeForce GTX 1060 major: 6 minor: 1 memoryClockRate(GHz): 1.6705 pciBusID: 0000:01:00.0 totalMemory: 6.00GiB freeMemory: 4.97GiB 2019-06-27 13:27:42.106200: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0 2019-06-27 13:27:42.921364: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-06-27 13:27:42.921510: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0 2019-06-27 13:27:42.921594: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N 2019-06-27 13:27:42.921777: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4722 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060, pci bus id: 0000:01:00.0, compute capability: 6.1) _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= dense (Dense) multiple 32 _________________________________________________________________ dense_1 (Dense) multiple 136 ================================================================= Total params: 168 Trainable params: 168 Non-trainable params: 0 _________________________________________________________________ _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= dense_2 (Dense) multiple 9 ================================================================= Total params: 9 Trainable params: 9 Non-trainable params: 0 _________________________________________________________________ Process finished with exit code 0
版权声明:本文为博主原创文章,欢迎转载,转载请注明作者及原文出处!
免责声明:本文来自互联网新闻客户端自媒体,不代表本网的观点和立场。
合作及投稿邮箱:E-mail:editor@tusaishared.com
上一篇:深度学习基础1--神经网络
下一篇:OpenCV:图像的合并和切分
热门资源
Python 爬虫(二)...
所谓爬虫就是模拟客户端发送网络请求,获取网络响...
TensorFlow从1到2...
原文第四篇中,我们介绍了官方的入门案例MNIST,功...
TensorFlow从1到2...
“回归”这个词,既是Regression算法的名称,也代表...
机器学习中的熵、...
熵 (entropy) 这一词最初来源于热力学。1948年,克...
TensorFlow2.0(10...
前面的博客中我们说过,在加载数据和预处理数据时...
智能在线
400-630-6780
聆听.建议反馈
E-mail: support@tusaishared.com