A Practical Guide for Debugging TensorFlow Codes
.author[Jongwook Choi]
.small[.white[Feb 17th, 2017]
.green[Initial Version: June 18th, 2016]]
.x-small[https://github.com/wookayin/tensorflow-talk-debugging]
layout: false
Bio: Jongwook Choi (@wookayin)
- An undergraduate student from Seoul National University,
Vision and Learning Laboratory - Looking for a graduate (Ph.D) program in ML/DL
- A huge fan of TensorFlow and Deep Learning ?
.dogdrip[TensorFlow rocks!!!]
.right.img-33[[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-m9hwp1Gk-1603170186514)(images/profile-wook.png)]]
About
This talk aims to share you with some practical guides and tips on writing and debugging TensorFlow codes.
–
… because you might find that debugging TensorFlow codes is something like …
class: center, middle, no-number, bg-full
background-image: url(images/meme-doesnt-work.jpg)
background-repeat: no-repeat
background-size: contain
Welcome!
.green[Contents]
- Introduction: Why debugging in TensorFlow is difficult
- Basic and advanced methods for debugging TensorFlow codes
- General tips and guidelines for easy-debuggable code
- .dogdrip[Benchmarking and profiling TensorFlow codes]
.red[A Disclaimer]
- This talk is NOT about how to debug your ML/DL model
.gray[(e.g. my model is not fitting well)],
but about how to debug your TF codes in a programming perspective - I had to assume that the audience is somewhat familiar with basics of TensorFlow and Deep Learning;
it would be very good if you have an experience to write a TensorFlow code by yourself - Questions are highly welcomed! Please feel free to interrupt!
template: inverse
Debugging?
Debugging TensorFlow Application is …
–
- Difficult!
–
- Do you agree? .dogdrip[Life is not that easy]
Review: TensorFlow Computation Mechanics
- The core concept of TensorFlow: The Computation Graph
- See Also: TensorFlow Mechanics 101
.center.img-33[[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-JVhJ747f-1603170186516)(images/tensors_flowing.gif)]]
Review: TensorFlow Computation Mechanics
.gray.right[(from TensorFlow docs)]
TensorFlow programs are usually structured into
- a .green[construction phase], that assembles a graph, and
- an .blue[execution phase] that uses a session to execute ops in the graph.
.center.img-33[[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-f4ba6UIA-1603170186519)(images/tensors_flowing.gif)]]
Review: in pure numpy …
W1, b1, W2, b2, W3, b3 = init_parameters()
def multilayer_perceptron(x, y_truth):
# (i) feed-forward pass
assert x.dtype == np.float32 and x.shape == [batch_size, 784] # numpy!
* fc1 = fully_connected(x, W1, b1, activation_fn='relu') # [B, 256]
* fc2 = fully_connected(fc1, W2, b2, activation_fn='relu') # [B, 256]
out = fully_connected(fc2, W3, b3, activation_fn=None) # [B, 10]
# (ii) loss and gradient, backpropagation
loss = softmax_cross_entropy_loss(out, y_truth) # loss as a scalar
param_gradients = _compute_gradient(...) # just an artificial example :)
return out, loss, param_gradients
def train():
for epoch in range(10):
epoch_loss = 0.0
batch_steps = mnist.train.num_examples / batch_size
for step in range(batch_steps):
batch_x, batch_y = mnist.train.next_batch(batch_size)
* y_pred, loss, gradients = multilayer_perceptron(batch_x, batch_y)
for v, grad_v in zip(all_params, gradients):
v = v - learning_rate * grad_v
epoch_loss += c / batch_steps
print "Epoch %02d, Loss = %.6f" % (epoch, epoch_loss)
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
Review: with TensorFlow
def multilayer_perceptron(x):
* fc1 = layers.fully_connected(x, 256, activation_fn=tf.nn.relu)
* fc2 = layers.fully_connected(fc1, 256, activation_fn=tf.nn.relu)
out = layers.fully_connected(fc2, 10, activation_fn=None)
return out
x = tf.placeholder(tf.float32, [None, 784])
y = tf.placeholder(tf.float32, [None, 10])
*pred = multilayer_perceptron(x)
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(pred, y))
train_op = tf.train.AdamOptimizer(learning_rate=0.001).minimize(loss)
def train(session):
batch_size = 200
session.run(tf.initialize_all_variables())
for epoch in range(10):
epoch_loss = 0.0
batch_steps = mnist.train.num_examples / batch_size
for step in range(batch_steps):
batch_x, batch_y = mnist.train.next_batch(batch_size)
* _, c = session.run([train_op, loss], {x: batch_x, y: batch_y})
epoch_loss += c / batch_steps
print "Epoch %02d, Loss = %.6f" % (epoch, epoch_loss)
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
Review: The Issues
def multilayer_perceptron(x):
* fc1 = layers.fully_connected(x, 256, activation_fn=tf.nn.relu)
* fc2 = layers.fully_connected(fc1, 256, activation_fn=tf.nn.relu)
out = layers.fully_connected(fc2, 10, activation_fn=None)
return out
x = tf.placeholder(tf.float32, [None, 784])
y = tf.placeholder(tf.float32, [None, 10])
*pred = multilayer_perceptron(x)
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- The actual computation is done inside
session.run()
;
what we just have done is to build a computation graph - The model building part (e.g.
multilayer_perceptron()
) is called only once
before training, so we can’t access the intermediates simply- e.g. Inspecting activations of
fc1
/fc2
is not trivial!
- e.g. Inspecting activations of
* _, c = session.run([train_op, loss], {x: batch_x, y: batch_y})
- 1
Session.run()
The most important method in TensorFlow — where every computation is performed!
tf.Session.run(fetches, feed_dict)
runs the operations and evaluates infetches
,
subsituting the values (placeholders) infeed_dict
for the corresponding input values.
Why TensorFlow debugging is difficult?
- The concept of Computation Graph might be unfamiliar to us.
- The “Inversion of Control”
- The actual computation (feed-forward, training) of model runs inside
Session.run()
,
upon the computation graph, but not upon the Python code we wrote - What is exactly being done during an execution of session is under an abstraction barrier
- The actual computation (feed-forward, training) of model runs inside
- Therefore, we do not retrieve the intermediate values during the computation,
unless we explicitly fetch them viaSession.run()
template: inverse
Debugging Facilities in TensorFlow
Debugging Scenarios
We may wish to …
- inspect intra-layer .blue[activations] (during training)
- e.g. See the output of conv5, fc7 in CNNs
- inspect .blue[parameter weights] (during training)
- under some conditions, pause the execution (i.e. .blue[breakpoint]) and
evaluate some expressions for debugging - during training, .red[NaN] occurs in loss and variables .dogdrip[but I don’t know why]
–
? Of course, in TensorFlow, we can do these very elegantly!
Debugging in TensorFlow: Overview
.blue[Basic ways:]
- Explicitly fetch, and print (or do whatever you want)!
Session.run()
- Tensorboard: Histogram and Image Summary
- the
tf.Print()
operation
.blue[Advanced ways:]
- Interpose any python codelet in the computation graph
- A step-by-step debugger
tfdbg
: The TensorFlow debugger
(1) Fetch tensors via Session.run()
TensorFlow allows us to run parts of graph in isolation, i.e.
.green[only the relevant part] of graph is executed (rather than executing everything)
x = tf.placeholder(tf.float32)
y = tf.placeholder(tf.float32)
bias = tf.Variable(1.0)
y_pred = x ** 2 + bias # x -> x^2 + bias
loss = (y - y_pred)**2 # l2 loss?
# Error: to compute loss, y is required as a dependency
print('Loss(x,y) = %.3f' % session.run(loss, {x: 3.0}))
# OK, print 1.000 = (3**2 + 1 - 9)**2
print('Loss(x,y) = %.3f' % session.run(loss, {x: 3.0, y: 9.0}))
# OK, print 10.000; for evaluating y_pred only, input to y is not required
*print('pred_y(x) = %.3f' % session.run(y_pred, {x: 3.0}))
# OK, print 1.000 bias evaluates to 1.0
*print('bias = %.3f' % session.run(bias))
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
Tensor Fetching: Example
We need to access to the tensors as python expressions
def multilayer_perceptron(x):
fc1 = layers.fully_connected(x, 256, activation_fn=tf.nn.relu)
fc2 = layers.fully_connected(fc1, 256, activation_fn=tf.nn.relu)
out = layers.fully_connected(fc2, 10, activation_fn=None)
* return out, fc1, fc2
net = {}
x = tf.placeholder(tf.float32, [None, 784])
y = tf.placeholder(tf.float32, [None, 10])
*pred, net['fc1'], net['fc2'] = multilayer_perceptron(x)
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
to fetch and evaluate them:
_, c, fc1, fc2, out = session.run(
* [train_op, loss, net['fc1'], net['fc2'], pred],
feed_dict={x: batch_x, y: batch_y})
# and do something ...
if step == 0: # XXX Debug
print fc1[0].mean(), fc2[0].mean(), out[0]
- 1
- 2
- 3
- 4
- 5
- 6
- 7
(1) Fetch tensors via Session.run()
.green[The Good:]
- Simple and Easy.
- The most basic method to get debugging information.
- We can fetch any evaluation result in numpy arrays, .green[anywhere] .gray[(except inside
Session.run()
or the computation graph)].
–
.red[The Bad:]
- We need to hold the reference to the tensors to inspect,
which might be burdensome if model becomes complex and big
.gray[(Or, we can simply pass the tensor name such asfc0/Relu:0
)] - The feed-forward needs to be done in an atomic way (i.e. a single call of
Session.run()
)
Tensor Fetching: The Bad (i)
- We need to hold a reference to the tensors to inspect,
which might be burdensome if model becomes complex and big
def alexnet(x):
assert x.get_shape().as_list() == [224, 224, 3]
conv1 = conv_2d(x, 96, 11, strides=4, activation='relu')
pool1 = max_pool_2d(conv1, 3, strides=2)
conv2 = conv_2d(pool1, 256, 5, activation='relu')
pool2 = max_pool_2d(conv2, 3, strides=2)
conv3 = conv_2d(pool2, 384, 3, activation='relu')
conv4 = conv_2d(conv3, 384, 3, activation='relu')
conv5 = conv_2d(conv4, 256, 3, activation='relu')
pool5 = max_pool_2d(conv5, 3, strides=2)
fc6 = fully_connected(pool5, 4096, activation='relu')
fc7 = fully_connected(fc6, 4096, activation='relu')
output = fully_connected(fc7, 1000, activation='softmax')
return conv1, pool1, conv2, pool2, conv3, conv4, conv5, pool5, fc6, fc7
conv1, conv2, conv3, conv4, conv5, fc6, fc7, output = alexnet(images) # ?!
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
_, loss_, conv1_, conv2_, conv3_, conv4_, conv5_, fc6_, fc7_ = session.run(
[train_op, loss, conv1, conv2, conv3, conv4, conv5, fc6, fc7],
feed_dict = {...})
- 1
- 2
- 3
Tensor Fetching: The Bad (i)
- Suggestion: Using a
dict
or class instance (e.g.self.conv5
) is a very good idea
def alexnet(x, net={}):
assert x.get_shape().as_list() == [224, 224, 3]
net['conv1'] = conv_2d(x, 96, 11, strides=4, activation='relu')
net['pool1'] = max_pool_2d(net['conv1'], 3, strides=2)
net['conv2'] = conv_2d(net['pool1'], 256, 5, activation='relu')
net['pool2'] = max_pool_2d(net['conv2'], 3, strides=2)
net['conv3'] = conv_2d(net['pool2'], 384, 3, activation='relu')
net['conv4'] = conv_2d(net['conv3'], 384, 3, activation='relu')
net['conv5'] = conv_2d(net['conv4'], 256, 3, activation='relu')
net['pool5'] = max_pool_2d(net['conv5'], 3, strides=2)
net['fc6'] = fully_connected(net['pool5'], 4096, activation='relu')
net['fc7'] = fully_connected(net['fc6'], 4096, activation='relu')
net['output'] = fully_connected(net['fc7'], 1000, activation='softmax')
return net['output']
net = {}
output = alexnet(images, net)
# access intermediate layers like net['conv5'], net['fc7'], etc.
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
Tensor Fetching: The Bad (i)
- Suggestion: Using a
dict
or class instance (e.g.self.conv5
) is a very good idea
class AlexNetModel():
# ...
def build_model(self, x):
assert x.get_shape().as_list() == [224, 224, 3]
self.conv1 = conv_2d(x, 96, 11, strides=4, activation='relu')
self.pool1 = max_pool_2d(self.conv1, 3, strides=2)
self.conv2 = conv_2d(self.pool1, 256, 5, activation='relu')
self.pool2 = max_pool_2d(self.conv2, 3, strides=2)
self.conv3 = conv_2d(self.pool2, 384, 3, activation='relu')
self.conv4 = conv_2d(self.conv3, 384, 3, activation='relu')
self.conv5 = conv_2d(self.conv4, 256, 3, activation='relu')
self.pool5 = max_pool_2d(self.conv5, 3, strides=2)
self.fc6 = fully_connected(self.pool5, 4096, activation='relu')
self.fc7 = fully_connected(self.fc6, 4096, activation='relu')
self.output = fully_connected(self.fc7, 1000, activation='softmax')
return self.output
model = AlexNetModel()
output = model.build_model(images)
# access intermediate layers like self.conv5, self.fc7, etc.
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
Tensor Fetching: The Bad (ii)
- The feed-forward (sometimes) needs to be done in an atomic way (i.e. a single call of
Session.run()
)
# a single step of training ...
*[loss_value, _] = session.run([loss_op, train_op],
feed_dict={images: batch_image})
# After this, the model parameter has been changed due to `train_op`
#
if np.isnan(loss_value):
# DEBUG : can we see the intermediate values for the current input?
[fc7, prob] = session.run([net['fc7'], net['prob']],
feed_dict={images: batch_image})
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- In other words, if any input is fed via
feed_dict
,
we may have to fetch the non-debugging-related tensors
and the debugging-related tensors at the same time.
Tensor Fetching: The Bad (ii)
- In fact, we can just perform an additional
session.run()
for debugging purposes,
if it does not involve any side effect
# for debugging only, get the intermediate layer outputs.
[fc7, prob] = session.run([net['fc7'], net['prob']],
feed_dict={images: batch_image})
#
# Yet another feed-forward: 'fc7' are computed once more ...
[loss_value, _] = session.run([loss_op, train_op],
feed_dict={images: batch_image})
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- A workaround: Use
session.partial_run()
.gray[(undocumented, and still experimental)]
h = sess.partial_run_setup([net['fc7'], loss_op, train_op], [images])
[loss_value, _] = sess.partial_run(h, [loss_op, train_op],
feed_dict={images: batch_image})
fc7 = sess.partial_run(h, net['fc7'])
- 1
- 2
- 3
- 4
(2) Tensorboard
- An off-the-shelf monitoring and debugging tool!
- Check out a must-read tutorial from TensorFlow documentation
- You will need to learn
- how to use and collect scalar/histogram/image summary
- how to use
tf.summary.FileWriter
.dogdrip[(previously it wasSummaryWriter
)]
Tensorboard: A Quick Tutorial
def multilayer_perceptron(x):
# inside this, variables 'fc1/weights' and 'fc1/bias' are defined
fc1 = layers.fully_connected(x, 256, activation_fn=tf.nn.relu,
scope='fc1')
* tf.summary.histogram('fc1', fc1)
* tf.summary.histogram('fc1/sparsity', tf.nn.zero_fraction(fc1))
fc2 = layers.fully_connected(fc1, 256, activation_fn=tf.nn.relu,
scope='fc2')
* tf.summary.histogram('fc2', fc2)
* tf.summary.histogram('fc2/sparsity', tf.nn.zero_fraction(fc2))
out = layers.fully_connected(fc2, 10, scope='out')
return out
# ... (omitted) ...
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(
logits=pred, labels=y))
*tf.scalar_summary('loss', loss)
*# histogram summary for all trainable variables (slow?)
*for v in tf.trainable_variables():
* tf.summary.histogram(v.name, v)
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
Tensorboard: A Quick Tutorial
*global_step = tf.Variable(0, dtype=tf.int32, trainable=False)
train_op = tf.train.AdamOptimizer(learning_rate=0.001)\
.minimize(loss, global_step=global_step)
- 1
- 2
- 3
def train(session):
batch_size = 200
session.run(tf.global_variables_initializer())
* merged_summary_op = tf.summary.merge_all()
* summary_writer = tf.summary.FileWriter(FLAGS.train_dir, session.graph)
# ... (omitted) ...
for step in range(batch_steps):
batch_x, batch_y = mnist.train.next_batch(batch_size)
* _, c, summary = session.run(
[train_op, loss, merged_summary_op],
feed_dict={x: batch_x, y: batch_y})
* summary_writer.add_summary(summary,
* global_step.eval(session=session))
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
Tensorboard: A Quick Tutorial (Demo)
Scalar Summary
.center.img-100[[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-UliDXkO1-1603170186522)(images/tensorboard-01-loss.png)]]
Tensorboard: A Quick Tutorial (Demo)
Histogram Summary (activations and variables)
.center.img-66[[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-vaJjQqkR-1603170186523)(images/tensorboard-02-histogram.png)]]
Tensorboard and Summary: Noteworthies
- Fetching histogram summary is extremely slow!
- GPU utilization can become very low (if the serialized values are huge)
- In non-debugging mode, disable it completely; or fetch summaries only periodically, e.g.
eval_tensors = [self.loss, self.train_op] if step % 200 == 0: eval_tensors += [self.merged_summary_op] eval_ret = session.run(eval_tensors, feed_dict) eval_ret = dict(zip(eval_tensors, eval_ret)) # as a dict current_loss = eval_ret[self.loss] if self.merged_summary_op in eval_tensors: self.summary_writer.add_summary( eval_ret[self.merged_summary_op], current_step)
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- I recommend to take simple and essential scalar summaries only (e.g. train/validation loss, overall accuracy, etc.), and to include debugging stuffs only on demand
Tensorboard and Summary: Noteworthies
- Some other recommendations:
- Use proper names (prefixed or scoped) for tensors and variables (specifying
name=...
to tensor/variable declaration) - Include both of train loss and validation loss,
plus train/validation accuracy (if possible) over step
- Use proper names (prefixed or scoped) for tensors and variables (specifying
.center.img-50[[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-AabT06wn-1603170186525)(images/tensorboard-02-histogram.png)]]
(3) tf.Print()
- During run-time evaluation, we can print the value of a tensor
.green[without] explicitly fetching and returning it to the code (i.e. viasession.run()
)
tf.Print(input, data, message=None,
first_n=None, summarize=None, name=None)
- 1
- 2
- It creates .blue[an identity op] with the side effect of printing
data
,
when this op is evaluated.
(3) tf.Print()
: Examples
def multilayer_perceptron(x):
fc1 = layers.fully_connected(x, 256, activation_fn=tf.nn.relu)
fc2 = layers.fully_connected(fc1, 256, activation_fn=tf.nn.relu)
out = layers.fully_connected(fc2, 10, activation_fn=None)
* out = tf.Print(out, [tf.argmax(out, 1)],
* 'argmax(out) = ', summarize=20, first_n=7)
return out
- 1
- 2
- 3
- 4
- 5
- 6
- 7
For the first seven times (i.e. 7 feed-forwards or SGD steps),
it will print the predicted labels for the 20 out of batch_size
examples
I tensorflow/core/kernels/logging_ops.cc:79] argmax(out) = [6 6 6 4 4 6 4 4 6 6 4 0 6 4 6 4 4 6 0 4...]
I tensorflow/core/kernels/logging_ops.cc:79] argmax(out) = [6 6 0 0 3 6 4 3 6 6 3 4 4 4 4 4 3 4 6 7...]
I tensorflow/core/kernels/logging_ops.cc:79] argmax(out) = [3 4 0 6 6 6 0 7 3 0 6 7 3 6 0 3 4 3 3 6...]
I tensorflow/core/kernels/logging_ops.cc:79] argmax(out) = [6 1 0 0 0 3 3 7 0 8 1 2 0 9 9 0 3 4 6 6...]
I tensorflow/core/kernels/logging_ops.cc:79] argmax(out) = [6 0 0 9 0 4 9 9 0 8 2 7 3 9 1 8 3 9 7 8...]
I tensorflow/core/kernels/logging_ops.cc:79] argmax(out) = [6 0 1 1 9 0 8 3 0 9 9 0 2 6 7 7 3 3 3 9...]
I tensorflow/core/kernels/logging_ops.cc:79] argmax(out) = [3 6 9 8 3 9 1 0 1 1 9 3 2 3 9 9 3 0 6 6...]
[2016-06-03 00:11:08.661563] Epoch 00, Loss = 0.332199
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
(3) tf.Print()
: Some drawbacks …
.red[Cons:]
- It is hard to take a full control of print formats (e.g. how do we print a 2D tensor in a matrix format?)
- Usually, we may want to print debugging values conditionally
(i.e. print them only if some condition is met)
or periodically
(i.e. print just only once per epoch)tf.Print()
has limitations to achieve this- TensorFlow has control flow operations; an overkill?
(3) tf.Assert()
- Asserts that the given condition is true, when evaluated (during the computation)
- If condition evaluates to
False
, print the list of tensors indata
,
and an error is thrown.
summarize
determines how many entries of the tensors to print.
tf.Assert(condition, data, summarize=None, name=None)
- 1
tf.Assert
: Examples
Abort the program if …
def multilayer_perceptron(x):
fc1 = layers.fully_connected(x, 256, activation_fn=tf.nn.relu, scope='fc1')
fc2 = layers.fully_connected(fc1, 256, activation_fn=tf.nn.relu, scope='fc2')
out = layers.fully_connected(fc2, 10, activation_fn=None, scope='out')
# let's ensure that all the outputs in `out` are positive
* tf.Assert(tf.reduce_all(out > 0), [out], name='assert_out_positive')
return out
- 1
- 2
- 3
- 4
- 5
- 6
- 7
–
The assertion will not work!
tf.Assert
is also an op, so it should be executed as well
tf.Assert
: Examples
We need to ensure that assert_op
is being executed when evaluating out
:
def multilayer_perceptron(x):
fc1 = layers.fully_connected(x, 256, activation_fn=tf.nn.relu, scope='fc1')
fc2 = layers.fully_connected(fc1, 256, activation_fn=tf.nn.relu, scope='fc2')
out = layers.fully_connected(fc2, 10, activation_fn=None, scope='out')
# let's ensure that all the outputs in `out` are positive
assert_op = tf.Assert(tf.reduce_all(out > 0), [out], name='assert_out_positive')
* with tf.control_dependencies([assert_op]):
* out = tf.identity(out, name='out')
return out
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
… somewhat ugly? … or
# ... same as above ...
* out = tf.with_dependencies([assert_op], out)
return out
- 1
- 2
- 3
tf.Assert
: Examples
Another good way: store all the created assertion operations into a collection,
(merge them into a single op), and explicitly evaluate them using Session.run()
def multilayer_perceptron(x):
fc1 = layers.fully_connected(x, 256, activation_fn=tf.nn.relu, scope='fc1')
fc2 = layers.fully_connected(fc1, 256, activation_fn=tf.nn.relu, scope='fc2')
out = layers.fully_connected(fc2, 10, activation_fn=None, scope='out')
* tf.add_to_collection('Asserts',
* tf.Assert(tf.reduce_all(out > 0), [out], name='assert_out_gt_0')
* )
return out
# merge all assertion ops from the collection
*assert_op = tf.group(*tf.get_collection('Asserts'))
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
... = session.run([train_op, assert_op], feed_dict={...})
- 1
Some built-in useful Assert ops
See Asserts and boolean checks in the docs!
tf.assert_negative(x, data=None, summarize=None, name=None)
tf.assert_positive(x, data=None, summarize=None, name=None)
tf.assert_proper_iterable(values)
tf.assert_non_negative(x, data=None, summarize=None, name=None)
tf.assert_non_positive(x, data=None, summarize=None, name=None)
tf.assert_equal(x, y, data=None, summarize=None, name=None)
tf.assert_integer(x, data=None, summarize=None, name=None)
tf.assert_less(x, y, data=None, summarize=None, name=None)
tf.assert_less_equal(x, y, data=None, summarize=None, name=None)
tf.assert_rank(x, rank, data=None, summarize=None, name=None)
tf.assert_rank_at_least(x, rank, data=None, summarize=None, name=None)
tf.assert_type(tensor, tf_type)
tf.is_non_decreasing(x, name=None)
tf.is_numeric_tensor(tensor)
tf.is_strictly_increasing(x, name=None)
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
If we need runtime assertions during computation, they are useful.
(4) Step-by-step Debugger
Python already has a powerful debugging utilities:
which are all interactive debuggers (like gdb
for C/C++)
- set breakpoint
- pause, continue
- inspect stack-trace upon exception
- watch variables and evaluate expressions interactively
Debugger: Usage
Insert set_trace()
for a breakpoint
def multilayer_perceptron(x):
fc1 = layers.fully_connected(x, 256, activation_fn=tf.nn.relu, scope='fc1')
fc2 = layers.fully_connected(fc1, 256, activation_fn=tf.nn.relu, scope='fc2')
out = layers.fully_connected(fc2, 10, activation_fn=None, scope='out')
* import ipdb; ipdb.set_trace() # XXX BREAKPOINT
return out
- 1
- 2
- 3
- 4
- 5
- 6
.gray[(yeah, it is a breakpoint on model building)]
.img-75.center[
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-jxTod6fR-1603170186527)(images/pdb-example-01.png)]
]
Debugger: Usage
Debug breakpoints can be conditional:
for i in range(batch_steps):
batch_x, batch_y = mnist.train.next_batch(batch_size)
* if (np.argmax(batch_y, axis=1)[:7] == [4, 9, 6, 2, 9, 6, 5]).all():
* import pudb; pudb.set_trace() # XXX BREAKPOINT
_, c = session.run([train_op, loss],
feed_dict={x: batch_x, y: batch_y})
- 1
- 2
- 3
- 4
- 5
- 6
.green[Live Demo and Case Example!!] .small[(20-mnist-pdb.py
)]
- Let’s break on training loop if some condition is met
- Get the
fc2
tensor, and fetch its evaluation result (givenx
)Session.run()
can be invoked and executed anywhere, even in the debugger
- .dogdrip[Wow… gonna love it…]
Hint: Some Useful TensorFlow APIs
To get any .green[operations] or .green[tensors] that might not be stored explicitly:
tf.get_default_graph()
: Get the current (default) graphG.get_operations()
: List all the TF ops in the graphG.get_operation_by_name(name)
: Retrieve a specific TF op
.gray[(Q. How to convert an operation to a tensor?)]G.get_tensor_by_name(name)
: Retrieve a specific tensortf.get_collection(tf.GraphKeys.~~
): Get the collection of some tensors
To get .green[variables]:
tf.get_variable_scope()
: Get the current variable scopetf.get_variable()
: Get a variable (see Sharing Variables)tf.trainable_variables()
: List all the (trainable) variables
[v for v in tf.all_variables() if v.name == 'fc2/weights:0'][0]
- 1
IPython.embed()
.green[ipdb/pudb:]
import pudb; pudb.set_trace()
- 1
- They are debuggers; we can set breakpoint, see stacktraces, watch expressions, …
(much more general)
.green[embed:]
from IPython import embed; embed()
- 1
- Open an ipython shell on the current context;
mostly used for watching expressions only
(5) Debugging ‘inside’ the computation graph
Our debugging tools so far can be used for debugging outside Session.run()
.
Question: How can we run a .red[‘custom’] operation? (e.g. custom layer)
–
-
TensorFlow allows us to write a custom operation in C++ !
-
The ‘custom’ operation can be designed for logging or debugging purposes (like PrintOp)
-
… but very burdensome (need to compile, define op interface, and use it …)
(5) Interpose any python code in the computation graph
We can also embed and interpose a python function in the graph:
tf.py_func()
comes to the rescue!
tf.py_func(func, inp, Tout, stateful=True, name=None)
- 1
- Wraps a python function and uses it .blue[as a tensorflow op].
- Given a python function
func
, which .green[takes numpy arrays] as its inputs and returns numpy arrays as its outputs, the function is wrapped as an operation.
def my_func(x):
# x will be a numpy array with the contents of the placeholder below
return np.sinh(x)
inp = tf.placeholder(tf.float32, [...])
*y = py_func(my_func, [inp], [tf.float32])
- 1
- 2
- 3
- 4
- 5
- 6
(5) Interpose any python code in the computation graph
In other words, we are now able to use the following (hacky) .green[tricks]
by intercepting the computation being executed on the graph:
- Print any intermediate values (e.g. layer activations) in a custom form
without fetching them - Use python debugger (e.g. trace and breakpoint)
- Draw a graph or plot using
matplotlib
, or save images into file
.gray.small[Warning: Some limitations may exist, e.g. thread-safety issue, not allowed to manipulate the state of session, etc…]
Case Example (i): Print
An ugly example …
def multilayer_perceptron(x):
fc1 = layers.fully_connected(x, 256, activation_fn=tf.nn.relu, scope='fc1')
fc2 = layers.fully_connected(fc1, 256, activation_fn=tf.nn.relu, scope='fc2')
out = layers.fully_connected(fc2, 10, activation_fn=None, scope='out')
* def _debug_print_func(fc1_val, fc2_val):
print 'FC1 : {}, FC2 : {}'.format(fc1_val.shape, fc2_val.shape)
print 'min, max of FC2 = {}, {}'.format(fc2_val.min(), fc2_val.max())
return False
* debug_print_op = tf.py_func(_debug_print_func, [fc1, fc2], [tf.bool])
with tf.control_dependencies(debug_print_op):
out = tf.identity(out, name='out')
return out
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
Case Example (ii): Breakpoints!
An ugly example to attach breakpoints …
def multilayer_perceptron(x):
fc1 = layers.fully_connected(x, 256, activation_fn=tf.nn.relu, scope='fc1')
fc2 = layers.fully_connected(fc1, 256, activation_fn=tf.nn.relu, scope='fc2')
out = layers.fully_connected(fc2, 10, activation_fn=None, scope='out')
* def _debug_func(x_val, fc1_val, fc2_val, out_val):
if (out_val == 0.0).any():
* import ipdb; ipdb.set_trace() # XXX BREAKPOINT
* from IPython import embed; embed() # XXX DEBUG
return False
* debug_op = tf.py_func(_debug_func, [x, fc1, fc2, out], [tf.bool])
with tf.control_dependencies(debug_op):
out = tf.identity(out, name='out')
return out
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
Another one: The tdb
library
A third-party TensorFlow debugging tool: .small[https://github.com/ericjang/tdb]
.small[(not actively maintained and looks clunky, but still good for prototyping)]
.img-90.center[
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-ga4IFq4s-1603170186529)(https://camo.githubusercontent.com/4c671d2b359c9984472f37a73136971fd60e76e4/687474703a2f2f692e696d6775722e636f6d2f6e30506d58516e2e676966)]
]
(6) tfdbg
: The official TensorFlow debugger
Recent versions of TensorFlow has the official debugger (tfdbg
).
Still experimental, but works quite well!
Check out the HOW-TOs and Examples on tfdbg
!!!
import tensorflow.python.debug as tf_debug
sess = tf.Session()
# create a debug wrapper session
*sess = tf_debug.LocalCLIDebugWrapperSession(sess)
# Add a tensor filter (similar to breakpoint)
*sess.add_tensor_filter("has_inf_or_nan", tf_debug.has_inf_or_nan)
# Each session.run() will be intercepted by the debugger,
# and we can inspect the value of tensors via the debugger interface
sess.run(loss, feed_dict = {x : ...})
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
(6) tfdbg
: The TensorFlow debugger
.img-100.center[
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-GbFu0mri-1603170186530)(images/tfdbg_example1.png)]
]
(6) tfdbg
: The TensorFlow debugger
.img-100.center[
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-mIKnOpro-1603170186532)(images/tfdbg_example2.png)]
]
tfdbg
: Features and Quick References
Conceptually, a wrapper session is employed (currently, CLI debugger session); it can intercept a single run of session.run()
run
/r
: Execute the run() call .blue[with debug tensor-watching]run -n
/r -n
: Execute the run() call .blue[without] debug tensor-watchingrun -f <filter_name>
: Keep executing run() calls until a dumped tensor passes a registered filter (conditional breakpoint)- e.g.
has_inf_or_nan
- e.g.
.img-80.center[
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-NqVEN35J-1603170186533)(images/tfdbg_example_run.png)]
]
tfdbg
: Tensor Filters
Registering tensor filters:
sess = tf_debug.LocalCLIDebugWrapperSession(sess)
sess.add_tensor_filter("has_inf_or_nan", tf_debug.has_inf_or_nan)
- 1
- 2
Tensor filters are just python functions (datum, tensor) -> bool
:
def has_inf_or_nan(datum, tensor):
_ = datum # Datum metadata is unused in this predicate.
if tensor is None:
# Uninitialized tensor doesn't have bad numerical values.
return False
elif (np.issubdtype(tensor.dtype, np.float) or
np.issubdtype(tensor.dtype, np.complex) or
np.issubdtype(tensor.dtype, np.integer)):
return np.any(np.isnan(tensor)) or np.any(np.isinf(tensor))
else:
return False
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
Running tensor filters are, therefore, quite slow.
tfdbg
: Tensor Fetching
In a tensor dump mode (the run-end UI), the debugger shows the list of tensors dumped in the session.run()
call:
.img-100.center[
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-Us29Mgo2-1603170186535)(images/tfdbg_example_fetch.png)]
]
tfdbg
: Tensor Fetching
Commands:
- .blue[
list_tensors
(lt
)] : Show the list of dumped tensor(s). - .blue[
print_tensor
(pt
)] : Print the value of a dumped tensor. node_info
(ni
) : Show information about a nodeni -t
: Shows the traceback of tensor creation
list_inputs
(li
) : Show inputs to a nodelist_outputs
(lo
) : Show outputs to a noderun_info
(ri
) : Show the information of current run
(e.g. what to fetch, what feed_dict is)- .green[
invoke_stepper
(s
)] : Invoke the stepper! run
(r
) : Move to the next run
tfdbg
: Tensor Fetching
Example: print_tensor fc2/Relu:0
.img-80.center[
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-roFrOqdz-1603170186537)(images/tfdbg_example_pt.png)]
]
- Slicing:
pt f2/Relu:0[0:10]
- Dumping:
pt fc2/Relu:0 > /tmp/debug/fc2.txt
See also: tfdbg CLI Frequently-Used Commands
tfdbg
: Stepper
Shows the tensor value(s) in a topologically-sorted order for the run.
.img-100.center[
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-KnMB8o88-1603170186538)(images/tfdbg_example_stepper.png)]
]
tfdbg
: Screencast and Demo!
.small.right[From Google Brain Team]
.small[
See also: Debug TensorFlow Models with tfdbg (@Google Developers Blog)
]
tfdbg
: Other Remarks
- Currently it is actively being developed (still experimental)
- In a near future, a web-based interactive debugger (integration with TensorBoard) will be out!
Debugging: Summary
Session.run()
: Explicitly fetch, and print- Tensorboard: Histogram and Image Summary
tf.Print()
,tf.Assert()
operation- Use python debugger (
ipdb
,pudb
) - Interpose your debugging python code in the graph
- The TensorFlow debugger:
tfdbg
.green[There is no silver bullet; one might need to choose the most convenient and suitable debugging tool, depending on the case]
template: inverse
Other General Tips
.gray[(in a Programmer’s Perspective)]
General Tips of Debugging
- Learn to use debugging tools, but do not solely rely on them when a problem occurs.
- Sometimes, just sitting down and reading through ? your code with ☕ (a careful code review!) would be greatly helpful.
General Tips from Software Engineering
Almost .red[all] of rule-of-thumb tips and guidelines for writing good, neat, and defensive codes can be applied to TensorFlow codes 😃
- Check and sanitize inputs
- Logging
- Assertions
- Proper usage of Exception
- Fail Fast: immediately abort if something is wrong
- The DRY principle: Don’t Repeat Yourself
- Build up well-organized codes, test smaller modules
- etc …
There are some good guides on the web like this
Use asserts as much as you can
- Use assertion anywhere (early fail is always good)
- e.g. Data processing routine (input sanity check)
- Especially, perform .red[shape check] for tensors (like ‘static’ type checking when compiling codes)
net['fc7'] = tf.nn.xw_plus_b(net['fc6'], vars['fc7/W'], vars['fc7/b'])
assert net['fc7'].get_shape().as_list() == [None, 4096]
*net['fc7'].get_shape().assert_is_compatible_with([B, 4096])
- 1
- 2
- 3
- 4
- Sometimes,
tf.Assert()
operation might be helpful (a run-time checking)- should be turned off if we are not debugging now
Use proper logging
- Being verbose for logging helps a lot (configurations for training hyperparameters, monitor train/validation loss, learning rate, elapsed time, etc.)
.img-100.center[
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-0ecQPkr1-1603170186540)(images/logging-example.png)]
]
Guard against Numerical Errors
Quite often, NaN
occurs during the training
- We usually deal with
float32
which is not a precise datatype;
deep learning models are susceptible to numerical instability - Some possible reasons:
- gradient is too big (clipping may be required)
- zero or negatives are passed in
sqrt
orlog
- Check: whether it is
NaN
or is finite
- Some useful TF APIs
Name your tensors properly
- It is recommended to specify names for intermediate tensors and variables when building model
- Using variable scopes properly is also a very good idea
- When something wrong happens, we can easily figure out where the error is from
ValueError: Cannot feed value of shape (200,)
for Tensor u'Placeholder_1:0', which has shape '(?, 10)'
ValueError: Tensor conversion requested dtype float32 for Tensor with
* dtype int32: 'Tensor("Variable_1/read:0", shape=(256,), dtype=int32)'
- 1
- 2
- 3
- 4
A better stacktrace:
ValueError: Cannot feed value of shape (200,)
for Tensor u'placeholder_y:0', which has shape '(?, 10)'
ValueError: Tensor conversion requested dtype float32 for Tensor with
* dtype int32: 'Tensor("fc1/weights/read:0", shape=(256,), dtype=int32)'
- 1
- 2
- 3
- 4
Name your tensors properly
def multilayer_perceptron(x):
W_fc1 = tf.Variable(tf.random_normal([784, 256], 0, 1))
b_fc1 = tf.Variable([0] * 256) # wrong here!!
fc1 = tf.nn.xw_plus_b(x, W_fc1, b_fc1)
# ...
- 1
- 2
- 3
- 4
- 5
>>> fc1
<tf.Tensor 'xw_plus_b:0' shape=(?, 256) dtype=float32>
- 1
- 2
Better:
def multilayer_perceptron(x):
W_fc1 = tf.Variable(tf.random_normal([784, 256], 0, 1), name='fc1/weights')
b_fc1 = tf.Variable(tf.zeros([256]), name='fc1/bias')
fc1 = tf.nn.xw_plus_b(x, W_fc1, b_fc1, name='fc1/linear')
fc1 = tf.nn.relu(fc1, name='fc1/relu')
# ...
- 1
- 2
- 3
- 4
- 5
- 6
>>> fc1
<tf.Tensor 'fc1/relu:0' shape=(?, 256) dtype=float32>
- 1
- 2
Name your tensors properly
The style that I much prefer:
def multilayer_perceptron(x):
* with tf.variable_scope('fc1'):
W_fc1 = tf.get_variable('weights', [784, 256]) # fc1/weights
b_fc1 = tf.get_variable('bias', [256]) # fc1/bias
fc1 = tf.nn.xw_plus_b(x, W_fc1, b_fc1) # fc1/xw_plus_b
fc1 = tf.nn.relu(fc1) # fc1/relu
# ...
- 1
- 2
- 3
- 4
- 5
- 6
- 7
or use high-level APIs or your custom functions:
import tensorflow.contrib.layers as layers
def multilayer_perceptron(x):
fc1 = layers.fully_connected(x, 256, activation_fn=tf.nn.relu,
* scope='fc1')
# ...
- 1
- 2
- 3
- 4
- 5
- 6
And more style guides …?
.small[https://github.com/wookayin/TensorFlowKR-2017-talk-bestpractice]
all of which will help you to write easily-debuggable codes!
Other Topics: Performance and Profiling
-
Run-time performance is a very important topic!
.dogdrip[there will be another lecture soon…]- Beyond the scope of this talk…
-
Make sure that your GPU utilization is always non-zero (and, near 100%)
- Watch and monitor using
nvidia-smi
orgpustat
- Watch and monitor using
.img-75.center[
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-PTfXYqM5-1603170186541)(https://github.com/wookayin/gpustat/raw/master/screenshot.png)]
]
.img-50.center[
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-4bbmwNPw-1603170186543)(images/nvidia-smi.png)]
]
Other Topics: Performance and Profiling
- Some possible factors that might slow down your code:
- Input batch preparation (i.e.
next_batch()
) - Too frequent or heavy summaries
- Inefficient model (e.g. CPU-bottlenecked operations)
- Input batch preparation (i.e.
- What we can do
- Use
tfdbg
!!! - Use
cProfile
,
line_profiler
or%profile
in IPython - Use
nvprof
for profiling CUDA operations - Use CUPTI (CUDA Profiling Tools Interface) tools for TF
- Use
.img-66.center[[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-vpJHn9zo-1603170186544)(images/tracing-cupti.png)]]
Concluding Remarks
We have talked about
- How to debug TensorFlow and python applications
- Some tips and guidelines for easy debugging — write a nice code that would require less debugging 😃
name: last-page
class: center, middle, no-number
Special Thanks to:
Juyong Kim, Yunseok Jang, Junhyug Noh, Cesc Park, Jongho Park
TensorFlow, the TensorFlow logo and any related marks are trademarks of Google Inc.
.img-66[[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-1tgM5Pla-1603170186545)(images/tensorflow-logo.png)]]
name: last-page
class: center, middle, no-number
Thank You!
@wookayin
.footnote[Slideshow created using remark.]