This notebook contains a walkthrough of text classification from scratch, starting from a directory of plain text files (a common scenario in practice). We demonstrate multiclass text classification using a dataset of Stack Overflow questions.

!pip3 install -q tf-nightly
import tensorflow as tf

import numpy as np
from tensorflow.keras import preprocessing
print(tf.__version__)

2.2.0-dev20200507

Multiclass text classification

This notebook shows a multicalss classifier of Stack Overflow posts as one of the most used languages today, namely Java, Javascript, Python or C#. This is an example of multiclass classification.

We will use a public dataset about Stack Overflow questions available in Google Cloud marketplace. You can explore the dataset in BigQuery just by following the instructions of the former link. In this notebook, you will build a model to predict the tags of questions from Stack Overflow, using a pre-processed table already built and coming from the BigQuery dataset. To keep things simple our pre-processed table includes questions containing 4 possible programming-related tags: Java, Javascript, Python or Csharp.

This notebook uses tf.keras to build and train models in TensorFlow, as well as some TensorFlow experimental features, like the TextVectorization layer for word splitting & indexing.

Download the BigQuery dataset

BigQuery has a public dataset that includes more than 17 million Stack Overflow questions. We are going to download some posts labeled as one of the four most used languages today: java, javascript, python and C#, but to make this a harder problem to our model, we have replaced every instance of that word with another less used language today (but well-known some decades ago) called blank. Otherwise, it will be very easy for the model to detect that a post is a java-related post just by finding the word java on it.

You can access the pre-processed blank-filled dataset as a tar file here in Google Cloud Storage. Each of the four labels has approximate 10k samples for training/eval and 10k samples for test.

!gsutil cp gs://tensorflow-blog-rnn/so_posts_4labels_blank_80k.tar.gz .
!tar -xf so_posts_4labels_blank_80k.tar.gz


Updates are available for some Cloud SDK components.  To install them,
please run:
  $ gcloud components update

Copying gs://tensorflow-blog-rnn/so_posts_4labels_blank_80k.tar.gz...

Operation completed over 1 objects/29.0 MiB.

batch_size = 32
raw_train_ds = tf.keras.preprocessing.text_dataset_from_directory(
    'train', batch_size=batch_size, validation_split=0.2, subset='training', seed=42)
raw_val_ds = tf.keras.preprocessing.text_dataset_from_directory(
    'train', batch_size=batch_size, validation_split=0.2, subset='validation', seed=42)
raw_test_ds = tf.keras.preprocessing.text_dataset_from_directory(
    'test', batch_size=batch_size)

Found 40000 files belonging to 4 classes.
Using 32000 files for training.
Found 40000 files belonging to 4 classes.
Using 8000 files for validation.
Found 40000 files belonging to 4 classes.

Caching may reduce the processing time. Let's prove it.

import time
start = time.time()
for text_batch, label_batch in raw_train_ds:
    pass
end = time.time()
print(end - start)

7.62075400352478

import time
start = time.time()
for text_batch, label_batch in raw_train_ds:
    pass
end = time.time()
print(end - start)

1.898169994354248

raw_train_ds = raw_train_ds.cache()
import time
start = time.time()
for text_batch, label_batch in raw_train_ds:
    pass
end = time.time()
print(end - start)

2.132780075073242

import time
start = time.time()
for text_batch, label_batch in raw_train_ds:
    pass
end = time.time()
print(end - start)

0.1296229362487793

Explore the data

The dataset comes pre-processed, by replacing key words java, javascript, C# or python by blank. In total, there are 4 labels (classes).

Note we can evaluate tensors using .numpy(), thanks to the eager execution of TensorFlow 2:

import time

for text_batch, label_batch in raw_train_ds.take(1):
  for i in range(5):
    print(text_batch.numpy()[i])
    print(label_batch.numpy()[i])

b'how do i find missing dates in a list of sorted dates? in blank how do i find all the missing days in a sorted list of dates?\n'
3
b'"find the sequences of numbers in the list? there is a task to find all the sequences of numbers in the list, then add them another list. for example, there is such a sequence of numbers in list ...  12222533343332...only numbers must appear in the resulting list like this 44 77 88 000 a prerequisite is that repeated numbers must stand side by side .for example, so ...  5122225333433325...5 should not fall into the resulting list because they are not near each other, respectively (not a sequence)..list&lt;integer&gt; toplist = new arraylist&lt;&gt;();.    list&lt;integer&gt; result = new arraylist&lt;&gt;();.    int count = 0;.    boolean flag = true;..    while (count &lt; toplist.size()){.        while (flag) {.            for (int j = count + 1; j &lt; toplist.size(); j++) {.                if (toplist.get(count).equals(toplist.get(j))) {.                    result.add(toplist.get(j));.                    system.out.println(result);.                    flag = false;.                }else {.                    flag = true;.                }.            }.            count++;.        }.    }...i try to compare the elements in pairs and add them to the sheet, but it is added to a couple of more elements for example instead of 22222, i get 222222. and instead of 333 and one more sequence 333. i get 333 and 33. how can i improve?"\n'
1
b'"is there a standard function code like `lambda x, y: x.custom_method(y)`? i know that i can call magic methods using functions from operator module, for example:..operator.add(a, b)...is equal to..a.__add__(b)...is there a standard function for calling a custom method (like operator.methodcaller but also accepts method arguments when called)?.currently i have code like this:..def methodapply(name):.    """"""apply a custom method...    usage:.        methodapply(\'some\')(a, *args, **kwargs) =&gt; a.some(*args, **kwargs)..    """""".    def func(instance, *args, **kwargs):.        return getattr(instance, name)(*args, **kwargs).    func.__doc__ = """"""call {!r} instance method"""""".format(name).    return func"\n'
3
b'"blank: refer to objects dynamically apologies if this is a silly question...i have a list of potential dictionary keys here:.. form_fields = [\'sex\',.                \'birth\',.                \'location\',.                \'politics\']...i am currently manually adding values to these keys like so:..        self.participant.vars[""sex""] =  [constants.fields_dict[""sex""][0], constants.fields_dict[""sex""][1], self.player.name].        self.participant.vars[""birth""] = [constants.fields_dict[""birth""][0], constants.fields_dict[""birth""][1],self.player.age].        self.participant.vars[""location""] = [constants.fields_dict[""location""][0], constants.fields_dict[""location""][1],self.player.politics]...i\'d like to be able to do a use a for loop to do this all at once like so:..for i in form_fields:.    self.participant.vars[i] =  [constants.fields_dict[i][0], constants.fields_dict[i][1], self.player.`i`]...obviously, however, i can\'t reference the object self.player.i like that.  is there a way to reference that object dynamically?"\n'
3
b'"list.split output trouble i was practicing some coding and i decided to make a parrot translator. the basic point of this game is, that after every word in a sentence, you should put the syllable ""pa"". i had written the code for that:..    print(""this is the parrot translator!"").    original = input(""please enter a sentence you want to translate: "")..    words = list(original.split())..    for words in words:.         print(words + ""pa"")...but the problem i have and i dont know how to fix is, when i split the sentence, the output wont be in the same line, but every word will be at it\'s own."\n'
3

Each label is an integer value between 0 and 3, correponsing to one of our four labels (0 to 3):

Prepare data for training

Since the data is pre-processed, we do not need to make any additional steps like removing HTML tags, as we did in Part 1 of this notebook.

We can go directly to instantiate our text vectorization layer (experimental feature). We are using this layer to normalize, split, and map strings to integers, so we set our output_mode to int. We also set the same constants as Part 1 for the model, like an explicit maximum sequence_length.

from tensorflow.keras.layers.experimental.preprocessing import TextVectorization

max_features = 5000
embedding_dim = 128
sequence_length = 500

vectorize_layer = TextVectorization(
    max_tokens=max_features,
    output_mode='int',
    output_sequence_length=sequence_length)

# Make a text-only dataset (no labels) and call adapt
text_ds = raw_train_ds.map(lambda x, y: x)
vectorize_layer.adapt(text_ds)

Vectorize the data

def vectorize_text(text, label):
  text = tf.expand_dims(text, -1)
  return vectorize_layer(text), label

# Vectorize the data.
train_ds = raw_train_ds.map(vectorize_text)
val_ds = raw_val_ds.map(vectorize_text)
test_ds = raw_test_ds.map(vectorize_text)

# Do async prefetching / buffering of the data for best performance on GPU.
train_ds = train_ds.cache().prefetch(buffer_size=10)
val_ds = val_ds.cache().prefetch(buffer_size=10)
test_ds = test_ds.cache().prefetch(buffer_size=10)

The vectorization layer transforms each input word of a sentence into a numerical representation, i.e. a list of token indices or vocabulary, with size defined by max_features (5000). Note that the output size is fixed, truncated by sequence_length (500), regardless of how many tokens resulted from the previous step, and this will be the input to our model.

Let's take a moment to understand the output of the vectorization layer. The output of each sentence is fixed to 500 integers, as stated by sequence_length. It should be noted that most of the values are zero, and this is due to the fact that there is no corresponding token in our vocabulary.

for text_batch, label_batch in train_ds.take(1):
  for i in range(5):
    print(text_batch.numpy()[i])
    print(label_batch.numpy()[i])

[  22   40    3  139  490 1037    6    5   53    9 1131 1037    6   16
   22   40    3  139   73    2  490  711    6    5 1131   53    9 1037
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0]
3
[ 139    2 3705    9  170    6    2   53   66    7    5  604    4  139
   73    2 3705    9  170    6    2   53   87  132  181  159   53   12
  138   66    7  300    5  846    9  170    6   53    1  170  310  918
    6    2 2217   53   46   13 2046 3045 2928 1177    5    1    7   14
 1732  170  310 4466  779   80  779   12  138   51    1   91   21 3240
   99    2 2217   53  193  208   60   21 2668  115  142 3880   21    5
    1    1   15 1790 3831  131   15 1790   29  185   19  248 1008   89
  111  185   61    1  111 1008   12   29  172  185   27  172   61    1
  172   11    1    1    1 1008  113   54 1008   89  185    3  118    4
  501    2  283    6 1491    8  132  181    4    2 1854   23   10    7
  407    4    5 1512    9  169  283   12  138  262    9    1    3   41
    1    8  262    9    1    8   71  169  846    1    3   41    1    8
 1325   22   34    3 2094    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0]
1
[   7   66    5 1013   39   28   46 1390   85  157    1    3   95   14
    3   34  148 2146  251   48  303   33  570  415   12    1    1  580
    1   66    5 1013   39   12  446    5  648   64   46    1   23  173
 1859   64  503   47    1    3   17   28   46 2810    1 1120    5  648
   64 1315    1  155 2061   58    1 2061  162    1  155 2061   25    1
    1 2061    1  148  296  256    1   25 1432    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0]
3
[  16 1652    4  222  736 4105   11   13    7    5 2685 3811   17    5
   53    9 3815  367  623  105    1    1 2954  492    1   35  412 1252
  421  128    4  227  623   46   51    1    1    1    1    1    1    1
    1    1    1   46    4   32  229    4   40    5   70    5   12  143
    4   40   13   73   62  441   46    1    3    6    1    1    1    1
    1  241    3  158  339    2   63    1   46   14    7   66    5   81
    4  339   14   63  736    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0]
3
[   1  126  718    3  120    1   83  832    8    3 1972    4  112    5
    1    1    2  668  311    9   13  253    7   14  151  236  213    6
    5  891   56   91  286    2    1    1    3  532  486    2   28   12
   14    1    7    2    1    1  562 2389  187    5  891   56   43    4
 2175  330    1   12  330    6  330    1    1    2  121    3   17    8
    3  130   95   22    4  399    7   47    3  502    2  891    2  126
  612   32    6    2  116   72   23  236  213   74   32   62   97  622
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0]
3

Build the model

The input data consists of an array of integer-encoded vocabulary, with a fixed size. The labels to predict are between 0 and 3, so instead of using a binary classifier, we will use a softmax classifier. We compile the model with an Adam optimizer and a different loss function from Part 1 (Sparse categorical crossentropy).

One of the parameters of the embedding layer is max_features+1and not max_features, and the reason is to add an extra token for an unknown word to our vocabulary in the input string.

from tensorflow.keras import layers

# A integer input for vocab indices.
inputs = tf.keras.Input(shape=(None,), dtype='int64')

x = layers.Embedding(max_features + 1, embedding_dim)(inputs)
x = layers.Bidirectional(layers.LSTM(128))(x)
predictions = layers.Dense(4, activation='softmax', name='predictions')(x)

model = tf.keras.Model(inputs, predictions)

model.compile(
    loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

Train the model

Train the model by passing the Dataset object to the model's fit function. Set the number of epochs.

epochs = 5

# Fit the model using the train and test datasets.
history = model.fit(
    train_ds,
    validation_data=val_ds,
    epochs=epochs)

Epoch 1/5
 225/1000 [=====>........................] - ETA: 11:23 - loss: 1.1989 - accuracy: 0.4439

---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
<ipython-input-16-25f03af6a77d> in <module>
      5     train_ds,
      6     validation_data=val_ds,
----> 7     epochs=epochs)

~/Library/Python/3.7/lib/python/site-packages/tensorflow/python/keras/engine/training.py in _method_wrapper(self, *args, **kwargs)
     70   def _method_wrapper(self, *args, **kwargs):
     71     if not self._in_multi_worker_mode():  # pylint: disable=protected-access
---> 72       return method(self, *args, **kwargs)
     73 
     74     # Running inside `run_distribute_coordinator` already.

~/Library/Python/3.7/lib/python/site-packages/tensorflow/python/keras/engine/training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_batch_size, validation_freq, max_queue_size, workers, use_multiprocessing)
    905                 batch_size=batch_size):
    906               callbacks.on_train_batch_begin(step)
--> 907               tmp_logs = train_function(iterator)
    908               if data_handler.should_sync:
    909                 context.async_wait()

~/Library/Python/3.7/lib/python/site-packages/tensorflow/python/eager/def_function.py in __call__(self, *args, **kwds)
    764       else:
    765         compiler = "nonXla"
--> 766         result = self._call(*args, **kwds)
    767 
    768       new_tracing_count = self._get_tracing_count()

~/Library/Python/3.7/lib/python/site-packages/tensorflow/python/eager/def_function.py in _call(self, *args, **kwds)
    791       # In this case we have created variables on the first call, so we run the
    792       # defunned version which is guaranteed to never create variables.
--> 793       return self._stateless_fn(*args, **kwds)  # pylint: disable=not-callable
    794     elif self._stateful_fn is not None:
    795       # Release the lock early so that multiple threads can perform the call

~/Library/Python/3.7/lib/python/site-packages/tensorflow/python/eager/function.py in __call__(self, *args, **kwargs)
   2810     with self._lock:
   2811       graph_function, args, kwargs = self._maybe_define_function(args, kwargs)
-> 2812     return graph_function._filtered_call(args, kwargs)  # pylint: disable=protected-access
   2813 
   2814   @property

~/Library/Python/3.7/lib/python/site-packages/tensorflow/python/eager/function.py in _filtered_call(self, args, kwargs, cancellation_manager)
   1836                            resource_variable_ops.BaseResourceVariable))),
   1837         captured_inputs=self.captured_inputs,
-> 1838         cancellation_manager=cancellation_manager)
   1839 
   1840   def _call_flat(self, args, captured_inputs, cancellation_manager=None):

~/Library/Python/3.7/lib/python/site-packages/tensorflow/python/eager/function.py in _call_flat(self, args, captured_inputs, cancellation_manager)
   1913       # No tape is watching; skip to running the function.
   1914       return self._build_call_outputs(self._inference_function.call(
-> 1915           ctx, args, cancellation_manager=cancellation_manager))
   1916     forward_backward = self._select_forward_and_backward_functions(
   1917         args,

~/Library/Python/3.7/lib/python/site-packages/tensorflow/python/eager/function.py in call(self, ctx, args, cancellation_manager)
    547               inputs=args,
    548               attrs=attrs,
--> 549               ctx=ctx)
    550         else:
    551           outputs = execute.execute_with_cancellation(

~/Library/Python/3.7/lib/python/site-packages/tensorflow/python/eager/execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
     58     ctx.ensure_initialized()
     59     tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
---> 60                                         inputs, attrs, num_outputs)
     61   except core._NotOkStatusException as e:
     62     if name is not None:

KeyboardInterrupt:

model.summary()

Evaluate the model

And let's see how the model performs. Two values will be returned: loss (a number which represents our error, lower values are better), and accuracy.

loss, accuracy = model.evaluate(test_ds)

print("Loss: ", loss)
print("Accuracy: ", accuracy)

Learn more

This notebook uses tf.keras, a high-level API to build and train models in TensorFlow. For a more advanced text classification tutorial using tf.keras, see the MLCC Text Classification Guide. In this notebook, we also use some TensorFlow experimental features, like the TextVectorization layer for word splitting & indexing.