TF2 Object Detection

Object Detection

There are many guides out there that are very good to help you get started with setting up the TF Object Detection API, but unfortunately, most of them are written for the TF v1 API.

We will take a look at how to use the TF v2 Object Detection API to build a model for a custom dataset on a Google Colab Notebook.

Before we begin the setup, make sure to change the runtime-type in Colab to GPU so that we can make use of the free GPU provided.

Installing Dependencies and setting up the workspace.

Create a folder for your workspace

(Remember, you will be executing all of these commands in your colab/jupyter notebook)

1%mkdir workspace
2%cd /content/workspace

We will be cloning the TF repository from GitHub.

1!git clone --q https://github.com/tensorflow/models.git

And before we install TF Object Detection we must install Protobuf.

“The Tensorflow Object Detection API uses Protobufs to configure model and training parameters. Before the framework can be used, the Protobuf libraries must be downloaded and compiled”

1!apt-get install -qq protobuf-compiler python-pil python-lxml python-tk
2!pip install -qq Cython contextlib2 pillow lxml matplotlib!pip install -qq pycocotools
3%cd models/research/
4!protoc object_detection/protos/*.proto --python_out=.

Now we install the TF Object Detection API

1%cp object_detection/packages/tf2/setup.py .
2!python -m pip install .
3!python object_detection/builders/model_builder_tf2_test.py

The output should be as follows:
Install output

Preparing the Dataset

There are two ways to go about this:

1. Use a Public Labelled Dataset
2. Create a Custom Labelled Dataset

You can find Public Labelled Datasets online, which are already labeled and saved in the right format, ready to be used to train.

For this tutorial, we will be creating our own dataset from scratch.

First things first, gather the images for the dataset. I will assume this step has already been done.

Now we need to label the images. There are many popular labeling tools, we will be using LabelIMG.

To install LabelIMG, execute the following code (Do it on your local Terminal since Colab does not support GUI applications):

1pip install labelImg

Launch LabelImg in the folder where your images are stored.

1labelImg imagesdir

Now you can start labeling your images, for more info on how to label the images follow this link (LabelImg Repository).

LabelImg

Create a label map in notepad as follows (label_map.pbtxt) with two classes for example cars and bikes:

1item {
2    id: 1
3    name: 'car'
4}
5
6item {
7    id: 2
8    name: 'bike'
9}

Now for creating the TFRecord files.

We can do the following:

1. Create TFRecord ourselves
2. Upload the annotations to Roboflow and get the dataset in TFRecord Format.

Creating the TFRecords ourselves is a bit tedious as the XML created after annotating may sometimes vary, so for the sake of ease, I suggest using Roboflow to perform the above task. They also provide an option to perform additional Data Augmentation which will increase the size of the dataset.

For your reference, here is a sample.py script to create the TFRecords manually.

  1import pandas as pd
  2import numpy as np
  3import csv
  4import re
  5import cv2 
  6import os
  7import glob
  8import xml.etree.ElementTree as ET
  9
 10import io
 11import tensorflow as tf
 12from collections import namedtuple, OrderedDict
 13
 14import shutil
 15import urllib.request
 16import tarfile
 17import argparse
 18
 19# os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'    # Suppress TensorFlow logging (1)
 20import tensorflow.compat.v1 as tf
 21from PIL import Image
 22from object_detection.utils import dataset_util, label_map_util
 23from collections import namedtuple
 24xml_dir = 'images/test'
 25image_dir = 'images/test'
 26
 27label_map = label_map_util.load_labelmap('annotations/label_map.pbtxt')
 28label_map_dict = label_map_util.get_label_map_dict(label_map)
 29
 30output_path = 'annotations/test.record'
 31
 32def xml_to_csv(path):
 33    """Iterates through all .xml files (generated by labelImg) in a given directory and combines
 34    them in a single Pandas dataframe.
 35    Parameters:
 36    ----------
 37    path : str
 38        The path containing the .xml files
 39    Returns
 40    -------
 41    Pandas DataFrame
 42        The produced dataframe
 43    """
 44
 45    xml_list = []
 46    for xml_file in glob.glob(path + '/*.xml'):
 47        tree = ET.parse(xml_file)
 48        root = tree.getroot()
 49        for member in root.findall('object'):
 50            value = (root.find('filename').text,
 51                     int(root.find('size')[0].text),
 52                     int(root.find('size')[1].text),
 53                     member[0].text,
 54                     int(member[4][0].text),
 55                     int(member[4][1].text),
 56                     int(member[4][2].text),
 57                     int(member[4][3].text)
 58                     )
 59            xml_list.append(value)
 60    column_name = ['filename', 'width', 'height',
 61                   'class', 'xmin', 'ymin', 'xmax', 'ymax']
 62    xml_df = pd.DataFrame(xml_list, columns=column_name)
 63    return xml_df
 64
 65
 66def class_text_to_int(row_label):
 67    return label_map_dict[row_label]
 68
 69
 70def split(df, group):
 71    data = namedtuple('data', ['filename', 'object'])
 72    gb = df.groupby(group)
 73    return [data(filename, gb.get_group(x)) for filename, x in zip(gb.groups.keys(), gb.groups)]
 74
 75
 76def create_tf_example(group, path):
 77    with tf.gfile.GFile(os.path.join(path, '{}'.format(group.filename)), 'rb') as fid:
 78        encoded_jpg = fid.read()
 79    encoded_jpg_io = io.BytesIO(encoded_jpg)
 80    image = Image.open(encoded_jpg_io)
 81    width, height = image.size
 82
 83    filename = group.filename.encode('utf8')
 84    image_format = b'jpg'
 85    xmins = []
 86    xmaxs = []
 87    ymins = []
 88    ymaxs = []
 89    classes_text = []
 90    classes = []
 91
 92    for index, row in group.object.iterrows():
 93        xmins.append(row['xmin'] / width)
 94        xmaxs.append(row['xmax'] / width)
 95        ymins.append(row['ymin'] / height)
 96        ymaxs.append(row['ymax'] / height)
 97        classes_text.append(row['class'].encode('utf8'))
 98        classes.append(class_text_to_int(row['class']))
 99
100    tf_example = tf.train.Example(features=tf.train.Features(feature={
101        'image/height': dataset_util.int64_feature(height),
102        'image/width': dataset_util.int64_feature(width),
103        'image/filename': dataset_util.bytes_feature(filename),
104        'image/source_id': dataset_util.bytes_feature(filename),
105        'image/encoded': dataset_util.bytes_feature(encoded_jpg),
106        'image/format': dataset_util.bytes_feature(image_format),
107        'image/object/bbox/xmin': dataset_util.float_list_feature(xmins),
108        'image/object/bbox/xmax': dataset_util.float_list_feature(xmaxs),
109        'image/object/bbox/ymin': dataset_util.float_list_feature(ymins),
110        'image/object/bbox/ymax': dataset_util.float_list_feature(ymaxs),
111        'image/object/class/text': dataset_util.bytes_list_feature(classes_text),
112        'image/object/class/label': dataset_util.int64_list_feature(classes),
113    }))
114    return tf_example
115
116csv_path = None
117def main(_):
118
119    writer = tf.python_io.TFRecordWriter(output_path)
120    path = os.path.join(image_dir)
121    examples = xml_to_csv(xml_dir)
122    grouped = split(examples, 'filename')
123    for group in grouped:
124        tf_example = create_tf_example(group, path)
125        writer.write(tf_example.SerializeToString())
126    writer.close()
127    print('Successfully created the TFRecord file: {}'.format(output_path))
128    if csv_path is not None:
129        examples.to_csv(csv_path, index=None)
130        print('Successfully created the CSV file: {}'.format(csv_path))
131
132if __name__ == '__main__':
133    tf.app.run()

Use the above code for train and test images to create train.tfrecord and test.tfrecord respectively by changing the following:

1xml_dir = 'images/test'
2image_dir = 'images/test'
3output_path = 'annotations/test.record'

By using Roboflow you will be provided the TFRecord files automatically.

Setting up on Colab

Create folders to store all the necessary files we have just created.

my_mobilenet folder is where our training results will be stored

1%mkdir annotations exported-models pre-trained-models models/my_mobilenet

Now upload the newly created TFRecord files along with the images and annotations to Google Colab by clicking upload files.

You could use Google Drive to store your necessary files and importing those to Google Colab should be as simple as doing a !cp command.

Download Pre-Trained Model

There are many models ready to download from the Tensorflow Model Zoo.

Be careful in choosing which model to use as some are not made for Object Detection. For this tutorial we will be using the following model:

SSD MobileNet V2 FPNLite 320x320.

Download it into your Colab Notebook and extract it by executing:

1%cd pre-trained-models
2!curl "http://download.tensorflow.org/models/object_detection/tf2/20200711/ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8.tar.gz" --output "ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8.tar.gz"

1model_name = 'ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8'
2model_file = model_name + '.tar.gz'
3tar = tarfile.open(model_file)
4tar.extractall()
5tar.close()
6os.remove(model_file)

Your directory structure should now look like this:

workspace/
├─ models/
│  ├─ community/
│  ├─ official/
│  ├─ orbit/
│  ├─ research/
│  ├─ my_mobilenet/
│  └─ ...
├─ annotations/
│  ├─ train/
│  └─ test/
├─ pre-trained-model/
├─ exported-models/

Editing the Configuration file

In TF Object Detection API, all the settings and required information for training the model and evaluating is situated in the pipeline.config file.

Let us take a look at it:

The most important ones we will need to change are:

 1batch_size: 128
 2fine_tune_checkpoint: "PATH_TO_BE_CONFIGURED"
 3num_steps: 50000
 4num_classes: 2
 5fine_tune_checkpoint_type: "classification"
 6train_input_reader {   
 7  label_map_path: "PATH_TO_BE_CONFIGURED"   
 8  tf_record_input_reader {     
 9    input_path: "PATH_TO_BE_CONFIGURED"   
10  } 
11} 
12eval_input_reader {   
13  label_map_path: "PATH_TO_BE_CONFIGURED"   
14  shuffle: false   
15  num_epochs: 1   
16  tf_record_input_reader {     
17    input_path: "PATH_TO_BE_CONFIGURED"   
18  }
19}

batch_size is the number of batches the model will train in parallel. A suitable number to use is 8. It could be more/less depending on the computing power available.

A good suggestion given on StackOverflow is:

Max batch size = available GPU memory bytes / 4 / (size of tensors + trainable parameters)

fine_tune_checkpoint is the last trained checkpoint (a checkpoint is how the model is stored by Tensorflow).

If you are starting the training for the first time, set this to the pre-trained-model.

If you want to continue training on a previously trained checkpoint, set it to the respective checkpoint path. (This will continue training, building upon the features and loss instead of starting from scratch).

 1# For Fresh Training
 2fine_tune_checkpoint: "pre-trained-model/ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8/checkpoint/ckpt-0"
 3
 4# For Contuining the Training
 5fine_tune_checkpoint: "exported_models/your_latest_batch/checkpoint/ckpt-0"
 6
 7batch_size = 8 # Increase/Decrease this value depending on how fast your train job runs and the availability of the Compute Resources.num_steps: 25000 # 25000 is a good number of steps to get a good loss.
 8
 9fine_tune_checkpoint_type: "detection" # Set this to detection
10
11train_input_reader {   
12  label_map_path: "annotations/label_map.pbtxt"   # Set to location of label map
13  tf_record_input_reader {     
14    input_path: "annotations/train.tfrecord"   # Set to location of train TFRecord file
15  } 
16}
17
18# Similarly do the same for the eval input reader
19eval_input_reader {   
20  label_map_path: "annotations/label_map.pbtxt"   
21  shuffle: false   
22  num_epochs: 1   
23  tf_record_input_reader {     
24    input_path: "annotations/test.tfrecord"   
25  }
26}

After editing the config file, we need to add the TensorFlow object detection folders to the python path.

1import os
2os.environ['PYTHONPATH'] += ':/content/window_detection/models/:/content/window_detection/models/research/:/content/window_detection/models/research/slim/'

Setting up TensorBoard on Colab to monitor the training process

Colab has introduced inbuilt support for TensorBoard and can now be called with a simple magic command as follows

1%load_ext tensorboard
2%tensorboard --logdir 'models/my_mobilenet'

Cell Running Tensorbard

This is how the cell will look once you execute the above command, but nothing to worry, once we start the training job, click refresh on the Tensorboard cell(Top Right) after a few minutes(The .tfevent files need to be created for us to monitor the TensorFlow logs) and you will see the output on the TensorBoard magic cell

Running the Training Job

We will copy the TensorFlow training python script to the workspace directory for ease of access.

1!cp '/content/window_detection/models/research/object_detection/model_main_tf2.py' .

The training job requires command-line arguments, namely:

model_dir : This refers to the path where the training process will store the checkpoint files.
pipeline_config_path : This refers to the path where the pipeline.config file is stored

Execute the following command to start the training job

1# If you are training from scratch
2!python model_main_tf2.py --model_dir=models/my_mobilenet --pipeline_config_path=pre-trained-model/ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8/pipeline.config# Or if you are continuing from a previous training
3!python model_main_tf2.py --model_dir=models/my_mobilenet --pipeline_config_path=exported_models/pipeline.config

If everything goes well, the training output cell should look like this

Training Output

The output will normally update slowly. The training outputs logs only every 100 steps by default, therefore if you wait for a while, you should see a log for the loss at step 100. The speed depends on whether a GPU is being used to train and the available VRAM and many other factors, so be patient.

Refresh the TensorBoard while the training is running and you will be able to monitor the progress

Tensorboard Logs

Once the loss reaches a fairly constant value or becomes lower than 0.05(in my case), then you can stop the training cell.

Evaluating the model

Now you can run the evaluation script to find out the mAP (Mean Average Precision) and the Loss.

Run the following in a cell:

1!python model_main_tf2.py --model_dir=exported-models/checkpoint --pipeline_config_path=exported-models/pipeline.config --checkpoint_dir=models/my_mobilenet/checkpoint # The folder where the model has saved the checkpoints during training

You should get an output that looks like this

Evaluation Output

Now the evaluation script has a default timeout of 3600 seconds to wait for a new checkpoint to be generated as the script was initially intended to be running in parallel to the training job, but we are running it after the training process on Colab

You may go ahead and stop the evaluation cell from running.

Exporting the model

Now that we have our model ready, we need to save it in a format we can use it later.

We now have a bunch of checkpoints in the models/my_mobilenet folder. To remove all the older checkpoints and keep the latest checkpoint, I have attached a neat little python script that will do the task automatically.

 1output_directory = 'exported-models/'
 2
 3# goes through the model is the training/ dir and gets the last one.
 4# you could choose a specfic one instead of the last
 5lst = os.listdir("models/my_mobilenet/")
 6# print(lst)
 7lst = [l for l in lst if 'ckpt-' in l and '.index' not in l]
 8steps=np.array([int(re.findall('\d+', l)[0]) for l in lst])
 9last_model = lst[steps.argmax()]
10last_model_path = os.path.join('models/my_mobilenet', last_model)
11# print(last_model_path)

Now to export the model, we run the export script provided by TF2, as follows:

1!python /content/workspace/models/research/object_detection/exporter_main_v2.py --input_type=image_tensor \
2--pipeline_config_path=pre-trained-model/ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8/pipeline.config
3--output_directory=exported_models \
4--trained_checkpoint_dir=models/my_mobilenet

The export script will save the model in the exported_models folder with the following structure

workspace/
├─ exported_models/
   ├─ checkpoint/
   ├─ saved_model/
   ├─ pipeline.config

You can now upload this folder to Google Drive or download it to save it for future use.

Inference on the model

The final step, the step that fills you with a sense of accomplishment, in this step we will test our model on a random input image and see the model predict the type of object and give its bounding box.

The entire process is a little tedious but I will attach a script that will let you perform inference directly on Google Colab

 1import numpy as np
 2from PIL import Image
 3from google.colab.patches import cv2_imshow
 4
 5def load_image_into_numpy_array(path):
 6    """Load an image from file into a numpy array.
 7    Puts image into numpy array to feed into tensorflow graph.
 8    Note that by convention we put it into a numpy array with shape
 9    (height, width, channels), where channels=3 for RGB.
10    Args:
11      path: the file path to the image
12    Returns:
13      uint8 numpy array with shape (img_height, img_width, 3)
14    """
15    return np.array(Image.open(path))
16
17image_path = "PATH TO YOUR INFERENCE IMAGE"
18print('Running inference for {}... '.format(image_path), end='')
19
20image_np = load_image_into_numpy_array(image_path)
21
22# Things to try:
23# Flip horizontally
24# image_np = np.fliplr(image_np).copy()
25
26# Convert image to grayscale, (You could uncomment this to try and see how the model reacts to a grayscale image)
27# image_np = np.tile(
28#     np.mean(image_np, 2, keepdims=True), (1, 1, 3)).astype(np.uint8)
29
30# The input needs to be a tensor, convert it using `tf.convert_to_tensor`.
31input_tensor = tf.convert_to_tensor(image_np)
32# The model expects a batch of images, so add an axis with `tf.newaxis`.
33input_tensor = input_tensor[tf.newaxis, ...]
34
35detections = detect_fn(input_tensor)
36
37# All outputs are batches tensors.
38# Convert to numpy arrays, and take index [0] to remove the batch dimension.
39# We're only interested in the first num_detections.
40num_detections = int(detections.pop('num_detections'))
41detections = {key: value[0, :num_detections].numpy()
42              for key, value in detections.items()}
43detections['num_detections'] = num_detections
44
45# detection_classes should be ints.
46detections['detection_classes'] = detections['detection_classes'].astype(np.int64)
47
48image_np_with_detections = image_np.copy()
49
50viz_utils.visualize_boxes_and_labels_on_image_array(
51      image_np_with_detections,
52      detections['detection_boxes'],
53      detections['detection_classes'],
54      detections['detection_scores'],
55      category_index,
56      use_normalized_coordinates=True,
57      max_boxes_to_draw=200,
58      min_score_thresh=.4, # Adjust this value to set the minimum probability boxes to be classified as True
59      agnostic_mode=False)
60
61cv2_imshow(image_np_with_detections)

The output of the inference should be like this

Inference Image

You can use the above script to fashion it into using a video as an input and perform inference on that.

Conclusion

Congratulations! You have built an object detection model with TensorFlow 2.

That’s it for the tutorial! Hope you face no issues while following along, if there are any questions please comment and I will respond to your queries.

Refer to the Tensorflow Github.

If you like this article or any of my other articles, please consider supporting me on my Patreon!

Thank you for reading!