Introduction

Data labelling is an important task in Machine Learning. The quality of data we feed in the model will determine how well our model performs. Image annotation is the process of labelling images of a dataset for the machine learning model. It is used to label the features we need our model to recognize. In image annotation, the object is annotated and tagged with special techniques. This makes different type of object easily perceptible to AI-enabled machines.

Annotation work is usually carried out manually. While annotating, classes are predefined and features for the images are provided. The computer vision model is trained on these annotations. Now, it predicts the predetermined features on the new images which are not annotated.

Why Annotation is Important?

Computer vision models can learn a lot through annotated datasets. It can learn to predict accurately and relatively quicker. Therefore, it has its application in tasks like self-driving car, number-plate detection, tumor detection and many other remarkable applications.

The annotated datasets can provide our models the quality information. It can enable the model to learn well and predict well on new, unannotated data. With annotated images, the object detection can be easily performed. Thus, we rely heavily on these datasets to build AI-based models for automation.

Image Annotation for Object Detection

Image annotation refers to attaching labels (predetermined classes – human, dog, etc.) to an image. This is done to recognize, count, or segment objects boundaries in images. The annotations can have the following forms:

Bounding boxes
Semantic segmentation
3D Cuboids
Polygons
Lines & Splines

Image Annotation Formats

Computer vision problems require annotated data in their own defined formats. Some popular annotation formats are given below:

COCO

Microsoft COCO Dataset, a widely-used dataset. It has 2.5 million labeled instances for 80 object categories. COCO has total 5 annotation types

object detection
keypoint detection
stuff segmentation
panoptic segmentation
image captioning

The annotations are stored in the JSON form. The format for object detection is as follows:

annotation{

"id": int,

"image_id": int,

"category_id": int,

"segmentation": RLE or [polygon],

"area": float,

"boundingbox": [x,y,width,height],

"iscrowd": 0 or 1}

categories[{

"id": int,

"name": str,

"supercategory": str,

}]

YOLO

YOLO (You Only Look Once) is a very fast and accurate object detection algorithm. In this format, .txt file is generated with the same name for each image file in the same directory. Each .txt file contains the annotations for the corresponding image file. It consists of object class, object coordinates, height and width.

<object-class> <x> <y> <width> <height>

Each object is annotated on a new line. For two objects, given below is how they will be written in the .txt file-

0 67 33 23 14
1 54 19 86 78

Pascal VOC

Pascal VOC provides standardized image datasets for object detection. The annotation is stored in the XML file. Given below is an example of Pascal VOC annotation file for object detection:

<annotation> 
  <folder>Train</folder> 
  <filename>01.png</filename>      
  <path>/path/Train/01.png</path> 
  <source>  
    <database>Unknown</database> 
  </source>
  <size>  
    <width>224</width>  
    <height>224</height>  
    <depth>3</depth>   
  </size> 
  <segmented>0</segmented> 
  <object>  
    <name>36</name>  
    <pose>Frontal</pose>  
    <truncated>0</truncated>  
    <difficult>0</difficult>  
    <occluded>0</occluded>  
    <bndbox>   
      <xmin>90</xmin>   
      <xmax>190</xmax>   
      <ymin>54</ymin>   
      <ymax>70</ymax>  
    </bndbox> 
  </object>
</annotation>

TFRecord

A TFRecord (Tensorflow Record) file stores data in the form of sequence of binary strings. Tensorflow provides two components for specifying the structure of the data: tf.train.Example and tf.train.SequenceExample. Each sample of the data has to be stored in one of these structures. Then, it will have to be serialized using tf.python_io.TFRecordWriter to write it to disk.

The process of reading TFRecord is given as follows:

Use tf.TFRecordReader to read the TFRecord.
Define the features expected in the TFRecord by using tf.FixedLenFeature and tf.VarLenFeature.
Parse one tf.train.Example (one file) a time using tf.parse_single_example.

Annotation Converters (COCO to CSV, YOLO to COCO, etc.)

We often need to convert annotated data of one format to another. This is done to make use of the annotated dataset in a more versatile manner. Thus, with annotation converter functions, we can easily achieve conversions like COCO to CSV format, YOLO to COCO format, etc.

In the rest of the article, we will create different functions to enable format conversions. So, you can directly use these functions to perform format conversions on your own dataset.

Features are best represented in the form of rows and columns. So, we begin with conversions from different formats(COCO, YOLO, etc.) to CSV format. Thus, we can get a good understanding of the features, classes, bounding boxes, etc.

COCO to CSV format

def coco_to_csv(filename):

    import json

    # COCO2017/annotations/instances_val2017.json
    s = json.load(open(filename, 'r'))
    out_file = filename[:-5] + '.csv'
    out = open(out_file, 'w')
    #out.write('id,x1,y1,x2,y2,label\n')

    all_ids = []
    for im in s['images']:
        all_ids.append(im['id'])
    all_fn = []
    for im in s['images']:
        all_fn.append(im['file_name'])
    all_d = []
    for im in s['images']:
        all_d.append((im['height'],im['width']))

    classes=[]
    for cl in s['categories']:
        classes.append(cl['name'])

    all_ids_ann = []
    for ann in s['annotations']:
        image_id = ann['image_id']
        all_ids_ann.append(image_id)
        x1 = ann['bbox'][0]
        x2 =  ann['bbox'][2]-x1
        y1 = ann['bbox'][1]
        y2 =  ann['bbox'][3]-y1
        label = ann['category_id']
        out.write('{},{},{},{},{},{},{},{}\n'.format(classes[label], x1, y1, x2, y2,all_fn[image_id], all_d[image_id][1],all_d[image_id][0] ))

YOLO to CSV format

import os
import glob
import pandas as pd

def yolo_to_csv(yolo_dir,destination_dir):
    os.chdir(yolo_dir)
    myFiles = glob.glob('*.txt')
    classes=[]
    with open(yolo_dir+'/classes.names','rt') as f:
        for l in f.readlines():
            classes.append(l[:-1])

    width=1024
    height=1024
    image_id=0
    final_df=[]
    for item in myFiles:


        image_id+=1
        with open(item, 'rt') as fd:
            for line in fd.readlines():
                row = []
                bbox_temp = []
                splited = line.split()
                print(splited)
                try:
                    row.append(classes[int(splited[0])])

                    #print(row)

                    row.append(splited[1])
                    row.append(splited[2])
                    row.append(splited[3])
                    row.append(splited[4])
                    row.append(item[:-4]+".png")
                    row.append(width)
                    row.append(height)
                    final_df.append(row)

                except:
                    pass
    df = pd.DataFrame(final_df)
    df.to_csv(destination_dir+"/saved.csv",index=False)

Pascal VOC to CSV format

import glob
import pandas as pd
import xml.etree.ElementTree as ET

def xml_to_csv(path):
    xml_list = []
    for xml_file in glob.glob(path + '/*.xml'):
        print(xml_file)
        tree = ET.parse(xml_file)
        root = tree.getroot()
        for member in root.findall('object'):
            bbx = member.find('bndbox')
            xmin = int(bbx.find('xmin').text)
            ymin = int(bbx.find('ymin').text)
            xmax = int(bbx.find('xmax').text)-xmin
            ymax = int(bbx.find('ymax').text)-ymin
            label = member.find('name').text
            value = (
                     label,
                     xmin,
                     ymin,
                     xmax,
                     ymax,
                     root.find('filename').text,
                     int(root.find('size')[0].text),
                     int(root.find('size')[1].text)
                     )
            xml_list.append(value)

    xml_df = pd.DataFrame(xml_list )
    xml_df.to_csv(args.destination_dir + '\saved.csv', index=None, header=False)

TFRecord to CSV format

import tensorflow as tf
from PIL import Image

filenames = []
filenames.append('newtrain.record')

def read_tfrecord(serialized_example):
    feature_description = {
            'image/height': tf.io.FixedLenFeature((), tf.int64),
            'image/width': tf.io.FixedLenFeature((), tf.int64),
            'image/encoded': tf.io.FixedLenFeature((), tf.string),
            'image/object/bbox/xmin': tf.io.VarLenFeature(tf.float32),
            'image/object/bbox/xmax': tf.io.VarLenFeature(tf.float32),
            'image/object/bbox/ymin': tf.io.VarLenFeature(tf.float32),
            'image/object/bbox/ymax': tf.io.VarLenFeature(tf.float32),
            'image/object/class/text': tf.io.VarLenFeature(tf.string),
            'image/filename': tf.io.FixedLenFeature((),tf.string)
    }
    parsed_features = tf.io.parse_single_example(serialized_example, feature_description)
    parsed_features['image/encoded'] = tf.io.decode_jpeg(
            parsed_features['image/encoded'], channels=3)

    return parsed_features

data = tf.data.TFRecordDataset(filenames)
parsed_dataset = data.shuffle(128).map(read_tfrecord).batch(1)
print(parsed_dataset)
coord = []
filenames = []
labels = []
dim = []

for sample in parsed_dataset.take(10000):
    numpyed = sample['image/encoded'].numpy()
    alist = numpyed[0,:,:,:]
    for i in range(len(sample['image/object/bbox/xmin'].values.numpy())):
        coord.append([round(sample['image/object/bbox/xmin'].values.numpy()[i]*tf.keras.backend.get_value(sample['image/width'])[0]),
                      round(sample['image/object/bbox/ymin'].values.numpy()[i]*tf.keras.backend.get_value(sample['image/height'])[0]),
                      round(sample['image/object/bbox/xmax'].values.numpy()[i]*tf.keras.backend.get_value(sample['image/width'])[0]),
                      round(sample['image/object/bbox/ymax'].values.numpy()[i]*tf.keras.backend.get_value(sample['image/height'])[0])])
        filenames.append(str(sample['image/filename'].numpy()[0])[2:][:-1])
        dim.append([tf.keras.backend.get_value(sample['image/width'])[0],
                    tf.keras.backend.get_value(sample['image/height'])[0]])
    for i in list(sample['image/object/class/text'].values.numpy()):
        labels.append(str(i)[2:][:-1])
    img = Image.fromarray(alist, 'RGB')
    img.save('new.jpg')
out_file = 'file.csv'
out = open(out_file, 'w')

for i in range(len(coord)):
    x1= coord[i][0]
    y1= coord[i][1]
    x2 = coord[i][2]-x1
    y2 = coord[i][3]-y1
    out.write('{},{},{},{},{},{},{},{}\n'.format(labels[i], x1, y1, x2, y2,filenames[i],dim[i][0],dim[i][1]))

Now, we will look at functions for conversions from CSV to COCO, YOLO, VOC PASCAL and TFRecord formats.

CSV to COCO format

import json
import pandas as pd
def csv_to_coco(file_dir,destination_dir):
    path = file_dir
    save_json_path = destination_dir+'/traincoco.json'
    clmns = ['class','xmin','ymin','xmax','ymax','filename','width','height']
    data = pd.read_csv(path, names = clmns, header=None)
    images = []
    categories = []
    annotations = []
    data['fileid'] = data['filename'].astype('category').cat.codes
    data['categoryid']= pd.Categorical(data['class'],ordered= True).codes
    data['categoryid'] = data['categoryid']+1
    data['annid'] = data.index
    
    def image(row):
        image = {}
        image["height"] = row.height
        image["width"] = row.width
        image["id"] = row.fileid
        image["file_name"] = row.filename
        return image
    
    def category(row):
        category = {}
        category["supercategory"] = 'None'
        category["id"] = row.categoryid-1

        category["name"] = row[1]
        return category
    
    def annotation(row):
        annotation = {}
        area = (row.xmax)*(row.ymax)
        annotation["segmentation"] = []
        annotation["iscrowd"] = 0
        annotation["area"] = area
        annotation["image_id"] = row.fileid
        annotation["bbox"] = [row.xmin, row.ymin, row.xmax +row.xmin,row.ymax+row.ymin ]
        annotation["category_id"] = row.categoryid-1
        annotation["id"] = row.annid
        return annotation
    
    for row in data.itertuples():
        annotations.append(annotation(row))
    imagedf = data.drop_duplicates(subset=['fileid']).sort_values(by='fileid')
    for row in imagedf.itertuples():
        images.append(image(row))
    catdf = data.drop_duplicates(subset=['categoryid']).sort_values(by='categoryid')
    for row in catdf.itertuples():
        categories.append(category(row))

    data_coco = {}
    data_coco["images"] = images
    data_coco["categories"] = categories
    data_coco["annotations"] = annotations
    json.dump(data_coco, open(save_json_path, "w"), indent=4)

CSV to YOLO format

import csv
import numpy as np
import os

def csv_to_yolo(csv_file,destination_folder):
    if not os.path.exists(destination_folder+'\data'):
        os.makedirs(destination_folder+'\data')
        classes_names = []
    file_names = []
    data = csv.reader(open(csv_file))
    for l in data:
        file_names.append(l[5])
        classes_names.append(l[0])
    classes_names = np.unique(classes_names)
    classes = {k: v for v, k in enumerate(classes_names)}
    f=open(destination_folder+"/data/"+ 'classes.names','a')
    for i in classes_names:
        f.write(str(i))
        f.write('\n')
    f.close()
    for name in np.unique(file_names):
        file = open(destination_folder+'/data/'+str(name[:-4])+".txt",'a')
        for l in csv.reader(open(csv_file)):
            if(l[5]==name):
                file.write(str(classes[l[0]]))
                file.write(' ')
                file.write(l[1])
                file.write(' ')
                file.write(l[2])
                file.write(' ')
                file.write(l[3])
                file.write(' ')
                file.write(l[4])
                file.write(' ')
                file.write('\n')
    file.close()

CSV to Pascal VOC format

from collections import defaultdict
import os
import csv
from xml.etree.ElementTree import  Element, SubElement, ElementTree
 
def csv_to_voc_pascal(file_dir,save_root2):
    file_dir = 'here1.csv'
    save_root2 = save_root2 + "/result_xmls"
    if not os.path.exists(save_root2):
        os.mkdir(save_root2)

    def write_xml(folder, filename, bbox_list):
        root = Element('annotation')
        SubElement(root, 'folder').text = folder
        SubElement(root, 'filename').text = filename
        SubElement(root, 'path').text = './images' +  filename
        source = SubElement(root, 'source')
        SubElement(source, 'database').text = 'Unknown'

        # Details from first entry
        e_class_name, e_xmin, e_ymin, e_xmax, e_ymax, e_filename, e_width, e_height = bbox_list[0]
        size = SubElement(root, 'size')
        SubElement(size, 'width').text = e_width
        SubElement(size, 'height').text = e_height
        SubElement(size, 'depth').text = '3'
        SubElement(root, 'segmented').text = '0'

        for entry in bbox_list:
            e_class_name, e_xmin, e_ymin, e_xmax, e_ymax, e_filename, e_width, e_height = entry
            obj = SubElement(root, 'object')
            SubElement(obj, 'name').text = e_class_name
            SubElement(obj, 'pose').text = 'Unspecified'
            SubElement(obj, 'truncated').text = '0'
            SubElement(obj, 'difficult').text = '0'

            bbox = SubElement(obj, 'bndbox')
            SubElement(bbox, 'xmin').text = e_xmin
            SubElement(bbox, 'ymin').text = e_ymin
            SubElement(bbox, 'xmax').text = e_xmax
            SubElement(bbox, 'ymax').text = e_ymax

        #indent(root)
        tree = ElementTree(root)
        xml_filename = os.path.join('.', folder, os.path.splitext(filename)[0] + '.xml')
        tree.write(xml_filename)
    entries_by_filename = defaultdict(list)

    with open(file_dir, 'r', encoding='utf-8') as f_input_csv:
        csv_input = csv.reader(f_input_csv)
        header = next(csv_input)
        class_name, xmin, ymin, xmax, ymax, filename, width, height= header
        header[3]=str(int(header[1])+int(header[3]))
        header[4]=str(int(header[2])+int(header[4]))
        entries_by_filename[filename].append(header)
        for row in csv_input:
            class_name, xmin, ymin, xmax, ymax, filename, width, height= row
            row[3]=str(int(row[1])+int(row[3]))
            row[4]=str(int(row[2])+int(row[4]))
            #print(row)
            entries_by_filename[filename].append(row)
    for filename, entries in entries_by_filename.items():
        #print(filename, len(entries))
        write_xml(save_root2, filename, entries)

CSV to TFRecord format

from __future__ import division
from __future__ import print_function
from __future__ import absolute_import

import os
import io
import pandas as pd
import tensorflow as tf
from PIL import Image
from object_detection.utils import dataset_util
from collections import namedtuple, OrderedDict

flags = tf.compat.v1.app.flags
flags.DEFINE_string('csv_input', '', 'Path to the CSV input')
flags.DEFINE_string('output_path', '', 'Path to output TFRecord')
flags.DEFINE_string('image_dir', '', 'Path to images')
FLAGS = flags.FLAGS

columns = ['class','xmin','ymin','xmax','ymax','filename','width','height']
data = pd.read_csv(FLAGS.csv_input,names = columns)
df_reorder = data[['filename','width','height','class','xmin','ymin','xmax','ymax']] # rearrange column here
df_reorder.to_csv('newcsv.csv', index=False)

#TO-DO
def class_text_to_int(row_label):
    if row_label == 'Man':
        return 1
    elif row_label == 'Dog':
        return 2
    elif row_label == 'Monitor':
        return 3
    elif row_label == 'Machine':
        return 4
    elif row_label == 'Girl':
        return 5
    else:
        None

def split(df, group):
    data = namedtuple('data', ['filename', 'object'])
    gb = df.groupby(group)
    return [data(filename, gb.get_group(x)) for filename, x in zip(gb.groups.keys(), gb.groups)]

def create_tf_example(group, path):
    i =0
    with tf.io.gfile.GFile(os.path.join(path, '{}'.format(group.filename)), 'rb') as fid:
        encoded_jpg = fid.read()
    encoded_jpg_io = io.BytesIO(encoded_jpg)
    image = Image.open(encoded_jpg_io)
    width, height = image.size

    filename = group.filename.encode('utf8')
    image_format = b'jpg'
    xmins = []
    xmaxs = []
    ymins = []
    ymaxs = []
    classes_text = []
    classes = []

    for index, row in group.object.iterrows():
        xmins.append(row['xmin'] / width)
        xmaxs.append((row['xmax']+row['xmin']) / width)
        ymins.append(row['ymin'] / height)
        ymaxs.append((row['ymax']+row['ymin'])/ height)
        classes_text.append(row['class'].encode('utf8'))
        classes.append(class_text_to_int(row['class']))

    tf_example = tf.train.Example(features=tf.train.Features(feature={
        'image/height': dataset_util.int64_feature(height),
        'image/width': dataset_util.int64_feature(width),
        'image/filename': dataset_util.bytes_feature(filename),
        'image/source_id': dataset_util.bytes_feature(filename),
        'image/encoded': dataset_util.bytes_feature(encoded_jpg),
        'image/format': dataset_util.bytes_feature(image_format),
        'image/object/bbox/xmin': dataset_util.float_list_feature(xmins),
        'image/object/bbox/xmax': dataset_util.float_list_feature(xmaxs),
        'image/object/bbox/ymin': dataset_util.float_list_feature(ymins),
        'image/object/bbox/ymax': dataset_util.float_list_feature(ymaxs),
        'image/object/class/text': dataset_util.bytes_list_feature(classes_text),
        'image/object/class/label': dataset_util.int64_list_feature(classes),
    }))
    return tf_example

def main(_):
    writer = tf.io.TFRecordWriter(FLAGS.output_path)
    path = os.path.join(FLAGS.image_dir)
    examples = pd.read_csv('newcsv.csv')
    grouped = split(examples, 'filename')
    for group in grouped:
        tf_example = create_tf_example(group, path)
        writer.write(tf_example.SerializeToString())
    writer.close()
    output_path = os.path.join(os.getcwd(), FLAGS.output_path)
    print('Successfully created the TFRecords: {}'.format(output_path))

To achieve any conversions, we can convert it to CSV format. Then, CSV format can be converted to the desired format. For instance, we need to convert COCO to Pascal VOC. We will first convert COCO to CSV format. Then, CSV will be converted to Pascal VOC format

In the last part of the article, some direct conversions functions are provided:

XML to JSON format

import os
import json
import xml.etree.ElementTree as ET
import glob

START_BOUNDING_BOX_ID = 1
PRE_DEFINE_CATEGORIES = None

def get(root, name):
    vars = root.findall(name)
    return vars

def get_and_check(root, name, length):
    vars = root.findall(name)
    if len(vars) == 0:
        raise ValueError("Can not find %s in %s." % (name, root.tag))
    if length > 0 and len(vars) != length:
        raise ValueError(
            "The size of %s is supposed to be %d, but is %d."
            % (name, length, len(vars))
        )
    if length == 1:
        vars = vars[0]
    return vars

def get_filename(filename):
        filename = filename.replace("\\", "/")
        filename = os.path.splitext(os.path.basename(filename))[0]
        return str(filename)
    
def get_categories(xml_files):
    """Generate category name to id mapping from a list of xml files.
        Arguments:
        xml_files {list} -- A list of xml file paths.
    Returns:
        dict -- category name to id mapping.
    """
    classes_names = []
    for xml_file in xml_files:
        tree = ET.parse(xml_file)
        root = tree.getroot()
        for member in root.findall("object"):
            classes_names.append(member[0].text)
    classes_names = list(set(classes_names))
    classes_names.sort()
    return {name: i for i, name in enumerate(classes_names)}

def convert(xml_files, json_file):
    json_dict = {"images": [], "type": "instances", "annotations": [], "categories": []}
    if PRE_DEFINE_CATEGORIES is not None:
        categories = PRE_DEFINE_CATEGORIES
    else:
        categories = get_categories(xml_files)
    bnd_id = START_BOUNDING_BOX_ID
    for xml_file in xml_files:
        tree = ET.parse(xml_file)
        root = tree.getroot()
        path = get(root, "path")
        if len(path) == 1:
            filename = os.path.basename(path[0].text)
        elif len(path) == 0:
            filename = get_and_check(root, "filename", 1).text
        else:
            raise ValueError("%d paths found in %s" % (len(path), xml_file))
        ## The filename must be a number
        image_id = get_filename(filename)
        size = get_and_check(root, "size", 1)
        width = int(get_and_check(size, "width", 1).text)
        height = int(get_and_check(size, "height", 1).text)
        image = {
            "file_name": filename,
            "height": height,
            "width": width,
            "id": image_id,
        }
        json_dict["images"].append(image)

        for obj in get(root, "object"):
            category = get_and_check(obj, "name", 1).text
            if category not in categories:
                new_id = len(categories)
                categories[category] = new_id
            category_id = categories[category]
            bndbox = get_and_check(obj, "bndbox", 1)
            xmin = int(get_and_check(bndbox, "xmin", 1).text) - 1
            ymin = int(get_and_check(bndbox, "ymin", 1).text) - 1
            xmax = int(get_and_check(bndbox, "xmax", 1).text)
            ymax = int(get_and_check(bndbox, "ymax", 1).text)
            assert xmax > xmin
            assert ymax > ymin
            o_width = abs(xmax - xmin)
            o_height = abs(ymax - ymin)
            ann = {
                "area": o_width * o_height,
                "iscrowd": 0,
                "image_id": image_id,
                "bbox": [xmin, ymin, o_width, o_height],
                "category_id": category_id,
                "id": bnd_id,
                "ignore": 0,
                "segmentation": [],
            }
            json_dict["annotations"].append(ann)
            bnd_id = bnd_id + 1

    for cate, cid in categories.items():
        cat = {"supercategory": "none", "id": cid, "name": cate}
        json_dict["categories"].append(cat)

    os.makedirs(os.path.dirname(json_file), exist_ok=True)
    json_fp = open(json_file, "w")
    json_str = json.dumps(json_dict)
    json_fp.write(json_str)
    json_fp.close()

JSON to YOLO format

import json
classes = ["Man","Monitor","Dog"]

def convert(size,box):
    x = box[0]
    y = box[1]
    w = box[2]
    h = box[3]
    return (x,y,w,h)

def convert_annotation(json_dir,destination_dir):
    with open(json_dir,'r') as f:
        data = json.load(f)
    for item in data['images']:
        image_id = item['id']      
        file_name = item['file_name']
        width = item['width']
        height = item['height']
        value = filter(lambda item1: item1['image_id'] == image_id,data['annotations'])
        outfile = open(destination_dir+"%s.txt"%(file_name[:-4]), 'a+')
        for item2 in value:
            category_id = item2['category_id']
            value1 = list(filter(lambda item3: item3['id'] == category_id,data['categories']))
            name = value1[0]['name']
            class_id = classes.index(name)
            box = item2['bbox']
            bb = convert((width,height),box)
            outfile.write(str(class_id)+" "+" ".join([str(a) for a in bb]) + '\n')
        outfile.close()

Annotation Converters for Object Detection