Introduction
Data labelling is an important task in Machine Learning. The quality of data we feed in the model will determine how well our model performs. Image annotation is the process of labelling images of a dataset for the machine learning model. It is used to label the features we need our model to recognize. In image annotation, the object is annotated and tagged with special techniques. This makes different type of object easily perceptible to AI-enabled machines.
Annotation work is usually carried out manually. While annotating, classes are predefined and features for the images are provided. The computer vision model is trained on these annotations. Now, it predicts the predetermined features on the new images which are not annotated.
Why Annotation is Important?
Computer vision models can learn a lot through annotated datasets. It can learn to predict accurately and relatively quicker. Therefore, it has its application in tasks like self-driving car, number-plate detection, tumor detection and many other remarkable applications.
The annotated datasets can provide our models the quality information. It can enable the model to learn well and predict well on new, unannotated data. With annotated images, the object detection can be easily performed. Thus, we rely heavily on these datasets to build AI-based models for automation.
Image Annotation for Object Detection
Image annotation refers to attaching labels (predetermined classes – human, dog, etc.) to an image. This is done to recognize, count, or segment objects boundaries in images. The annotations can have the following forms:
- Bounding boxes
- Semantic segmentation
- 3D Cuboids
- Polygons
- Lines & Splines
Image Annotation Formats
Computer vision problems require annotated data in their own defined formats. Some popular annotation formats are given below:
COCO
Microsoft COCO Dataset, a widely-used dataset. It has 2.5 million labeled instances for 80 object categories. COCO has total 5 annotation types
- object detection
- keypoint detection
- stuff segmentation
- panoptic segmentation
- image captioning
The annotations are stored in the JSON form. The format for object detection is as follows:
annotation{ "id": int, "image_id": int, "category_id": int, "segmentation": RLE or [polygon], "area": float, "boundingbox": [x,y,width,height], "iscrowd": 0 or 1} categories[{ "id": int, "name": str, "supercategory": str, }]
YOLO
YOLO (You Only Look Once) is a very fast and accurate object detection algorithm. In this format, .txt file is generated with the same name for each image file in the same directory. Each .txt file contains the annotations for the corresponding image file. It consists of object class, object coordinates, height and width.
<object-class> <x> <y> <width> <height>
Each object is annotated on a new line. For two objects, given below is how they will be written in the .txt file-
0 67 33 23 14
1 54 19 86 78
Pascal VOC
Pascal VOC provides standardized image datasets for object detection. The annotation is stored in the XML file. Given below is an example of Pascal VOC annotation file for object detection:
<annotation> <folder>Train</folder> <filename>01.png</filename> <path>/path/Train/01.png</path> <source> <database>Unknown</database> </source> <size> <width>224</width> <height>224</height> <depth>3</depth> </size> <segmented>0</segmented> <object> <name>36</name> <pose>Frontal</pose> <truncated>0</truncated> <difficult>0</difficult> <occluded>0</occluded> <bndbox> <xmin>90</xmin> <xmax>190</xmax> <ymin>54</ymin> <ymax>70</ymax> </bndbox> </object> </annotation>
TFRecord
A TFRecord (Tensorflow Record) file stores data in the form of sequence of binary strings. Tensorflow provides two components for specifying the structure of the data: tf.train.Example and tf.train.SequenceExample. Each sample of the data has to be stored in one of these structures. Then, it will have to be serialized using tf.python_io.TFRecordWriter to write it to disk.
The process of reading TFRecord is given as follows:
- Use tf.TFRecordReader to read the TFRecord.
- Define the features expected in the TFRecord by using tf.FixedLenFeature and tf.VarLenFeature.
- Parse one tf.train.Example (one file) a time using tf.parse_single_example.
Annotation Converters (COCO to CSV, YOLO to COCO, etc.)
We often need to convert annotated data of one format to another. This is done to make use of the annotated dataset in a more versatile manner. Thus, with annotation converter functions, we can easily achieve conversions like COCO to CSV format, YOLO to COCO format, etc.
In the rest of the article, we will create different functions to enable format conversions. So, you can directly use these functions to perform format conversions on your own dataset.
Features are best represented in the form of rows and columns. So, we begin with conversions from different formats(COCO, YOLO, etc.) to CSV format. Thus, we can get a good understanding of the features, classes, bounding boxes, etc.
COCO to CSV format
def coco_to_csv(filename): import json # COCO2017/annotations/instances_val2017.json s = json.load(open(filename, 'r')) out_file = filename[:-5] + '.csv' out = open(out_file, 'w') #out.write('id,x1,y1,x2,y2,label\n') all_ids = [] for im in s['images']: all_ids.append(im['id']) all_fn = [] for im in s['images']: all_fn.append(im['file_name']) all_d = [] for im in s['images']: all_d.append((im['height'],im['width'])) classes=[] for cl in s['categories']: classes.append(cl['name']) all_ids_ann = [] for ann in s['annotations']: image_id = ann['image_id'] all_ids_ann.append(image_id) x1 = ann['bbox'][0] x2 = ann['bbox'][2]-x1 y1 = ann['bbox'][1] y2 = ann['bbox'][3]-y1 label = ann['category_id'] out.write('{},{},{},{},{},{},{},{}\n'.format(classes[label], x1, y1, x2, y2,all_fn[image_id], all_d[image_id][1],all_d[image_id][0] ))
YOLO to CSV format
import os import glob import pandas as pd def yolo_to_csv(yolo_dir,destination_dir): os.chdir(yolo_dir) myFiles = glob.glob('*.txt') classes=[] with open(yolo_dir+'/classes.names','rt') as f: for l in f.readlines(): classes.append(l[:-1]) width=1024 height=1024 image_id=0 final_df=[] for item in myFiles: image_id+=1 with open(item, 'rt') as fd: for line in fd.readlines(): row = [] bbox_temp = [] splited = line.split() print(splited) try: row.append(classes[int(splited[0])]) #print(row) row.append(splited[1]) row.append(splited[2]) row.append(splited[3]) row.append(splited[4]) row.append(item[:-4]+".png") row.append(width) row.append(height) final_df.append(row) except: pass df = pd.DataFrame(final_df) df.to_csv(destination_dir+"/saved.csv",index=False)
Pascal VOC to CSV format
import glob import pandas as pd import xml.etree.ElementTree as ET def xml_to_csv(path): xml_list = [] for xml_file in glob.glob(path + '/*.xml'): print(xml_file) tree = ET.parse(xml_file) root = tree.getroot() for member in root.findall('object'): bbx = member.find('bndbox') xmin = int(bbx.find('xmin').text) ymin = int(bbx.find('ymin').text) xmax = int(bbx.find('xmax').text)-xmin ymax = int(bbx.find('ymax').text)-ymin label = member.find('name').text value = ( label, xmin, ymin, xmax, ymax, root.find('filename').text, int(root.find('size')[0].text), int(root.find('size')[1].text) ) xml_list.append(value) xml_df = pd.DataFrame(xml_list ) xml_df.to_csv(args.destination_dir + '\saved.csv', index=None, header=False)
TFRecord to CSV format
import tensorflow as tf from PIL import Image filenames = [] filenames.append('newtrain.record') def read_tfrecord(serialized_example): feature_description = { 'image/height': tf.io.FixedLenFeature((), tf.int64), 'image/width': tf.io.FixedLenFeature((), tf.int64), 'image/encoded': tf.io.FixedLenFeature((), tf.string), 'image/object/bbox/xmin': tf.io.VarLenFeature(tf.float32), 'image/object/bbox/xmax': tf.io.VarLenFeature(tf.float32), 'image/object/bbox/ymin': tf.io.VarLenFeature(tf.float32), 'image/object/bbox/ymax': tf.io.VarLenFeature(tf.float32), 'image/object/class/text': tf.io.VarLenFeature(tf.string), 'image/filename': tf.io.FixedLenFeature((),tf.string) } parsed_features = tf.io.parse_single_example(serialized_example, feature_description) parsed_features['image/encoded'] = tf.io.decode_jpeg( parsed_features['image/encoded'], channels=3) return parsed_features data = tf.data.TFRecordDataset(filenames) parsed_dataset = data.shuffle(128).map(read_tfrecord).batch(1) print(parsed_dataset) coord = [] filenames = [] labels = [] dim = [] for sample in parsed_dataset.take(10000): numpyed = sample['image/encoded'].numpy() alist = numpyed[0,:,:,:] for i in range(len(sample['image/object/bbox/xmin'].values.numpy())): coord.append([round(sample['image/object/bbox/xmin'].values.numpy()[i]*tf.keras.backend.get_value(sample['image/width'])[0]), round(sample['image/object/bbox/ymin'].values.numpy()[i]*tf.keras.backend.get_value(sample['image/height'])[0]), round(sample['image/object/bbox/xmax'].values.numpy()[i]*tf.keras.backend.get_value(sample['image/width'])[0]), round(sample['image/object/bbox/ymax'].values.numpy()[i]*tf.keras.backend.get_value(sample['image/height'])[0])]) filenames.append(str(sample['image/filename'].numpy()[0])[2:][:-1]) dim.append([tf.keras.backend.get_value(sample['image/width'])[0], tf.keras.backend.get_value(sample['image/height'])[0]]) for i in list(sample['image/object/class/text'].values.numpy()): labels.append(str(i)[2:][:-1]) img = Image.fromarray(alist, 'RGB') img.save('new.jpg') out_file = 'file.csv' out = open(out_file, 'w') for i in range(len(coord)): x1= coord[i][0] y1= coord[i][1] x2 = coord[i][2]-x1 y2 = coord[i][3]-y1 out.write('{},{},{},{},{},{},{},{}\n'.format(labels[i], x1, y1, x2, y2,filenames[i],dim[i][0],dim[i][1]))
Now, we will look at functions for conversions from CSV to COCO, YOLO, VOC PASCAL and TFRecord formats.
CSV to COCO format
import json import pandas as pd def csv_to_coco(file_dir,destination_dir): path = file_dir save_json_path = destination_dir+'/traincoco.json' clmns = ['class','xmin','ymin','xmax','ymax','filename','width','height'] data = pd.read_csv(path, names = clmns, header=None) images = [] categories = [] annotations = [] data['fileid'] = data['filename'].astype('category').cat.codes data['categoryid']= pd.Categorical(data['class'],ordered= True).codes data['categoryid'] = data['categoryid']+1 data['annid'] = data.index def image(row): image = {} image["height"] = row.height image["width"] = row.width image["id"] = row.fileid image["file_name"] = row.filename return image def category(row): category = {} category["supercategory"] = 'None' category["id"] = row.categoryid-1 category["name"] = row[1] return category def annotation(row): annotation = {} area = (row.xmax)*(row.ymax) annotation["segmentation"] = [] annotation["iscrowd"] = 0 annotation["area"] = area annotation["image_id"] = row.fileid annotation["bbox"] = [row.xmin, row.ymin, row.xmax +row.xmin,row.ymax+row.ymin ] annotation["category_id"] = row.categoryid-1 annotation["id"] = row.annid return annotation for row in data.itertuples(): annotations.append(annotation(row)) imagedf = data.drop_duplicates(subset=['fileid']).sort_values(by='fileid') for row in imagedf.itertuples(): images.append(image(row)) catdf = data.drop_duplicates(subset=['categoryid']).sort_values(by='categoryid') for row in catdf.itertuples(): categories.append(category(row)) data_coco = {} data_coco["images"] = images data_coco["categories"] = categories data_coco["annotations"] = annotations json.dump(data_coco, open(save_json_path, "w"), indent=4)
CSV to YOLO format
import csv import numpy as np import os def csv_to_yolo(csv_file,destination_folder): if not os.path.exists(destination_folder+'\data'): os.makedirs(destination_folder+'\data') classes_names = [] file_names = [] data = csv.reader(open(csv_file)) for l in data: file_names.append(l[5]) classes_names.append(l[0]) classes_names = np.unique(classes_names) classes = {k: v for v, k in enumerate(classes_names)} f=open(destination_folder+"/data/"+ 'classes.names','a') for i in classes_names: f.write(str(i)) f.write('\n') f.close() for name in np.unique(file_names): file = open(destination_folder+'/data/'+str(name[:-4])+".txt",'a') for l in csv.reader(open(csv_file)): if(l[5]==name): file.write(str(classes[l[0]])) file.write(' ') file.write(l[1]) file.write(' ') file.write(l[2]) file.write(' ') file.write(l[3]) file.write(' ') file.write(l[4]) file.write(' ') file.write('\n') file.close()
CSV to Pascal VOC format
from collections import defaultdict import os import csv from xml.etree.ElementTree import Element, SubElement, ElementTree def csv_to_voc_pascal(file_dir,save_root2): file_dir = 'here1.csv' save_root2 = save_root2 + "/result_xmls" if not os.path.exists(save_root2): os.mkdir(save_root2) def write_xml(folder, filename, bbox_list): root = Element('annotation') SubElement(root, 'folder').text = folder SubElement(root, 'filename').text = filename SubElement(root, 'path').text = './images' + filename source = SubElement(root, 'source') SubElement(source, 'database').text = 'Unknown' # Details from first entry e_class_name, e_xmin, e_ymin, e_xmax, e_ymax, e_filename, e_width, e_height = bbox_list[0] size = SubElement(root, 'size') SubElement(size, 'width').text = e_width SubElement(size, 'height').text = e_height SubElement(size, 'depth').text = '3' SubElement(root, 'segmented').text = '0' for entry in bbox_list: e_class_name, e_xmin, e_ymin, e_xmax, e_ymax, e_filename, e_width, e_height = entry obj = SubElement(root, 'object') SubElement(obj, 'name').text = e_class_name SubElement(obj, 'pose').text = 'Unspecified' SubElement(obj, 'truncated').text = '0' SubElement(obj, 'difficult').text = '0' bbox = SubElement(obj, 'bndbox') SubElement(bbox, 'xmin').text = e_xmin SubElement(bbox, 'ymin').text = e_ymin SubElement(bbox, 'xmax').text = e_xmax SubElement(bbox, 'ymax').text = e_ymax #indent(root) tree = ElementTree(root) xml_filename = os.path.join('.', folder, os.path.splitext(filename)[0] + '.xml') tree.write(xml_filename) entries_by_filename = defaultdict(list) with open(file_dir, 'r', encoding='utf-8') as f_input_csv: csv_input = csv.reader(f_input_csv) header = next(csv_input) class_name, xmin, ymin, xmax, ymax, filename, width, height= header header[3]=str(int(header[1])+int(header[3])) header[4]=str(int(header[2])+int(header[4])) entries_by_filename[filename].append(header) for row in csv_input: class_name, xmin, ymin, xmax, ymax, filename, width, height= row row[3]=str(int(row[1])+int(row[3])) row[4]=str(int(row[2])+int(row[4])) #print(row) entries_by_filename[filename].append(row) for filename, entries in entries_by_filename.items(): #print(filename, len(entries)) write_xml(save_root2, filename, entries)
CSV to TFRecord format
from __future__ import division from __future__ import print_function from __future__ import absolute_import import os import io import pandas as pd import tensorflow as tf from PIL import Image from object_detection.utils import dataset_util from collections import namedtuple, OrderedDict flags = tf.compat.v1.app.flags flags.DEFINE_string('csv_input', '', 'Path to the CSV input') flags.DEFINE_string('output_path', '', 'Path to output TFRecord') flags.DEFINE_string('image_dir', '', 'Path to images') FLAGS = flags.FLAGS columns = ['class','xmin','ymin','xmax','ymax','filename','width','height'] data = pd.read_csv(FLAGS.csv_input,names = columns) df_reorder = data[['filename','width','height','class','xmin','ymin','xmax','ymax']] # rearrange column here df_reorder.to_csv('newcsv.csv', index=False) #TO-DO def class_text_to_int(row_label): if row_label == 'Man': return 1 elif row_label == 'Dog': return 2 elif row_label == 'Monitor': return 3 elif row_label == 'Machine': return 4 elif row_label == 'Girl': return 5 else: None def split(df, group): data = namedtuple('data', ['filename', 'object']) gb = df.groupby(group) return [data(filename, gb.get_group(x)) for filename, x in zip(gb.groups.keys(), gb.groups)] def create_tf_example(group, path): i =0 with tf.io.gfile.GFile(os.path.join(path, '{}'.format(group.filename)), 'rb') as fid: encoded_jpg = fid.read() encoded_jpg_io = io.BytesIO(encoded_jpg) image = Image.open(encoded_jpg_io) width, height = image.size filename = group.filename.encode('utf8') image_format = b'jpg' xmins = [] xmaxs = [] ymins = [] ymaxs = [] classes_text = [] classes = [] for index, row in group.object.iterrows(): xmins.append(row['xmin'] / width) xmaxs.append((row['xmax']+row['xmin']) / width) ymins.append(row['ymin'] / height) ymaxs.append((row['ymax']+row['ymin'])/ height) classes_text.append(row['class'].encode('utf8')) classes.append(class_text_to_int(row['class'])) tf_example = tf.train.Example(features=tf.train.Features(feature={ 'image/height': dataset_util.int64_feature(height), 'image/width': dataset_util.int64_feature(width), 'image/filename': dataset_util.bytes_feature(filename), 'image/source_id': dataset_util.bytes_feature(filename), 'image/encoded': dataset_util.bytes_feature(encoded_jpg), 'image/format': dataset_util.bytes_feature(image_format), 'image/object/bbox/xmin': dataset_util.float_list_feature(xmins), 'image/object/bbox/xmax': dataset_util.float_list_feature(xmaxs), 'image/object/bbox/ymin': dataset_util.float_list_feature(ymins), 'image/object/bbox/ymax': dataset_util.float_list_feature(ymaxs), 'image/object/class/text': dataset_util.bytes_list_feature(classes_text), 'image/object/class/label': dataset_util.int64_list_feature(classes), })) return tf_example def main(_): writer = tf.io.TFRecordWriter(FLAGS.output_path) path = os.path.join(FLAGS.image_dir) examples = pd.read_csv('newcsv.csv') grouped = split(examples, 'filename') for group in grouped: tf_example = create_tf_example(group, path) writer.write(tf_example.SerializeToString()) writer.close() output_path = os.path.join(os.getcwd(), FLAGS.output_path) print('Successfully created the TFRecords: {}'.format(output_path))
To achieve any conversions, we can convert it to CSV format. Then, CSV format can be converted to the desired format. For instance, we need to convert COCO to Pascal VOC. We will first convert COCO to CSV format. Then, CSV will be converted to Pascal VOC format
In the last part of the article, some direct conversions functions are provided:
XML to JSON format
import os import json import xml.etree.ElementTree as ET import glob START_BOUNDING_BOX_ID = 1 PRE_DEFINE_CATEGORIES = None def get(root, name): vars = root.findall(name) return vars def get_and_check(root, name, length): vars = root.findall(name) if len(vars) == 0: raise ValueError("Can not find %s in %s." % (name, root.tag)) if length > 0 and len(vars) != length: raise ValueError( "The size of %s is supposed to be %d, but is %d." % (name, length, len(vars)) ) if length == 1: vars = vars[0] return vars def get_filename(filename): filename = filename.replace("\\", "/") filename = os.path.splitext(os.path.basename(filename))[0] return str(filename) def get_categories(xml_files): """Generate category name to id mapping from a list of xml files. Arguments: xml_files {list} -- A list of xml file paths. Returns: dict -- category name to id mapping. """ classes_names = [] for xml_file in xml_files: tree = ET.parse(xml_file) root = tree.getroot() for member in root.findall("object"): classes_names.append(member[0].text) classes_names = list(set(classes_names)) classes_names.sort() return {name: i for i, name in enumerate(classes_names)} def convert(xml_files, json_file): json_dict = {"images": [], "type": "instances", "annotations": [], "categories": []} if PRE_DEFINE_CATEGORIES is not None: categories = PRE_DEFINE_CATEGORIES else: categories = get_categories(xml_files) bnd_id = START_BOUNDING_BOX_ID for xml_file in xml_files: tree = ET.parse(xml_file) root = tree.getroot() path = get(root, "path") if len(path) == 1: filename = os.path.basename(path[0].text) elif len(path) == 0: filename = get_and_check(root, "filename", 1).text else: raise ValueError("%d paths found in %s" % (len(path), xml_file)) ## The filename must be a number image_id = get_filename(filename) size = get_and_check(root, "size", 1) width = int(get_and_check(size, "width", 1).text) height = int(get_and_check(size, "height", 1).text) image = { "file_name": filename, "height": height, "width": width, "id": image_id, } json_dict["images"].append(image) for obj in get(root, "object"): category = get_and_check(obj, "name", 1).text if category not in categories: new_id = len(categories) categories[category] = new_id category_id = categories[category] bndbox = get_and_check(obj, "bndbox", 1) xmin = int(get_and_check(bndbox, "xmin", 1).text) - 1 ymin = int(get_and_check(bndbox, "ymin", 1).text) - 1 xmax = int(get_and_check(bndbox, "xmax", 1).text) ymax = int(get_and_check(bndbox, "ymax", 1).text) assert xmax > xmin assert ymax > ymin o_width = abs(xmax - xmin) o_height = abs(ymax - ymin) ann = { "area": o_width * o_height, "iscrowd": 0, "image_id": image_id, "bbox": [xmin, ymin, o_width, o_height], "category_id": category_id, "id": bnd_id, "ignore": 0, "segmentation": [], } json_dict["annotations"].append(ann) bnd_id = bnd_id + 1 for cate, cid in categories.items(): cat = {"supercategory": "none", "id": cid, "name": cate} json_dict["categories"].append(cat) os.makedirs(os.path.dirname(json_file), exist_ok=True) json_fp = open(json_file, "w") json_str = json.dumps(json_dict) json_fp.write(json_str) json_fp.close()
JSON to YOLO format
import json classes = ["Man","Monitor","Dog"] def convert(size,box): x = box[0] y = box[1] w = box[2] h = box[3] return (x,y,w,h) def convert_annotation(json_dir,destination_dir): with open(json_dir,'r') as f: data = json.load(f) for item in data['images']: image_id = item['id'] file_name = item['file_name'] width = item['width'] height = item['height'] value = filter(lambda item1: item1['image_id'] == image_id,data['annotations']) outfile = open(destination_dir+"%s.txt"%(file_name[:-4]), 'a+') for item2 in value: category_id = item2['category_id'] value1 = list(filter(lambda item3: item3['id'] == category_id,data['categories'])) name = value1[0]['name'] class_id = classes.index(name) box = item2['bbox'] bb = convert((width,height),box) outfile.write(str(class_id)+" "+" ".join([str(a) for a in bb]) + '\n') outfile.close()