## 223

participants

#### Topic provider

Pervasive Artificial Intelligence Research (PAIR) Labs, National Chiao Tung University
(NCTU), Taiwan

The Pervasive AI Research (PAIR) Labs, a group of national research labs funded by the
Ministry of Science and Technology, Taiwan, is commissioned to achieve academic
excellence, nurture local AI talents, build international linkage, and develop pragmatic
approaches in the areas of applied AI technologies toward services, products, workflows, and
supply chains innovation and optimization. PAIR is constituted of 18 distinguished research
institutes in Taiwan to conduct research in various of applied AI areas.

Website: https://pairlabs.ai/

Intelligent Vision System (IVS) Lab, National Yang Ming Chiao Tung University (NYCU), Taiwan (NCTU), Taiwan

The Intelligent Vision System (IVS) Lab at National Yang Ming Chiao Tung University is directed by Professor Jiun-In Guo. We are tackling practical open problems in autonomous driving research, which focuses on intelligent vision processing systems, applications, and SoC exploiting deep learning technology.

Website: http://ivs.ee.nctu.edu.tw/ivs/

AI System (AIS) Lab, National Cheng Kung University (NCKU), Taiwan

The AI System (AIS) Lab at National Cheng Kung University is directed by Professor ChiaChi Tsai. We dedicate our passion on the system with AI technology. Our research includes AI accelerator development, AI architecture improvement, and AI-based solutions to multimedia problems.

MediaTek

MediaTek Inc. is a Taiwanese fabless semiconductor company that provides chips for wireless communications, high-definition television, handheld mobile devices like smartphones and tablet computers, navigation systems, consumer multimedia products and digital subscriber line services as well as optical disc drives. MediaTek is known for advances in multimedia, AI and expertise delivering the most power possible – when and where needed. MediaTek’s chipsets are optimized to run cool and super power-efficient to extend battery life. Always a perfect balance of high performance, power-efficiency, and connectivity.

Website: https://www.mediatek.com/

Wistron-NCTU Embedded Artificial Intelligence Research Center

Sponsored by Wistron and founded in 2020 September, Wistron-NCTU Embedded Artificial Intelligence Research Center (E-AI RDC) is a young and enthusiastic research center leaded by Prof. Jiun-In Guo (Institute of Electronics, National Chiao Tung University) aiming at developing the key technology related to embedded AI applications, ranging from AI data acquisition and labeling, AI model development and optimization and AI computing platform development with the help of easy to use AI toolchain (called ezAIT). The target applications cover AIoT, ADAS/ADS, smart transportation, smart manufacturing, smart medical imaging, and emerging communication systems. In addition to developing the above-mentioned technology, E-AI RDC will also collaborate with international partners as well as industrial partners in cultivating the talents in the embedded AI field to further enhance the industrial competitiveness in Taiwan Industry

###### 2023/03/17 Private testing data has been released!

Dear competitor: We have announced Private Testing Data for Qualification Data. Please submit your result for qualification. Since the new private testing dataset is different from the previous dataset, the leaderboard is reset. The final ranking will be based on the score of the private leaderboard. Thank you!

#### Introduction

Object detection in the computer vision area has been extensively studied and making tremendous progress in recent years. Furthermore, image segmentation takes it to a new level by trying to find out accurately the exact boundary of the objects in the image. Semantic segmentation is in pursuit of more than just location of an object, going down to pixel level information. However, due to the heavy computation required in most deep learning-based algorithms, it is hard to run these models on embedded systems, which have limited computing capabilities. In addition, the existing open datasets for traffic scenes applied in ADAS applications usually include main lane, adjacent lanes, different lane marks (i.e. double line, single line, and dashed line) in western countries, which is not quite similar to that in Asian countries like Taiwan with lots of motorcycle riders speeding on city roads, such that the semantic segmentation models training by only using the existing open datasets will require extra technique for segmenting complex scenes in Asian countries. Often time, for most of the complicated applications, we are dealing with both object detection and segmentation task. We will have difficulties when accomplish these two tasks in separated models on limited-resources platform.

In this competition, we encourage the participants to design a lightweight single deep learning model to support multi-task functions, including semantic segmentation and object detection, that can be applied in Taiwan’s traffic scene with lots of fast speeding motorcycles running on city roads along with vehicles and pedestrians. The developed models not only fit for embedded systems but also achieve high accuracy at the same time.

This competition includes two stages: qualification and final competition.

• Qualification competition: all participants submit their answers online. A score is calculated. The top 15 teams would be qualified to enter the final round of the competition.
• Final competition: the final score will be evaluated on new MediaTek platform (Dimensity Series) platform for the final score.

The goal is to design a lightweight single deep learning model to support multi-task functions, including semantic segmentation and object detection, which is suitable for constrained embedded system design to deal with traffic scenes in Asian countries like Taiwan. We focus on segmentation/object detection accuracy, power consumption, real-time performance optimization and the deployment on MediaTek’s Dimensity Series platform.

With MediaTek’s Dimensity Series platform and its heterogeneous computing capabilities such as CPUs, GPUs and APUs (AI processing units) embedded into the system-on-chip products, developers are provided the high performance and power efficiency for building the AI features and applications. Developers can target these specific processing units within the system-on-chip or, they can also let MediaTek NeuroPilot SDK intelligently handle the processing allocation for them.

Given the test image dataset, participants are asked to do two tasks in a single model at the same time, which includes object detection and semantic segmentation. For the semantic segmentation task, the model should be able to segment each pixel belonging to the following six classes {background, main_lane, alter_lane, double_line, dashed_line, single_line} in each image. For the object detection task, the same model should be able to detect objects belonging to the following four classes {pedestrian, vehicle, scooter, bicycle} in each image, including class, bounding box, and confidence.

Reference

[1] F. Yu et al., “BDD100K: A Diverse Driving Dataset for Heterogeneous Multitask Learning”,
in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
Recognition (CVPR), 2020.
[2] Google, “Measuring device power : Android Open Source Project,” Android Open Source
Project. [Online]. Available:
https://source.android.com/devices/tech/power/device?hl=en#power-consumption.
[Accessed: 11-Nov-2021].
[3] M. Cordts et al., “The Cityscapes Dataset for Semantic Urban Scene Understanding”, in
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
(CVPR), 2016.
[4] COCO API: https://github.com/cocodataset/cocoapi
[5] Average Precision (AP): https://en.wikipedia.org/wiki/Evaluation_measures_(information_retrieval)#Average_precisio n
[6] Intersection over union (IoU): https://en.wikipedia.org/wiki/Jaccard_index

#### Prize

According to the points of each team in the final evaluation, we select the highest three teams for regular awards.

1. Champion:            $USD 1500 2. 1st Runner-up:$USD 1000
3. 3rd-place:              $USD 700 Special Award 1. Best INT8 model development Award:$USD 500
• Best overall score in the final competition using INT8 model development

All the award winners must agree to submit contest paper and attend the IEEE ICME2023 Grand Challenge PAIR Competition Special Session to present their work. If the paper failed to submit, or the length of the submitted paper is less than 3 pages, the award would be cancelled.

#### Activity time

DateEvent
2/03/2023Qualification Competition Start Date
2/03/2023Date to Release Public Testing Data
3/17/2023Date to Release Private Testing Data for Qualification
3/24/2023 19:59:59 PM UTC+8Qualification Competition End Date
3/25/2023 20:00 PM UTC+8Finalist Announcement
3/26/2023Final Competition Start Date
4/03/2023Date to Release Private Testing Data for Final
4/10/2023 19:59:59 PM UTC+8Final Competition End Date
4/19/2023 20:00 PM UTC+8Award Announcement
5/07/2023

#### Evaluation Criteria

Qualification Competition

The grading criteria is divided into two parts. One for the object detection score and the other for semantic segmentation score. Each account for 50％ points in the qualification competition. The evaluation metric for these two parts is listed below.

The evaluation metric for object detection is based on MSCOCO object detection rule.

• The mean Average Precision (mAP) is used to evaluate the result.
• Intersection over union (IoU) threshold is set at 0.5.

The resulting average precision (AP) of each class will be calculated and the mean AP (mAP) over all classes is evaluated as the key metric.

The evaluation metric for semantic segmentation is based on the Cityscapes [3] Pixel-Level Semantic Labeling Task.

• The standard Jaccard Index (commonly known as the PASCAL VOC intersection-over-union metric IoU = TP ⁄ (TP+FP+FN) is used to evaluate the semantic segmentation results.
• TP, FP, and FN are the numbers of true positive, false positive, and false negative pixels, respectively.
• Pixels labeled as void do not contribute to the score.

The IoU compares the prediction region with the ground truth region for a class and quantifies this based on the area of overlap between both regions. The IoU is calculated for each semantic class in an image and the mean of all class IoU scores makes up the mean Intersection over Union (mIoU) score.

The total score for qualification competition is listed below.

• Accuracy (mIoU)–50％
• The team with the highest accuracy will get the full score (50％) and the team with the lowest one will get zero. The rest teams will get scores directly proportional to the mIoU difference.
• Accuracy(mAP)–50％
• The team with the highest accuracy will get the full score (50％) and the team with the lowest one will get zero. The rest teams will get scores directly proportional to the mAP difference.

Final Competition

• Mandatory Criteria
• Accuracy of final submission cannot be 5％ lower (include) than their submitted model of qualification.
• The summation of Preprocessing & Postprocessing time of final submission cannot be 50％ slower (include) than the inference time of the main model. (Evaluated on the host machine)
• [Host] Accuracy (mIoU)–25％
The team with the highest accuracy will get the full score (25％) and the team with the lowest one will get zero. The rest teams will get scores directly proportional to the mIoU difference.
• [Host] Accuracy (mAP)–25％
The team with the highest accuracy will get the full score (25％) and the team with the lowest one will get zero. The rest teams will get scores directly proportional to the mAP difference.
• [Host] Model Computational Complexity(GOPs/frame)–12.5％
The team with the smallest GOP number per frame will get the full score (12.5％) and the team with the largest one will get zero. The rest teams will get scores directly proportional to the GOP value difference.
• [Host] Model size (number of parameters * bit width used in storing the parameters)–12.5％
The team with the smallest model will get the full score (12.5％) and the team with the largest one will get zero. The rest teams will get scores directly proportional to the model size difference.
• [Device] Power consumption (average current computation on MediaTek Dimensity Series)–12.5
• Measured by android battery fuel gauge on MediaTek’s Dimensity Series Platform [2]. The “BATTERY_PROPERTY_CURRENT_AVERAGE” mode is used in the evaluation.
• The team with a single model (w/o Preprocessing & Postprocessing) to complete the semantic segmentation task with the lowest power consumption will get the full score (12.5％) and the team with the largest one will get zero. The rest teams will get scores directly proportional to the average current computation difference.
• [Device] Speed on MediaTek Dimensity 9200 Series Platform–12.5％
The team with a single model (w/o Preprocessing & Postprocessing) to complete the semantic segmentation task in the shortest time will get the full score (12.5％) and the team that takes the longest time will get zero score. The rest teams will get scores directly proportional to the execution time difference.
The evaluation procedure will be toward the overall process from reading the private testing dataset in final to completing submission.csv file, including parsing image list, loading images, and any other overhead to conduct the semantic segmentation through the testing dataset.

Final Competition

The finalists have to hand in a package that includes SavedModel (should be compatible w/ freeze_graph.py@tensorflow_v2.0.0~v2.8.0) and inference script. We will deploy tensorflow model to MediaTek’s Dimensity Series platform and grade the final score by running the model.

A technical report is required to reveal the model structure, complexity, and execution efficiency, etc.

Submission File

Upload the zip file naming submission.zip containing the following files:

1. Tensorflow Inference Package
• An official Docker Image will be released by organizer
• Tensorflow version are restricted to the following version.
• Tensorflow v2.0.0~v2.8.0
• Other versions or other frameworks are not allowed.
• The following files must exist in the submitted inference package
• Tensorflow SaveModel
(refer to TF saved_model.md@tensorflow_v2.0.0~v2.8.0 for more detail)
• run_model.py [image_list.txt] [path of output results]
It runs your model to detect objects in test images listed in the image_list.txt and creates submission folder in the [path of output results]. The submission folder format is identical to the qualification competition.
• Source code of your model
The directory structure of source code shall be illustrated in README.txt
2. techreport.pdf
• The technical report that describes the model structure, complexity, execution efficiency, implementation techniques, and experiment results, etc.

#### Coordinator Contacts

Po-Chi Hu, tkuo@cs.nctu.edu.tw
Jenq-Neng Hwang,hwang@uw.edu
Jiun-In Guo, jiguo@nycu.edu.tw
Marvin Chen, marvin.chen@mediatek.com
Hsien-Kai Kuo, hsienkai.kuo@mediatek.com
Chia-Chi Tsai, cctsai@gs.ncku.edu.tw

#### Rules

• Team mergers are not allowed in this competition.
• Each team can consist of a maximum of 6 team members.
• The task is open to the public. Individuals, institutions of higher education, research
• A leaderboard will be set up and make available publicly.
• Multiple submissions are allowed before the deadline and the last one will be used to enter the final qualification consideration.
• The upload date/time will be used as the tiebreaker.
• Privately sharing code or data outside of teams is not permitted. It is okay to share
code if made available to all participants on the forums.
• Personnel of IVSLAB team are not allowed to participate in the task.
• A common honor code should be observed. Any violation will be disqualified.