Low-power Deep Learning Object Detection and Semantic Segmentation Multitask Model Compression Competition for Traffic Scene in Asian Countries
Object detection in the computer vision area has been extensively studied and making tremendous progress in recent years. Furthermore, image segmentation takes it to a new level by trying to find out accurately the exact boundary of the objects in the image. Semantic segmentation is in pursuit of more than just location of an object, going down to pixel level information. However, due to the heavy computation required in most deep learning-based algorithms, it is hard to run these models on embedded systems, which have limited computing capabilities. In addition, the existing open datasets for traffic scenes applied in ADAS applications usually include main lane, adjacent lanes, different lane marks (i.e. double line, single line, and dashed line) in western countries, which is not quite similar to that in Asian countries like Taiwan with lots of motorcycle riders speeding on city roads, such that the semantic segmentation models training by only using the existing open datasets will require extra technique for segmenting complex scenes in Asian countries. Often time, for most of the complicated applications, we are dealing with both object detection and segmentation task. We will have difficulties when accomplish these two tasks in separated models on limited-resources platform.In this competition, we encourage the participants to design a lightweight single deep learning model to support multi-task functions, including semantic segmentation and object detection, that can be applied in Taiwan’s traffic scene with lots of fast speeding motorcycles running on city roads along with vehicles and pedestrians. The developed models not only fit for embedded systems but also achieve high accuracy at the same time.This competition includes two stages: qualification and final competition.Qualification competition: all participants submit their answers online. A score is calculated. The top 15 teams would be qualified to enter the final round of the competition.Final competition: the final score will be evaluated on new MediaTek platform (Dimensity Series) platform for the final score.The goal is to design a lightweight single deep learning model to support multi-task functions, including semantic segmentation and object detection, which is suitable for constrained embedded system design to deal with traffic scenes in Asian countries like Taiwan. We focus on segmentation/object detection accuracy, power consumption, real-time performance optimization and the deployment on MediaTek’s Dimensity Series platform.With MediaTek’s Dimensity Series platform and its heterogeneous computing capabilities such as CPUs, GPUs and APUs (AI processing units) embedded into the system-on-chip products, developers are provided the high performance and power efficiency for building the AI features and applications. Developers can target these specific processing units within the system-on-chip or, they can also let MediaTek NeuroPilot SDK intelligently handle the processing allocation for them.Given the test image dataset, participants are asked to do two tasks in a single model at the same time, which includes object detection and semantic segmentation. For the semantic segmentation task, the model should be able to segment each pixel belonging to the following six classes {background, main_lane, alter_lane, double_line, dashed_line, single_line} in each image. For the object detection task, the same model should be able to detect objects belonging to the following four classes {pedestrian, vehicle, scooter, bicycle} in each image, including class, bounding box, and confidence.Reference[1] F. Yu et al., “BDD100K: A Diverse Driving Dataset for Heterogeneous Multitask Learning”,in Proceedings of the IEEE/CVF Conference on Computer Vision and PatternRecognition (CVPR), 2020.[2] Google, “Measuring device power : Android Open Source Project,” Android Open SourceProject. [Online]. Available: https://source.android.com/devices/tech/power/device?hl=en#power-consumption.[Accessed: 11-Nov-2021].[3] M. Cordts et al., “The Cityscapes Dataset for Semantic Urban Scene Understanding”, inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR), 2016.[4] COCO API: https://github.com/cocodataset/cocoapi[5] Average Precision (AP): https://en.wikipedia.org/wiki/Evaluation_measures_(information_retrieval)#Average_precisio n[6] Intersection over union (IoU): https://en.wikipedia.org/wiki/Jaccard_index
2023-02-02T16:00:00+00:00 ~ 2023-03-24T11:59:59+00:00