科技生活

結束

Low-power Efficient and Accurate Facial-Landmark Detection for Embedded Systems (Final Competition)

In the realm of computer vision, the field of facial-landmark detection has witnessed remarkable progress, gaining increasing significance in diverse applications like augmented reality, facial recognition, and emotion analysis. While object detection identifies objects within images and semantic segmentation meticulously outlines object boundaries down to the pixel level, facial-landmark detection's purpose is to accurately pinpoint and track critical facial features.Nevertheless, the intricacies of facial features, particularly in dynamic settings, combined with the substantial computational demands of deep learning-based algorithms, present formidable challenges when deploying these models on embedded systems with limited computational capabilities. Additionally, the diversity in facial features across various ethnicities and expressions poses difficulties in constructing a universally robust model. For example, the nuances in facial features and expressions within Asian populations, such as those in Taiwan, might not be comprehensively represented in existing open datasets, which predominantly focus on Western demographics.In this competition, we extend an invitation to participants to engineer a lightweight yet potent single deep learning model tailored for excellence in facial-landmark detection tasks. This model should demonstrate the capacity to accurately locate key facial landmarks under a spectrum of conditions, encompassing diverse expressions, orientations, and lighting environments. The objective is to craft a model not only suitable for deployment on embedded systems but also one that maintains high accuracy and real-time performance.This competition includes two stages: qualification and final competition.Qualification competition: Participants initially submit their models online for evaluation. The top 15 teams, judged based on accuracy, will advance to the final round.Final competition: The ultimate assessment will occur on the innovative MediaTek platform, the Dimensity Series, centering on the model's performance within real-world scenarios.The challenge underscores the development of a solitary model adept at pinpointing a range of facial landmarks with remarkable precision. This encompasses the detection of subtle variations in critical facial aspects like the eyes, nose, mouth, and jawline. Alongside accuracy, the spotlight is on low power consumption, streamlined processing, and real-time performance, particularly on MediaTek's Dimensity Series platform.The MediaTek platform, boasting heterogeneous computing capabilities, inclusive of CPUs, GPUs, and AI Processing Units, offers elevated performance and energy efficiency, making it an ideal foundation for constructing AI-driven facial-landmark detection applications. Participants have the option to manually target these processing units or leverage MediaTek's NeuroPilot SDK for intelligent processing allocation.Participants are expected to showcase their model's prowess in the concurrent detection of multiple facial landmarks, thereby exemplifying precision and efficiency in a resource-constrained environment.Given the test image dataset, participants are required to utilize a single model to perform the task of facial-landmark detection. The model must identify and locate 51 specific facial landmarks in each image. The landmarks correspond to salient features on the face, which are critical for various applications such as identity verification, emotion recognition, and augmented reality. The model's output should include:A set of coordinates for each of the 51 landmarks on the face.A confidence score for the detection of each landmark, indicating the model's certainty.The landmarks to be detected will cover areas such as the eye contours, eyebrows, nose, and mouth. Participants must ensure that their model is robust and can handle variations in facial expressions, orientations, and lighting conditions. The precise detection of these facial points is crucial for the success of the model in real-world applications.Participants will submit their results as a TXT file for each test image, where each row corresponds to a landmark and includes the landmark's ID, the x and y coordinates, and the confidence score. The TXT file should be named according to the convention image_name_landmarks.txt. Accuracy will be assessed based on the mean error across all landmarks and images, normalized by the inter-ocular distance to account for different face sizes and positions within the images.Reference[1] “i·bug - resources - 300 Faces In-the-Wild Challenge (300-W), ICCV 2013.” Accessed: Dec. 04, 2023. [Online]. Available: https://ibug.doc.ic.ac.uk/resources/300-W/[2] Google, “Measuring device power : Android Open Source Project,” Android Open Source Project. [Online]. Available: https://source.android.com/devices/tech/power/device?hl=en#power-consumption. [Accessed: 11-Nov-2021].

2024-03-17T16:00:00+00:00 ~ 2024-04-03T08:00:00+00:00
結束

Low-power Efficient and Accurate Facial-Landmark Detection for Embedded Systems

In the realm of computer vision, the field of facial-landmark detection has witnessed remarkable progress, gaining increasing significance in diverse applications like augmented reality, facial recognition, and emotion analysis. While object detection identifies objects within images and semantic segmentation meticulously outlines object boundaries down to the pixel level, facial-landmark detection's purpose is to accurately pinpoint and track critical facial features.Nevertheless, the intricacies of facial features, particularly in dynamic settings, combined with the substantial computational demands of deep learning-based algorithms, present formidable challenges when deploying these models on embedded systems with limited computational capabilities. Additionally, the diversity in facial features across various ethnicities and expressions poses difficulties in constructing a universally robust model. For example, the nuances in facial features and expressions within Asian populations, such as those in Taiwan, might not be comprehensively represented in existing open datasets, which predominantly focus on Western demographics.In this competition, we extend an invitation to participants to engineer a lightweight yet potent single deep learning model tailored for excellence in facial-landmark detection tasks. This model should demonstrate the capacity to accurately locate key facial landmarks under a spectrum of conditions, encompassing diverse expressions, orientations, and lighting environments. The objective is to craft a model not only suitable for deployment on embedded systems but also one that maintains high accuracy and real-time performance.This competition includes two stages: qualification and final competition.Qualification competition: Participants initially submit their models online for evaluation. The top 15 teams, judged based on accuracy, will advance to the final round.Final competition: The ultimate assessment will occur on the innovative MediaTek platform, the Dimensity Series, centering on the model's performance within real-world scenarios.The challenge underscores the development of a solitary model adept at pinpointing a range of facial landmarks with remarkable precision. This encompasses the detection of subtle variations in critical facial aspects like the eyes, nose, mouth, and jawline. Alongside accuracy, the spotlight is on low power consumption, streamlined processing, and real-time performance, particularly on MediaTek's Dimensity Series platform.The MediaTek platform, boasting heterogeneous computing capabilities, inclusive of CPUs, GPUs, and AI Processing Units, offers elevated performance and energy efficiency, making it an ideal foundation for constructing AI-driven facial-landmark detection applications. Participants have the option to manually target these processing units or leverage MediaTek's NeuroPilot SDK for intelligent processing allocation.Participants are expected to showcase their model's prowess in the concurrent detection of multiple facial landmarks, thereby exemplifying precision and efficiency in a resource-constrained environment.Given the test image dataset, participants are required to utilize a single model to perform the task of facial-landmark detection. The model must identify and locate 51 specific facial landmarks in each image. The landmarks correspond to salient features on the face, which are critical for various applications such as identity verification, emotion recognition, and augmented reality. The model's output should include:A set of coordinates for each of the 51 landmarks on the face.A confidence score for the detection of each landmark, indicating the model's certainty.The landmarks to be detected will cover areas such as the eye contours, eyebrows, nose, and mouth. Participants must ensure that their model is robust and can handle variations in facial expressions, orientations, and lighting conditions. The precise detection of these facial points is crucial for the success of the model in real-world applications.Participants will submit their results as a TXT file for each test image, where each row corresponds to a landmark and includes the landmark's ID, the x and y coordinates, and the confidence score. The TXT file should be named according to the convention image_name_landmarks.txt. Accuracy will be assessed based on the mean error across all landmarks and images, normalized by the inter-ocular distance to account for different face sizes and positions within the images.Reference[1] “i·bug - resources - 300 Faces In-the-Wild Challenge (300-W), ICCV 2013.” Accessed: Dec. 04, 2023. [Online]. Available: https://ibug.doc.ic.ac.uk/resources/300-W/[2] Google, “Measuring device power : Android Open Source Project,” Android Open Source Project. [Online]. Available: https://source.android.com/devices/tech/power/device?hl=en#power-consumption. [Accessed: 11-Nov-2021].

2024-02-02T16:00:00+00:00 ~ 2024-03-17T11:59:59+00:00
結束

Challenge on Visual Attention Estimation in HMD 2023 (Final Competition)

To improve the experience of XR applications, techniques of visual attention estimation have been developed for predicting human intention so that the HMD can pre-render the visual content to reduce rendering latency. However, most deep learning-based algorithms have to pay heavy computation to achieve satisfactory accuracy. This is especially challenging for embedded systems with finite resources such as computing power and memory bandwidth (e.g. standalone HMD). In addition, this research field relies on richer data to advance the most cutting-edge progress, while the number and diversity of existing datasets were still lacking. In this challenge, we collected a set of 360° MR/VR videos along with the information of user head pose and eye gaze signals. The goal of this competition is to encourage contestants to design lightweight visual attention estimation models that can be deployed on an embedded device of constrained resources. The developed models need to not only achieve high fidelity but also show good performance on the device. This competition is divided into two stages: qualification and final competition.Qualification competition stage: all participants submit their answers online. A score is calculated based on the ranking of five evaluation metrics. The top 15 teams would be qualified to enter the final round of the competition.Final competition stage: the final score will be evaluated on MediaTek's platform for the final score. Given the test dataset containing 360° videos, participants are asked to estimate a saliency map for each video. To be more precise, each pixel has a predicted value in the range [0,1].Note that the goal of this challenge is to design a lightweight deep learning model suitable for constrained embedded systems. Therefore, we focus on prediction correctness, model size, computational complexity, performance optimization and the deployment on MediaTek’s Dimensity platform. With MediaTek’s platform and its heterogeneous computing capabilities such as CPUs, GPUs and APUs (AI processing units) embedded into the system-on-chip products, developers are provided the high performance and power efficiency for building the AI features and applications. Developers can target these specific processing units within the system-on-chip or, they can also let MediaTek NeuroPoint SDK intelligently handle the processing allocation for them. ***Please note that we use Tensorflow Lite in final competition stage of this challenge.  

2023-10-02T16:00:00+00:00 ~ 2023-11-06T15:59:59+00:00
結束

PAIR-LITEON Competition: Embedded AI Object Detection Model Design Contest on Fish-eye Around-view Cameras(Final Competition)

Object detection in the computer vision area has been extensively studied and making tremendous progress in recent years using deep learning methods. However, due to the heavy computation required in most deep learning-based algorithms, it is hard to run these models on embedded systems, which have limited computing capabilities. In addition, the existing open datasets for object detection applied in ADAS applications with the 3-D AVM scene usually include pedestrian, vehicles, cyclists, and motorcycle riders in western countries, which is not quite similar to the crowded Asian countries like Taiwan with lots of motorcycle riders speeding on city roads, such that the object detection models training by using the existing open datasets cannot be applied in detecting moving objects in Asian countries like Taiwan.In this competition, we encourage the participants to design object detection models that can be applied in Asian traffic with lots of fast speeding motorcycles running on city roads along with vehicles and pedestrians. The developed models not only fit for embedded systems but also achieve high accuracy at the same time.This competition is divided into two stages: qualification and final competition.Qualification competition: all participants submit their answers online. A score is calculated. We will announce the threshold at the beginning of the preliminary round. If participants achieve accuracy above the threshold during the qualification round, they will qualify to participate in the final round.Final competition: the final score will be evaluated on MemryX CIM computing platform [4] for the final score.The goal is to design a lightweight deep learning model suitable for constrained embedded system design to deal with traffic in Asian countries like Taiwan. We focus on detection accuracy, model size, computational complexity, performance optimization and the deployment on MemryX CIM SDK platform.MemryX [4] uses a proprietary, configurable native dataflow architecture, along with at-memory computing that sets the bar for Edge AI processing. The system architecture fundamentally eliminates the data movement bottleneck, while supporting future generations (new hardware, new processes/chemistries and new AI models) — all with the same software.Given the test image dataset, participants are asked to detect objects belonging to the following four classes {pedestrian, vehicle, scooter, bicycle} in each image, including class, bounding box, and confidence.Reference[1] COCO API: https://github.com/cocodataset/cocoapi[2] Average Precision (AP): https://en.wikipedia.org/wiki/Evaluation_measures_(information_retrieval)#Average_precisio n[3] Intersection over union (IoU): https://en.wikipedia.org/wiki/Jaccard_index[4]  MemryX SDK: https://memryx.com/technology/

2023-08-04T16:00:00+00:00 ~ 2023-09-06T15:59:59+00:00
結束

Challenge on Visual Attention Estimation in HMD 2023

To improve the experience of XR applications, techniques of visual attention estimation have been developed for predicting human intention so that the HMD can pre-render the visual content to reduce rendering latency. However, most deep learning-based algorithms have to pay heavy computation to achieve satisfactory accuracy. This is especially challenging for embedded systems with finite resources such as computing power and memory bandwidth (e.g. standalone HMD). In addition, this research field relies on richer data to advance the most cutting-edge progress, while the number and diversity of existing datasets were still lacking. In this challenge, we collected a set of 360° MR/VR videos along with the information of user head pose and eye gaze signals. The goal of this competition is to encourage contestants to design lightweight visual attention estimation models that can be deployed on an embedded device of constrained resources. The developed models need to not only achieve high fidelity but also show good performance on the device. This competition is divided into two stages: qualification and final competition.Qualification competition stage: all participants submit their answers online. A score is calculated based on the ranking of five evaluation metrics. The top 15 teams would be qualified to enter the final round of the competition.Final competition stage: the final score will be evaluated on MediaTek's platform for the final score. Given the test dataset containing 360° videos, participants are asked to estimate a saliency map for each video. To be more precise, each pixel has a predicted value in the range [0,1].Note that the goal of this challenge is to design a lightweight deep learning model suitable for constrained embedded systems. Therefore, we focus on prediction correctness, model size, computational complexity, performance optimization and the deployment on MediaTek’s Dimensity platform. With MediaTek’s platform and its heterogeneous computing capabilities such as CPUs, GPUs and APUs (AI processing units) embedded into the system-on-chip products, developers are provided the high performance and power efficiency for building the AI features and applications. Developers can target these specific processing units within the system-on-chip or, they can also let MediaTek NeuroPoint SDK intelligently handle the processing allocation for them. ***Please note that we use Tensorflow Lite in final competition stage of this challenge.  

2023-07-02T16:00:00+00:00 ~ 2023-10-02T15:59:59+00:00
結束

PAIR-LITEON Competition: Embedded AI Object Detection Model Design Contest on Fish-eye Around-view Cameras

Object detection in the computer vision area has been extensively studied and making tremendous progress in recent years using deep learning methods. However, due to the heavy computation required in most deep learning-based algorithms, it is hard to run these models on embedded systems, which have limited computing capabilities. In addition, the existing open datasets for object detection applied in ADAS applications with the 3-D AVM scene usually include pedestrian, vehicles, cyclists, and motorcycle riders in western countries, which is not quite similar to the crowded Asian countries like Taiwan with lots of motorcycle riders speeding on city roads, such that the object detection models training by using the existing open datasets cannot be applied in detecting moving objects in Asian countries like Taiwan.In this competition, we encourage the participants to design object detection models that can be applied in Asian traffic with lots of fast speeding motorcycles running on city roads along with vehicles and pedestrians. The developed models not only fit for embedded systems but also achieve high accuracy at the same time.This competition is divided into two stages: qualification and final competition.Qualification competition: all participants submit their answers online. A score is calculated. We will announce the threshold at the beginning of the preliminary round. If participants achieve accuracy above the threshold during the qualification round, they will qualify to participate in the final round.Final competition: the final score will be evaluated on MemryX CIM computing platform [4] for the final score.The goal is to design a lightweight deep learning model suitable for constrained embedded system design to deal with traffic in Asian countries like Taiwan. We focus on detection accuracy, model size, computational complexity, performance optimization and the deployment on MemryX CIM SDK platform.MemryX [4] uses a proprietary, configurable native dataflow architecture, along with at-memory computing that sets the bar for Edge AI processing. The system architecture fundamentally eliminates the data movement bottleneck, while supporting future generations (new hardware, new processes/chemistries and new AI models) — all with the same software.Given the test image dataset, participants are asked to detect objects belonging to the following four classes {pedestrian, vehicle, scooter, bicycle} in each image, including class, bounding box, and confidence.Reference[1] COCO API: https://github.com/cocodataset/cocoapi[2] Average Precision (AP): https://en.wikipedia.org/wiki/Evaluation_measures_(information_retrieval)#Average_precisio n[3] Intersection over union (IoU): https://en.wikipedia.org/wiki/Jaccard_index[4]  MemryX SDK: https://memryx.com/technology/

2023-06-14T16:00:00+00:00 ~ 2023-08-04T15:59:59+00:00
結束

Low-power Deep Learning Object Detection and Semantic Segmentation Multitask Model Compression Competition for Traffic Scene in Asian Countries

Object detection in the computer vision area has been extensively studied and making tremendous progress in recent years. Furthermore, image segmentation takes it to a new level by trying to find out accurately the exact boundary of the objects in the image. Semantic segmentation is in pursuit of more than just location of an object, going down to pixel level information. However, due to the heavy computation required in most deep learning-based algorithms, it is hard to run these models on embedded systems, which have limited computing capabilities. In addition, the existing open datasets for traffic scenes applied in ADAS applications usually include main lane, adjacent lanes, different lane marks (i.e. double line, single line, and dashed line) in western countries, which is not quite similar to that in Asian countries like Taiwan with lots of motorcycle riders speeding on city roads, such that the semantic segmentation models training by only using the existing open datasets will require extra technique for segmenting complex scenes in Asian countries. Often time, for most of the complicated applications, we are dealing with both object detection and segmentation task. We will have difficulties when accomplish these two tasks in separated models on limited-resources platform.In this competition, we encourage the participants to design a lightweight single deep learning model to support multi-task functions, including semantic segmentation and object detection, that can be applied in Taiwan’s traffic scene with lots of fast speeding motorcycles running on city roads along with vehicles and pedestrians. The developed models not only fit for embedded systems but also achieve high accuracy at the same time.This competition includes two stages: qualification and final competition.Qualification competition: all participants submit their answers online. A score is calculated. The top 15 teams would be qualified to enter the final round of the competition.Final competition: the final score will be evaluated on new MediaTek platform (Dimensity Series) platform for the final score.The goal is to design a lightweight single deep learning model to support multi-task functions, including semantic segmentation and object detection, which is suitable for constrained embedded system design to deal with traffic scenes in Asian countries like Taiwan. We focus on segmentation/object detection accuracy, power consumption, real-time performance optimization and the deployment on MediaTek’s Dimensity Series platform.With MediaTek’s Dimensity Series platform and its heterogeneous computing capabilities such as CPUs, GPUs and APUs (AI processing units) embedded into the system-on-chip products, developers are provided the high performance and power efficiency for building the AI features and applications. Developers can target these specific processing units within the system-on-chip or, they can also let MediaTek NeuroPilot SDK intelligently handle the processing allocation for them.Given the test image dataset, participants are asked to do two tasks in a single model at the same time, which includes object detection and semantic segmentation. For the semantic segmentation task, the model should be able to segment each pixel belonging to the following six classes {background, main_lane, alter_lane, double_line, dashed_line, single_line} in each image. For the object detection task, the same model should be able to detect objects belonging to the following four classes {pedestrian, vehicle, scooter, bicycle} in each image, including class, bounding box, and confidence.Reference[1] F. Yu et al., “BDD100K: A Diverse Driving Dataset for Heterogeneous Multitask Learning”,in Proceedings of the IEEE/CVF Conference on Computer Vision and PatternRecognition (CVPR), 2020.[2] Google, “Measuring device power : Android Open Source Project,” Android Open SourceProject. [Online]. Available: https://source.android.com/devices/tech/power/device?hl=en#power-consumption.[Accessed: 11-Nov-2021].[3] M. Cordts et al., “The Cityscapes Dataset for Semantic Urban Scene Understanding”, inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR), 2016.[4] COCO API: https://github.com/cocodataset/cocoapi[5] Average Precision (AP): https://en.wikipedia.org/wiki/Evaluation_measures_(information_retrieval)#Average_precisio n[6] Intersection over union (IoU): https://en.wikipedia.org/wiki/Jaccard_index

2023-02-02T16:00:00+00:00 ~ 2023-03-24T11:59:59+00:00
結束

教電腦看羽球 - 台灣第一個結合AI與運動的競賽

近年統計全球有約22億羽球人口,台灣則超過3百萬人,單項運動全國普及度排名第二,且近年羽球選手在國際賽場上有十分突出的表現,國民關注度逐漸提升。針對羽球技戰術分析,本團隊已提出比賽拍拍記錄格式並開發了電腦視覺輔助的快速拍拍標記程式,啟動羽球大數據的研究,雖然已使用許多電腦輔助的技巧,但人工進行拍拍標記依然需耗費人力及時間,尤其技術資料識別仍需具有羽球專業的人員來執行。透過本競賽期望能邀集具機器學習、影像處理及運動科學專長的專家與高手,開發高辨識率的自動拍拍標記模型,讓巨量羽球情蒐成為可能,普及羽球技戰術分析的科研與應用。相關問題諮詢,請洽:evawang.cs11@nycu.edu.tw競賽論壇: 2023 AI CUP:教電腦、看羽球、AI CUP 實戰人工智慧※ 本議題不開放AIdea平台直接報名,欲參加者請從 AI CUP 報名系統 報名若是第一次使用AI CUP報名系統的參賽者,可至連結查看 AI CUP 報名系統流程。報名完成後,請填寫前測問卷。 參賽對象報名時具中華民國學籍之各級在學學生(如:國高中生、大學生、研究生等)及社會人士皆可組隊參加。參賽組別分為學生組、社會人士組。學生組需要全隊皆為學生身分;隊伍內只要有一名非學生身分者,則該隊列為社會人士組。只有學生組隊伍列入綜合競賽敘獎排名。

2023-03-01T00:00:00+00:00 ~ 2023-05-17T09:00:00+00:00
結束

Low-power Deep Learning Semantic Segmentation Model Compression Competition for Traffic Scene in Asian Countries

Object detection in the computer vision area has been extensively studied and making tremendous progress in recent years. Furthermore, image segmentation takes it to a new level by trying to find out accurately the exact boundary of the objects in the image. Semantic segmentation is in pursuit of more than just location of an object, going down to pixel level information. However, due to the heavy computation required in most deep learning-based algorithms, it is hard to run these models on embedded systems, which have limited computing capabilities. In addition, the existing open datasets for traffic scenes applied in ADAS applications usually include main lane, adjacent lanes, different lane marks (i.e. double line, single line, and dashed line) in western countries, which is not quite similar to that in Asian countries like Taiwan with lots of motorcycle riders speeding on city roads, such that the semantic segmentation models training by only using the existing open datasets will require extra technique for segmenting complex scenes in Asian countries. In this competition, we encourage the participants to design semantic segmentation model that can be applied in Taiwan’s traffic scene with lots of fast speeding motorcycles running on city roads along with vehicles and pedestrians. The developed models not only fit for embedded systems but also achieve high accuracy at the same time.This competition includes two stages: qualification and final competition.Qualification competition: all participants submit their answers online. A score is calculated. The top 15 teams would be qualified to enter the final round of the competition.Final competition: the final score will be evaluated on new MediaTek platform (Dimensity Series) platform for the final score.The goal is to design a lightweight deep learning semantic segmentation model suitable for constrained embedded system design to deal with traffic scenes in Asian countries like Taiwan. We focus on segmentation accuracy, power consumption, real-time performance optimization and the deployment on MediaTek’s Dimensity Series platform.With MediaTek’s Dimensity Series platform and its heterogeneous computing capabilities such as CPUs, GPUs and APUs (AI processing units) embedded into the system-on-chip products, developers are provided the high performance and power efficiency for building the AI features and applications. Developers can target these specific processing units within the system-on-chip or, they can also let MediaTek NeuroPilot SDK intelligently handle the processing allocation for them.Given the test image dataset, participants are asked to segment each pixel belonging to the following six classes {background, main_lane, alter_lane, double_line, dashed_line, single_line} in each image.Reference[1] F. Yu et al., “BDD100K: A Diverse Driving Dataset for Heterogeneous Multitask Learning”,in Proceedings of the IEEE/CVF Conference on Computer Vision and PatternRecognition (CVPR), 2020.[2] Google, “Measuring device power : Android Open Source Project,” Android Open SourceProject. [Online]. Available: https://source.android.com/devices/tech/power/device?hl=en#power-consumption.[Accessed: 11-Nov-2021].[3] M. Cordts et al., “The Cityscapes Dataset for Semantic Urban Scene Understanding”, inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR), 2016.

2022-01-09T16:00:00+00:00 ~ 2022-02-21T15:59:59+00:00
結束

Visual Attention Estimation in HMD

To improve the experience of XR applications, techniques of visual attention estimation have been developed for predicting human intention so that the HMD can pre-render the visual content to reduce rendering latency. However, most deep learning-based algorithms have to pay heavy computation to achieve satisfactory accuracy. This is especially challenging for embedded systems with finite resources such as computing power and memory bandwidth (e.g. standalone HMD). In addition, this research field relies on richer data to advance the most cutting-edge progress, while the number and diversity of existing datasets were still lacking. In this challenge, we collected a set of 360° MR/VR videos along with the information of user head pose and eye gaze signals. The goal of this competition is to encourage contestants to design lightweight visual attention estimation models that can be deployed on an embedded device of constrained resources. The developed models need to not only achieve high fidelity but also show good performance on the device.   This competition is divided into two stages: qualification and final competition.Qualification competition: all participants submit their answers online. A score is calculated based on the ranking of five evaluation metrics. The top 15 teams would be qualified to enter the final round of the competition.Final competition: the final score will be evaluated on MediaTek's platform for the final score. Given the test dataset containing 360° videos, participants are asked to estimate a saliency map for each video. To be more precise, each pixel has a predicted value in the range [0,1]. Note that the goal of this challenge is to design a lightweight deep learning model suitable for constrained embedded systems. Therefore, we focus on prediction correctness, model size, computational complexity, performance optimization and the deployment on MediaTek’s Dimensity 1000+ platform. With MediaTek’s platform and its heterogeneous computing capabilities such as CPUs, GPUs and APUs (AI processing units) embedded into the system-on-chip products, developers are provided the high performance and power efficiency for building the AI features and applications. Developers can target these specific processing units within the system-on-chip or, they can also let MediaTek NeuroPoint SDK intelligently handle the processing allocation for them. ***Please note that we use Tensorflow Lite in this challenge.  

2021-08-01T16:00:00+00:00 ~ 2021-10-07T15:59:59+00:00
結束

Embedded Deep Learning Object Detection Model Compression Competition for Traffic in Asian Countries

Object detection in the computer vision area has been extensively studied and making tremendous progress in recent years using deep learning methods. However, due to the heavy computation required in most deep learning-based algorithms, it is hard to run these models on embedded systems, which have limited computing capabilities. In addition, the existing open datasets for object detection applied in ADAS applications usually include pedestrian, vehicles, cyclists, and motorcycle riders in western countries, which is not quite similar to the crowded Asian countries like Taiwan with lots of motorcycle riders speeding on city roads, such that the object detection models training by using the existing open datasets cannot be applied in detecting moving objects in Asian countries like Taiwan.In this competition, we encourage the participants to design object detection models that can be applied in Taiwan’s traffic with lots of fast speeding motorcycles running on city roads along with vehicles and pedestrians. The developed models not only fit for embedded systems but also achieve high accuracy at the same time.This competition is divided into two stages: qualification and final competition.Qualification competition: all participants submit their answers online. A score is calculated. The top 15 teams would be qualified to enter the final round of the competition.Final competition: the final score will be evaluated on MediaTek Dimensity 1000 Series platform for the final score.The goal is to design a lightweight deep learning model suitable for constrained embedded system design to deal with traffic in Asian countries like Taiwan. We focus on detection accuracy, model size, computational complexity, performance optimization and the deployment on MediaTek’s Dimensity 1000 platform.With MediaTek’s Dimensity 1000 platform and its heterogeneous computing capabilities such as CPUs, GPUs and APUs (AI processing units) embedded into the system-on-chip products, developers are provided the high performance and power efficiency for building the AI features and applications. Developers can target these specific processing units within the system-on-chip or, they can also let MediaTek NeuroPoint SDK intelligently handle the processing allocation for them.Given the test image dataset, participants are asked to detect objects belonging to the following four classes {pedestrian, vehicle, scooter, bicycle} in each image, including class, bounding box, and confidence.

2021-01-03T16:00:00+00:00 ~ 2021-03-08T12:00:00+00:00
結束

和弦辨識競賽

隨著時代的進展,聆聽音樂的方式已經從以往的 CD 轉移到各式各樣的音樂線上平台,如國外的 Spotify、Line Music,國內的 KKBOX、Friday Music 等。就 2016 年 IFPI 的報告指出,數位音樂的產值已經正式超過實體音樂產值,而實體音樂的產值正在連年下降中,顯示出趨勢正站在數位音樂這邊。而數位音樂的發展帶動了許多相關的 AI 智能應用,包括原曲辨識、哼唱選歌、樂曲分類等,其中 Line Music、KKBOX 與 Spotify 都相繼成立機器學習或人工智慧部門,專門透過歌曲內容與使用者的聆聽習慣來分析使用者的音樂喜好,並且提供上述各種 AI 服務來讓使用者方便聆聽到自己喜歡的音樂,進而衍生出加值的空間。如前所述,各式各樣的線上音樂平台正在積極的拓展各式各樣的 AI 音樂分析與應用,並成立屬於自己的機器學習或是人工智慧部門。Line Music 也即將在今年正式進駐台灣,顯示數位音樂的市場在經由科技、網路、以及社群平台的發展之後正在火速超越實體音樂的產值中。此類服務在機器學習的研究上可以分成兩個部分,一是針對使用者行為做推薦,另一則是針對歌曲的本質做推薦,在歌曲的本質上,許多基本的特性方法研究是一個必須的重點,如歌曲的主旋律、和弦、歌曲結構、曲風、節拍…等等的基本要素,這些基本要素構成了一首歌曲該如何被分類及推薦,因此熟悉這些音樂基本分析元素及機器學習之方法的人才在目前的音樂產業中是亟需被重點培養的。競賽論壇:AI CUP - 和弦辨識競賽參賽對象全國各大專院校在學生(含研究生),業界亦可參加,但不列入敘獎排名。競賽方式及評選辦法本競賽「和弦辨識」係依據主辦單位所提供之語料集,並經由主辦單位聘僱之音樂領域專家所標注的結果,以 WCSR 來評比各參賽隊伍的系統效能和名次,詳細辦法說明如下:主辦單位會標注好 500 首歌曲的資料,其中 300 首為測試資料集,作為最終的評分使用,另 200 首則會在競賽途中釋放作為訓練集所用。資料內含有原曲的和弦,對應的 YouTube 連結及對應的時間點(內含起始時間,結束時間及和弦,單位為 [秒、秒、和弦名稱] )。主要的比賽階段如下:第一階段:主辦單位提供 200 首經音樂專家標注具有和弦的資料集,此段時間內參賽者可以利用交互驗證(Cross Validation)的方式來作訓練及測試。第二階段:線上系統公佈,主辦單位會公布 300 首測試資料集的和弦資訊與對應的 YouTube 連結,參賽者可自行產生對應的和弦辨識結果並依照格式上傳至系統,在此階段只會公布 150 首之評估結果。第三階段:競賽時間截止,系統會以最後一筆上傳的結果計算剩下 150 首測試資料集的評估分數,並以此結果為最終排名依據。第四階段:參賽者必須在規定的時間內上傳報告說明文件,以茲證明無任何作弊或抄襲之可能,前九名的參賽者須提供程式碼,供主辦單位驗證。委員們將在此階段一一進行嚴格的文件審查。第五階段:競賽成績公佈。

2020-09-13T16:00:00+00:00 ~ 2021-01-09T15:59:59+00:00
結束

歌聲轉譜競賽

隨著時代的進展,聆聽音樂的方式已經從以往的 CD 轉移到各式各樣的音樂線上平台,如國外的 Spotify、Line Music,國內的 KKBOX、Friday Music 等。就 2016 年 IFPI 的報告指出,數位音樂的產值已經正式超過實體音樂產值,而實體音樂的產值正在連年下降中,顯示出趨勢正站在數位音樂這邊。而數位音樂的發展帶動了許多相關的 AI 智能應用,包括原曲辨識、哼唱選歌、樂曲分類等,其中 Line Music、KKBOX 與 Spotify 都相繼成立機器學習或人工智慧部門,專門透過歌曲內容與使用者的聆聽習慣來分析使用者的音樂喜好,並且提供上述各種 AI 服務來讓使用者方便聆聽到自己喜歡的音樂,進而衍生出加值的空間。如前所述,各式各樣的線上音樂平台正在積極的拓展各式各樣的 AI 音樂分析與應用,並成立屬於自己的機器學習或是人工智慧部門。Line Music 也即將在今年正式進駐台灣,顯示數位音樂的市場在經由科技、網路、以及社群平台的發展之後正在火速超越實體音樂的產值中。此類服務在機器學習的研究上可以分成兩個部分,一是針對使用者行為做推薦,另一則是針對歌曲的本質做推薦,在歌曲的本質上,許多基本的特性方法研究是一個必須的重點,如歌曲的主旋律、和弦、歌曲結構、曲風、節拍…等等的基本要素,這些基本要素構成了一首歌曲該如何被分類及推薦,因此熟悉這些音樂基本分析元素及機器學習之方法的人才在目前的音樂產業中是亟需被重點培養的。競賽論壇:AI CUP - 歌聲轉譜競賽參賽對象全國各大專院校在學生(含研究生),業界亦可參加,但不列入敘獎排名。競賽獎金敘獎對象須為報名時具中華民國各大專校院之在學生,敘獎時需檢附相關證明。在「歌聲轉譜競賽」項目前九名的優勝隊伍將分別獲得競賽獎金:名次獎金第一名10 萬元第二名5 萬元第三名3.5 萬元優等1.5 萬元佳作五名各 1 萬元前三名獲獎隊伍經評審委員審定後將獲得教育部獎狀乙紙。名次在前 25% 且超過 Baseline 之隊伍,經評審委員會審定後,將獲頒教育部人工智慧競賽計畫辦公室獎狀。各項獎勵名額得視參賽件數及成績酌予調整,參賽作品未達水準時,得由決選評審委員決定從缺,或不足額入選。競賽方式及評選辦法本競賽「歌聲轉譜」係依據主辦單位所提供之語料集,並經由主辦單位聘僱之音樂領域專家所標註的結果,以 F1-measure 來評比各參賽隊伍的系統效能和名次,詳細辦法說明如下:主辦單位會標註好 2000 首歌曲的資料,其中 1500 首為測試資料集,作為最終的評分使用,另 500 首則會在競賽途中釋放作為訓練集所用。資料內含有有原曲的音高(單位:Semitone),對應的 YouTube 連結及對應的音符(內含起始時間,結束時間及音高,單位為[毫秒、毫秒、Semitone])。主要的比賽階段如下:第一階段:主辦單位提供 500 首經音樂專家標註具有歌聲轉譜的資料集,此段時間內參賽者可以利用交互驗證(Cross Validation)的方式來作訓練及測試。第二階段:線上系統公佈,主辦單位會公布 1500 首測試資料集的音高與對應的 YouTube 連結,參賽者可自行產生對應的音符並依照格式上傳至系統,在此階段只會公布 750 首之評估結果。第三階段:競賽時間截止,系統會以最後一筆上傳的結果計算剩下 750 首測試資料集的評估分數,並以此測試資料集的結果為最終排名依據。第四階段:參賽者必須在競賽截止的兩周內上傳報告說明文件,以茲證明無任何作弊或抄襲之可能,前九名的參賽者須提供程式碼,供主辦單位驗證。委員們將在此階段一一進行嚴格的文件審查。第五階段:競賽成績公佈。

2020-03-04T16:00:00+00:00 ~ 2020-07-10T15:59:59+00:00
結束

Embedded Deep Learning Object Detection Model Compression Competition for Traffic in Asian Countries

Object detection in the computer vision area has been extensively studied and making tremendous progress in recent years using deep learning methods. However, due to the heavy computation required in most deep learning-based algorithms, it is hard to run these models on embedded systems, which have limited computing capabilities. In addition, the existing open datasets for object detection applied in ADAS applications usually include pedestrian, vehicles, cyclists, and motorcycle riders in western countries, which is not quite similar to the crowded Asian countries with lots of motorcycle riders speeding on city roads, such that the object detection models trained by using the existing open datasets cannot be directly applied in detecting moving objects in Asian countries.In this competition, we encourage the participants to design object detection models that can be applied in the competition’s traffic with lots of fast speeding motorcycles running on city roads along with vehicles and pedestrians. The developed models not only can fit for embedded systems but also can achieve high accuracy at the same time.This competition is divided into two stages: qualification and final competition.Qualification competition: all participants submit their answers online. A score is calculated. The top 15 teams would be qualified to enter the final round of the competition.Final competition: the final score will be validated and evaluated over NVIDIA Jetson TX2 by the organizing team for the final score.The goal is to design a lightweight deep learning model suitable for constrained embedded system design to deal with traffic in Asian countries. We focus on detection accuracy, model size, computational complexity, and performance optimization on NVIDIA Jetson TX2 based on a predefined metric.Given the test image dataset, participants are asked to detect objects belonging to the following four classes {pedestrian, vehicle, scooter, bicycle} in each image, including class and bounding box.Prize InformationAccording to the points of each team in the final evaluation, we select the highest three teams for cash awards.Champion: $USD 1,5001st Runner-up: $USD 1,0002nd Runner-up: $USD 750Special AwardsBest accuracy award – award for the highest mAP in the final competition: $USD 200;Best bicycle detection award – award for the highest AP of bicycle recognition in the final competition: $USD 200;Best scooter detection award – award for the highest AP of scooter recognition in the final competition: $USD 200;All the award winners must agree to submit contest paper, allow to open source the final codes, and attend the ICME2020 Grand Challenge PAIR Competition Special Session to present their work.

2019-11-30T16:00:00+00:00 ~ 2020-01-30T15:59:59+00:00
結束

新聞立場檢索應用獎金賽

「新聞立場檢索應用獎金賽」報名開跑啦!獎金最高十萬元!歡迎商管、人文系所同學揪團一起來挑戰,結合跨領域專長與創意,為專業技術激發意想不到的影響力!具爭議性議題的新聞一直是閱聽人關注與討論的焦點,例如:美國牛肉開放進口、死刑廢除、多元成家等。不論是政治、經濟、教育、兩性、能源、環保等公共議題,新聞媒體常需報導不同的立場。若能從大量的新聞文件裡,快速搜尋各種爭議性議題中具特定立場的新聞,不但有助於人們理解不同立場對這些議題的認知與價值觀,對制定決策的過程而言,也相當有參考價值。近年來,許多與意見立場相關的資訊檢索應用應運而生,例如透過社群媒體的「社群聆聽」進行選舉時的民意調查,或是經由網路論壇的輿情分析,掌握某一產品或品牌的市場評價,足以顯示這個問題的重要性。本新聞立場檢索競賽意圖整合「資訊檢索」及「意見探勘」兩種不同的競賽類型,分別規畫「新聞立場檢索技術獎金賽」與「新聞立場檢索應用獎金賽」兩個項目。在「新聞立場檢索應用獎金賽」中,可使用各種合法授權之資料,惟必須包含「新聞立場檢索技術獎金賽」開發之技術及資料。參賽隊伍需提出企劃書並完成應用雛形系統之實作。期望能在不同領域訓練應用人工智慧相關技術人才,同時藉由不同學科領域的觀點,激發各種技術創新、經驗創新等面向的價值與潛力。參賽對象全國各大專院校在學生競賽獎金敘獎對象必須為全學生之隊伍,敘獎時需要檢附相關證明。在「新聞立場檢索應用獎金賽」項目前三名優勝隊伍將獲得競賽獎金:名次獎金首獎10 萬元最佳創意獎6 萬元最有潛力獎4 萬元前三名獲獎隊伍經評審委員審定後將獲得教育部獎狀乙紙。各項獎勵名額得視參賽件數及成績酌予調整,參賽作品未達水準時,得由決選評審委員決定從缺,或不足額入選。競賽方式及評選辦法第一階段:在「新聞立場檢索應用獎金賽」項目,參賽隊伍應基於「新聞立場檢索技術獎金賽」之技術,提出相關應用的企劃書,企劃書內容可使用各種合法授權之資料,惟系統展示時,必須包含「新聞立場檢索技術獎金賽」之資料。若參賽隊伍超過十隊,將依第一階段企劃書選出前十名進入第二階段決選。第二階段:在決選期限前,入選的參賽隊伍需先繳交成果報告書,以利之後的決選審查。參賽隊伍需在決選中現場報告企劃,並提供系統雛形於現場展示與測試,所開發之應用系統形式不限。得獎名單將於 11 月公布於競賽網頁,並以 Email 通知。

2019-03-21T16:00:00+00:00 ~ 2019-11-20T15:59:59+00:00
結束

新聞立場檢索技術獎金賽

具爭議性議題的新聞一直是閱聽人關注與討論的焦點,例如:美國牛肉開放進口、死刑廢除、多元成家等。不論是政治、經濟、教育、兩性、能源、環保等公共議題,新聞媒體常需報導不同的立場。若能從大量的新聞文件裡,快速搜尋各種爭議性議題中具特定立場的新聞,不但有助於人們理解不同立場對這些議題的認知與價值觀,對制定決策的過程而言,也相當有參考價值。參與本競賽之隊伍需開發一搜尋引擎,找出「與爭議性議題相關」且「符合特定立場」的新聞。本競賽網站以網頁連結(Hyperlink)方式,提供國內各大媒體新聞作為競賽用的資料;本網站亦提供參賽隊伍一些「包含立場和爭議性議題」的查詢題目(例如:「反對學雜費調漲」)以及部分標註資料(例如:「相關」與「不相關」),協助參賽隊伍應用「資訊檢索」及「機器學習」技術於檢索模型的訓練,期望所開發之搜尋引擎能有效找出與「反對學雜費調漲」的相關新聞,並依照相關程度由高至低排列。競賽獎金敘獎對象必須為全學生之隊伍,敘獎時需要檢附相關證明。在「新聞立場檢索技術獎金賽」項目前十三名的優勝隊伍將獲得競賽獎金:名次獎金第一名10 萬元第二名6 萬元第三名4 萬元佳作 10 名各 1 萬元前十三名獲獎隊伍經評審委員審定後將獲得教育部獎狀乙紙。名次在前 25% 且超過 Baseline 之隊伍,經評審委員會審定後,將獲頒教育部人工智慧競賽計畫辦公室獎狀。各項獎勵名額得視參賽件數及成績酌予調整,參賽作品未達水準時,得由決選評審委員決定從缺,或不足額入選。參賽對象全國各大專院校在學生,業界亦可參加,但不列入敘獎排名。競賽方式及評選辦法本競賽「新聞立場檢索技術獎金賽」係依據主辦單位提供之語料集建構檢索系統,並經由主辦單位指定之測試查詢主題結果,來評比各參賽隊伍的系統效能和名次,詳細辦法說明如下:本競賽分為兩階段,主辦單位於每個階段提供以下資料:第一階段:「部分新聞語料庫(NC-1)」及「其測試查詢題目(QS-1)」第二階段:「完整新聞語料庫(NC-2)」及「其測試查詢題目(QS-2)」「完整新聞語料庫」包含「部分新聞語料庫」((NC-1⊂NC-2)),「第二階段之測試查詢題目」包含「第一階段之測試查詢題目」((QS-1⊂QS-2))。,第一階段另外提供「訓練標記語料((TD))」「訓練標記語料(TD)」作為訓練模型之參考,詳述如下:1. 第一階段:參賽隊伍需從「部分新聞語料庫(NC-1)」中搜索出其對應之「測試查詢題目(QS-1)」的相關文章,每個查詢題目需回傳排名前 300 名的新聞,並上傳搜尋結果至線上排名系統以調校模型效能,一天最多上傳 10 次。該系統使用 ${MAP@300}$ 指標評分,之後會說明 ${MAP@300}$ 的計算方法。主辦單位於本階段另提供「訓練標記語料(TD)」,每項標記語料包含「訓練用的查詢題目(QS-t)」、「部分新聞語料庫(NC-1)中的某篇新聞」及「該新聞針對查詢題目的相關程度」;其中相關程度分四個等級,分別代表不相關 (0)、部分相關 (1)、相關 (2)、非常相關 (3);「訓練標記語料(TD)」並非「部分新聞語料庫(NC-1)」之完整標記,意即「部分新聞語料庫(NC-1)」的某些新聞可能沒有標記;「測試查詢題目(QS-1)」包含「訓練用的查詢題目(QS-t)」中的 5 個查詢題目。2. 第二階段:參賽隊伍需從「完整新聞語料庫(NC-2)」中搜索出其對應之「測試查詢題目(QS-2)」的相關文章,每個查詢題目需回傳排名前 300 名的新聞。請注意:第二階段上傳的結果將決定本競賽之最終排名,此階段不再提供第一階段線上排名系統的服務,主辦單位會提前公布「完整新聞語料庫(NC-2)」,參賽隊伍必須於「測試查詢題目(QS-2)」公布之當日截止時間前上傳「完整新聞語料庫(NC-2)」的搜索結果,至多上傳 7 次,最後一次上傳的答案將作為評分對象。

2019-03-21T16:00:00+00:00 ~ 2019-09-02T15:59:59+00:00