Pervasive Artificial Intelligence Research (PAIR) Labs, National Yang Ming Chiao Tung University (NYCU), Taiwan
The Pervasive AI Research (PAIR) Labs, a group of national research labs funded by the National Science and Technology Council, Taiwan, is commissioned to achieve academic excellence, nurture local AI talents, build international linkage, and develop pragmatic approaches in the areas of applied AI technologies toward service, product, workflow, and supply chain innovation and optimization. PAIR is now a top-tier research lab located at the Guangfu campus of National Yang Ming Chiao Tung University and constituted of 10 distinguished research teams to conduct research in various applied AI areas.
Intelligent Vision System (IVS) Lab, National Yang Ming Chiao Tung University (NYCU), Taiwan (NYCU), Taiwan
The Intelligent Vision System (IVS) Lab at National Yang Ming Chiao Tung University is directed by Professor Jiun-In Guo. We are tackling practical open problems in autonomous driving research, which focuses on intelligent vision processing systems, applications, and SoC exploiting deep learning technology.
AI System (AIS) Lab, National Cheng Kung University (NCKU), Taiwan
The AI System (AIS) Lab at National Cheng Kung University is directed by Professor ChiaChi Tsai. We dedicate our passion on the system with AI technology. Our research includes AI accelerator development, AI architecture improvement, and AI-based solutions to multimedia problems.
MediaTek Inc. is a Taiwanese fabless semiconductor company that provides chips for wireless communications, high-definition television, handheld mobile devices like smartphones and tablet computers, navigation systems, consumer multimedia products and digital subscriber line services as well as optical disc drives. MediaTek is known for advances in multimedia, AI and expertise delivering the most power possible – when and where needed. MediaTek’s chipsets are optimized to run cool and super power-efficient to extend battery life. Always a perfect balance of high performance, power-efficiency, and connectivity.
In the realm of computer vision, the field of facial-landmark detection has witnessed remarkable progress, gaining increasing significance in diverse applications like augmented reality, facial recognition, and emotion analysis. While object detection identifies objects within images and semantic segmentation meticulously outlines object boundaries down to the pixel level, facial-landmark detection's purpose is to accurately pinpoint and track critical facial features.
Nevertheless, the intricacies of facial features, particularly in dynamic settings, combined with the substantial computational demands of deep learning-based algorithms, present formidable challenges when deploying these models on embedded systems with limited computational capabilities. Additionally, the diversity in facial features across various ethnicities and expressions poses difficulties in constructing a universally robust model. For example, the nuances in facial features and expressions within Asian populations, such as those in Taiwan, might not be comprehensively represented in existing open datasets, which predominantly focus on Western demographics.
In this competition, we extend an invitation to participants to engineer a lightweight yet potent single deep learning model tailored for excellence in facial-landmark detection tasks. This model should demonstrate the capacity to accurately locate key facial landmarks under a spectrum of conditions, encompassing diverse expressions, orientations, and lighting environments. The objective is to craft a model not only suitable for deployment on embedded systems but also one that maintains high accuracy and real-time performance.
This competition includes two stages: qualification and final competition.
The challenge underscores the development of a solitary model adept at pinpointing a range of facial landmarks with remarkable precision. This encompasses the detection of subtle variations in critical facial aspects like the eyes, nose, mouth, and jawline. Alongside accuracy, the spotlight is on low power consumption, streamlined processing, and real-time performance, particularly on MediaTek's Dimensity Series platform.
The MediaTek platform, boasting heterogeneous computing capabilities, inclusive of CPUs, GPUs, and AI Processing Units, offers elevated performance and energy efficiency, making it an ideal foundation for constructing AI-driven facial-landmark detection applications. Participants have the option to manually target these processing units or leverage MediaTek's NeuroPilot SDK for intelligent processing allocation.
Participants are expected to showcase their model's prowess in the concurrent detection of multiple facial landmarks, thereby exemplifying precision and efficiency in a resource-constrained environment.
Given the test image dataset, participants are required to utilize a single model to perform the task of facial-landmark detection. The model must identify and locate 51 specific facial landmarks in each image. The landmarks correspond to salient features on the face, which are critical for various applications such as identity verification, emotion recognition, and augmented reality. The model's output should include:
A set of coordinates for each of the 51 landmarks on the face.A confidence score for the detection of each landmark, indicating the model's certainty.The landmarks to be detected will cover areas such as the eye contours, eyebrows, nose, and mouth. Participants must ensure that their model is robust and can handle variations in facial expressions, orientations, and lighting conditions. The precise detection of these facial points is crucial for the success of the model in real-world applications.
Participants will submit their results as a TXT file for each test image, where each row corresponds to a landmark and includes the landmark's ID, the x and y coordinates, and the confidence score. The TXT file should be named according to the convention image_name_landmarks.txt. Accuracy will be assessed based on the mean error across all landmarks and images, normalized by the inter-ocular distance to account for different face sizes and positions within the images.
 “i·bug - resources - 300 Faces In-the-Wild Challenge (300-W), ICCV 2013.” Accessed: Dec. 04, 2023. [Online]. Available: https://ibug.doc.ic.ac.uk/resources/300-W/
 Google, “Measuring device power : Android Open Source Project,” Android Open Source Project. [Online]. Available: https://source.android.com/devices/tech/power/device?hl=en#power-consumption. [Accessed: 11-Nov-2021].
According to the points of each team in the final evaluation, we select the highest three teams for regular awards.
All the award winners must agree to submit contest paper and attend the IEEE ICME2024 Grand Challenge PAIR Competition Special Session to present their work. If the paper failed to submit, or the length of the submitted paper is less than 3 pages, the award would be cancelled.
Deadline for Submission（UTC+8）:
|Qualification Competition Start Date
|Date to Release Private Testing Data for Qualification
|3/17/2024 12:00 PM UTC
|Qualification Competition End Date
|3/18/2024 12:00 AM UTC
|Final Competition Start Date
|Date to Release Private Testing Data for Final
|4/01/2024 12:00 PM UTC
|Final Competition End Date
|4/03/2024 12:00 PM UTC
|Invited Paper Submission Deadline
|Camera ready form deadline
The grading criteria are based on the direct measurement of the accuracy of facial-landmark detection. The evaluation metric and scoring system will be as follows:
Facial-Landmark Detection Accuracy:
The evaluation will be based on the mean squared error (MSE) between the predicted and actual landmark coordinates for all the landmarks detected in each image.
Each landmark detection will be scored based on the precision of its location. The more accurate the landmark''s predicted coordinates are to the actual, the higher the score awarded for that landmark.
For each landmark in an image, a set number of points are awarded based on the deviation from the actual landmark position. For example:
And so on, with the points decreasing as the error margin increases.
The total points from all landmarks detected in an image will be summed up to give a final image score.
The final score for each team in the qualification competition will be the average of their image scores across the entire test dataset.
Teams will be ranked based on their total average score, with the highest-scoring team placed at the top. This direct correlation between detection precision and score is designed to incentivize the most accurate facial-landmark detection possible.
The evaluation procedure will be toward the overall process from reading the private testing dataset in final to completing submission.csv file, including parsing image list, loading images, and any other overhead to conduct the facial-landmark detection through the testing dataset.