Low-power Efficient and Accurate Facial-Landmark Detection for Embedded Systems (Final Competition)
In the realm of computer vision, the field of facial-landmark detection has witnessed remarkable progress, gaining increasing significance in diverse applications like augmented reality, facial recognition, and emotion analysis. While object detection identifies objects within images and semantic segmentation meticulously outlines object boundaries down to the pixel level, facial-landmark detection's purpose is to accurately pinpoint and track critical facial features.Nevertheless, the intricacies of facial features, particularly in dynamic settings, combined with the substantial computational demands of deep learning-based algorithms, present formidable challenges when deploying these models on embedded systems with limited computational capabilities. Additionally, the diversity in facial features across various ethnicities and expressions poses difficulties in constructing a universally robust model. For example, the nuances in facial features and expressions within Asian populations, such as those in Taiwan, might not be comprehensively represented in existing open datasets, which predominantly focus on Western demographics.In this competition, we extend an invitation to participants to engineer a lightweight yet potent single deep learning model tailored for excellence in facial-landmark detection tasks. This model should demonstrate the capacity to accurately locate key facial landmarks under a spectrum of conditions, encompassing diverse expressions, orientations, and lighting environments. The objective is to craft a model not only suitable for deployment on embedded systems but also one that maintains high accuracy and real-time performance.This competition includes two stages: qualification and final competition.Qualification competition: Participants initially submit their models online for evaluation. The top 15 teams, judged based on accuracy, will advance to the final round.Final competition: The ultimate assessment will occur on the innovative MediaTek platform, the Dimensity Series, centering on the model's performance within real-world scenarios.The challenge underscores the development of a solitary model adept at pinpointing a range of facial landmarks with remarkable precision. This encompasses the detection of subtle variations in critical facial aspects like the eyes, nose, mouth, and jawline. Alongside accuracy, the spotlight is on low power consumption, streamlined processing, and real-time performance, particularly on MediaTek's Dimensity Series platform.The MediaTek platform, boasting heterogeneous computing capabilities, inclusive of CPUs, GPUs, and AI Processing Units, offers elevated performance and energy efficiency, making it an ideal foundation for constructing AI-driven facial-landmark detection applications. Participants have the option to manually target these processing units or leverage MediaTek's NeuroPilot SDK for intelligent processing allocation.Participants are expected to showcase their model's prowess in the concurrent detection of multiple facial landmarks, thereby exemplifying precision and efficiency in a resource-constrained environment.Given the test image dataset, participants are required to utilize a single model to perform the task of facial-landmark detection. The model must identify and locate 51 specific facial landmarks in each image. The landmarks correspond to salient features on the face, which are critical for various applications such as identity verification, emotion recognition, and augmented reality. The model's output should include:A set of coordinates for each of the 51 landmarks on the face.A confidence score for the detection of each landmark, indicating the model's certainty.The landmarks to be detected will cover areas such as the eye contours, eyebrows, nose, and mouth. Participants must ensure that their model is robust and can handle variations in facial expressions, orientations, and lighting conditions. The precise detection of these facial points is crucial for the success of the model in real-world applications.Participants will submit their results as a TXT file for each test image, where each row corresponds to a landmark and includes the landmark's ID, the x and y coordinates, and the confidence score. The TXT file should be named according to the convention image_name_landmarks.txt. Accuracy will be assessed based on the mean error across all landmarks and images, normalized by the inter-ocular distance to account for different face sizes and positions within the images.Reference[1] “i·bug - resources - 300 Faces In-the-Wild Challenge (300-W), ICCV 2013.” Accessed: Dec. 04, 2023. [Online]. Available: https://ibug.doc.ic.ac.uk/resources/300-W/[2] Google, “Measuring device power : Android Open Source Project,” Android Open Source Project. [Online]. Available: https://source.android.com/devices/tech/power/device?hl=en#power-consumption. [Accessed: 11-Nov-2021].
2024-03-17T16:00:00+00:00 ~ 2024-04-03T08:00:00+00:00