165

participants

Topic provider

Pervasive Artificial Intelligence Research (PAIR) Labs, National Yang Ming Chiao Tung University (NYCU), Taiwan

The Pervasive AI Research (PAIR) Labs, a group of national research labs funded by the National Science and Technology Council, Taiwan, is commissioned to achieve academic excellence, nurture local AI talents, build international linkage, and develop pragmatic approaches in the areas of applied AI technologies toward service, product, workflow, and supply chain innovation and optimization. PAIR is now a top-tier research lab located at the Guangfu campus of National Yang Ming Chiao Tung University and constituted of 10 distinguished research teams to conduct research in various applied AI areas.

Website: https://pairlabs.ai/

Intelligent Vision System (IVS) Lab, National Yang Ming Chiao Tung University (NYCU), Taiwan (NYCU), Taiwan

The Intelligent Vision System (IVS) Lab at National Yang Ming Chiao Tung University is directed by Professor Jiun-In Guo. We are tackling practical open problems in autonomous driving research, which focuses on intelligent vision processing systems, applications, and SoC exploiting deep learning technology.


Website: http://ivs.ee.nctu.edu.tw/ivs/

AI System (AIS) Lab, National Cheng Kung University (NCKU), Taiwan

The AI System (AIS) Lab at National Cheng Kung University is directed by Professor ChiaChi Tsai. We dedicate our passion on the system with AI technology. Our research includes AI accelerator development, AI architecture improvement, and AI-based solutions to multimedia problems.


MediaTek

MediaTek Inc. is a Taiwanese fabless semiconductor company that provides chips for wireless communications, high-definition television, handheld mobile devices like smartphones and tablet computers, navigation systems, consumer multimedia products and digital subscriber line services as well as optical disc drives. MediaTek is known for advances in multimedia, AI and expertise delivering the most power possible – when and where needed. MediaTek’s chipsets are optimized to run cool and super power-efficient to extend battery life. Always a perfect balance of high performance, power-efficiency, and connectivity.


Website: https://www.mediatek.com/

Introduction

In the realm of computer vision, the field of facial-landmark detection has witnessed remarkable progress, gaining increasing significance in diverse applications like augmented reality, facial recognition, and emotion analysis. While object detection identifies objects within images and semantic segmentation meticulously outlines object boundaries down to the pixel level, facial-landmark detection's purpose is to accurately pinpoint and track critical facial features.

Nevertheless, the intricacies of facial features, particularly in dynamic settings, combined with the substantial computational demands of deep learning-based algorithms, present formidable challenges when deploying these models on embedded systems with limited computational capabilities. Additionally, the diversity in facial features across various ethnicities and expressions poses difficulties in constructing a universally robust model. For example, the nuances in facial features and expressions within Asian populations, such as those in Taiwan, might not be comprehensively represented in existing open datasets, which predominantly focus on Western demographics.

In this competition, we extend an invitation to participants to engineer a lightweight yet potent single deep learning model tailored for excellence in facial-landmark detection tasks. This model should demonstrate the capacity to accurately locate key facial landmarks under a spectrum of conditions, encompassing diverse expressions, orientations, and lighting environments. The objective is to craft a model not only suitable for deployment on embedded systems but also one that maintains high accuracy and real-time performance.

This competition includes two stages: qualification and final competition.

  • Qualification competition: Participants initially submit their models online for evaluation. The top 15 teams, judged based on accuracy, will advance to the final round.
  • Final competition: The ultimate assessment will occur on the innovative MediaTek platform, the Dimensity Series, centering on the model's performance within real-world scenarios.

The challenge underscores the development of a solitary model adept at pinpointing a range of facial landmarks with remarkable precision. This encompasses the detection of subtle variations in critical facial aspects like the eyes, nose, mouth, and jawline. Alongside accuracy, the spotlight is on low power consumption, streamlined processing, and real-time performance, particularly on MediaTek's Dimensity Series platform.

The MediaTek platform, boasting heterogeneous computing capabilities, inclusive of CPUs, GPUs, and AI Processing Units, offers elevated performance and energy efficiency, making it an ideal foundation for constructing AI-driven facial-landmark detection applications. Participants have the option to manually target these processing units or leverage MediaTek's NeuroPilot SDK for intelligent processing allocation.

Participants are expected to showcase their model's prowess in the concurrent detection of multiple facial landmarks, thereby exemplifying precision and efficiency in a resource-constrained environment.

Given the test image dataset, participants are required to utilize a single model to perform the task of facial-landmark detection. The model must identify and locate 51 specific facial landmarks in each image. The landmarks correspond to salient features on the face, which are critical for various applications such as identity verification, emotion recognition, and augmented reality. The model's output should include:

A set of coordinates for each of the 51 landmarks on the face.A confidence score for the detection of each landmark, indicating the model's certainty.The landmarks to be detected will cover areas such as the eye contours, eyebrows, nose, and mouth. Participants must ensure that their model is robust and can handle variations in facial expressions, orientations, and lighting conditions. The precise detection of these facial points is crucial for the success of the model in real-world applications.

Participants will submit their results as a TXT file for each test image, where each row corresponds to a landmark and includes the landmark's ID, the x and y coordinates, and the confidence score. The TXT file should be named according to the convention image_name_landmarks.txt. Accuracy will be assessed based on the mean error across all landmarks and images, normalized by the inter-ocular distance to account for different face sizes and positions within the images.

Reference

[1] “i·bug - resources - 300 Faces In-the-Wild Challenge (300-W), ICCV 2013.” Accessed: Dec. 04, 2023. [Online]. Available: https://ibug.doc.ic.ac.uk/resources/300-W/
[2] Google, “Measuring device power : Android Open Source Project,” Android Open Source Project. [Online]. Available: https://source.android.com/devices/tech/power/device?hl=en#power-consumption. [Accessed: 11-Nov-2021].

Prize

According to the points of each team in the final evaluation, we select the highest three teams for regular awards.

  1. Champion:            $USD 3000
  2. 1st Runner-up:      $USD 2000
  3. 3rd-place:              $USD 1400

Special Award

  1. Best INT8 model development Award:   $USD 600
    Best overall score in the final competition using INT8 model development
  2. Best Detection model Award:   $USD 600
    Best overall detection Model

All the award winners must agree to submit contest paper and attend the IEEE ICME2024 Grand Challenge PAIR Competition Special Session to present their work. If the paper failed to submit, or the length of the submitted paper is less than 3 pages, the award would be cancelled.

Activity time

Deadline for Submission(UTC+8):

DateEvent
2/03/2024Qualification Competition Start Date
3/05/2024Date to Release Private Testing Data for Qualification
3/17/2024 12:00 PM UTCQualification Competition End Date
3/18/2024 12:00 AM UTCFinalist Announcement
3/18/2024Final Competition Start Date
3/25/2024Date to Release Private Example Testing Data for Final
4/01/2024 12:00 PM UTCFinal Competition End Date
4/03/2024 12:00 PM UTCAward Announcement
4/12/2024Invited Paper Submission Deadline
5/07/2024
Camera ready form deadline

 

Evaluation Criteria

Qualification Competition

The grading criteria are based on the direct measurement of the accuracy of facial-landmark detection. The evaluation metric and scoring system will be as follows:

Facial-Landmark Detection Accuracy:

The evaluation will be based on the mean squared error (MSE) between the predicted and actual landmark coordinates for all the landmarks detected in each image.

Each landmark detection will be scored based on the precision of its location. The more accurate the landmark''s predicted coordinates are to the actual, the higher the score awarded for that landmark.

Scoring System:

For each landmark in an image, a set number of points are awarded based on the deviation from the actual landmark position. For example:

  • 0 pixels off (exact detection): Full points for that landmark.
  • 1-2 pixels off: Slightly fewer points, e.g., 95% of the full points.
  • 3-4 pixels off: Even fewer points, e.g., 90% of the full points.

And so on, with the points decreasing as the error margin increases.

The total points from all landmarks detected in an image will be summed up to give a final image score.

The final score for each team in the qualification competition will be the average of their image scores across the entire test dataset.

Teams will be ranked based on their total average score, with the highest-scoring team placed at the top. This direct correlation between detection precision and score is designed to incentivize the most accurate facial-landmark detection possible.


Final Competition

  • Mandatory Criteria
    • Accuracy of final submission cannot be 5% lower (include) than their submitted model of qualification.
    • The summation of Preprocessing & Postprocessing time of final submission cannot be 50% slower (include) than the inference time of the main model. (Evaluated on the host machine)
    • The accuracy and speed of participants' models must surpass those of a provided sample model for scores to be awarded.
  • [Host] Accuracy (ME)
    The precision of facial landmark detection.
  • [Host] Model Computational Complexity(GOPs/frame)
    Efficiency of the model in operations per frame.
  • [Host] Model size (number of parameters * bit width used in storing the parameters)
    The total number of parameters multiplied by the bit width used for storing these parameters. 
  • [Device] Power consumption (average current computation on MediaTek Dimensity Series)
    Measured by android battery fuel gauge on MediaTek’s Dimensity Series Platform [2]. The “BATTERY_PROPERTY_CURRENT_AVERAGE” mode is used in the evaluation.
  • [Device] Speed on MediaTek Dimensity 9300 Series Platform
    The quickness of the model to perform the detection task without preprocessing and postprocessing.
  • [Score] The score is calculated using a formula that takes into account the model's accuracy, inference time, computational complexity, power usage, and model size.
    The score calculation as depicted in the image is an integrated formula where the overall score is derived from the product of several factors, each normalized by their maximum values. Accuracy, inference time, complexity, power, and model size are all considered in one formula without specific percentage weights assigned to each. Instead, the formula balances these factors such that improvements in any one area will increase the score, while any inefficiencies will decrease it. The overall performance is thus reflected in a single score, promoting a balance between accuracy and operational efficiency.

The evaluation procedure will be toward the overall process from reading the private testing dataset in final to completing submission.csv file, including parsing image list, loading images, and any other overhead to conduct the facial-landmark detection through the testing dataset. 

Coordinator Contacts

Po-Chi Hu, pochihu@nycu.edu.tw
Jenq-Neng Hwang,hwang@uw.edu
Jiun-In Guo, jiguo@nycu.edu.tw
Marvin Chen, marvin.chen@mediatek.com
Hsien-Kai Kuo, hsienkai.kuo@mediatek.com
Chia-Chi Tsai, cctsai@gs.ncku.edu.tw

Rules

  • Team mergers are not allowed in this competition.
  • Each team can consist of a maximum of 6 team members.
  • The task is open to the public. Individuals, institutions of higher education, research institutes, enterprises, or other organizations can all sign up for the task.
  • A leaderboard will be set up and make available publicly.
  • Multiple submissions are allowed before the deadline and the last one will be used to enter the final qualification consideration. 
  • The upload date/time will be used as the tiebreaker.
  • Privately sharing code or data outside of teams is not permitted. It is okay to share code if made available to all participants on the forums.
  • Personnel of IVSLAB team and AISLAB team are not allowed to participate in the task.
  • A common honor code should be observed. Any violation will be disqualified.