66

participants

Topic provider

Multimedia Information System Lab, National Tsing Hua University (NTHU), Taiwan

The Multimedia Information System Laboratory (MISLab) was founded in August 2012. Led by professor Min-Chun Hu, our team aims to design original methodologies and develop practical multimedia systems to meet different demands of users. Our research topics include digital signal processing, digital content analysis/editing/presentation, machine learning and artificial intelligence, computer vision and pattern recognition, human-computer interaction, computer graphics, virtual reality and augmented reality.

Website: http://mislab.cs.nthu.edu.tw/


Multimedia Interaction and Intelligent System Lab, National Taiwan University of Science and Technology. (NTUST), Taiwan

The Multimedia Interaction and Intelligent System Lab (MiiSLab) was founded in February 2023 at the Graduate Institute of A.I. Cross-disciplinary Tech, Industry-Academia Innovation College, National Taiwan University of Science and Technology. Led by Professor Tse-Yu Pan, our team aims to apply AI Technology to develop practical system in cross-disciplinary fields, including sports technology, smart manufactory, interactive artworks, and video analysis.  

Website: https://sites.google.com/view/mislab-ntust/home


MediaTek Inc. 

MediaTek Incorporated (TWSE: 2454) is a global fabless semiconductor company that enables nearly 2 billion connected devices a year. We are a market leader in developing innovative systems-on-chip (SoC) for mobile device, home entertainment, connectivity and IoT products. Our dedication to innovation has positioned us as a driving market force in several key technology areas, including highly power-efficient mobile technologies, automotive solutions and a broad range of advanced multimedia products such as smartphones, tablets, digital televisions, 5G, Voice Assistant Devices (VAD) and wearables. MediaTek empowers and inspires people to expand their horizons and achieve their goals through smart technology, more easily and efficiently than ever before. We work with the brands you love to make great technology accessible to everyone, and it drives everything we do. Visit www.mediatek.com for more information.

Industrial Technology Research Institute

ITRI is a world-leading applied technology research institute with more than 6,000 outstanding employees. Its mission is to drive industrial development, create economic value, and enhance social well-being through technology R&D. Founded in 1973, it pioneered in IC development and started to nurture new tech ventures and deliver its R&D results to industries. ITRI has set up and incubated companies such as TSMC, UMC, Taiwan Mask Corp., Epistar Corp., Mirle Automation Corp., and Taiwan Biomaterial Co.

Introduction

To improve the experience of XR applications, techniques of visual attention estimation have been developed for predicting human intention so that the HMD can pre-render the visual content to reduce rendering latency. However, most deep learning-based algorithms have to pay heavy computation to achieve satisfactory accuracy. This is especially challenging for embedded systems with finite resources such as computing power and memory bandwidth (e.g. standalone HMD). In addition, this research field relies on richer data to advance the most cutting-edge progress, while the number and diversity of existing datasets were still lacking. In this challenge, we collected a set of 360° MR/VR videos along with the information of user head pose and eye gaze signals. The goal of this competition is to encourage contestants to design lightweight visual attention estimation models that can be deployed on an embedded device of constrained resources. The developed models need to not only achieve high fidelity but also show good performance on the device.

 

This competition is divided into two stages: qualification and final competition.

  • Qualification competition stage: all participants submit their answers online. A score is calculated based on the ranking of five evaluation metrics. The top 15 teams would be qualified to enter the final round of the competition.
  • Final competition stage: the final score will be evaluated on MediaTek's platform for the final score.

 

Given the test dataset containing 360° videos, participants are asked to estimate a saliency map for each video. To be more precise, each pixel has a predicted value in the range [0,1].Note that the goal of this challenge is to design a lightweight deep learning model suitable for constrained embedded systems. Therefore, we focus on prediction correctness, model size, computational complexity, performance optimization and the deployment on MediaTek’s Dimensity platform.

 

With MediaTek’s platform and its heterogeneous computing capabilities such as CPUs, GPUs and APUs (AI processing units) embedded into the system-on-chip products, developers are provided the high performance and power efficiency for building the AI features and applications. Developers can target these specific processing units within the system-on-chip or, they can also let MediaTek NeuroPoint SDK intelligently handle the processing allocation for them. ***Please note that we use Tensorflow Lite in final competition stage of this challenge.

 

 

Prize

According to the points of each team in the final evaluation, we select the highest three teams for regular awards.

  1. Champion:            $USD 1500
  2. 1st Runner-up:      $USD 1000
  3. 3rd-place:              $USD 750

Please be aware that all award recipients are cordially invited to attend the awarding ceremony at ACM Multimedia Asia 2023.

Activity time

The time is based on UTC+8.

TimeEvent

07/03/2021

Qualification Competition Start Date

Date to Release Public Testing Data

Date to Release Private Testing Data for Qualification

10/02/2023 12:00 PM UTC

Qualification Competition Stage End Date

10/03/2023 12:00 AM UTC

Finalist Announcement

10/03/2021

Final Competition Stage Start Date

11/06/2023 12:00 PM UTC

Final Competition Stage End Date

 

Evaluation Criteria

The evaluation metrics are based on Salient360! [1] evaluation metrics:

 

For evaluating the accuracy of predicted saliency maps, we consider the following metrics [2, 3]:

  • Area Under Curve (AUC) proposed by Judd et al. (AUC-Judd) [4]
  • Normalized Scanpath Saliency (NSS)
  • Linear Correlation Coefficient (CC)
  • Similarity of Histogram Intersection (SIM)
  • Kullback-Leibler Divergence (KLD)

 

  • Qualification Competition
    • We use five metrics to evaluate the prediction results. For each metric, the team with the best prediction will get the full points (20%) and the team with the worst one will get zero. The rest teams will get points directly proportional to the ranking. We will rank the 5 metrics individually, and then the score calculated based on the 5 ranking lists will be used to determine the position in the leaderboard.

      $$ Score=\sum_{i} (n-R_{i})\cdot \frac{20}{n-1} $$
      $$ R_{i}: \text {ranking in each metric} $$
      $$ n: \text {number of teams} $$

    • Besides, during the qualification competition period, each team has to submit a team composition document, including team name, leader, team members, affiliation, and contact information, etc.

 

  • Final Competition

    • Mandatory Criteria
      The summation of Preprocessing & Postprocessing time of final submission cannot be 50% slower (include) than the inference time of the main model. (evaluated on the host machine)
    • [Host] Estimation accuracy – 40% (8% for each accuracy metric)
      We use five metrics to evaluate the prediction results. For each metric, the team with the best prediction will get the full points (8%) and the team with the worst one will get zero. The rest teams will get points directly proportional to the ranking. We will rank the 5 metrics individually, and then the score calculated based on the 5 ranking lists will be used to determine the position in the leaderboard.
      $$ Score=\sum_{i} (n-R_{i})\cdot \frac{20}{n-1} $$
      $$ R_{i}: \text {ranking in each metric} $$
      $$ n: \text {number of teams} $$

    • [Host] Model size (number of parameters * bit width used in storing the parameters) – 15%
      The team with the smallest model will get the full points (15%) and the team with the largest one will get zero. The rest teams will get points directly proportional to the model size ranking.
    • [Host] Model Computational Complexity (GOPs/frame) – 15%
      The team with the smallest GOP number per frame will get the full points (15%) and the team with the largest one will get zero. The rest teams will get points directly proportional to the GOP ranking.
    • [Device] Speed on MediaTek Dimensity Series Platform – 30%
      The team with a single model (w/o Preprocessing & Postprocessing) to complete the detection task in the shortest time will get the full points (30%) and the team that takes the longest time will get zero. The rest teams will get points directly proportional to the execution time ranking.

      The evaluation procedure will be toward the overall process from reading the private testing dataset to completing the output file.

 

Reference

[1] Salient360! Challenge: https://salient360.ls2n.fr/

[2] Bylinskii, Zoya, et al. "What do different evaluation metrics tell us about saliency models?." IEEE transactions on pattern analysis and machine intelligence 41.3 (2018): 740-757.

[3] Gutiérrez, Jesús, et al. "Toolbox and dataset for the development of saliency and scanpath models for omnidirectional/360 still images." Signal Processing: Image Communication 69 (2018): 35-42.

[4] Judd, Tilke, et al. "Learning to predict where humans look." 2009 IEEE 12th international conference on computer vision. IEEE, 2009.

 

Committee

Min-Chun Hu, National Tsing Hua University

Tse-Yu Pan, Natonal Tawian University of Science and Technology

Herman Prawiro, National Tsing Hua University

CM Cheng, MediaTek

Hsien-kai Kuo, MediaTek

 

Contacts

Email: vae.challenge@gmail.com

Rules

  • Team mergers are not allowed in this competition.
  • Each team can consist of a maximum of 6 team members.
  • The task is open to the public. Individuals, institutions of higher education, research institutes, enterprises, or other organizations can all sign up for the task.
  • A leaderboard will be set and made public.
  • Multiple submissions are allowed at most three times per day before the deadline and the last one will be used to enter the final qualification consideration.
  • The upload date/time will be used as the tiebreaker.
  • Privately sharing code or data outside of teams is not permitted. It’s okay to share code if made available to all participants on the forums.
  • Personnel of the MISLab team are not allowed to participate in the task.
  • A common honor code should be observed. Any violation will be disqualified.