00

day

00

hr

00

min

00

sec

## 103

participants

#### Topic provider

Multimedia Information System Lab, National Tsing Hua University (NTHU), Taiwan
The Multimedia Information System Laboratory (MISLab) was founded in August 2012. Led by professor Min-Chun Hu, our team aims to design original methodologies and develop practical multimedia systems to meet different demands of users. Our research topics include digital signal processing, digital content analysis/editing/presentation, machine learning and artificial intelligence, computer vision and pattern recognition, human-computer interaction, computer graphics, virtual reality and augmented reality.
Website: http://mislab.cs.nthu.edu.tw/

MediaTek Inc.
MediaTek Incorporated (TWSE: 2454) is a global fabless semiconductor company that enables nearly 2 billion connected devices a year. We are a market leader in developing innovative systems-on-chip (SoC) for mobile device, home entertainment, connectivity and IoT products. Our dedication to innovation has positioned us as a driving market force in several key technology areas, including highly power-efficient mobile technologies, automotive solutions and a broad range of advanced multimedia products such as smartphones, tablets, digital televisions, 5G, Voice Assistant Devices (VAD) and wearables. MediaTek empowers and inspires people to expand their horizons and achieve their goals through smart technology, more easily and efficiently than ever before. We work with the brands you love to make great technology accessible to everyone, and it drives everything we do. Visit www.mediatek.com for more information.

#### Introduction

To improve the experience of XR applications, techniques of visual attention estimation have been developed for predicting human intention so that the HMD can pre-render the visual content to reduce rendering latency. However, most deep learning-based algorithms have to pay heavy computation to achieve satisfactory accuracy. This is especially challenging for embedded systems with finite resources such as computing power and memory bandwidth (e.g. standalone HMD). In addition, this research field relies on richer data to advance the most cutting-edge progress, while the number and diversity of existing datasets were still lacking. In this challenge, we collected a set of 360° MR/VR videos along with the information of user head pose and eye gaze signals. The goal of this competition is to encourage contestants to design lightweight visual attention estimation models that can be deployed on an embedded device of constrained resources. The developed models need to not only achieve high fidelity but also show good performance on the device.

This competition is divided into two stages: qualification and final competition.

• Qualification competition: all participants submit their answers online. A score is calculated based on the ranking of five evaluation metrics. The top 15 teams would be qualified to enter the final round of the competition.
• Final competition: the final score will be evaluated on MediaTek's platform for the final score.

Given the test dataset containing 360° videos, participants are asked to estimate a saliency map for each video. To be more precise, each pixel has a predicted value in the range [0,1]. Note that the goal of this challenge is to design a lightweight deep learning model suitable for constrained embedded systems. Therefore, we focus on prediction correctness, model size, computational complexity, performance optimization and the deployment on MediaTek’s Dimensity 1000+ platform.

With MediaTek’s platform and its heterogeneous computing capabilities such as CPUs, GPUs and APUs (AI processing units) embedded into the system-on-chip products, developers are provided the high performance and power efficiency for building the AI features and applications. Developers can target these specific processing units within the system-on-chip or, they can also let MediaTek NeuroPoint SDK intelligently handle the processing allocation for them. ***Please note that we use Tensorflow Lite in this challenge.

#### Prize

According to the points of each team in the final evaluation, we select the highest three teams for regular awards.

1. Champion: USD 1,500 + TWCC services equivalent to a maximum of NTD 150,000 based on TWCC pricing
2. 1st Runner-up: USD 1,000 + TWCC services equivalent to a maximum of NTD 100,000 based on TWCC pricing
3. 3rd-place: USD 750 + TWCC services equivalent to a maximum of NTD 50,000 based on TWCC pricing

Special Awards

Best accuracy award for each track (award for the highest accuracy in the final competition)

USD 600

Please note that all challenge participants entering the final competition are expected to submit a 2-page contest paper describing their work and attend IEEE AIVR 2021 (http://www.ieee-aivr.org/) to present their work in the Challenge session. One conference registration can cover for the publication and conference fee of all co-authors. Papers will be included in the IEEE AVIR 2021 proceedings, which are published in IEEE Xplore. We still hope to run at least parts of the conference on location in Taiwan, but remote presentation will be possible for participants who cannot or do not want to travel due to COVID.

TWCC Award Provisions and Eligibility

#### Activity time

The time is based on UTC+8.

TimeEvent

08/02/2021

Qualification Competition Start Date

Date to Release Testing Data

Date to Release Private Testing Data (without ground truth data) for Qualification

08/16/2021

09/30/2021 11:59 PM UTC+8

Qualification Competition End Date

10/01/2021 12:00 PM UTC+8

Finalist Announcement

10/01/2021

Final Competition Start Date

10/11/2021 11:59 PM

UTC+8

Final Competition End Date

10/20/2021 12:00 PM

UTC+8

Award Announcement

10/20/2021 12:00 PM

UTC+8

#### Evaluation Criteria

The evaluation metrics are based on Salient360! [1] evaluation metrics:

For evaluating the accuracy of predicted saliency maps, we consider the following metrics [2, 3]:

• Area Under Curve (AUC) proposed by Judd et al. (AUC-Judd) [4]
• Normalized Scanpath Saliency (NSS)
• Linear Correlation Coefficient (CC)
• Similarity of Histogram Intersection (SIM)
• Kullback-Leibler Divergence (KLD)

• Qualification Competition
• We use five metrics to evaluate the prediction results. For each metric, the team with the best prediction will get the full points (20%) and the team with the worst one will get zero. The rest teams will get points directly proportional to the ranking. We will rank the 5 metrics individually, and then the score calculated based on the 5 ranking lists will be used to determine the position in the leaderboard.

$$Score=\sum_{i} (n-R_{i})\cdot \frac{20}{n-1}$$
$$R_{i}: \text {ranking in each metric}$$
$$n: \text {number of teams}$$

• Besides, during the qualification competition period, each team has to submit a team composition document, including team name, leader, team members, affiliation, and contact information, etc.

• Final Competition

[Dimensity-1000+ on the Market]

♦ vivo iQOO Z1
♦ ealme X7 Pro
♦ Oppo Reno5 Pro 5G
♦ Xiaomi Redmi K30 Ultra (We use Xiaomi Redmi K30 Ultra in this competition evaluation.)

[EcoSystem] NeuroPilot EcoSystem at GitHub

♦ Website: https://github.com/MediaTek-NeuroPilot

• Mandatory Criteria
The summation of Preprocessing & Postprocessing time of final submission cannot be 50% slower (include) than the inference time of the main model. (evaluated on the host machine)
• [Host] Estimation accuracy – 40% (8% for each accuracy metric)
We use five metrics to evaluate the prediction results. For each metric, the team with the best prediction will get the full points (8%) and the team with the worst one will get zero. The rest teams will get points directly proportional to the ranking. We will rank the 5 metrics individually, and then the score calculated based on the 5 ranking lists will be used to determine the position in the leaderboard.
$$Score=\sum_{i} (n-R_{i})\cdot \frac{20}{n-1}$$
$$R_{i}: \text {ranking in each metric}$$
$$n: \text {number of teams}$$

• [Host] Model size (number of parameters * bit width used in storing the parameters) – 15%
The team with the smallest model will get the full score (15%) and the team with the largest one will get zero. The rest teams will get points directly proportional to the model size ranking.
• [Host] Model Computational Complexity (GOPs/frame) – 15%
The team with the smallest GOP number per frame will get the full score (15%) and the team with the largest one will get zero. The rest teams will get points directly proportional to the GOP ranking.
• [Device] Speed on MediaTek Dimensity 1000+ Series Platform – 30%
The team with a single model (w/o Preprocessing & Postprocessing) to complete the detection task in the shortest time will get the full points (30%) and the team that takes the longest time will get zero. The rest teams will get points directly proportional to the execution time ranking.

The evaluation procedure will be toward the overall process from reading the private testing dataset to completing the output file.

Reference

[1] Salient360! Challenge: https://salient360.ls2n.fr/

[2] Bylinskii, Zoya, et al. "What do different evaluation metrics tell us about saliency models?." IEEE transactions on pattern analysis and machine intelligence 41.3 (2018): 740-757.

[3] Gutiérrez, Jesús, et al. "Toolbox and dataset for the development of saliency and scanpath models for omnidirectional/360 still images." Signal Processing: Image Communication 69 (2018): 35-42.

[4] Judd, Tilke, et al. "Learning to predict where humans look." 2009 IEEE 12th international conference on computer vision. IEEE, 2009.

#### Committee

Min-Chun Hu, National Tsing Hua University

Wan-Lun Tsai, National Tsing Hua University

Tse-Yu Pan, National Tsing Hua University

Herman Prawiro, National Tsing Hua University

CM Cheng, MediaTek

Hsien-kai Kuo, MediaTek

Min-Hung Chen, MediaTek

#### Contacts

Email: vae.challenge@gmail.com

#### Rules

• Team mergers are not allowed in this competition.
• Each team can consist of a maximum of 6 team members.
• The task is open to the public. Individuals, institutions of higher education, research institutes, enterprises, or other organizations can all sign up for the task.