參加人數
Multimedia Information System Lab, National Tsing Hua University (NTHU), Taiwan
The Multimedia Information System Laboratory (MISLab) was founded in August 2012. Led by professor Min-Chun Hu, our team aims to design original methodologies and develop practical multimedia systems to meet different demands of users. Our research topics include digital signal processing, digital content analysis/editing/presentation, machine learning and artificial intelligence, computer vision and pattern recognition, human-computer interaction, computer graphics, virtual reality and augmented reality.
Website: http://mislab.cs.nthu.edu.tw/
Multimedia Interaction and Intelligent System Lab, National Taiwan University of Science and Technology. (NTUST), Taiwan
The Multimedia Interaction and Intelligent System Lab (MiiSLab) was founded in February 2023 at the Graduate Institute of A.I. Cross-disciplinary Tech, Industry-Academia Innovation College, National Taiwan University of Science and Technology. Led by Professor Tse-Yu Pan, our team aims to apply AI Technology to develop practical system in cross-disciplinary fields, including sports technology, smart manufactory, interactive artworks, and video analysis.
Website: https://sites.google.com/view/mislab-ntust/home
MediaTek Inc.
MediaTek Incorporated (TWSE: 2454) is a global fabless semiconductor company that enables nearly 2 billion connected devices a year. We are a market leader in developing innovative systems-on-chip (SoC) for mobile device, home entertainment, connectivity and IoT products. Our dedication to innovation has positioned us as a driving market force in several key technology areas, including highly power-efficient mobile technologies, automotive solutions and a broad range of advanced multimedia products such as smartphones, tablets, digital televisions, 5G, Voice Assistant Devices (VAD) and wearables. MediaTek empowers and inspires people to expand their horizons and achieve their goals through smart technology, more easily and efficiently than ever before. We work with the brands you love to make great technology accessible to everyone, and it drives everything we do. Visit www.mediatek.com for more information.
Industrial Technology Research Institute
ITRI is a world-leading applied technology research institute with more than 6,000 outstanding employees. Its mission is to drive industrial development, create economic value, and enhance social well-being through technology R&D. Founded in 1973, it pioneered in IC development and started to nurture new tech ventures and deliver its R&D results to industries. ITRI has set up and incubated companies such as TSMC, UMC, Taiwan Mask Corp., Epistar Corp., Mirle Automation Corp., and Taiwan Biomaterial Co.
To improve the experience of XR applications, techniques of visual attention estimation have been developed for predicting human intention so that the HMD can pre-render the visual content to reduce rendering latency. However, most deep learning-based algorithms have to pay heavy computation to achieve satisfactory accuracy. This is especially challenging for embedded systems with finite resources such as computing power and memory bandwidth (e.g. standalone HMD). In addition, this research field relies on richer data to advance the most cutting-edge progress, while the number and diversity of existing datasets were still lacking. In this challenge, we collected a set of 360° MR/VR videos along with the information of user head pose and eye gaze signals. The goal of this competition is to encourage contestants to design lightweight visual attention estimation models that can be deployed on an embedded device of constrained resources. The developed models need to not only achieve high fidelity but also show good performance on the device.
This competition is divided into two stages: qualification and final competition.
Given the test dataset containing 360° videos, participants are asked to estimate a saliency map for each video. To be more precise, each pixel has a predicted value in the range [0,1].Note that the goal of this challenge is to design a lightweight deep learning model suitable for constrained embedded systems. Therefore, we focus on prediction correctness, model size, computational complexity, performance optimization and the deployment on MediaTek’s Dimensity platform.
With MediaTek’s platform and its heterogeneous computing capabilities such as CPUs, GPUs and APUs (AI processing units) embedded into the system-on-chip products, developers are provided the high performance and power efficiency for building the AI features and applications. Developers can target these specific processing units within the system-on-chip or, they can also let MediaTek NeuroPoint SDK intelligently handle the processing allocation for them. ***Please note that we use Tensorflow Lite in final competition stage of this challenge.
According to the points of each team in the final evaluation, we select the highest three teams for regular awards.
Please be aware that all award recipients are cordially invited to attend the awarding ceremony at ACM Multimedia Asia 2023.
The time is based on UTC+8.
Time | Event |
---|---|
07/03/2021 | Qualification Competition Start Date Date to Release Public Testing Data Date to Release Private Testing Data for Qualification |
10/02/2023 12:00 PM UTC | Qualification Competition Stage End Date |
10/03/2023 12:00 AM UTC | Finalist Announcement |
10/03/2021 | Final Competition Stage Start Date |
11/06/2023 12:00 PM UTC | Final Competition Stage End Date |
The evaluation metrics are based on Salient360! [1] evaluation metrics:
For evaluating the accuracy of predicted saliency maps, we consider the following metrics [2, 3]:
We use five metrics to evaluate the prediction results. For each metric, the team with the best prediction will get the full points (20%) and the team with the worst one will get zero. The rest teams will get points directly proportional to the ranking. We will rank the 5 metrics individually, and then the score calculated based on the 5 ranking lists will be used to determine the position in the leaderboard.
$$ Score=\sum_{i} (n-R_{i})\cdot \frac{20}{n-1} $$
$$ R_{i}: \text {ranking in each metric} $$
$$ n: \text {number of teams} $$
Reference
[1] Salient360! Challenge: https://salient360.ls2n.fr/
[2] Bylinskii, Zoya, et al. "What do different evaluation metrics tell us about saliency models?." IEEE transactions on pattern analysis and machine intelligence 41.3 (2018): 740-757.
[3] Gutiérrez, Jesús, et al. "Toolbox and dataset for the development of saliency and scanpath models for omnidirectional/360 still images." Signal Processing: Image Communication 69 (2018): 35-42.
[4] Judd, Tilke, et al. "Learning to predict where humans look." 2009 IEEE 12th international conference on computer vision. IEEE, 2009.
Min-Chun Hu, National Tsing Hua University
Tse-Yu Pan, Natonal Tawian University of Science and Technology
Herman Prawiro, National Tsing Hua University
CM Cheng, MediaTek
Hsien-kai Kuo, MediaTek
Email: vae.challenge@gmail.com