Yan Yan 博士学术报告会

报告题目：A Multi-task Learning framework for Head Pose Estimation and Actor-Action Semantic Video Segmentation

报告人：Yan Yan 博士

报告时间：2019年6月10日（周一）上午10：00

报告地点：安徽大学磬苑校区理工楼D318

主办单位：计算机科学与技术学院

欢迎各位老师、同学届时前往！

科学技术处

2019年6月6日

报告摘要： Multi-task learning, as one important branch of machine learning, has developed very fast during the past decade. Multi-task learning methods aim to simultaneously learn classification or regression models for a set of related tasks. This typically leads to better models as compared to a learner that does not account for task relationships. In this talk, we will investigate a multi-task learning framework for head pose estimation and actor-action segmentation. (1) Head pose estimation from low-resolution surveillance data has gained in importance. However, monocular and multi-view head pose estimation approaches still work poorly under target motion, as facial appearance distorts owing to camera perspective and scale changes when a person moves around. We propose FEGA-MTL, a novel framework based on multi-task learning for classifying the head pose of a person who moves freely in an environment monitored by multiple, large field-of-view surveillance cameras. Upon partitioning the monitored scene into a dense uniform spatial grid, FEGA-MTL simultaneously clusters grid partitions into regions with similar facial appearance, while learning region-specific head pose classifiers. (2) Fine-grained activity understanding in videos has attracted considerable recent attention with a shift from action classification to detailed actor and action understanding that provides compelling results for perceptual needs of cutting-edge autonomous systems. However, current methods for detailed understanding of actor and action have significant limitations: they require large amounts of finely labeled data, and they fail to capture any internal relationship among actors and actions. To address these issues, we propose a novel, robust multi-task ranking model for weakly-supervised actor-action segmentation where only video-level tags are given for training samples. Our model is able to share useful information among different actors and actions while learning a ranking matrix to select representative supervoxels for actors and actions respectively.