Tutorials – IEEE International Conference on Multimedia and Expo

Video Summarization and Re-use Technologies and Tools
Deep Bayesian Modeling and Learning
Immersive Imaging Technologies: from Capture to Display
Versatile Video Coding – Algorithms and Specification
Device Fingerprinting and its Applications in Multimedia Forensics and Security
Point Cloud Coding: the Status Quo

Video Summarization and Re-use Technologies and Tools

Contacts:

Vasileios Mezaris, CERTH-ITI, Greece
Lyndon Nixon, MODUL Technology, Austria

Abstract:

This tutorial will deliver a broad overview of the main technologies that enable the automatic generation of video summaries for re-use in different distribution channels, and the optimisation of the video summary-based reach and engagement of the audience; and, provide an in-depth analysis of selected SoA methods and tools on these topics. It will comprise two main modules. The first module, on video summaries generation, will provide an overview of deep-learning-based video summarization techniques, and then will discuss in depth a few selected SoA techniques that are based on Generative Adversarial Networks. Special emphasis will be put on unsupervised learning techniques, whose advantages will also be elaborated. An overview of video summarization datasets, evaluation protocols and related considerations & limitations will also be presented. The second module, on video summaries (re-)use and recommendation, will discuss the use of Web and social media analysis to detect topics in online content and trends in online discussion. It will subsequently examine the application of predictive analytics to suggest future trending topics, in order to guide video summaries publication strategies. Besides the underlying technologies, a few complete tools will be demonstrated, to link the research aspects of video summarization, trend detection and predictive analytics with the practitioners’ expectations and needs for video summarization and (re-)publication online. The tutorial’s target audience includes researchers in the video summarization and deep learning topics and, in general, in deep-learning-based multimedia understanding; researchers in web and social media data analysis, topic and trends detection, and predictive analytics; and practitioners in video content creation and (re-)use, including YouTube/Instagram prosumers, TV and film producers, representatives of broadcasters and online media platforms.

Speaker bios:

Vasileios Mezaris is a Research Director (Senior Researcher Grade A) with the Information Technologies Institute / Centre for Research and Technology Hellas, Thessaloniki, Greece. His research interests include multimedia understanding and artificial intelligence; in particular, image and video analysis and annotation, machine learning and deep learning for multimedia understanding and big data analytics, multimedia indexing and retrieval, and applications of multimedia understanding and artificial intelligence in specific domains (including TV broadcasting and news, education and culture, medical / ecological / business data analysis). Dr. Mezaris has co-authored more than 40 papers in refereed journals, 20 book chapters, 150 papers in international conferences, and 3 patents. He has edited two books and several proceedings volumes; he serves as Associate Editor for the IEEE Signal Processing Letters (2016-present) and the IEEE Transactions on Multimedia (2012-2015, and 2018-present); and serves regularly as a guest editor for international journals, as an organizer or reviewer for conferences/workshops, and as a reviewer of research projects and project proposals for national and international funding agencies. He has participated in many research projects, and as the Coordinator in EC H2020 projects InVID and MOVING. He is a Senior Member of the IEEE.

Lyndon J B Nixon is the CTO of MODUL Technology GmbH. He also holds the position of Assistant Professor in the New Media Technology group at MODUL University. He has been researching in the semantic multimedia domain since 2001. His PhD (2007) was on automatic generation of multimedia presentations using semantics. He has been active in many European and Austrian projects including in the role of Scientific Coordinator (LinkedTV) and Project Coordinator (ReTV, SOFI, MediaMixer, SmartReality, ConnectME). He is a proponent of “Linked Media” – ensuring rich semantic annotations of multimedia assets so that systems can derive associations between them for search, browsing, navigation or recommendation – and has co-organized a series of Linked Media workshops (WWW2013, ESWC2014, WWW2015, ESWC2016). These are among over 40 events he has co-chaired complemented by 27 talks, 8 book chapters, 6 journal articles and 88 refereed publications. Currently he focuses his research on content analysis of image and video in social networks, semantic annotation and linking of media fragments, and combining annotations and data analytics in prediction and recommendation for TV programming.

Schedule:

Monday, July 6 – London (BST time zone)

Start	End	Talk
10:00	10:20	PART 1: Automatic Video Summarization Video Summarization Problem Definition and Literature Overview
10:00	10:25	Short Break and Questions
10:25	10:45	In-Depth Discussion on a Few Unsupervised GAN-based Methods
10:45	10:50	Short Break and Questions
10:50	11:10	Datasets, Evaluation Protocols and Results, and Future Directions
11:10	11:30	Break and Questions
11:30	11:50	PART 2: Video Summaries (Re-)use and Recommendation Optimal Usage of Summarization in Media Collections for Digital Marketing
11:50	11:55	Short Break and Questions
11:55	12:15	Finding the Best Topic for Selecting a Media Asset for Summarization
12:15	12:20	Short Break and Questions
12:20	12:40	Recommending and Scheduling Media Summaries for Publication
12:40	13:00	Final Questions Session

Deep Bayesian Modeling and Learning

Contacts:

Jen-Tzung Chien, National Chiao Tung University, Taiwan

Abstract:

This tutorial addresses the advances in deep Bayesian learning for spatial and temporal data which are ubiquitous in speech, music, text, image, video, web, communication and networking applications. Multimedia contents are analyzed and represented to fulfill a variety of tasks ranging from classification, synthesis, generation, segmentation, dialogue, search, recommendation, summarization, answering, captioning, mining, translation, adaptation to name a few. Traditionally, “deep learning” is taken to be a learning process where the inference or optimization is based on the real-valued deterministic model. The “latent semantic structure” in words, sentences, images, actions, documents or videos learned from data may not be well expressed or correctly optimized in mathematical logic or computer programs. The “distribution function” in discrete or continuous latent variable model for spatial and temporal sequences may not be properly decomposed or estimated. This tutorial addresses the fundamentals of statistical models and neural networks, and focuses on a series of advanced Bayesian models and deep models including Bayesian nonparametrics, recurrent neural network, sequence-to-sequence model, variational auto-encoder (VAE), generative adversarial network, attention mechanism, memory-augmented neural network, skip neural network, temporal difference VAE, stochastic neural network, stochastic temporal convolutional network, predictive state neural network, and policy neural network. Enhancing the prior/posterior representation is addressed. We present how these models are connected and why they work for a variety of applications on symbolic and complex patterns in sequence data. The variational inference and sampling method are formulated to tackle the optimization for complicated models. The embeddings, clustering or co-clustering of words, sentences or objects are merged with linguistic and semantic constraints. A series of case studies are presented to tackle different issues in deep Bayesian modeling and learning. At last, we will point out a number of directions and outlooks for future studies.

Speaker bio:

Jen-Tzung Chien is the Chair Professor at the National Chiao Tung University, Taiwan. He held the Visiting Professor position at the IBM T. J. Watson Research Center, Yorktown Heights, NY, in 2010. His research interests include machine learning, deep learning, computer vision and natural language processing. Dr. Chien served as the associate editor of the IEEE Signal Processing Letters in 2008-2011, the general co-chair of the IEEE International Workshop on Machine Learning for Signal Processing in 2017, and the tutorial speaker of the ICASSP in 2012, 2015, 2017, the INTERSPEECH in 2013, 2016, the COLING in 2018, the AAAI, ACL, KDD, IJCAI in 2019. He received the Best Paper Award of IEEE Automatic Speech Recognition and Understanding Workshop in 2011 and the AAPM Farrington Daniels Award in 2018. He has published extensively, including the books “Bayesian Speech and Language Processing”, Cambridge University Press, in 2015, and “Source Separation and Machine Learning”, Academic Press, in 2018. He is currently serving as an elected member of the IEEE Machine Learning for Signal Processing Technical Committee.

Schedule:

Monday, July 6 – London (BST time zone)

Start	End	Talk
10:00	10:43	PART 1 1. Introduction 1.1 Motivation and Background 1.2 Probabilistic Model 1.3 Neural Network 2. Bayesian Learning 2.1 Inference and Optimization 2.2 Variational Bayesian Inference
10:43	10:45	QA Break
10:45	11:23	PART 2 2.3 Monte Carlo Markov Chain Inference 3. Deep Sequential Learning 3.1 Deep Unfolded Topic Model 3.2 Gated Recurrent Neural Network 3.3 Bayesian Recurrent Neural Network 3.4 Memory-Augmented Neural Network
11:23	11:25	QA Break
11:25	12:14	PART 3 3.5 Sequence-to-Sequence Learning 3.6 Convolutional Neural Network 3.7 Dilated Neural Network 3.8 Attention Network using Transformer 4. Deep Bayesian Learning 4.1 Variational Auto-Encoder 4.2 Variational Recurrent Auto-Encoder
12:14	12:15	QA Break
12:15	12:58	PART 4 4.3 Hierarchical Variational Auto-Encoder 4.4 Stochastic Recurrent Neural Network 4.5 Regularized Recurrent Neural Network 4.6 Skip Recurrent Neural Network 4.7 Markov Recurrent Neural Network 4.8 Temporal Difference Variational Auto-Encoder 4.9 Further Challenges and Advances 5. Summarization and Future Trend
12:58	13:00	QA Break

Immersive Imaging Technologies: from Capture to Display

Contacts:

Dr. Martin Alain – Trinity College Dublin, Ireland
Dr. Cagri Ozcinar – Trinity College Dublin, Irelan
Dr. Emin Zerman – Trinity College Dublin, Ireland

Abstract:

The advances in imaging technologies in the last decade brought a number of alternatives to the way we acquire and display visual information. These new imaging technologies are immersive as they provide the viewer with more information which either surrounds the viewer or helps the viewer to be immersed in this augmented representation. These immersive imaging technologies include light fields, omnidirectional images and videos, and volumetric (also known as free-viewpoint) videos. These different modalities cover the full spectrum of immersive imaging, from 3 degrees of freedom (DoF) to 6DoF, and can be used for virtual reality (VR) as well as augmented reality (AR). Applications of immersive imaging notably include education, cultural heritage, tele-immersion, remote collaboration, and communication. In this tutorial, we cover all stages of the immersive imaging technologies from content capture to display. The main concepts of immersive imaging will first be introduced, and creative experiments based on immersive imaging will be presented as a specific illustration of these technologies. Next, content acquisition based on single or multiple camera systems is presented, along with the corresponding data formats. Content coding is then discussed, notably ongoing standardisation efforts, followed by adaptive streaming strategies. Immersive imaging displays are then presented, as they play a crucial role in the user’s sense of immersion. Image rendering algorithms related to such displays are also explained. Finally, perception and quality evaluation of immersive imaging is presented.

Keywords: immersive imaging, emerging media, light fields, omnidirectional videos, volumetric videos, 3DoF, 6DoF

URL to Tutorial Website:

https://v-sense.scss.tcd.ie/lectures/tutorial-on-immersive-imaging-technologies/

Speaker bios:

Dr. Martin Alain received the Master’s degree in electrical engineering from the Bordeaux Graduate School of Engineering (ENSEIRB-MATMECA), Bordeaux, France in 2012 and the PhD degree in signal processing and telecommunications from University of Rennes 1, Rennes, France in 2016. As a PhD student working in Technicolor and INRIA in Rennes, France, he explored novel image and video compression algorithms.
Since September 2016, he is a postdoctoral researcher in the V-SENSE project at the School of Computer Science and Statistics in Trinity College Dublin, Ireland. His research interests lie at the intersection of signal and image processing, computer vision, and computer graphics. His current topic involves light field imaging, with a focus on denoising, super-resolution, compression, scene reconstruction, and rendering.
Martin is a reviewer for the Irish Machine Vision and Image Processing conference, IEEE International Conference on Image Processing, IEEE Transactions on Image Processing, IEEE Transactions on Circuits and Systems I, and IEEE Transactions on Circuits and Systems for Video Technology. He was special session chair at the EUSIPCO 2018 in Rome, ICIP 2019 in Taipei, and ICME 2020 in London.

Dr. Cagri Ozcinar is a research fellow within the V-SENSE project at Trinity College Dublin, Ireland, since July 2016. Before he joined the V-SENSE team, he was a post-doctoral fellow in the Multimedia group at Institut Mines-Telecom Telecom ParisTech, Paris, France.
Cagri received the M.Sc. (Hons.) and the Ph.D. degrees in electronic engineering from the University of Surrey, UK, in 2010 and 2015, respectively. His current research interests include visual attention (saliency), coding, streaming, and computer vision for immersive audio-visual technologies.
Cagri has been serving as a reviewer for a number of journal and conference proceedings, such as IEEE TIP, IEEE TCSVT, IEEE TMM, IEEE Journal of STSP, CVPR, IEEE ICASSP, IEEE ICIP, IEEE QoMEX, IEEE MMSP, EUSIPCO, and BMVC. Cagri has been involved in organizing workshops, challenges, and special sessions. He was a special session chair on recent advances in immersive imaging technologies at the EUSIPCO 2018, ICIP 2019, and ICME 2020.

Dr. Emin Zerman is a postdoctoral research fellow in V-SENSE project at the School of Computer Science and Statistics, Trinity College Dublin, Ireland since February 2018. He received his Ph.D. degree (2018) in Signals and Images from Télécom ParisTech, France, and his M.Sc. degree (2013) and B.Sc. degree (2011) in Electrical and Electronics Engineering from the Middle East Technical University, Turkey. His research interests include image and video processing, immersive multimedia applications, human visual perception, high dynamic range imaging, and multimedia quality assessment.Emin is a member of Institute of Electrical and Electronics Engineers (IEEE) and IEEE Signal Processing Society. He has been acting as a reviewer for several conferences and peer-reviewed journals, including Signal Processing: Image Communications, IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), IEEE Transactions on Image Processing (TIP), MDPI Journal of Imaging, MDPI Applied Sciences, IEEE International Workshop on Multimedia Signal Processing (MMSP), European Signal Processing Conference (EUSIPCO), and IEEE International Conference on Image Processing (ICIP). He is one of the special session organisers at the ICME 2020 in London.

Schedule:

Monday, July 6 – London (BST time zone)

Start	End	Talk
14:00	14:20	Part 1: Immersive Imaging Technologies
14:20	15:00	Part 2: Acquisition and Data Format
15:00	15:10	Q&A / Break
15:10	15:50	Part 3: Content Delivery
15:50	16:10	Part 4: Rendering and Display Technologies
16:10	16:20	Q&A / Break
16:20	16:55	Part 5: Perception & Quality Evaluation
16:55	17:00	Q&A

Versatile Video Coding – Algorithms and Specification

Contacts:

Mathias Wien, RWTH Aachen University, Germany
Benjamin Bross, Fraunhofer Heinrich Hertz Institute (HHI), Germany

Abstract:

The tutorial provides an overview on the latest emerging video coding standard VVC (Versatile Video Coding) to be jointly published by ITU-T and ISO/IEC. It has been developed by the Joint Video Experts Team (JVET), consisting of ITU-T Study Group 16 Question 6 (known as VCEG) and ISO/IEC JTC 1/SC 29/WG 11 (known as MPEG). VVC has been designed to achieve significantly improved compression capability compared to previous standards such as HEVC, and at the same time to be highly versatile for effective use in a broadened range of applications. Some key application areas for the use of VVC particularly include ultra-high-definition video (e.g. 4K or 8K resolution), video with a high dynamic range and wide colour gamut (e.g., with transfer characteristics specified in Rec. ITU-R BT.2100), and video for immersive media applications such as 360° omnidirectional video, in addition to the applications that have commonly been addressed by prior video coding standards. Important design criteria for VVC have been low computational complexity on the decoder side and friendliness for parallelization on various algorithmic levels. VVC is planned to be finalized by July 2020 and is expected to enter the market very soon.
The tutorial details the video layer coding tools specified in VVC and develops the concepts behind the selected design choices. While many tools or variants thereof have been available before, the VVC design reveals many improvements compared to previous standards which result in compression gain and implementation friendliness. Furthermore, new tools such as the Adaptive Loop Filter, or Matrix-based Intra Prediction have been adopted which contribute significantly to the overall performance. The high-level syntax of VVC has been re-designed compared to previous standards such as HEVC, in order to enable dynamic sub-picture access as well as major scalability features already in version 1 of the specification.

Speaker bio:

Mathias received the Diploma and Dr.-Ing. degrees from Rheinisch-Westfälische Technische Hochschule Aachen (RWTH Aachen University), Aachen, Germany, in 1997 and 2004, respectively. In 2018, he achieved the status of the habilitation, which makes him an independent scientist in the field of visual media communication. He was with Institut für Nachrichtentechnik, RWTH Aachen University (head: Prof. Jens-Rainer Ohm) as a researcher from 1997-2006, and as senior researcher and head of administration from 2006-2018. Since July 2018, he is with Lehrstuhl für Bildverarbeitung, RWTH Aachen University (head: Prof. Dorit Merhof) as senior researcher, leader of the Visual Media Communication group, and head of administration. His research interests include image and video processing, immersive, space-frequency adaptive and scalable video compression, and robust video transmission. Mathias has been an active contributor to H.264/AVC, HEVC, and VVC. He has participated and contribute to ITU-T VCEG, ISO/IEC MPEG, the Joint Video Team (JVT), the Joint Collaborative Team on Video Coding (JCT-VC), and the Joint Video Experts Team (JVET) of VCEG and ISO/IEC MPEG. He has served as a co-editor of the scalability amendment to H.264/AVC (SVC). In the aforementioned standardization bodies, he has co-chaired and coordinated several AdHoc groups as well as tool- and core experiments. Mathias has published more than 60 scientific articles and conference papers in the area of video coding and has co-authored several patents in this area. Mathias is member of the IEEE Signal Processing Society and the IEEE Circuits and Systems Society. He is a member of IEEE CASS TC VSPC. He is Technical Program Co-Chair of PCS 2019 and has co-organized and co-chaired special sessions at IEEE VCIP and PCS. He was the Corresponding Guest Editor of a IEEE JETCAS Special Issue on Immersive Video Coding and Transmission. He has co-organized and co-chaired the Grand Challenge on Video Compression Technology at IEEE ICIP 2017. He serves as associate editor for IEEE Transactions on Circuits and Systems for Video Technology, and Signal Processing: Image Communication. Mathias has further authored and co-authored more than 200 standardization documents. He has published the Springer textbook “High Efficiency Video Coding: Coding Tools and Specification”, which fully covers Version 1 of HEVC.

Benjamin Bross received the Dipl.-Ing. degree in electrical engineering from RWTH Aachen University, Aachen, Germany, in 2008. In 2009, he joined the Fraunhofer Institute for Telecommunications – Heinrich Hertz Institute, Berlin, Germany, where he is currently heading the Video Coding Systems group at the Video Coding & Analytics Department and in 2011, he became a part-time lecturer at the HTW University of Applied Sciences Berlin. Since 2010, Benjamin is very actively involved in the ITU-T VCEG | ISO/IEC MPEG video coding standardization processes as a technical contributor, coordinator of core experiments and chief editor of the High Efficiency Video Coding (HEVC) standard [ITU-T H.265 | ISO/IEC 23008-2] and the emerging Versatile Video Coding (VVC) standard. In addition to his involvement in standardization, Benjamin is coordinating standard-compliant software implementation activities. This includes the development of an HEVC encoder that is currently deployed in broadcast for HD and UHD TV channels. Besides giving talks about recent video coding technologies, Benjamin Bross is an author or co-author of several fundamental HEVC and VVC-related publications, and an author of two book chapters on HEVC and Inter-Picture Prediction Techniques in HEVC. He received the IEEE Best Paper Award at the 2013 IEEE International Conference on Consumer Electronics – Berlin in 2013, the SMPTE Journal Certificate of Merit in 2014 and an Emmy Award at the 69th Engineering Emmy Awards in 2017 as part of the Joint Collaborative Team on Video Coding for its development of HEVC.

Schedule:

Friday, July 10 – London (BST time zone)

Start	End	Talk
10:00	10:30	Video: Introduction / Specification / Standardization / Video Coding Systems
10:30	10:35	Break / Interactive Questions
10:35	11:15	Video: Coding Structures and High-Level Syntax / Coding Tools I
11:15	11:25	Break / Interactive Questions
11:25	12:05	Video Coding Tools II
12:05	12:10	Break / Interactive Questions
12:10	12:50	Video Coding Tools III / Performance and Versatility
12:50	13:00	Discussion / Interactive Questions

Device Fingerprinting and its Applications in Multimedia Forensics and Security

Contacts:

Chang-Tsun Li, Deakin University, Australia

Abstract:

Similar to people identification through human fingerprint analysis, multimedia forensics and security assurance through device fingerprint analysis have attracted much attention amongst scientists, practitioners and law enforcement agencies around the world in the past decade. Device information, such as device models and serial numbers, stored in the EXIF are useful for identifying the devices responsible for the creation of the images and videos in question. However, stored separately from the content, the metadata in the EXIF can be removed and manipulated at ease. Device fingerprints deposited in the content by the devices provide a more reliable alternative to aid forensic investigations and multimedia assurance. Various hardware or software components of the imaging devices leave model or device specific artifacts in the content in the digital image acquisition process. These model or device specific artifacts, if properly extracted, can be used as device fingerprints to identify the source devices. This tutorial will start with an introduction to various types of device fingerprints. The presentation will then focus on sensor pattern noise, which is currently the only form of device fingerprint that can differentiate individual devices of the same model. We will also discuss the real-world applications of sensor pattern noise to source device verification, common source inference, source device identification, content authentication (including fake new detection) and source-oriented image clustering. Some real-world use cases in the law enforcement community will also be presented. Finally we will discuss the limitations of existing device fingerprints and point out a few lines for future investigations including the use of deep learning to inference device fingerprints.

Speaker bio:

Chang-Tsun Li received the BSc degree in electrical engineering from National Defence University (NDU), Taiwan, in 1987, the MSc degree in computer science from U.S. Naval Postgraduate School, USA, in 1992, and the PhD degree in computer science from the University of Warwick, UK, in 1998. He was an associate professor of the Department of Electrical Engineering at NDU during 1998-2002 and a visiting professor of the Department of Computer Science at U.S. Naval Postgraduate School in the second half of 2001. He was a professor of the Department of Computer Science at the University of Warwick, UK, until Dec 2016. He was a professor of the School of Computing and Mathematics, and Director of Data Science Research Unit, Charles Sturt University, Australia from January 2017 to February 2019. He is currently Professor of Cyber Security of the School of Information Technology at Deakin University, Australia and Research Director of Deakin’s Centre for Cyber Security research and Innovation. His research interests include multimedia forensics and security, biometrics, data mining, machine learning, data analytics, computer vision, image processing, pattern recognition, bioinformatics, and content-based image retrieval. The outcomes of his multimedia forensics research have been translated into award-winning commercial products protected by a series of international patents and have been used by a number of law enforcement agencies, national security institutions, courts of law, banks and companies around the world. He is currently Associate Editor of IEEE Access, the EURASIP Journal of Image and Video Processing (JIVP) and Associate of Editor of IET Biometrics. He has published over 200 papers in prestigious international journals and conference proceedings, including a winner of 2018 IEEE AVSS Best Paper Award. He contributed actively in the organisation of many international conferences and workshops and also served as member of the international program committees for numerous international conferences. He is also actively disseminating his research outcomes through keynote speeches, tutorials and talks at various international events.

Schedule:

Friday, July 10 – London (BST time zone)

Start	End	Talk
11:00	11:25	1. Device Fingerprints 1.1 Lens Aberrations 1.2 Colour Filter Array and Colour Interpolation Artefacts 1.3 Camera Response Function 1.4 Quantisation Table of JPEG Compression 1.5 Sensor Pattern Noise
11:25	11:55	2. Sensor Pattern Noise Extraction and Enhancement 2.1 Sensor Pattern Noise Extraction 2.2 Sensor Pattern Noise Enhancement
11:55	12:05	Break
12:05	12:35	3. SPN in Multimedia Forensic Applications 3.1 Source Device Verification 3.2 Common Source Inference 3.3 Source Device Identification 3.4 Content Authentication (including fake news detection) 3.5 Source-Oriented Image Clustering
12:35	13:00	4. Conclusions and Future Works 4.1 Conclusions 4.2 Issues Surrounding Existing Device Fingerprints 4.3 Future Works (including the use of deep learning)

Point Cloud Coding: the Status Quo

Contacts:

João Ascenso, Instituto Superior Técnico, Lisbon, Portugal
Fernando Pereira, Instituto Superior Técnico, Lisbon, Portugal

Abstract:

Recently, 3D visual representation models such as light fields and point clouds are becoming popular due to their capability to represent the real world in a more complete, realistic and immersive way, paving the road for new and more advanced visual experiences. The point cloud (PC) representation model is able to efficiently represent the surface of objects/scenes by means of a set of 3D points and associated attributes and is increasingly being used from autonomous cars to augmented reality. Emerging imaging sensors have made easier to perform richer and denser PC acquisitions, notably with millions of points, making impossible to store and transmit these very high amounts of data. This bottleneck has raised the need for efficient PC coding solutions that can offer immersive visual experiences and good quality of experience.
This tutorial will survey the most relevant PC basics as well as the main PC coding solutions available today. Regarding the content of this tutorial is important to highlight: 1) a new classification taxonomy for PC coding solutions to more easily identify and abstract their differences, commonalities and relationships; 2) representative static and dynamic PC coding solutions available in the literature, such as octree, transform and graph based PC coding among others; 3) MPEG PC standard coding solutions which have been recently developed, notably Video-based Point Cloud Coding (V-PCC), for dynamic content, and Geometry-based Point Cloud Coding (G-PCC), for static and dynamically acquired content; 4) rate-distortion (RD) performance evaluation including the G-PCC and V-PCC standards and other relevant PC coding solutions, using suitable objective quality metrics. The tutorial will end with some discussion on the strengths and weaknesses of the current PC coding solutions as well as on future trends and directions.

Speaker bios:

João Ascenso is a professor at the department of Electrical and Computer Engineering of Instituto Superior Técnico and is with the Multimedia Signal Processing Group of Instituto de Telecomunicações, Lisbon, Portugal. João Ascenso received the E.E., M. Sc. and Ph.D in Electrical and Computer Engineering from Instituto Superior Técnico, in 1999, 2003 and 2010, respectively. In the past, he was an adjunct professor in Instituto Superior de Engenharia de Lisboa and Instituto Politécnico de Setúbal. He coordinates several national and international research projects, in the areas of coding, analysis and description of video. The last project grants received were in the field of point cloud coding and quality assessment. He is also very active in the ISO/IEC MPEG and JPEG standardization activities and currently chairs the JPEG-AI ad-hoc group that targets the evaluation and development of learning-based image compression solutions. He has published more than 100 papers in international conferences and journals and has more than 3200 citations over 35 papers (h-index of 25). He is an associate editor of IEEE Transactions on Multimedia, IEEE Transactions on Image Processing and was an associate editor of the IEEE Signal Processing Letters. He is an elected member of the IEEE Multimedia Signal Processing Technical Committee. He acts as a member of the Organizing Committees of well-known IEEE international conferences, such as MMSP 2020, ICME 2020, ISM 2018, QoMEX 2016, among others. He also served as a technical program committee member and area chair for several widely known conferences in the multimedia signal processing field, such as ICIP, MMSP and ICME and made invited talks and tutorials at conferences and workshops. He has received two Best Paper Awards at the 31st Picture Coding Symposium 2015, Cairns, Australia and at the IEEE International Conference on Multimedia and Expo 2019. He has also won the ‘Excellent Professor’ award from the Electrical and Computers Engineering Department of Instituto Superior Técnico several times. His current research interests include visual coding, quality assessment, light-fields, point clouds and holography processing, indexing and searching of audio–visual content and visual sensor networks.

Fernando Pereira is currently with the Department of Electrical and Computers Engineering of Instituto Superior Técnico and with Instituto de Telecomunicações, Lisbon, Portugal. He is responsible for the participation of IST in many national and international research projects. He acts often as project evaluator and auditor for various organizations. He is Area Editor of the Signal Processing: Image Communication Journal and Associate Editor of the EURASIP Journal on Image and Video Processing, and is or has been a member of the Editorial Board of the Signal Processing Magazine, Associate Editor of IEEE Transactions of Circuits and Systems for Video Technology, IEEE Transactions on Image Processing, IEEE Transactions on Multimedia, and IEEE Signal Processing Magazine. In 2013-2015, he was the Editor-in-Chief of the IEEE Journal of Selected Topics in Signal Processing.He is or has been a member of the IEEE Signal Processing Society Technical Committees on Image, Video and Multidimensional Signal Processing, and Multimedia Signal Processing, and of the IEEE Circuits and Systems Society Technical Committees on Visual Signal Processing and Communications, and Multimedia Systems and Applications. He was an IEEE Distinguished Lecturer in 2005 and elected as an IEEE Fellow in 2008 for “contributions to object-based digital video representation technologies and standards”. He has been elected to serve on the Signal Processing Society Board of Governors in the capacity of Member-at-Large for a 2012 and a 2014-2016 term. Since January 2018, he is the SPS Vice-President for Conferences. Since 2013, he is also a EURASIP Fellow for “contributions to digital video representation technologies and standards”. He has been elected to serve on the European Signal Processing Society Board of Directors for a 2015-2018 term. Since 2015, he is also a IET Fellow. He is/has been a member of the Scientific and Program Committees of many international conferences and workshops. He has been the General Chair of the Picture Coding Symposium (PCS) in 2007, the Technical Program Co-Chair of the Int. Conference on Image Processing (ICIP) in 2010 and 2016, the Technical Program Chair of the International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS) in 2008 and 2012, and the General Chair of the International Conference on Quality of Multimedia Experience (QoMEX) in 2016. He has been participating in the MPEG standardization activities, notably as the head of the Portuguese delegation, chairman of the MPEG Requirements Group, and chairman of many Ad Hoc Groups related to the MPEG-4 and MPEG-7 standards. Since February 2016, he is the JPEG Requirements Chair. He has been one of the key designers of the JPEG Pleno project which targets defining standard representations for several types of plenoptic imaging, notably light fields, point clouds and holograms. He has been developing research on point cloud clustering, coding and quality assessment, and publishing in these areas. He has contributed more than 250 papers in international journals, conferences and workshops, and made several tens of invited talks at conferences and workshops. His areas of interest are video analysis, coding, description and adaptation, and advanced multimedia services.

Schedule:

Friday, July 10 – London (BST time zone)

Start	End	Talk
14:00	14:40	PART 1 1. 3D Visual Representation and Coding 2. Plenoptic Function based Imaging
14:40	14:45	Questions
14:45	15:25	PART 2 3. Point Cloud Coding: Basic Approaches
15:25	15:30	Questions
15:30	16:10	PART 3 4. Point Cloud Coding Standardization
16:10	16:15	Questions
16:15	17:55	PART 4 5. Point Cloud Coding Standardization (cont) 6. Point Cloud Coding Assessment 7. Summary and Trends
16:55	17:00	Questions