Edificio ex Ca’ Noa - Aula 1
13 Oct, 2023
14:30 (duration: 2 hours)
The DCT-based image and video coding technique was adopted by the international standards (ISO JPEG, ITU H.261/264/265/266, ISO MPEG-2/4/H, and many others) for nearly 30 years. Although researchers are still trying to improve its efficiency by fine-tuning its components and parameters, the basic structure has not changed in the past two decades. The arrival of deep learning recently spurred a new wave of developments in end-to-end learned image and video compression. This fast-growing research area has attracted more than 100+ publications in the literature, with the state-of-the-art end-to-end learned image compression showing comparable compression performance to H.266/VVC intra coding in terms of PSNR-RGB and much better MS-SSIM results. End-to-end learned video coding is also catching up quickly. Some preliminary studies report comparable PSNR-RGB results to H.265/HEVC or even H.266/VVC under the low-delay setting. These interesting results have led to intensive activities in international standards organizations (e.g. JPEG AI) and various Challenges (e.g. CLIC at CVPR and Grand Challenge on Neural Network-based Video Coding at ISCAS). In this talk, I shall overview (1) the progress of this area, with a particular focus on the recent standardization activities in JPEG AI, (2) review some notable end-to-end learned image/video compression systems, and (3) address recent efforts in creating hardware-friendly, low-complexity models. The talk will be concluded with potential research opportunities and an outlook for learned compression systems.