3D Video @ CGL - Compression

Recording & Compression

Image-space Compression

Since 3D video fragments are generated from 2D video pixels, the compression of a 3D video fragments stream can be based upon images. The input camera images are used as building elements of a data structure, where each 3D video fragment is identified by the position in image space and the camera identifier of the pixel it was generated from. Thus, every 3D video fragment can be uniquely referenced and the attributes can be separately stored and compressed.

Looking separately at each camera image, we are only interested in foreground pixels which contribute to the point cloud describing the 3D object. Thus, we use the segmentation mask from the camera images as reference for all subsequent coding schemes. The 3D video fragment attributes are encoded by standard or novel image and video coding schemes. The coding of the segmentation masks must be lossless, the compression of the attributes can be lossy and we privilege the use of progressive coding schemes.

The 3D video stream finally consists of key frames and delta frames which rely upon a prediction based on the closest key frame. The encoding of the key frames and the prediction algorithm are chosen such that the desired navigability features of the encoded sequence can be fullfilled.

Average coding

For true random access, which is required for unconstrained navigation in space and time, we suggest a simple but efficient coding scheme. In each foreground pixel of a time window, an average value for each attribute is computed. This information becomes the key frame, which is, within the respective time window, independent from the recording time. The delta frame is simply the difference information of the original frame and the respective key frame. It is true that for every window of N frames which uses the same key frame, we need to encode N+1 image frames. However, we can use pretty high values for N and thus, the additional cost of coding the key frame is distributed over a large number of delta frames

Average coding illustrated with a texture example: a) Key frame of 12kB. b) Masked key frame (PSNR=29.3dB). c) Key frame plus 100B of delta frame (PSNR=31.4dB). d) Key frame plus 300B of delta frame (PSNR=33.7dB). e) Key frame plus 4.3kB of delta frame (PSNR=42.8dB). f) Original frame.

Last Update: 05.01.2004