3D Video @ CGL - RealTime

Real-time 3D Video

Differential Streaming

Our concept of 3D video fragments exploits the spatio-temporal interframe coherence of multiple input streams by using a differential update scheme for dynamic point samples. The basic primitives of this scheme are the 3D video fragments, point samples with different attributes like, e.g., a position, a surface normal vector, and a color. The update scheme is expressed in terms of 3D fragment operators.

We distinguish between three different types of operators:

Insert adds new 3D video fragments into the representation after they have become visible in one of the input cameras.
Delete removes fragments from the representation once they vanish from the view of the input camera.
Update corrects appearance and geometry attributes of fragments that are already part of the representation, but whose attributes have changed with respect to prior frames.

The time sequence of these operators creates a differential fragment operator stream that updates a 3D video data structure on a remote site. An Insert operator results from the reprojection of a pixel with color attributes from image space back into three-dimensional object space. Any real time 3D reconstruction method which extracts depth and normals from images can be employed for this purpose. Note that the point primitives feature a one-to-one mapping between depth and color/texture samples. The 3D operators and associated data can be summarized as follows:

where p are the coordinates of a pixel, c its color, P the respective 3D coordinates of the point sample, and n its surface normal.

Operation Modes

The popular MPEG video compression standard defines three different types of video frames: I-frames are coded without any references from past frames, they are self-contained; P-frames use motion-compensated prediction from the past I- or P-frame; B-frames use bidirectional prediction from the most recent and the closest future I- or P-frame. Note that the predicted P- and B-frames achieve the highest compression rates, but a sequence encoded exclusively with prediction frames can only be rendered correctly if the receiver retains the complete data stream. Problems may occur in many situations, e.g. parts of the data stream are lost during network transmission or the sender and the receiver are started asynchronously. Moreover, prediction errors accumulate over time. In the following, we use the nomenclature from 2D video coding for describing three different modes of our 3D video pipeline. We distinguish between three different modes of our 3D video pipeline.

I-mode (Full-Reconstruction)
The 3D object is completely reconstructed in each frame, no information from previous frames is used. Hence, no differential streaming is used and the complete 3D data needs to be recomputed and transmitted. The resulting stream is composed of I-frames only and is highly redundant.
P-mode (Reliable-Prediction)
This setup keeps track of all the previous frames and only computes and transmits changes in geometry and color. In this case, reconstruction and rendering nodes need to share a totally consistent data representation and hence a reliable data transmission is required (hard synchronization). Speaking in MPEG terms, the data stream consists exclusively of P-frames.
R-mode (Redundant-Prediction)
This setup too, exploits differential information over several input frames, but also reconstructs parts of the 3D representation at regular intervals, such that the complete 3D data is regularly recomputed over time. In this case, no reliable transmission is required and errors due to an inconsistent shared data representation between reconstruction and rendering nodes only occur temporarily (soft synchronization).

Last Update: 05.01.2004