Differential Streaming
Our concept of 3D video fragments exploits the spatio-temporal
interframe coherence of multiple input streams by using a differential
update scheme for dynamic point samples. The basic primitives of this
scheme are the 3D video fragments, point samples with different
attributes like, e.g., a position, a surface normal vector, and a color.
The update scheme is expressed in terms of 3D fragment operators.
We distinguish between three different types
of operators:
-
Insert adds new 3D video fragments into the representation after
they have become visible in one of the input cameras.
-
Delete removes fragments from the representation once they vanish
from the view of the input camera.
-
Update corrects appearance and geometry attributes of fragments
that are already part of the representation, but whose attributes have
changed with respect to prior frames.
The time sequence of these operators creates
a differential fragment operator stream that updates a 3D video data
structure on a remote site. An Insert
operator results from the reprojection of a pixel with color attributes from
image space back into three-dimensional object space. Any real time 3D
reconstruction method which extracts depth and normals from images can be
employed for this purpose. Note that the point primitives feature a
one-to-one mapping between depth and color/texture samples. The 3D operators and associated data can be
summarized as follows:
where p are the coordinates of a pixel,
c its color, P the respective 3D
coordinates of the point sample, and n its surface normal.
Operation Modes
The popular MPEG video compression standard
defines three different types of video frames: I-frames are coded without
any references from past frames, they are self-contained; P-frames use
motion-compensated prediction from the past I- or P-frame; B-frames use
bidirectional prediction from the most recent and the closest future I- or
P-frame. Note that the predicted P- and B-frames achieve the highest
compression rates, but a sequence encoded exclusively with prediction frames
can only be rendered correctly if the receiver retains the complete data
stream. Problems may occur in many situations, e.g. parts of the data stream
are lost during network transmission or the sender and the receiver are
started asynchronously. Moreover, prediction errors accumulate over time. In
the following, we use the nomenclature from 2D video coding for describing
three different modes of our 3D video pipeline. We distinguish between three different modes of our 3D video
pipeline.
-
I-mode (Full-Reconstruction)
The 3D object is completely
reconstructed in each frame, no information from
previous frames is used. Hence, no differential streaming is used and the complete 3D data
needs to be recomputed and transmitted. The resulting
stream is composed of I-frames only and is highly redundant.
-
P-mode (Reliable-Prediction)
This setup keeps track of
all the previous frames and only computes and transmits
changes in geometry and color. In this case, reconstruction
and rendering nodes need to share a totally consistent data
representation and hence a reliable data transmission is
required (hard synchronization). Speaking in MPEG terms,
the data stream consists exclusively of P-frames.
-
R-mode (Redundant-Prediction)
This setup too, exploits
differential information over several input frames, but also
reconstructs parts of the 3D representation at regular intervals,
such that the complete 3D data is regularly recomputed
over time. In this case, no reliable transmission is
required and errors due to an inconsistent shared data representation
between reconstruction and rendering nodes
only occur temporarily (soft synchronization).
|