Motivation


Introduction

In recent years, there has been an increasing interest in generating free-viewpoint video sequences from multiple camera views. Apart from purely image-based approaches, free-viewpoint video can be computed by extracting geometry and texture information from a set of concentric views of the same object. Today, the robust generation and transmission of free-viewpoint video is still a challenging problem.

Our research on free-viewpoint or 3D video systems is motivated by our interest in novel immersive projection and acquisition environments for tele-presence. At ETH, we developed the blue-c (http://blue-c.ethz.ch/), two networked virtual reality portals consisting of a CAVE-like environment, augmented by an array of cameras and an active lighting system. Thus, blue-c combines the simultaneous acquisition of multiple video streams with advanced 3D projection technology. Both portals can be used for 3D video acquisition and in full operation mode they enable networked collaborative applications enhanced by 3D video-conferencing features. Even though most efforts in rendering and compression of 3D data focus on meshes, we opted for point samples as the basic primitive in our 3D video representation. In fact, a general approach handling dynamic objects must allow for topology changes, and a topology change on a 3D mesh is a costly operation and hard to achieve in real-time. Furthermore, point samples can be considered as a straightforward generalization of 2D video pixels into 3D space. In particular, our 3D video system is composed of 16 camera nodes, which acquire images and perform 2D image processing. The resulting information is streamed to a reconstruction node, which computes the actual 3D representation of the observed object. Camera and reconstruction nodes are at the same physical location and are connected in a local area network. The 3D video data is then streamed to a rendering node, which, in a real-world tele-presence application, runs at a remote location.

For free-viewpoint video, a scene is typically captured by N cameras. Different camera settings are possible, e.g. parallel view, convergent view, divergent view, but in general any setting of cameras (e.g. combinations of the above) is possible, i.e. multiple view. From these views a 3D video object is created which comprises shape and appearance. The shape can be described by, e.g., polygon meshes, point samples, implicit surfaces, depth images, or layered depth images. The appearance data is mapped onto the shape and allows the 3D video object to be seamlessly blended into new 2D or 3D video content. Appearance is typically described by a series of video streams, comprising textures and other surface properties. The 3D video object can be composed into existing content, or it can be interactively viewed from different directions, or under different illumination. 3D video objects can be applied to broadcast, storage (e.g., DVD), and interactive online applications.

 
  Home Info

Back to top
© SW, 2004
Last Update: 05.01.2004