Vitual Camera Control System for Cinematographic 3D Video Rendering

Share Embed


Descrição do Produto

VITUAL CAMERA CONTROL SYSTEM FOR CINEMATOGRAPHIC 3D VIDEO RENDERING Hansung Kim1, Ryuuki Sakamoto1, Itaru Kitahara1,2, Tomoji Toriyama1, and Kiyoshi Kogure1 1

Knowledge Science Lab, ATR, Kyoto, Japan {hskim, skmt, kogure}@atr.jp 2 Dept. of Intelligent Interaction Technologies, Univ. of Tsukuba, Japan [email protected] ABSTRACT We propose a virtual camera control system that creates attractive videos from 3D models generated with a virtualized reality system. The proposed camera control system helps the user to generate final videos from the 3D model by referring to the grammar of film language. Many kinds of camera shots and principal camera actions are stored in the system as expertise. Therefore, even nonexperts can easily convert the 3D model to attractive movies that look as if they were edited by expert film producers with the help of the system’s expertise. The user can update the system by creating a new set of camera shots and storing it in the shots’ knowledge database. Index Terms—Camera control, Cinematographic 3D video, Virtualized realty, 1. INTRODUCTION There have already been some studies conducted on using video cameras to regenerate video captured at arbitrary viewpoints in a 3D space using the technique of Virtualized Reality [1][2]. The technique reconstructs 3D models in the space by merging multiple videos using computer vision techniques, and generates 3D free-viewpoint videos by applying CG technology to the reconstructed 3D model. We have developed a free-viewpoint rendering system using multiple cameras as shown in Fig. 1 [3]. The system reconstructs 3D models from captured video streams using a shape-from-silhouette method and generates realistic freeview video of those objects from a virtual camera. Although we can generate free-viewpoint video from the Virtualized Reality system, one more important problem remains: how can we produce attractive videos from generated 3D models? In the case of video productions for film and television, they attract audiences by changing camera positions and camera actions in response to each captured situation (hereafter, we refer to these two attributes of a camera as “shots”). At each scene, there are several choices of shots. Interestingly, different shot choices

produce different impressions and effects, even if the captured scene is the same. The “grammar of film language” formalizes these differences and describes the rules of filmmaking to produce easily understandable and attractive footage for audiences [4][5]. Matsushita et al. applied this “grammar” to the 3D CG (computer graphic) world and verified its effectiveness by rendering video productions [6]. However, this grammar has not yet been applied to real events in the real 3D world. In this paper, we propose a cinematographic virtual camera control system that helps the user to generate final videos from the 3D model by referring to the grammar of film language. Many kinds of camera shots and principal camera actions are stored in the system as expertise. Therefore, even non-experts can easily convert the 3D video to attractive movies with the help of the system’s expertise. Fig. 1 shows a flow diagram for creating a cinematographic video. 2. CINEMATOGRAPHIC CAMERA CONTROL The grammar of film language is based on a constrained condition for switching sequential camera shots. Since each single shot is nothing more than a video fragment, it is necessary to combine many shots to generate an entire video. We call such sequential combination of shots a ‘scene.’ In this section, we describe camera shot information and the constrained condition for combing sequential shots to construct a scene. 2.1. Camera Shot Camera shot information generally comprises two types of information for camera control: initial camera parameters and camera actions. The initial camera parameters are set to appropriately capture the target objects in the initial state. Specifically, they describe the relative angle between a target object and a capturing camera, in addition to the size and position of the object in the captured image as shown in Fig. 3. Labeling these values with aliases (e.g. BIRDS EYE,

Figure 1. Free-viewpoint rendering system Table 1. Taxonomy of camera actions Fixed Moving independently

Figure 2. Flow diagram for making a Cinematographic video BIRDS EYE

Shot name FixShot, BustShot, MediumShot, LongShot CraneUpShot, CraneDownShot, RaisUpShot, SpinAroundShot, TimeSliceShot

Moving tied to the target

PanShot, DollyShot

Zoom

ZoomOutShot, ZoomInShot, WhipZoomShot

2.2. Constraints on Switching Shots

SUPER HIGH

HIGH

。 90

。 θ =45 End Point

REGULAR

0

。 135

。 Start Point  ( = Target Point) 180。

θ

POV *

315

。 270



。 225

The constrained condition for switching camera shots determines the constraints on continuity between the current shot and the next shot. We provide the following two constraints on camera-shot switching to produce easily understandable and attractive videos by referring to the grammar of film language.

LOW

SUPER LOW *POV : Point Of View

Figure 3. Initial camera parameters

SUPER LOW, etc.) makes it easier to add new shot information. The camera action describes variations in camera position and the zoom parameter. In the proposed system, it is possible to set the values in the following two ways: one is by using the time-series relative difference from the initial state, and the other is by calculating the values by interpolating the initial state and the exit state, which are given as input information. The camera actions are preconfigured and stored in the database. As Table 1 shows, the proposed system provides fourteen sets of camera actions.

◆ Do not set the next camera shot to stride across the imaginary line. This confuses the audience. ◆ Do not choose a following camera shot that is similar to the current shot because the similarity reduces the effectiveness of the switch. 2.3. Applying Camera Shot The system generates a scene with a declared set of camera shots that satisfy the constraints on switching shots. If a suitable set of camera shots is found in the preserved shots’ film-knowledge database, the user declares the retrieved scene valid. If the user cannot find a suitable set of shots, however, the user can create a new set of camera shots and store it in the shots’ knowledge database. The declared or created set of

Multiple-video capturing

3D Modeling

Temporal Annotation (frame number)

Index 3D Model

Annotating Module

Camera Parameter

Cinematographic Camera Controlling Module

Free-view Video generating Module Video Rendering

Top View

Input I/F

Frontal View

Right View

Index List

Cinematographic Video

Figure 4. Block diagram of the pilot system

shots is nicknamed for easy future access (e.g. Dramatic, Suspense, etc.). To apply the shots to the scene, it is necessary to know the positions of the target objects/actors and to know the capturing-time code. We use annotation information to note the positions and time codes for notifying the system. Annotations are assumed to be made not only by humans but also by various sensor inputs such as IR sensors and pressure sensors. An imaginary line is also estimated according to the annotation information. 3. PILOT SYSTEM In this section, we introduce our pilot system for creating cinematographic 3D video. As Fig. 4 shows, the system consists of a free-view generating module, an annotating module, and a cinematographic camera-controlling module. 3.1. Free-view Generating Module We have implemented a distributed system using two PCs and eight synchronized IEEE-1394 cameras which provide 1024×768 color video streams at a speed of 30 frames/sec. The cameras oriented toward the center of the space to capture almost the same area. An intensity-based background subtraction method was used to segment the foreground regions [7]. The segmentation masks and texture information are sent via a 1-Gbps (gigabits per second) network to the modeling PC. The modeling PC reconstructs the 3D shape of the target object as a voxel volume with the ‘shape-fromsilhouette’ method, then synthesizes a 3D video from a virtual camera. The 3D space was modeled at a resolution of 300×300×200 on a 1cm×1cm×1cm voxel grid and a microfacet billboarding technique [8] is employed for rendering to generate fine free-view video. 3.2. Annotating Module When a camera shot determines the capturing parameters of a virtual camera, annotation information is necessary to indicate target objects. Annotation information consists of both spatial and temporal information. A spatial annotation is explained by the 3D positions of the target objects and their 3D regions, while a temporal annotation is described according to

Spatial Annotation (face)

Spatial Annotation (whole body)

3D Position of Each Spatial Annotation

Figure 5. Annotating tool

its time code. Fig. 5 shows a screenshot of our developed interface application for inputting spatial and temporal annotations simultaneously. Here, a spatial annotation is defined by dragging a mouse over the area, while a temporal annotation is defined by clicking on a certain point on a timescale bar. These annotations are recorded with index information and an extra “user’s area” described in a free format. The annotations are assumed to be manually input, although it is not practical to input all temporal annotations in this way because there are too many frames in a captured video sequence to be processed by humans. We solve this laborintensive problem by interpolating two different temporal annotations that have the same index information. 3.2. Camera-Controlling Module Finally, the system generates a footage by piecing all of the generated videos. Fig. 6 (a) shows example footage of a cinematographic video with camera controls using two annotations to 3D regions and five annotations to time codes. Varied shots with different angles and framing are set for the 3D video to capture a man shadowboxing dynamically. Fig. 6 (b) shows other footage that is applied to the same shots, to which a region annotation is added to the man's foot. Despite the fact that these pieces of footage made from the videos of the same scenes, the impressions they give are rather different. In the example of “Dramatic”, the camera action “CraneDownShot” gives the initial state as “Long” framing and a “BIRDS EYE” angle, and then the camera position is moved gradually closer to the ground. In the “Suspense”, on the other hand, the “DollyShot” maintains a fixed distance to the annotated target, the foot, using “Close” framing and the “REGULAR” angle. Fig. 7 shows the flow of the camera work in the footage of “Dramatic”. These pieces of footage indicate that the user can aim the camera to generate a dramatic video as shown in Fig. 6 (a) and a suspense one as in Fig. 6 (b). Clearly, then, our system can produce a video in response to the user's request. Video clips showing the results can be downloaded from the following addresses. http://coolhs99.cafe24.com/Eng/Dramatic.wmv http://coolhs99.cafe24.com/Eng/Suspense.wmv

(a) Dramatic Fig. 6. Outcome of shadowboxing scene

(b) Suspense

Fig. 7. The overview of the camera operations “Suspense”

4. CONCLUSION The goal of this study is to develop a virtual camera controlling system for creating attractive videos from 3D models. The proposed system helps users to apply expert knowledge to generate desirable and interesting film footage by using a sequence of shots taken with a virtual camera. As future work, we are going to devise a method to use sensors to automatically determine annotation information. ACKNOWLEDGEMENT This research was supported by the National Institute of Information and Communications Technology. REFERENCES [1] P. Rander, P.J. Narayanan, and T. Kanade, “Virtualized reality: constructing time-varying virtual worlds from real world events,” Proc. Visualization, pp. 277-283, 1997

[2] T. Kanade and P.J. Narayanan, “Historical Perspectives on 4D Virtualized Reality,” Proc. CVPR, pp. 165, 2006. [3] H. Kim, I. Kitahara, R. Sakamoto, and K. Kogure, “An Immersive Free-Viewpoint Video System Using Multiple Outer/Inner Cameras,” Proc. 3DPVT, 2006. [4] D. Arijon, Grammar of the Film language, Silman-James Press,1991 [5] S.D. Katz, Cinematic Motion: Film Directing: A Workshop for Staging Scenes, Michael Wiese Film Productions, 2004. [6] A. Inoue, H. Shigeno, K. Okada, and Y. Matsushita, “Introducing Grammar of the Film Language into Automatic Shooting for Face-to-face Meetings,” Proc. SAINT, pp. 277-280, 2004. [7] H. Kim, I. Kitahara, K. Kogure, T. Toriyama and K. Sohn, “Robust Foreground Segmentation from Color Video Sequences Using Background Subtraction with Multiple Thresholds,” Proc. KJPR, pp. 188-193, 2006. [8] S. Yamazaki, R. Sagawa, H. Kawasaki, K. Ikeuchi, and M. Sakauchi, “Microfacet billboarding,” Proc. Eurographics Workshop on Rendering, pp. 175–186, 2002

Lihat lebih banyak...

Comentários

Copyright © 2017 DADOSPDF Inc.