Sony Patent | Image processing device, image processing method, program, and image transmission system
Patent: Image processing device, image processing method, program, and image transmission system
Drawings: Click to check drawins
Publication Number: 20210152848
Publication Date: 20210520
Applicant: Sony
Abstract
The present disclosure relates to an image processing device, an image processing method, a program, and an image transmission system that can achieve a higher compression efficiency. A compression rate higher than a compression rate for a non-overlapping region is set for an overlapping region in which an image captured by a reference camera, which serves as a reference, of N cameras and an image captured by a non-reference camera other than the reference camera overlap each other. The image is compressed at each of the compression rates. The present technology is applicable to, for example, an image transmission system configured to transmit an image to be displayed on a display capable of expressing a three-dimensional space.
Claims
-
An image processing device comprising: a setting unit configured to set a compression rate for an overlapping region in which, of a plurality of images obtained by capturing a subject by a plurality of imaging devices from a plurality of viewpoints, the image captured by a reference imaging device, which serves as a reference, and the image captured by a non-reference imaging device other than the reference imaging device overlap each other, higher than a compression rate for a non-overlapping region; and a compression unit configured to compress the image at each of the compression rates.
-
The image processing device according to claim 1, further comprising: a detection unit configured to detect the overlapping region on a basis of information indicating a three-dimensional shape of the subject.
-
The image processing device according to claim 1, further comprising: an acquisition unit configured to acquire the image to supply the image to the compression unit.
-
The image processing device according to claim 1, wherein the setting unit sets the compression rate for the overlapping region using angle information indicating an angle between a light beam vector extending from the reference imaging device to a predetermined point on a surface of the subject and a normal vector at the predetermined point.
-
The image processing device according to claim 1, further comprising: a 3D shape calculation unit configured to calculate information indicating a three-dimensional shape of the subject from the plurality of the images obtained by imaging the subject by the plurality of the imaging devices from the plurality of viewpoints.
-
The image processing device according to claim 1, further comprising: a depth image acquisition unit configured to acquire a depth image having a depth to the subject; and a point cloud calculation unit configured to calculate, as information indicating a three-dimensional shape of the subject, point cloud information regarding the subject on a basis of the depth image.
-
The image processing device according to claim 1, further comprising: a reference imaging device decision unit configured to decide the reference imaging device from the plurality of the imaging devices.
-
The image processing device according to claim 7, wherein the reference imaging device decision unit decides the reference imaging device on a basis of distances from the plurality of the imaging devices to the subject.
-
The image processing device according to claim 7, wherein the reference imaging device decision unit decides the reference imaging device on a basis of information indicating a virtual viewpoint that is used in generating a virtual viewpoint video of the subject from an arbitrary viewpoint.
-
The image processing device according to claim 1, wherein the reference imaging device includes two or more imaging devices of the plurality of the imaging devices.
-
The image processing device according to claim 1, wherein the setting unit sets the compression rate on a basis of information indicating a virtual viewpoint that is used in generating a virtual viewpoint video of the subject from an arbitrary viewpoint.
-
An image processing method comprising: by an image processing device which compresses an image, setting a compression rate for an overlapping region in which, of a plurality of the images obtained by capturing a subject by a plurality of imaging devices from a plurality of viewpoints, the image captured by a reference imaging device, which serves as a reference, and the image captured by a non-reference imaging device other than the reference imaging device overlap each other, higher than a compression rate for a non-overlapping region; and compressing the image at each of the compression rates.
-
A program causing a computer of an image processing device which compresses an image to execute image processing comprising: setting a compression rate for an overlapping region in which, of a plurality of the images obtained by capturing a subject by a plurality of imaging devices from a plurality of viewpoints, the image captured by a reference imaging device, which serves as a reference, and the image captured by a non-reference imaging device other than the reference imaging device overlap each other, higher than a compression rate for a non-overlapping region; and compressing the image at each of the compression rates.
-
An image processing device comprising: a determination unit configured to determine, for each of a plurality of images obtained by capturing a subject from a plurality of viewpoints, whether a predetermined position of the subject from an arbitrary viewpoint on a virtual viewpoint video is a visible region or an invisible region in each of a plurality of the imaging devices on a basis of information indicating a three-dimensional shape of the subject; a decision unit configured to perform a weighted average using weight information based on a compression rate used in compressing a position corresponding to the predetermined position determined as the visible region on each of the plurality of the images, and color information indicating a color at the position corresponding to the predetermined position on each image, to thereby decide a color at the predetermined position of the virtual viewpoint video; and a generation unit configured to generate the virtual viewpoint video on a basis of the color decided by the decision unit.
-
An image processing method comprising: by an image processing device which generates an image, determining, for each of a plurality of the images obtained by capturing a subject from a plurality of viewpoints, whether a predetermined position of the subject from an arbitrary viewpoint on a virtual viewpoint video is a visible region or an invisible region in each of a plurality of the imaging devices on a basis of information indicating a three-dimensional shape of the subject; performing a weighted average using weight information based on a compression rate used in compressing a position corresponding to the predetermined position determined as the visible region on each of the plurality of the images, and color information indicating a color at the position corresponding to the predetermined position on each image, to thereby decide a color at the predetermined position of the virtual viewpoint video; and generating the virtual viewpoint video on a basis of the color decided.
-
A program causing a computer of an image processing device which generates an image to execute image processing comprising: determining, for each of a plurality of the images obtained by capturing a subject from a plurality of viewpoints, whether a predetermined position of the subject from an arbitrary viewpoint on a virtual viewpoint video is a visible region or an invisible region in each of a plurality of the imaging devices on a basis of information indicating a three-dimensional shape of the subject; performing a weighted average using weight information based on a compression rate used in compressing a position corresponding to the predetermined position determined as the visible region on each of the plurality of the images, and color information indicating a color at the position corresponding to the predetermined position on each image, to thereby decide a color at the predetermined position of the virtual viewpoint video; and generating the virtual viewpoint video on a basis of the color decided.
-
An image transmission system comprising: a first image processing device including a setting unit configured to set a compression rate for an overlapping region in which, of a plurality of images obtained by capturing a subject by a plurality of imaging devices from a plurality of viewpoints, the image captured by a reference imaging device, which serves as a reference, and the image captured by a non-reference imaging device other than the reference imaging device overlap each other, higher than a compression rate for a non-overlapping region, and a compression unit configured to compress the image at each of the compression rates; and a second image processing device including a determination unit configured to determine, for each of the plurality of images transmitted from the first image processing device, whether a predetermined position of the subject from an arbitrary viewpoint on a virtual viewpoint video is a visible region or an invisible region in each of the plurality of the imaging devices on a basis of information indicating a three-dimensional shape of the subject, a decision unit configured to perform a weighted average using weight information based on a compression rate used in compressing a position corresponding to the predetermined position determined as the visible region on each of the plurality of the images, and color information indicating a color at the position corresponding to the predetermined position on each image, to thereby decide a color at the predetermined position of the virtual viewpoint video, and a generation unit configured to generate the virtual viewpoint video on a basis of the color decided by the decision unit.
Description
TECHNICAL FIELD
[0001] The present disclosure relates to an image processing device, an image processing method, a program, and an image transmission system, and in particular, to an image processing device, an image processing method, a program, and an image transmission system that can achieve a higher compression efficiency.
BACKGROUND ART
[0002] In recent years, technologies related to AR (Augmented Reality), VR (Virtual Reality), and MR (Mixed Reality) and technologies related to stereoscopic displays configured to three-dimensionally display videos have been developed. Such technological development has leaded to the development of displays capable of presenting, to viewers, stereoscopic effects, the sense of reality, and the like that related-art displays configured to perform two-dimensional display have not been able to express.
[0003] For example, as means for displaying states of the real world on a display capable of expressing three-dimensional spaces, there is a method that utilizes a multiview video obtained by synchronously capturing a scene by a plurality of cameras arranged in the scene as a capturing subject. Meanwhile, in a case where a multiview video is used, a video data amount increases significantly, and an effective compression technology is therefore demanded.
[0004] Thus, as a method of compressing multiview videos, by H.264/MVC (Multi View Coding), a compression rate enhancement method utilizing a characteristic that videos at respective viewpoints are similar to each other is standardized. Since this method expects that videos captured by cameras are similar to each other, it is expected that the method is highly effective in a case where baselines between cameras are short but provides low compression efficiency in a case where cameras are used in a large space and baselines between the cameras are long.
[0005] In view of this, as disclosed in PTL 1, there has been proposed an image processing system configured to separate the foreground and background of a video and compress the foreground and the background at different compression rates, to thereby reduce the data amount of the entire system. This image processing system is highly effective in a case where a large scene such as a stadium is to be captured and the background region is overwhelmingly larger than the foreground region including persons, for example.
CITATION LIST
Patent Literature
[0006] [PTL 1]
[0007] Japanese Patent Laid-Open No. 2017-211828
SUMMARY
Technical Problem
[0008] Incidentally, it is expected that the image processing system proposed in PTL 1 described above provides low compression efficiency in a scene in which a subject corresponding to the foreground region in a captured image is dominant in the picture frame, for example.
[0009] The present disclosure has been made in view of such a circumstance and can achieve a higher compression efficiency.
Solution to Problem
[0010] According to a first aspect of the present disclosure, there is provided an image processing device including a setting unit configured to set a compression rate for an overlapping region in which, of a plurality of images obtained by capturing a subject by a plurality of imaging devices from a plurality of viewpoints, the image captured by a reference imaging device, which serves as a reference, and the image captured by a non-reference imaging device other than the reference imaging device overlap each other, higher than a compression rate for a non-overlapping region, and a compression unit configured to compress the image at each of the compression rates.
[0011] According to the first aspect of the present disclosure, there is provided an image processing method including, by an image processing device which compresses an image, setting a compression rate for an overlapping region in which, of a plurality of the images obtained by capturing a subject by a plurality of imaging devices from a plurality of viewpoints, the image captured by a reference imaging device, which serves as a reference, and the image captured by a non-reference imaging device other than the reference imaging device overlap each other, higher than a compression rate for a non-overlapping region, and compressing the image at each of the compression rates.
[0012] According to the first aspect of the present disclosure, there is provided a program causing a computer of an image processing device which compresses an image to execute image processing including setting a compression rate for an overlapping region in which, of a plurality of the images obtained by capturing a subject by a plurality of imaging devices from a plurality of viewpoints, the image captured by a reference imaging device, which serves as a reference, and the image captured by a non-reference imaging device other than the reference imaging device overlap each other, higher than a compression rate for a non-overlapping region, and compressing the image at each of the compression rates.
[0013] In the first aspect of the present disclosure, a compression rate for an overlapping region in which, of a plurality of images obtained by capturing a subject by a plurality of imaging devices from a plurality of viewpoints, the image captured by a reference imaging device, which serves as a reference, and the image captured by a non-reference imaging device other than the reference imaging device overlap each other, is set higher than a compression rate for a non-overlapping region. The image is compressed at each of the compression rates.
[0014] According to a second aspect of the present disclosure, there is provided an image processing device including a determination unit configured to determine, for each of a plurality of images obtained by capturing a subject from a plurality of viewpoints, whether a predetermined position of the subject from an arbitrary viewpoint on a virtual viewpoint video is a visible region or an invisible region in each of a plurality of the imaging devices on the basis of information indicating a three-dimensional shape of the subject, a decision unit configured to perform a weighted average using weight information based on a compression rate used in compressing a position corresponding to the predetermined position determined as the visible region on each of the plurality of the images, and color information indicating a color at the position corresponding to the predetermined position on each image, to thereby decide a color at the predetermined position of the virtual viewpoint video, and a generation unit configured to generate the virtual viewpoint video on the basis of the color decided by the decision unit.
[0015] According to the second aspect of the present disclosure, there is provided an image processing method including, by an image processing device which generates an image, determining, for each of a plurality of the images obtained by capturing a subject from a plurality of viewpoints, whether a predetermined position of the subject from an arbitrary viewpoint on a virtual viewpoint video is a visible region or an invisible region in each of a plurality of the imaging devices on the basis of information indicating a three-dimensional shape of the subject, performing a weighted average using weight information based on a compression rate used in compressing a position corresponding to the predetermined position determined as the visible region on each of the plurality of the images, and color information indicating a color at the position corresponding to the predetermined position on each image, to thereby decide a color at the predetermined position of the virtual viewpoint video, and generating the virtual viewpoint video on the basis of the color decided.
[0016] According to the second aspect of the present disclosure, there is provided a program causing a computer of an image processing device which generates an image to execute image processing including determining, for each of a plurality of the images obtained by capturing a subject from a plurality of viewpoints, whether a predetermined position of the subject from an arbitrary viewpoint on a virtual viewpoint video is a visible region or an invisible region in each of a plurality of the imaging devices on the basis of information indicating a three-dimensional shape of the subject, performing a weighted average using weight information based on a compression rate used in compressing a position corresponding to the predetermined position determined as the visible region on each of the plurality of the images, and color information indicating a color at the position corresponding to the predetermined position on each image, to thereby decide a color at the predetermined position of the virtual viewpoint video, and generating the virtual viewpoint video on the basis of the color decided.
[0017] In the second aspect of the present disclosure, for each of a plurality of images obtained by capturing a subject from a plurality of viewpoints, whether a predetermined position of the subject from an arbitrary viewpoint on a virtual viewpoint video is a visible region or an invisible region in each of a plurality of imaging devices is determined on the basis of information indicating a three-dimensional shape of the subject. A weighted average is performed using weight information based on a compression rate used in compressing a position corresponding to the predetermined position determined as the visible region on each of the plurality of images, and color information indicating a color at the position corresponding to the predetermined position on each image, to thereby decide a color at the predetermined position of the virtual viewpoint video. The virtual viewpoint video is generated on the basis of the color decided.
[0018] According to a third aspect of the present disclosure, there is provided an image transmission system including: a first image processing device including a setting unit configured to set a compression rate for an overlapping region in which, of a plurality of images obtained by capturing a subject by a plurality of imaging devices from a plurality of viewpoints, the image captured by a reference imaging device, which serves as a reference, and the image captured by a non-reference imaging device other than the reference imaging device overlap each other, higher than a compression rate for a non-overlapping region, and a compression unit configured to compress the image at each of the compression rates; and a second image processing device including a determination unit configured to determine, for each of the plurality of images transmitted from the first image processing device, whether a predetermined position of the subject from an arbitrary viewpoint on a virtual viewpoint video is a visible region or an invisible region in each of the plurality of the imaging devices on the basis of information indicating a three-dimensional shape of the subject, a decision unit configured to perform a weighted average using weight information based on a compression rate used in compressing a position corresponding to the predetermined position determined as the visible region on each of the plurality of the images, and color information indicating a color at the position corresponding to the predetermined position on each image, to thereby decide a color at the predetermined position of the virtual viewpoint video, and a generation unit configured to generate the virtual viewpoint video on the basis of the color decided by the decision unit.
[0019] In the third aspect of the present disclosure, in the first image processing device, a compression rate for an overlapping region in which, of a plurality of images obtained by capturing a subject by a plurality of imaging devices from a plurality of viewpoints, the image captured by a reference imaging device, which serves as a reference, and the image captured by a non-reference imaging device other than the reference imaging device overlap each other, is set higher than a compression rate for a non-overlapping region. The image is compressed at each of the compression rates. Moreover, in the second image processing device, for each of the plurality of images transmitted from the first image processing device, whether a predetermined position of the subject from an arbitrary viewpoint on a virtual viewpoint video is a visible region or an invisible region in each of the plurality of the imaging devices is determined on the basis of information indicating a three-dimensional shape of the subject. A weighted average is performed using weight information based on a compression rate used in compressing a position corresponding to the predetermined position determined as the visible region on each of the plurality of images, and color information indicating a color at the position corresponding to the predetermined position on each image, to thereby decide a color at the predetermined position of the virtual viewpoint video. The virtual viewpoint video is generated on the basis of the color decided.
Advantageous Effect of Invention
[0020] According to the first to third aspects of the present disclosure, it is possible to achieve a higher compression efficiency.
[0021] Note that the effect described here is not necessarily limited and may be any effects described in the present disclosure.
BRIEF DESCRIPTION OF DRAWINGS
[0022] FIG. 1 is a block diagram illustrating a configuration example of a first embodiment of an image transmission system to which the present technology is applied.
[0023] FIG. 2 is a diagram illustrating a deployment example of a plurality of cameras.
[0024] FIG. 3 is a block diagram illustrating a configuration example of a video compression unit.
[0025] FIG. 4 is a block diagram illustrating a configuration example of a virtual viewpoint video generation unit.
[0026] FIG. 5 is a diagram illustrating an example of overlapping regions and non-overlapping regions.
[0027] FIG. 6 is a diagram illustrating an overlap determination method.
[0028] FIG. 7 is a flowchart illustrating compressed video generation processing.
[0029] FIG. 8 is a flowchart illustrating virtual viewpoint video generation processing.
[0030] FIG. 9 is a flowchart illustrating color information and weight information acquisition processing.
[0031] FIG. 10 is a block diagram illustrating a configuration example of a second embodiment of the image transmission system.
[0032] FIG. 11 is a block diagram illustrating a configuration example of a third embodiment of the image transmission system.
[0033] FIG. 12 is a block diagram illustrating a configuration example of a fourth embodiment of the image transmission system.
[0034] FIG. 13 is a block diagram illustrating a configuration example of a fifth embodiment of the image transmission system.
[0035] FIG. 14 is a diagram illustrating a deployment example in which a plurality of cameras is arranged to surround a subject.
[0036] FIG. 15 is a diagram illustrating overlapping regions when two reference cameras are used.
[0037] FIG. 16 is a block diagram illustrating a configuration example of one embodiment of a computer to which the present technology is applied.
DESCRIPTION OF EMBODIMENTS
[0038] Now, specific embodiments to which the present technology is applied are described in detail with reference to the drawings.
[0039]
[0040] FIG. 1 is a block diagram illustrating a configuration example of a first embodiment of an image transmission system to which the present technology is applied.
[0041] As illustrated in FIG. 1, an image transmission system 11 includes a multiview video transmission unit 12 configured to transmit a multiview video obtained by capturing a subject from multiple viewpoints, and an arbitrary viewpoint video generation unit 13 configured to generate a virtual viewpoint video that is a video of a subject virtually seen from an arbitrary viewpoint to present the virtual viewpoint video to a viewer. Further, in the image transmission system 11, N cameras 14-1 to 14-N are connected to the multiview video transmission unit 12. For example, as illustrated in FIG. 2, a plurality of cameras 14 (five cameras 14-1 to 14-5 in the example of FIG. 2) is arranged at a plurality of positions around a subject.
[0042] For example, in the image transmission system 11, compressed video data that is a compressed multiview video including N images obtained by capturing a subject by the N cameras 14-1 to 14-N from N viewpoints, and 3D shape data regarding the subject are transmitted from the multiview video transmission unit 12 to the arbitrary viewpoint video generation unit 13. Then, in the image transmission system 11, a high-quality virtual viewpoint video is generated from the compressed video data and the 3D shape data by the arbitrary viewpoint video generation unit 13 to be displayed on a display device (not illustrated) such as a head mounted display, for example.
[0043] The multiview video transmission unit 12 includes N image acquisition units 21-1 to 21-N, a reference camera decision unit 22, a 3D shape calculation unit 23, N video compression units 24-1 to 24-N, a video data transmission unit 25, and a 3D shape data transmission unit 26.
[0044] The image acquisition units 21-1 to 21-N acquire images obtained by capturing a subject by the corresponding cameras 14-1 to 14-N from the N viewpoints. Then, the image acquisition units 21-1 to 21-N supply the acquired images to the 3D shape calculation unit 23 and the corresponding video compression units 24-1 to 24-N.
[0045] The reference camera decision unit 22 decides any one of the N cameras 14-1 to 14-N as a reference camera 14a serving as a reference in determining overlapping regions in which an image captured by the camera in question and images captured by other cameras overlap each other (see the reference camera 14a illustrated in FIG. 5 described later). Then, the reference camera decision unit 22 supplies, to the video compression units 24-1 to 24-N, reference camera information specifying the reference camera 14a of the cameras 14-1 to 14-N. Note that the cameras 14-1 to 14-N other than the reference camera 14a are hereinafter referred to as non-reference camera 1b appropriately (see the non-reference camera 1b illustrated in FIG. 5 described later).
[0046] The 3D shape calculation unit 23 performs calculation based on images at the N viewpoints supplied from the image acquisition units 21-1 to 21-N to acquire a 3D shape expressing a subject as a three-dimensional shape and supplies the 3D shape to the video compression units 24-1 to 24-N and the 3D shape data transmission unit 26.
[0047] For example, the 3D shape calculation unit 23 acquires the 3D shape of a subject by Visual Hull that projects a silhouette of a subject at each viewpoint to a 3D space and forms the intersection region of the silhouettes as a 3D shape, Multi view stereo that utilizes consistency of texture information between viewpoints, or the like. Note that, to achieve the processing of Visual Hull, Multi view stereo, or the like, the 3D shape calculation unit 23 needs the internal parameters and external parameters of each of the cameras 14-1 to 14-N. Such information is known through calibration, which is performed in advance. For example, as the internal parameters, camera-specific values such as focal lengths, image center coordinates, or aspect ratios are used. As the external parameters, vectors indicating an orientation and position of a camera in the world coordinate system are used.
[0048] The video compression units 24-1 to 24-N receive images captured by the corresponding cameras 14-1 to 14-N from the image acquisition units 21-1 to 21-N. Further, the video compression units 24-1 to 24-N receive reference camera information from the reference camera decision unit 22, and the 3D shape of a subject from the 3D shape calculation unit 23. Then, the video compression units 24-1 to 24-N compress, on the basis of the reference camera information and the 3D shape of the subject, the images captured by the corresponding cameras 14-1 to 14-N, and supply compressed videos acquired as a result of the compression to the video data transmission unit 25.
[0049] Here, as illustrated in FIG. 3, the video compression units 24 each include an overlapping region detection unit 41, a compression rate setting unit 42, and a compression processing unit 43.
[0050] First, the overlapping region detection unit 41 detects, on the basis of the 3D shape of a subject, overlapping regions between an image captured by the reference camera 14a and an image captured by the non-reference camera 1b. Then, in compressing the image captured by the non-reference camera 1b, the compression rate setting unit0 42 sets, for the overlapping regions, a compression rate higher than a compression rate for non-overlapping regions. For example, it is expected that, when the cameras 14-1 to 14-5 are arranged as illustrated in FIG. 2, images captured by the respective cameras 14-1 to 14-5 include a large number of overlapping regions in which the images overlap each other with respect to the subject. In such a circumstance, in compressing an image captured by the non-reference camera 1b, a compression rate for the overlapping regions is set higher than a compression rate for non-overlapping regions, so that the compression efficiency of the entire image transmission system 11 can be enhanced.
[0051] When the compression rate setting unit 42 sets compression rates for overlapping regions and non-overlapping regions in this way, the compression processing unit 43 performs the compression processing of compressing an image at each of the compression rates, to thereby acquire a compressed video. Here, the compression processing unit 43 provides the compressed video with compression information indicating the compression rates for the overlapping regions and the non-overlapping regions. Note that the compressed video generation processing that the video compression unit 24 performs to generate compressed videos is described later with reference to the flowchart of FIG. 7.
[0052] Note that it is assumed that, as the compression technology that is used by the video compression units 24-1 to 24-N, a general video compression codec such as H.264/AVC (Advanced Video Coding) or H.265/HEVC (High Efficiency Video Coding) is utilized, but the compression technology is not limited thereto.
[0053] The video data transmission unit 25 combines N compressed videos supplied from the video compression units 24-1 to 24-N to convert the N compressed videos to compressed video data to be transmitted, and transmits the compressed video data to the arbitrary viewpoint video generation unit 13.
[0054] The 3D shape data transmission unit 26 converts a 3D shape supplied from the 3D shape calculation unit 23 to 3D shape data to be transmitted, and transmits the 3D shape data to the arbitrary viewpoint video generation unit 13.
[0055] The arbitrary viewpoint video generation unit 13 includes a video data reception unit 31, a 3D shape data reception unit 32, a virtual viewpoint information acquisition unit 33, N video decompression units 34-1 to 43-N, and a virtual viewpoint video generation unit 35.
[0056] The video data reception unit 31 receives compressed video data transmitted from the video data transmission unit 25, divides the compressed video data into N compressed videos, and supplies the N compressed videos to the video decompression units 34-1 to 43-N.
[0057] The 3D shape data reception unit 32 receives 3D shape data transmitted from the 3D shape data transmission unit 26, and supplies the 3D shape of a subject based on the 3D shape data to the virtual viewpoint video generation unit 35.
[0058] The virtual viewpoint information acquisition unit 33 acquires, depending on the motion or operation of the viewer, for example, on the posture of the head mounted display, virtual viewpoint information indicating a viewpoint from which the viewer virtually sees a subject in a virtual viewpoint video, and supplies the virtual viewpoint information to the virtual viewpoint video generation unit 35.
[0059] The video decompression units 34-1 to 43-N receive, from the video data reception unit 31, compressed videos obtained by compressing images obtained by capturing a subject by the corresponding cameras 14-1 to 14-N from the N viewpoints. Then, the video decompression units 34-1 to 43-N decompress the corresponding compressed videos in accordance with a video compression codec utilized by the video compression units 24-1 to 24-N, to thereby acquire N images, and supplies the N images to the virtual viewpoint video generation unit 35. Further, the video decompression units 34-1 to 43-N acquire respective pieces of compression information given to the corresponding compressed videos, and supplies the pieces of compression information to the virtual viewpoint video generation unit 35.
[0060] Here, the compressed videos are individually subjected to the compression processing in the video compression units 24-1 to 24-N, and the video decompression units 34-1 to 43-N can individually decompress the compressed videos without data communication therebetween. That is, the video decompression units 34-1 to 43-N can perform the decompression processing in parallel, with the result that a processing time of the entire image transmission system 11 can be shortened.
[0061] The virtual viewpoint video generation unit 35 generates, on the basis of the 3D shape of a subject supplied from the 3D shape data reception unit 32 and virtual viewpoint information supplied from the virtual viewpoint information acquisition unit 33, virtual viewpoint videos by referring to respective pieces of compression information corresponding to N images.
[0062] Here, as illustrated in FIG. 4, the virtual viewpoint video generation unit 35 includes a visible region determination unit 51, a color decision unit 52, and a generation processing unit 53.
[0063] For example, the visible region determination unit 51 determines, for each of N images, whether a predetermined position on a virtual viewpoint video is a visible region or an invisible region in each of the cameras 14-1 to 14-N on the basis of the 3D shape of a subject. Further, the color decision unit 52 acquires, from compression information, a compression rate used in compressing a position corresponding to the predetermined position determined as the visible region on each of the N images, to thereby acquire weight information based on each compression rate. In addition, the color decision unit 52 acquires color information indicating a color at the position corresponding to the predetermined position determined as the visible region on each image.
[0064] Moreover, the generation processing unit 53 performs a weighted average using weight information and color information regarding each of N images to decide a color at a predetermined position of a virtual viewpoint video, to thereby generate the virtual viewpoint video. Note that the virtual viewpoint video generation processing that the virtual viewpoint video generation unit 35 performs to generate virtual viewpoint videos is described later with reference to the flowcharts of FIG. 8 and FIG. 9.
[0065] The image transmission system 11 is configured as described above, and the multiview video transmission unit 12 sets, for overlapping regions, a compression rate higher than a compression rate for non-overlapping regions, so that the compression efficiency of compressed video data can be enhanced. Further, the arbitrary viewpoint video generation unit 13 generates a virtual viewpoint video by performing a weighted average using weight information and color information regarding each of N images, so that the quality can be enhanced.
[0066]
[0067] With reference to FIG. 5 and FIG. 6, a method of detecting overlapping regions is described.
[0068] FIG. 5 schematically illustrates a range captured by the reference camera 14a and a range captured by the non-reference camera 1b.
[0069] As illustrated in FIG. 5, in a case where a subject and an object behind the subject, which is another subject (referred to as “background object”), are arranged, a region of the subject observed by both the reference camera 14a and non-reference camera 1b (region d) is an overlapping region. Further, a region of the background object observed by both the reference camera 14a and non-reference camera 1b (region b) is also an overlapping region.
[0070] Meanwhile, of the region observed by the non-reference camera 1b, a region of the background object that cannot be observed by the reference camera 14a because the region is hidden in the subject (region c) is a non-overlapping region. Further, of the region observed by the non-reference camera 1b, a side surface region of the background object that does not face the reference camera 14a (region a) is also a non-overlapping region.
[0071] Then, in a case where the corresponding cameras 14-1 to 14-N are each the non-reference camera 1b, as described above, the video compression units 24-1 to 24-N set, for the overlapping regions, a compression rate higher than a compression rate for the non-overlapping regions, and perform the compression processing of compressing the images. Here, in detecting the overlapping regions, the video compression units 24-1 to 24-N determine whether or not the images overlap each other for each of pixels constituting the images, for example.
[0072] With reference to FIG. 6, a method of determining the overlap for each of the pixels constituting images is described.
[0073] First, the video compression unit 24 corresponding to the non-reference camera 1b renders the 3D shapes of subject and background object using the internal parameters and external parameters of the reference camera 14a. With this, the video compression unit 24 obtains, for each pixel of an image including the subject and background object observed from the reference camera 14a, a depth value indicating a distance from the reference camera 14a to a surface of the subject or a surface of the background object, to thereby acquire a depth buffer with respect to all the surfaces of the subject and the background object at the viewpoint of the reference camera 14a.
[0074] Next, the video compression unit 24 renders, by referring to the depth buffer of the reference camera 14a, the 3D shapes of the subject and the background object using the internal parameters and the external parameters of the non-reference camera 1b. Then, the video compression unit 24 sequentially sets the pixels constituting an image captured by the non-reference camera 1b as a pixel of interest that is a target of an overlapping region determination, and acquires a 3D position indicating the three-dimensional position of the pixel of interest in question.
[0075] In addition, the video compression unit 24 performs Model View conversion and Projection conversion on the 3D position of the pixel of interest using the internal parameters and the external parameters of the reference camera 14a, to thereby convert the 3D position of the pixel of interest to a depth value indicating a depth from the reference camera 14a to the 3D position of the pixel of interest. Further, the video compression unit 24 projects the 3D position of the pixel of interest to the reference camera 14a to identify a pixel on a light beam extending from the reference camera 14a to the 3D position of the pixel of interest, to thereby acquire, from the depth buffer of the reference camera 14a, a depth value at the pixel position of the pixel in question.
[0076] Then, the video compression unit 24 compares the depth value of the pixel of interest to the depth value of the pixel position, and sets, in a case where the depth value of the pixel of interest is larger, a non-overlapping mark to the pixel of interest in question. Meanwhile, the video compression unit 24 sets, in a case where the depth value of the pixel of interest is smaller (or is the same), an overlapping mark to the pixel of interest in question.
[0077] For example, in the case of the 3D position of a pixel of interest as illustrated in FIG. 6, since the depth value of the pixel of interest is larger than the depth value of the pixel position, the video compression unit 24 sets a non-overlapping mark to the pixel of interest in question. That is, the pixel of interest illustrated in FIG. 6 is a non-overlapping region as illustrated in FIG. 5.
[0078] Such a determination is made on all the pixels constituting the image captured by the non-reference camera 1b, so that the video compression unit 24 can detect, as overlapping regions, regions including pixels having overlapping marks set thereto.
[0079] Note that, since an actually acquired depth buffer of the reference camera 14a has numerical calculation errors, the video compression unit 24 preferably makes a determination with some latitude when determining the overlap of a pixel of interest. Further, by the overlap determination method described here, the video compression units 24-1 to 24-N can detect overlapping regions on the basis of corresponding images, the 3D shapes of the subject and the background object, and the internal parameters and external parameters of the cameras 14-1 to 14-N. That is, the video compression units 24-1 to 24-N can each compress a corresponding image (for example, an image captured by the camera 14-1 for the video compression unit 24-1) without using images other than the image in question, and therefore efficiently perform the compression processing.
[0080]
[0081] FIG. 7 is a flowchart illustrating the compressed video generation processing that the image acquisition units 21-1 to 21-N each perform to generate a compressed video.
[0082] Here, as the compressed video generation processing that the image acquisition units 21-1 to 21-N each perform, the compressed video generation processing that an n-th video compression unit 24-n of the N image acquisition units 21-1 to 21-N performs is described. Moreover, the video compression unit 24-n receives an image captured by an n-th camera 14-n from an n-th image acquisition unit 21-n. Further, the camera 14-n is the non-reference camera 1b that is not used as the reference camera 14a.
[0083] For example, the processing starts when an image captured by the camera 14-n is supplied to the video compression unit 24-n and a 3D shape acquired with the use of the image in question is supplied from the 3D shape calculation unit 23 to the video compression unit 24-n. In Step S11, the video compression unit 24-n renders, using the internal parameters and external parameters of the reference camera 14a, the 3D shape supplied from the 3D shape calculation unit 23, and acquires the depth buffer of the reference camera 14a.
[0084] In Step S12, the video compression unit 24-n renders, using the internal parameters and external parameters of the camera 14-n, which is the non-reference camera 1b, the 3D shape supplied from the 3D shape calculation unit 23.
[0085] In Step S13, the video compression unit 24-n sets a pixel of interest from the pixels of the image captured by the camera 14-n. For example, the video compression unit 24-n can set the pixel of interest in accordance with a raster order.
[0086] In Step S14, the video compression unit 24-n acquires the 3D position of the pixel of interest in the world coordinate system on the basis of depth information obtained by the rendering in Step S12 and the internal parameters and the external parameters of the camera 14-n, which is the non-reference camera 1b.
[0087] In Step S15, the video compression unit 24-n performs Model View conversion and Projection conversion on the 3D position of the pixel of interest acquired in Step S14, using the internal parameters and the external parameters of the reference camera 14a. With this, the video compression unit 24-n acquires a depth value from the reference camera 14a to the 3D position of the pixel of interest.
[0088] In Step S16, the video compression unit 24-n projects the 3D position of the pixel of interest to the reference camera 14a, and acquires, from the depth buffer of the reference camera 14a acquired in Step S11, the depth value of a pixel position on a light beam extending from the reference camera 14a to the 3D position of the pixel of interest.
[0089] In Step S17, the video compression unit 24-n compares the depth value of the pixel of interest acquired in Step S15 to the depth value of the pixel position acquired in Step S16.
[0090] In Step S18, the video compression unit 24-n determines, on the basis of the result of the comparison in Step S17, whether or not the depth value of the 3D position of the pixel of interest is larger than the depth value corresponding to the position of the pixel of interest.
[0091] In a case where the video compression unit 24-n determines in Step S18 that the depth value of the 3D position of the pixel of interest is not larger (is equal to or smaller) than the depth value corresponding to the position of the pixel of interest, the processing proceeds to Step S19. In Step S19, the video compression unit 24-n sets an overlapping mark to the pixel of interest.
……
……
……