Sony Patent | Image processing apparatus and image processing method

编辑：映维 | 分类：Sony | 2021年7月29日

Patent: Image processing apparatus and image processing method

Drawings: Click to check drawins

Publication Number: 20210233303

Publication Date: 20210729

Applicant: Sony

Assignee: Sony Corporation

Sony Patent | Image processing apparatus and image processing method

Abstract

There is provided an image processing apparatus and an image processing method for enabling generation of a high-quality 3D image while suppressing a data amount. A generation device includes a generation unit that generates 3D shape data representing a 3D shape of an object, mapping data that is two-dimensionally mapped texture information of the object, and area image data of a specific area of one or more captured images obtained by capturing the object from one or more viewpoint positions. The present technology can be applied to, for example, an image processing system that displays a viewing viewpoint image of a 3D model viewed from a predetermined viewing position, or the like.

Claims

An image processing apparatus comprising: a generation unit configured to generate 3D shape data representing a 3D shape of an object, mapping data that is two-dimensionally mapped texture information of the object, and area image data of a specific area of the object captured in one or more captured images obtained by capturing the object from one or more viewpoint positions, the area image data being image data in a format different from the mapping data.
The image processing apparatus according to claim 1, wherein the mapping data is data by one of UV mapping, cube mapping, parallel projection mapping, or cylindrical coordinate projection mapping.
The image processing apparatus according to claim 1, wherein the generation unit detects the specific area by recognition processing, and generates the area image data of the detected specific area.
The image processing apparatus according to claim 1, further comprising: a viewpoint image generation unit configured to synthesize and generate a viewpoint image viewed from a same viewpoint as the viewpoint position from the 3D shape data and the mapping data; and a control unit configured to control the generation of the area image data on a basis of a difference between the viewpoint image and the captured image.
The image processing apparatus according to claim 4, further comprising: an encoding unit configured to encode the difference.
The image processing apparatus according to claim 1, wherein the generation unit generates a viewpoint synthesis image obtained by synthesizing a plurality of the captured images, and generates an image of the specific area from the viewpoint synthesis image.
The image processing apparatus according to claim 6, wherein the viewpoint synthesis image is an image having higher resolution than the captured images.
The image processing apparatus according to claim 1, further comprising: a transmission unit configured to transmit the 3D shape data, the mapping data, and the area image data to an external information processing apparatus, wherein the external information processing apparatus generates a viewing viewpoint synthesis image of a 3D model of the object viewed from a predetermined viewing position on a basis of the 3D shape data, the mapping data, and the area image data.
The image processing apparatus according to claim 1, further comprising: an encoding unit configured to encode the 3D shape data, the mapping data, and the area image data.
An image processing method comprising: by an image processing apparatus, generating 3D shape data representing a 3D shape of an object, mapping data that is two-dimensionally mapped texture information of the object, and area image data of a specific area of the object of one or more captured images obtained by capturing the object from one or more viewpoint positions, the area image data being image data in a format different from the mapping data.
An image processing apparatus comprising: a synthesis unit configured to synthesize 3D shape data representing a 3D shape of an object, mapping data that is two-dimensionally mapped texture information of the object, and area image data of a specific area of the object of one or more captured images obtained by capturing the object from one or more viewpoint positions, the area image data being image data in a format different from the mapping data, to generate a viewing viewpoint synthesis image of a 3D model of the object viewed from a predetermined viewing position.
The image processing apparatus according to claim 11, wherein the synthesis unit synthesizes a first viewing viewpoint image of a first 3D model of the object viewed from the predetermined viewing position, the first 3D model being generated from the 3D shape data and the mapping data, and a second viewing viewpoint image of a second 3D model of the object viewed from the predetermined viewing position, the second 3D model being generated from the 3D shape data and the area image data, to generate the viewing viewpoint synthesis image.
The image processing apparatus according to claim 11, wherein the synthesis unit generates a first 3D model of the object from the 3D shape data and the mapping data and generates a second 3D model of the object from the 3D shape data and the area image data, and generates the viewing viewpoint synthesis image of a 3D model viewed from the predetermined viewing position, the 3D model being obtained after the first 3D model and the second 3D model are synthesized.
The image processing apparatus according to claim 11, wherein the synthesis unit synthesizes a viewing viewpoint auxiliary synthesis image obtained by synthesizing a plurality of specific area images that is images of a plurality of the specific areas by weighted addition and a viewing viewpoint basic image based on the mapping data to generate the viewing viewpoint synthesis image.
The image processing apparatus according to claim 11, wherein the synthesis unit synthesizes the specific area image having highest reliability in a plurality of specific area images that is images of a plurality of the specific areas with a viewing viewpoint basic image based on the mapping data to generate the viewing viewpoint synthesis image.
The image processing apparatus according to claim 11, further comprising: a viewpoint image generation unit configured to generate a viewpoint image from a same viewpoint as the viewpoint position from the 3D shape data and the mapping data; and a decoding unit configured to decode the area image data obtained by encoding a difference between the viewpoint image of the specific area and the captured image, using the viewpoint image.
The image processing apparatus according to claim 16, further comprising: a first viewing viewpoint image generation unit configured to generate a viewing viewpoint basic image of a 3D model of the object viewed from the predetermined viewing position, the 3D model being generated from the 3D shape data and the mapping data; and a second viewing viewpoint image generation unit configured to generate a viewing viewpoint auxiliary image, using the difference obtained by decoding the area image data and the viewpoint image, wherein the synthesis unit synthesizes the viewing viewpoint basic image and the viewing viewpoint auxiliary image to generate the viewing viewpoint synthesis image.
The image processing apparatus according to claim 11, further comprising: a reception unit configured to receive the 3D shape data, the mapping data, and the area image data.
The image processing apparatus according to claim 11, further comprising: a decoding unit configured to decode the encoded 3D shape data, the encoded mapping data, and the encoded area image data.
An image processing method comprising: by an image processing apparatus, synthesizing 3D shape data representing a 3D shape of an object, mapping data that is two-dimensionally mapped texture information of the object, and area image data of a specific area of the object of one or more captured images obtained by capturing the object from one or more viewpoint positions, the area image data being image data in a format different from the mapping data, to generate a viewing viewpoint synthesis image of a 3D model of the object viewed from a predetermined viewing position.

Description

TECHNICAL FIELD

[0001] The present technology relates to an image processing apparatus and an image processing method, and particularly relates to an image processing apparatus and an image processing method for enabling generation of a high-quality 3D image while suppressing a data amount.

BACKGROUND ART

[0002] Various technologies have been proposed for generating and transmitting a 3D model. For example, a method of generating a 3D model shape of an object and a color of each point on a surface of the 3D model shape from a plurality of texture images and depth images obtained by capturing the object from a plurality of viewpoints has been proposed (for example, see Non-Patent Document 1).

CITATION LIST

Non-Patent Document

[0003] Non-Patent Document 1: “High-Quality Streamable Free-Viewpoint Video@SIGGRAPH20152”, Alvaro Collet, Ming Chuang, Pat Sweeney, Don Gillett, Dennis Evseev, David Calabrese, Hugues Hoppe, Adam Kirk, Steve Sullivan, ACM Trans. Graphics (SIGGRAPH), 34(4), 2015, Internet http://hhoppe.com/proj/fvv/>

SUMMARY OF THE INVENTION

Problems to be Solved by the Invention

[0004] However, according to the technology disclosed in Non-Patent Document 1, a rendering result largely depends on the accuracy of the 3D model of the object and tends to be a distorted image particularly in a case where the number of viewpoints to be captured is small, for example. Meanwhile, an information amount is increased and redundancy becomes large when the number of viewpoints to be captured is increased.

[0005] The present technology has been made in view of such a situation and enables generation of a high-quality 3D image while suppressing a data amount.

Solutions to Problems

[0006] An image processing apparatus according to the first aspect of the present technology includes a generation unit configured to generate 3D shape data representing a 3D shape of an object, mapping data that is two-dimensionally mapped texture information of the object, and area image data of a specific area of one or more captured images obtained by capturing the object from one or more viewpoint positions.

[0007] An image processing method according to the first aspect of the present technology includes, by an image processing apparatus, generating 3D shape data representing a 3D shape of an object, mapping data that is two-dimensionally mapped texture information of the object, and area image data of a specific area of one or more captured images obtained by capturing the object from one or more viewpoint positions.

[0008] In the first aspect of the present technology, 3D shape data representing a 3D shape of an object, mapping data that is two-dimensionally mapped texture information of the object, and area image data of a specific area of one or more captured images obtained by capturing the object from one or more viewpoint positions are generated.

[0009] An image processing apparatus according to the second aspect of the present technology includes a synthesis unit configured to synthesize 3D shape data representing a 3D shape of an object, mapping data that is two-dimensionally mapped texture information of the object, and area image data of a specific area of one or more captured images obtained by capturing the object from one or more viewpoint positions to generate a viewing viewpoint synthesis image that is an image of a 3D model of the object viewed from a predetermined viewing position.

[0010] An image processing method according to the second aspect of the present technology includes, by an image processing apparatus, synthesizing 3D shape data representing a 3D shape of an object, mapping data that is two-dimensionally mapped texture information of the object, and area image data of a specific area of one or more captured images obtained by capturing the object from one or more viewpoint positions to generate a viewing viewpoint synthesis image that is an image of a 3D model of the object viewed from a predetermined viewing position.

[0011] In the second aspect of the present technology, 3D shape data representing a 3D shape of an object, mapping data that is two-dimensionally mapped texture information of the object, and area image data of a specific area of one or more captured images obtained by capturing the object from one or more viewpoint positions are synthesized to generate a viewing viewpoint synthesis image that is an image of a 3D model of the object viewed from a predetermined viewing position.

[0012] Note that the image processing apparatuses according to the first and second aspects of the present technology can be implemented by causing a computer to execute a program.

[0013] Furthermore, to implement the image processing apparatuses according to the first and second aspects of the present technology, the program executed by the computer can be provided by being transmitted via a transmission medium or by being recorded on a recording medium.

[0014] The image processing apparatus may be an independent apparatus or may be internal blocks configuring one apparatus.

Effects of the Invention

[0015] According to the first and second aspects of the present technology, it is possible to generate a high-quality 3D image while suppressing a data amount.

[0016] Note that the effects described here are not necessarily limited, and any of effects described in the present disclosure may be exhibited.

BRIEF DESCRIPTION OF DRAWINGS

[0017] FIG. 1 is a block diagram illustrating a configuration example of an image processing system to which the present technology is applied.

[0018] FIG. 2 is a diagram illustrating an arrangement example of imaging devices.

[0019] FIG. 3 is a diagram for describing 3D model data.

[0020] FIG. 4 is a block diagram illustrating a configuration example of a first embodiment of a generation device.

[0021] FIG. 5 is a block diagram illustrating a configuration example of the first embodiment of a reproduction device.

[0022] FIG. 6 is a flowchart for describing 3D model data generation processing according to the first embodiment.

[0023] FIG. 7 is a flowchart for describing 3D model image generation processing according to the first embodiment.

[0024] FIG. 8 is a block diagram illustrating a configuration example of a second embodiment of a generation device.

[0025] FIG. 9 is a block diagram illustrating a configuration example of the second embodiment of a reproduction device.

[0026] FIG. 10 is a flowchart for describing 3D model data generation processing according to the second embodiment.

[0027] FIG. 11 is a flowchart for describing 3D model image generation processing according to the second embodiment.

[0028] FIG. 12 is a block diagram illustrating a configuration example of a third embodiment of a generation device.

[0029] FIG. 13 is a flowchart for describing 3D model data generation processing according to the third embodiment.

[0030] FIG. 14 is a block diagram illustrating a configuration example of an embodiment of a computer to which the present technology is applied.

MODE FOR CARRYING OUT THE INVENTION

[0031] Hereinafter, modes for implementing the present technology (hereinafter referred to as embodiments) will be described. Note that the description will be given in the following order.

[0032] 1. Image Processing System

[0033] 2. First Embodiment

[0034] 3. Flowchart of First Embodiment

[0035] 4. Second Embodiment

[0036] 5. Flowchart of Second Embodiment

[0037] 6. Third Embodiment

[0038] 7. Flowchart of Third Embodiment

[0039] 8. Configuration Example of Computer

Image Processing System

[0040] FIG. 1 illustrates a configuration example of an image processing system to which the present technology is applied.

[0041] An image processing system 1 in FIG. 1 includes a distribution side in which image data of a 3D model is generated from a plurality of captured images obtained from a plurality of imaging devices 21 and is distributed, and a reproduction side in which the image data of the 3D model transmitted from the distribution side is received, and reproduced and displayed.

[0042] For example, as illustrated in FIG. 2, imaging devices 21-1 to 21-N (N>1) are arranged at different positions in an outer periphery of an object, capture the object, and supply resultant moving images to a generation device 22. FIG. 2 illustrates an example in which eight imaging devices 21-1 to 21-8 are arranged. Each of the imaging devices 21-1 to 21-8 captures an image of the object from a direction different from the other imaging devices 21. It is assumed that the position of each imaging device 21 on a world coordinate system is known.

[0043] In the present embodiment, the moving image generated by each imaging device 21 is assumed to be a captured image (RGB image) including RGB wavelengths but the moving image may be a multispectral image including an infrared (IR) image.

[0044] Furthermore, each imaging device 21 may perform imaging a plurality of times while changing imaging conditions such as an exposure condition, a light source position, or a light source color, and may supply a resultant captured image to the generation device 22.

[0045] Moreover, each imaging device 21 may include a distance measuring sensor and measure a distance to the object, generate a depth image in which the distance to the object in a depth direction is stored as a depth value in association with each pixel of the captured image, in addition to the RGB captured image that is texture information of the object, and supply the depth image to the generation device 22. Furthermore, the distance measuring sensor may be independently present of each imaging device 21.

[0046] As a method for the distance measuring sensor for measuring the distance to the object, there are various methods such as a time of flight (TOF) method, a structured light method, a stereo matching method, and a structure from motion (SfM) method, and the method is not particularly limited. The method may be a combination of a plurality of methods. For example, the TOF method is a method of irradiating a target space with near-infrared light, receiving reflected light from an object existing in the target space, and obtaining a distance to the object in the target space on the basis of a time from when radiating the near-infrared light to when receiving the reflected light. Furthermore, the structured light method is a method of projecting a predetermined projection pattern of near-infrared light on an object existing in a target space, and detecting a shape (depth) of the object existing in the target space on the basis of a deformation state of the projection pattern. The stereo matching method is a method of obtaining a distance to an object on the basis of a parallax between two captured images of the object captured from positions different from each other. Furthermore, the SfM method is a method of calculating a relationship between images such as positioning of characteristic points using a plurality of captured images captured at angles different from each other and optimizing the relationship to perform depth detection.

[0047] Moreover, each imaging device 21 may generate information regarding reflectance (albedo) of the object as an object, information regarding environmental light or shading, additional information such as bump mapping, transmission mapping, normal mapping, and environmental mapping, and the like, and supply the generated information to the generation device 22.

[0048] Each imaging device 21 may be configured to arbitrarily combine the above-described image and additional information and supply the combined information to the generation device 22.

[0049] The generation device 22 generates 3D shape data representing a 3D shape of the object, mapping data that is two-dimensionally mapped texture information of the object, and area image data that is image data of a specific area in a plurality of captured images from the plurality of captured images respectively supplied from the imaging devices 21-1 to 21-N, and supplies the generated data to a distribution server 23. Hereinafter, the 3D shape data, the mapping data, and the area image data are collectively referred to as 3D model data.

[0050] FIG. 3 is a diagram for describing the 3D model data generated by the generation device 22 and transmitted by the distribution server 23.

[0051] For example, captured images P1 to P8 are respectively obtained by the imaging devices 21-1 to 21-8. The generation device 22 generates the 3D model of the object from the captured images P1 to P8. The 3D model is configured by the 3D shape data representing the 3D shape (geometry information) of the object and the mapping data that is two-dimensionally mapped texture information of the object. The 3D shape data is, for example, data represented by a polygon mesh, and the mapping data is, for example, data represented by a UV map. Moreover, the generation device 22 extracts one or more specific areas SP desired to have high image quality from the captured images P1 to P8, and generates the area image data. In the example in FIG. 3, three specific areas SP1 to SP3 including a face area of a person who is the object are extracted from the captured images P1 to P8.

[0052] Note that the generation device 22 can acquire the captured images temporarily stored in a predetermined storage unit such as a data server instead of directly acquiring the captured images from the imaging devices 21-1 to 21-N, and generate the 3D model data.

[0053] Returning to FIG. 1, the distribution server 23 stores the 3D model data supplied from the generation device 22 and transmits the 3D model data to a reproduction device 25 via a network 24 in response to a request from the reproduction device 25.

[0054] The distribution server 23 includes a transmission/reception unit 41 and a storage 42.

[0055] The transmission/reception unit 41 acquires the 3D model data supplied from the generation device 22 and stores the acquired 3D model data in the storage 42. Furthermore, the transmission/reception unit 41 transmits the 3D model data to the reproduction device 25 via the network 24 in response to a request from the reproduction device 25.

[0056] Note that the transmission/reception unit 41 can acquire the 3D model data from the storage 42 and transmit the 3D model data to the reproduction device 25 or can directly transmit (distribute in real time) the 3D model data that is not stored in the storage 42 and supplied from the generation device 22 to the reproduction device 25.

[0057] The network 24 is configured by, for example, the Internet, a telephone line network, a satellite communication network, various local area networks (LANs) including Ethernet (registered trademark), a wide area network (WAN), or a dedicated line network such as an internet protocol-virtual private network (IP-VPNs).

[0058] The reproduction device 25 generates (reproduces) the 3D model of the object on the basis of the 3D model data transmitted from the distribution server 23 via the network 24. More specifically, the reproduction device 25 attaches the texture information of the mapping data to the 3D shape represented by the 3D shape data as basic texture and further attaches an area image of the specific area SP represented by the area image data to the 3D shape as auxiliary texture to generate the 3D model of the object. Then, the reproduction device 25 generates (reproduces) a 3D model image of the 3D model of the object viewed from a viewing position of a viewer, the viewing position being supplied from a viewing position detection device 27, and supplies the 3D model image to a display device 26.

[0059] The display device 26 displays the 3D model image supplied from the reproduction device 25. The viewer views the 3D model image displayed on the display device 26. The viewing position detection device 27 detects the viewing position of the viewer and supplies the detected position to the reproduction device 25.

[0060] The display device 26 and the viewing position detection device 27 may be configured as an integrated device. For example, the display device 26 and the viewing position detection device 27 are configured by a head-mounted display, and detects the position to which the viewer has moved, movement of the head, and the like to detect the viewing position of the viewer. The viewing position includes a sight direction of the viewer with respect to the 3D model generated by the reproduction device 25.

[0061] As an example of configuring the display device 26 and the viewing position detection device 27 as separate devices, for example, the viewing position detection device 27 is configured by a controller for operating the viewing position or the like, for example, and the viewing position according to an operation of the controller by the viewer is supplied to the reproduction device 25. The reproduction device 25 displays the 3D model image corresponding to the specified viewing position on the display device 26.

[0062] The display device 26 or the viewing position detection device 27 can supply information regarding display functions of the display device 26, such as an image size and an angle of view of the image displayed by the display device 26, and the like to the reproduction device 25 as necessary.

[0063] The image processing system 1 configured as described above displays a high-quality image by using an image with a suppressed data amount using a free viewpoint image by the basic texture not depending on the viewpoint as the captured image of the entire object, and the area image transmitted as the auxiliary texture as the specific area SP that attracts attention from the viewer. Thereby, high image quality can be implemented while suppressing the data amount to be transmitted.

[0064] Hereinafter, detailed configurations of the generation device 22 and the reproduction device 25 will be described.

First Embodiment

[0065]

[0066] FIG. 4 is a block diagram illustrating a configuration example of a first embodiment of the generation device 22.

[0067] The generation device 22 includes an image acquisition unit 61, a 3D shape calculation unit 62, a basic texture generation unit 63, an auxiliary texture generation unit 64, a shape encoding unit 65, a basic texture encoding unit 66, an auxiliary texture encoding unit 67, and a transmission unit 68. The 3D shape calculation unit 62, the basic texture generation unit 63, and the auxiliary texture generation unit 64 may be configured as one generation unit 71, and the shape encoding unit 65, the basic texture encoding unit 66, and the auxiliary texture encoding unit 67 may be configured as one encoding unit 72.

[0068] The image acquisition unit 61 acquires a plurality of captured images supplied from the plurality of imaging devices 21 and supplies the captured images to the 3D shape calculation unit 62, the basic texture generation unit 63, and the auxiliary texture generation unit 64.

[0069] The 3D shape calculation unit 62 generates 3D shape data representing a 3D shape of an object on the basis of the plurality of captured images supplied from the image acquisition unit 61. For example, the 3D shape calculation unit 62 acquires the 3D shape of the object and generates the 3D shape data by Visual Hull of projecting silhouettes of the object at respective viewpoints on a 3D space and obtaining an intersection area of the silhouettes as a 3D shape, Multi view stereo of using consistency of texture information between viewpoints, or the like.

[0070] Note that, to implement processing such as Visual Hull or Multi view stereo, the 3D shape calculation unit 62 requires camera parameters (internal parameters and external parameters) of the plurality of imaging devices 21. Those pieces of information are input in advance in the generation device 22 and are known. For example, the internal parameter is, for example, a focal length of the imaging device 21, image center coordinates, an aspect ratio, or the like, and the external parameter is, for example, a vector indicating a direction and a position of each imaging device 21 in a world coordinate system.

[0071] The 3D shape calculation unit 62 can generate the 3D shape data by an arbitrary method such as point cloud format representing a three-dimensional position of the object as a set of points, 3D mesh format for representing the 3D shape data as connection between vertices called polygon mesh, or voxel format for representing the 3D shape data as a set of cubes called voxels. The 3D shape calculation unit 62 supplies the generated 3D shape data to the basic texture generation unit 63 and the shape encoding unit 65.

[0072] The basic texture generation unit 63 generates a texture image not depending on a sight direction on the basis of the plurality of captured images supplied from the image acquisition unit 61 and the 3D shape data supplied from the 3D shape calculation unit 62. More specifically, the basic texture generation unit 63 generates mapping data that is two-dimensionally mapped texture information of the object. For example, the basic texture generation unit 63 generates mapping data in which the texture information is mapped by an arbitrary mapping method such as UV mapping in which the texture information is associated with polygon mesh, cube mapping in which the texture information is attached to a cube, cylindrical coordinate projection mapping in which the texture information is attached to a cylinder, or parallel projection mapping in which the texture information is attached to a surface of an object in a parallel projection manner. The basic texture generation unit 63 supplies the generated mapping data to the basic texture encoding unit 66.

[0073] The auxiliary texture generation unit 64 selects and cuts out (extracts) one or more specific areas SP from among at least one of the plurality of captured images supplied from the image acquisition unit 61, thereby generating an area image of a specific area SP as auxiliary texture. The auxiliary texture generation unit 64 supplies the area image of the specific area SP and the camera parameters of the imaging device 21 that has captured the area image to the auxiliary texture encoding unit 67. Alternatively, the auxiliary texture generation unit 64 may supply data obtained by converting the area image into mapping data by U mapping or the like to the auxiliary texture encoding unit 67 as an area image, instead of the cut area image itself cut from the captured image. In this case, no camera parameters are required.

[0074] The shape of the selected specific area SP can be an arbitrary shape such as a rectangle, a circle, or a polygon. Furthermore, the shape may be determined by a free curve. Furthermore, the number of specific areas SP selected for one captured image may be one (single) or plural. Furthermore, the size of the selected specific area SP may be a fixed size set in advance or may be a size adaptively changed according to, for example, the object size of the object of interest, such as a face area.

[0075] Furthermore, the auxiliary texture generation unit 64 may select the specific area SP by a manual operation for each captured image, such as a user specifying the specific area SP using a mouse, or may automatically select the specific area SP (without the user’s operation). An example of the method of automatically selecting the specific area SP includes a method of detecting a face area of a person as an object or a specific object such as a person or a vehicle by recognition processing.

[0076] In a case where not only the RGB captured image but also a plurality of types of texture images such as bump map mapping data that expresses texture (pores and wrinkles) in a human skin area is supplied from the imaging device 21 as additional information of the object, the auxiliary texture generation unit 64 selects the specific area SP for each of the plurality of texture images and supplies the selected specific area SP to the auxiliary texture encoding unit 67. By transmitting the plurality of types of texture images regarding the specific areas SP, improvement of the texture when image data is reproduced and displayed by the reproduction device 25 can be expected, for example. Furthermore, in a case where a plurality of types of texture images with different exposure conditions is received from the imaging device 21 as the texture information of the object, a wide dynamic range image with an increased dynamic range can be generated on the reproduction device 25 side, and improvement of the image quality when image data is reproduced and displayed by the reproducing device 25 can be expected.

[0077] The user may perform the operation of specifying the specific area SP and the recognition processing for each of the plurality of captured images captured at different capturing positions. However, the auxiliary texture generation unit 64 may select the specific areas SP of the plurality of captured images by reflecting the specific area SP, which has been selected by the manual operation and the recognition processing in one of the plurality of captured images, in areas of the captured images captured at the other capturing positions. In a case of reflecting an area selected in one captured image (first captured image) in another captured image (second captured image), the same position in the world coordinate system may be selected, or the same object at different coordinate positions may be selected.

[0078] Furthermore, the selected specific area SP may be continuously selected for captured images continuous in a time direction, and can be tracked or changed in size with respect to a predetermined object.

[0079] In a case where the position or size of the specific area SP is changed depending on a captured image, the auxiliary texture generation unit 64 can transmit information regarding the position or size of the specific area SP, for example, coordinates of an upper left end portion of the specific area SP, the width and height of the specific area SP, and the like as meta information.

[0080] Furthermore, for example, in a case where the exposure conditions are different among the imaging devices 21 or in a case where the exposure conditions are changed in the time direction in the same imaging device 21, the auxiliary texture generation unit 64 can transmit information for adjusting brightness among the plurality of captured images, such as an exposure time and a gain value, as meta information.

[0081] The shape encoding unit 65 encodes the 3D shape data supplied from the 3D shape calculation unit 62 by a predetermined encoding method and supplies resultant encoded 3D shape data to the transmission unit 68. The encoding method is not particularly limited, and an arbitrary method can be adopted. For example, an encoding compression method called “Draco” developed by Google can be adopted (https://mag.osdn.jp/17/01/16/144500).

[0082] Furthermore, the shape encoding unit 65 may encode and transmit information necessary for calculating the 3D shape instead of encoding and transmitting the 3D shape data itself. For example, the shape encoding unit 65 may encode and transmit the silhouette images and camera parameters as information necessary for calculating the 3D shape by Visual Hull, or may encode and transmit the depth images, camera parameters, and the like instead of transmitting the 3D shape data in the point cloud format.

[0083] The basic texture encoding unit 66 encodes the mapping data supplied from the basic texture generation unit 63 by a predetermined encoding method and supplies resultant encoded mapping data to the transmission unit 68. The encoding method is not particularly limited, and an arbitrary method can be adopted. For example, a high efficiency video coding (HEVC) method or the like can be adopted for the mapping data by UV mapping. Furthermore, in the case where the 3D shape data is in the point cloud format, RGB information may be added to the position information of each point.

[0084] The auxiliary texture encoding unit 67 encodes the area image of the specific area SP supplied from the auxiliary texture generation unit 64 by a predetermined encoding method and supplies resultant encoded area image data to the transmission unit 68. The encoding method is not particularly limited, and for example, an arbitrary method such as an MPEG2 method and the high efficiency video coding (HEVC) method can be adopted. The camera parameters of the imaging device 21 that has captured the area image are stored as metadata in the encoded area image data, for example. The camera parameters may be transmitted for each frame or may be transmitted only at the time of change after transmitted in a first frame of a moving image.

[0085] In a case where the specific area SP selected from the captured image is a fixed area in the time direction, compression efficiency can be improved by performing predictive encoding, which is adopted in encoding by the MPEG2 method or the H.264/AVC method, for a plurality of area images adjacent in the time direction, for example.

[0086] The transmission unit 68 transmits the encoded 3D shape data, the encoded mapping data, and the encoded area image data supplied from the shape encoding unit 65, the basic texture encoding unit 66, and the auxiliary texture encoding unit 67 to the distribution server 23.

[0087]

[0088] FIG. 5 is a block diagram illustrating a configuration example of the first embodiment of the reproduction device 25.

[0089] The reproduction device 25 includes a reception unit 81, a shape decoding unit 82, a basic texture decoding unit 83, an auxiliary texture decoding unit 84, a viewing viewpoint image generation unit 85, a viewing viewpoint image generation unit 86, a viewing viewpoint image synthesis unit 87, and an output unit 88.

[0090] The shape decoding unit 82, the basic texture decoding unit 83, and the auxiliary texture decoding unit 84 may be configured as one decoding unit 91, and the viewing viewpoint image generation unit 85, the viewing viewpoint image generation unit 86, and the viewing viewpoint image synthesis unit 87 may be configured as one synthesis unit 92. The decoding unit 91 decodes the encoded 3D shape data, the encoded mapping data, and the encoded area image data. The synthesis unit 92 synthesizes the 3D shape data, the mapping data, and the area image data to generate an image viewed from a predetermined viewing position (viewing viewpoint synthesis image).

[0091] The reception unit 81 requests the distribution server 23 to transmit the 3D model data at predetermined timing, and receives the 3D model data, more specifically, the encoded 3D shape data, the encoded mapping data, and the encoded area image data transmitted from the distribution server 23 in response to the request. The reception unit 81 supplies the encoded 3D shape data to the shape decoding unit 82, supplies the encoded mapping data to the basic texture decoding unit 83, and supplies the encoded area image data to the auxiliary texture decoding unit 84.

[0092] The shape decoding unit 82 decodes the encoded 3D shape data supplied from the reception unit 81 by a method corresponding to the encoding method of the generation device 22. The shape decoding unit 82 supplies the 3D shape data obtained by decoding to the viewing viewpoint image generation unit 85 and the viewing viewpoint image generation unit 86.

[0093] The basic texture decoding unit 83 decodes the encoded mapping data supplied from the reception unit 81 by a method corresponding to the encoding method of the generation device 22. The basic texture decoding unit 83 supplies the mapping data obtained by decoding to the viewing viewpoint image generation unit 85.

[0094] The auxiliary texture decoding unit 84 decodes the encoded area image data supplied from the reception unit 81 by a method corresponding to the encoding method of the generation device 22. The auxiliary texture decoding unit 84 supplies one or more area images obtained by decoding to the viewing viewpoint image generation unit 86.

[0095] A viewing position of a viewer is supplied from the viewing position detection device 27 (FIG. 1) to the viewing viewpoint image generation unit 85 and the viewing viewpoint image generation unit 86.

[0096] The viewing viewpoint image generation unit 85 attaches the texture image of the mapping data supplied from the basic texture decoding unit 83 to a surface of the 3D shape of the 3D shape data supplied from the shape decoding unit 82 to generate a 3D model of the object. Then, the viewing viewpoint image generation unit 85 generates (renders) a viewing viewpoint image (first viewing viewpoint image) that is a 2D image of the generated 3D model of the object viewed from the viewing position supplied from the viewing position detection device 27 (FIG. 1). The viewing viewpoint image generation unit 85 supplies the generated viewing viewpoint image to the viewing viewpoint image synthesis unit 87.

[0097] In the case where the mapping method for the mapping data is the UV mapping, each position of the 3D shape of the object corresponds to the texture image. Thus, the texture image of the mapping data can be attached to the surface of the 3D shape. In the case where the mapping method is the parallel projection mapping, the cube mapping, or the like, an attaching position of the texture image is geometrically determined according to the 3D shape of the object and the projection method.

[0098] The viewing viewpoint image generation unit 86 attaches one or more area images supplied from the auxiliary texture decoding unit 84 to the surface of the 3D shape corresponding to the 3D shape data supplied from the shape decoding unit 82 to generate a 3D model of the object. In a case where the area image and the camera parameters are included in the area image data, the viewing viewpoint image generation unit 86 geometrically determines an attaching position of the area image from the area image and the camera parameters. In the case where the area image data is configured by the mapping data of the UV mapping or the like, the texture image of the mapping data can be attached to the surface of the 3D shape according to the mapping method, similarly to the basic texture.

[0099] The viewing viewpoint image generation unit 86 generates (renders) a viewing viewpoint image (second viewing viewpoint image) that is a 2D image of the generated 3D model of the object viewed from the viewing position supplied from the viewing position detection device 27 (FIG. 1). Since the area image data is data of an image of only a specific area of the object, there is an area (pixels) to which no texture is attached in the viewing viewpoint image generated by the viewing viewpoint image generation unit 86. The viewing viewpoint image generation unit 86 supplies the generated viewing viewpoint image to the viewing viewpoint image synthesis unit 87.

[0100] Hereinafter, the viewing viewpoint image based on the basic texture generated by the viewing viewpoint image generation unit 85 will be referred to as a viewing viewpoint basic image, and the viewing viewpoint image based on the auxiliary texture generated by the viewing viewpoint image generation unit 86 will be referred to as a viewing viewpoint auxiliary image, to make distinction.

[0101] In a case where two or more area images are included in the area image data, the viewing viewpoint image generation unit 86 generates the viewing viewpoint auxiliary image for each area image. At that time, the viewing viewpoint image generation unit 86 generates and adds reliability in units of pixels of the viewing viewpoint auxiliary image, the reliability being required for the viewing viewpoint image synthesis unit 87 to synthesize a plurality of viewing viewpoint auxiliary images.

[0102] The reliability can be generated as follows, for example.

[0103] First, the reliability of a pixel to which no texture is attached in the viewing viewpoint auxiliary image is set to 0 and is set as an invalid area. Thereby, it is possible to distinguish an area to which the area image (texture) is attached and an area to which no texture is attached in the viewing viewpoint auxiliary image.

[0104] In each pixel to which the area image is attached in the viewing viewpoint auxiliary image, the viewing viewpoint image generation unit 86 can set larger reliability of the viewing viewpoint auxiliary image, for the pixel closer to the imaging device 21 that has captured the area image, for example. Thereby, the image becomes coarser as the distance from the imaging device 21 to the object is more distant. Therefore, a pixel of the viewing viewpoint auxiliary image cut from the captured image captured at a position close to the object can be selected.

[0105] Alternatively, for example, the viewing viewpoint image generation unit 86 can set smaller reliability of the viewing viewpoint auxiliary image, for a pixel having an angle that is closer to 90 degrees, the angle being made by the capturing direction of the imaging device 21 that has captured the area image and a normal of the shape of the object of each pixel. By the setting, the area image obliquely facing the imaging device 21 is stretched when attached. Therefore, a pixel of the viewing viewpoint auxiliary image, the pixel facing the front as much as possible, can be selected.

[0106] Alternatively, for example, the viewing viewpoint image generation unit 86 can set larger reliability of the viewing viewpoint auxiliary image, for a pixel closer to the center of the captured image captured by the imaging device 21. By the setting, the image of an outer peripheral portion (a position with high image height) in a capture range of the imaging device 21 becomes blurred by distortion correction. Therefore, a pixel of the viewing viewpoint auxiliary image, the pixel being located in the center of the image as much as possible, can be selected.

[0107] The above is a method of setting the reliability for each pixel of the viewing viewpoint auxiliary image. However, the reliability may be set for each viewing viewpoint auxiliary image.

[0108] For example, the viewing viewpoint image generation unit 86 can set large reliability of the viewing viewpoint auxiliary image with little noise or can set large reliability of the viewing viewpoint auxiliary image cut from the captured image with high resolution by comparing SN ratios of area images. By the setting, the viewing viewpoint auxiliary image with little noise or with high resolution can be selected.

[0109] Note that, in a case where not only the viewing position but also information regarding the display functions of the display device 26 is supplied from the viewing position detection device 27 (FIG. 1) to the viewing viewpoint image generation unit 85 or the viewing viewpoint image generation unit 86, the viewing viewpoint image generation unit 85 or the viewing viewpoint image generation unit 86 can generate the viewing viewpoint image on the basis of the information.

[0110] The viewing viewpoint image synthesis unit 87 synthesizes the viewing viewpoint basic image based on the basic texture supplied from the viewing viewpoint image generation unit 85 and the viewing viewpoint auxiliary image based on the auxiliary texture supplied from the viewing viewpoint image generation unit 86 to generate a resultant viewing viewpoint synthesis image.

[0111] For a pixel having no viewing viewpoint auxiliary image based on the auxiliary texture, the viewing viewpoint basic image based on the basic texture is adopted as it is as the viewing viewpoint synthesis image in the generation of the viewing viewpoint synthesis image. For a pixel in which the viewing viewpoint basic image and one viewing viewpoint auxiliary image are present, the viewing viewpoint auxiliary image is adopted as the viewing viewpoint synthesis image. For a pixel in which the viewing viewpoint basic image and two or more viewing viewpoint auxiliary images are present, the viewing viewpoint auxiliary image with highest reliability is adopted as the viewing viewpoint synthesis image. Since a step may be caused at a boundary between the pixel in which the viewing viewpoint auxiliary image is adopted and the pixel in which the viewing viewpoint basic image is adopted in the viewing viewpoint synthesis image, the viewing viewpoint image synthesis unit 87 performs alpha blend processing and smooth the viewing viewpoint basic image and the viewing viewpoint auxiliary image near a boundary of an invalid area where the reliability is 0.

[0112] The viewing viewpoint image synthesis unit 87 supplies the generated viewing viewpoint synthesis image to the output unit 88 as a 3D model image. The output unit 88 converts the viewing viewpoint synthesis image as a 3D model image into a signal format corresponding to the input format of the display device 26, and outputs the signal.

Flowchart of First Embodiment

[0113] Next, 3D model data generation processing by the generation device 22 according to the first embodiment will be described with reference to the flowchart in FIG. 6.

[0114] First, in step S1, the image acquisition unit 61 acquires a plurality of captured images supplied from the plurality of imaging devices 21 and supplies the captured images to the 3D shape calculation unit 62, the basic texture generation unit 63, and the auxiliary texture generation unit 64.

[0115] In step S2, the 3D shape calculation unit 62 generates 3D shape data representing a 3D shape of an object on the basis of the plurality of captured images supplied from the image acquisition unit 61. The 3D shape calculation unit 62 supplies the generated 3D shape data to the basic texture generation unit 63 and the shape encoding unit 65.

[0116] In step S3, the basic texture generation unit 63 generates mapping data that is two-dimensionally mapped texture information of the object on the basis of the plurality of captured images supplied from the image acquisition unit 61 and the 3D shape data supplied from the 3D shape calculation unit 62. The basic texture generation unit 63 supplies the generated mapping data to the basic texture encoding unit 66.

[0117] In step S4, the auxiliary texture generation unit 64 selects and cuts a specific area SP from at least one of the plurality of captured images, thereby generating an area image of the specific area SP as auxiliary texture. The auxiliary texture generation unit 64 supplies the area image of the specific area SP and the camera parameters of the imaging device 21 that has captured the area image as area image data to the auxiliary texture encoding unit 67. The camera parameters may be transmitted for each frame on a constant basis or may be transmitted only at the time of change after transmitted in a first frame of a moving image.

[0118] The processing in steps S2 and S3 and the processing in S4 can be executed in any order or can be executed in parallel.

[0119] In step S5, the shape encoding unit 65 encodes the 3D shape data supplied from the 3D shape calculation unit 62 by a predetermined encoding method to generate encoded 3D shape data and supplies the encoded 3D shape data to the transmission unit 68.

[0120] In step S6, the basic texture encoding unit 66 encodes the mapping data supplied from the basic texture generation unit 63 by a predetermined encoding method to generate encoded mapping data and supplies the encoded mapping data to the transmission unit 68.

[0121] In step S7, the auxiliary texture encoding unit 67 encodes the area image supplied from the auxiliary texture generation unit 64 by a predetermined encoding method to generate encoded area image data and supplies the encoded area image data to the transmission unit 68. In the encoding, predictive encoding, which is adopted in encoding by the MPEG2 method or the H.264/AVC method, is performed for a plurality of area images adjacent in the time direction. The camera parameters of the imaging device 21 that has captured the area image are stored as metadata in the encoded area image data, for example.

[0122] The processing in steps S5 to S7 can be executed in any order or can be executed in parallel.

[0123] In step S8, the transmission unit 68 transmits the encoded 3D shape data, the encoded mapping data, and the encoded area image data to the distribution server 23.

[0124] The above processing in steps S1 to S8 is repeatedly executed while the captured images are supplied from the plurality of imaging devices 21. Then, in a case where supply of the captured images is completed, the 3D model data generation processing is terminated.

[0125] Next, 3D model image generation processing by the reproduction device 25 according to the first embodiment will be described with reference to the flowchart in FIG. 7.

[0126] First, in step S21, the reception unit 81 requests the distribution server 23 to transmit the 3D model data and receives the 3D model data, more specifically, the encoded 3D shape data, the encoded mapping data, and the encoded area image data transmitted from the distribution server 23 in response to the request. The reception unit 81 supplies the encoded 3D shape data to the shape decoding unit 82, supplies the encoded mapping data to the basic texture decoding unit 83, and supplies the encoded area image data to the auxiliary texture decoding unit 84.

[0127] In step S22, the shape decoding unit 82 decodes the encoded 3D shape data supplied from the reception unit 81 by a method corresponding to the encoding method of the generation device 22. The 3D shape data obtained by decoding is supplied to the viewing viewpoint image generation unit 85 and the viewing viewpoint image generation unit 86.

[0128] In step S23, the basic texture decoding unit 83 decodes the encoded mapping data supplied from the reception unit 81 by a method corresponding to the encoding method of the generation device 22. The basic texture decoding unit 83 supplies the mapping data obtained by decoding to the viewing viewpoint image generation unit 85.

……
……
……

本文链接：https://patent.nweon.com/19731

Sony Patent | Image processing apparatus and image processing method

您可能还喜欢...

分类

最新AR/VR行业分享

最新AR/VR论文

最新AR/VR行业招聘

Sony Patent | Image processing apparatus and image processing method

您可能还喜欢...

Sony Patent | Display Screen Front Panel Of Hmd For Viewing By Users Viewing The Hmd Player

Sony Patent | Display apparatus and imaging apparatus

Sony Patent | Information Processing Device, Information Processing Method, And Terminal Device

分类

最新AR/VR行业分享

最新AR/VR论文

最新AR/VR行业招聘