Sony Patent | Image processing apparatus and method

编辑：映维 | 分类：Sony | 2021年9月23日

Patent: Image processing apparatus and method

Drawings: Click to check drawins

Publication Number: 20210297696

Publication Date: 20210923

Applicant: Sony

Assignee: Sony Corporation

Sony Patent | Image processing apparatus and method

Abstract

The present disclosure relates to an image processing apparatus and a method that make it easier to reproduce a two-dimensional image. A video frame that includes a patch obtained by projecting, onto a two-dimensional plane, a point cloud that represents an object having a three-dimensional shape as a group of points, and a two-dimensional image different from the patch is generated, and the generated video frame is encoded and a bitstream is generated. The present disclosure can be applied to, for example, an image processing apparatus, an electronic device, an image processing method, a program, or the like.

Claims

An image processing apparatus comprising: a generation unit that generates a video frame that includes a patch obtained by projecting, onto a two-dimensional plane, a point cloud that represents an object having a three-dimensional shape as a group of points, and a two-dimensional image different from the patch; and a coding unit that encodes the video frame generated by the generation unit to generate a bitstream.
The image processing apparatus according to claim 1, wherein the two-dimensional image is a rendered image obtained by rendering the object.
The image processing apparatus according to claim 2, wherein the rendered image is an image obtained by rendering just like imaging the object from a recommended camera position and direction.
The image processing apparatus according to claim 3, wherein the generation unit generates a moving image constituted by the video frame including a plurality of the rendered images, which are moving images, and the coding unit encodes the moving image generated by the generation unit to generate the bitstream.
The image processing apparatus according to claim 4, wherein the plurality of the rendered images, which are moving images, are rendered images obtained by rendering the object with the same camera work as each other.
The image processing apparatus according to claim 2, further comprising a rendering unit that renders the object to generate a rendered image, wherein the generation unit generates the video frame that includes the patch and the rendered image generated by the rendering unit.
The image processing apparatus according to claim 1, wherein the generation unit generates a color video frame that includes a patch obtained by projecting attribute information of the point cloud onto a two-dimensional plane and a two-dimensional image different from the patch.
The image processing apparatus according to claim 1, wherein the coding unit encodes the video frame in a multi-layered structure, and the generation unit generates a moving image that includes the two-dimensional image in the video frame of some of layers in the multi-layered structure.
The image processing apparatus according to claim 1, wherein the coding unit encodes the video frame in a multi-layered structure, and the generation unit generates a moving image that includes the two-dimensional image in the video frame of all layers in the multi-layered structure.
The image processing apparatus according to claim 1, wherein the coding unit generates the bitstream that further includes information regarding the two-dimensional image.
The image processing apparatus according to claim 10, wherein the information regarding the two-dimensional image includes two-dimensional image presence/absence identification information that indicates whether or not the bitstream includes data of the two-dimensional image.
The image processing apparatus according to claim 10, wherein the information regarding the two-dimensional image includes two-dimensional image spatial position management information for managing a position in a spatial direction of the two-dimensional image.
The image processing apparatus according to claim 10, wherein the information regarding the two-dimensional image includes two-dimensional image temporal position management information for managing a position in a time direction of the two-dimensional image.
The image processing apparatus according to claim 10, wherein the information regarding the two-dimensional image includes two-dimensional image reproduction assisting information for assisting reproduction of the two-dimensional image.
The image processing apparatus according to claim 1, wherein the coding unit encodes the two-dimensional image independently of the patch.
The image processing apparatus according to claim 15, wherein the coding unit encodes the two-dimensional image by using a coding parameter for the two-dimensional image.
An image processing method comprising: generating a video frame that includes a patch obtained by projecting, onto a two-dimensional plane, a point cloud that represents an object having a three-dimensional shape as a group of points, and a two-dimensional image different from the patch; and encoding the generated video frame to generate a bitstream.
An image processing apparatus comprising: an extraction unit that extracts, from a bitstream that includes coded data of a video frame that includes a patch obtained by projecting, onto a two-dimensional plane, a point cloud that represents an object having a three-dimensional shape as a group of points, and a two-dimensional image different from the patch, the coded data; and a two-dimensional decoding unit that decodes the coded data extracted from the bitstream by the extraction unit to restore the two-dimensional image.
The image processing apparatus according to claim 18, further comprising a three-dimensional decoding unit that decodes the bitstream to reconstruct the point cloud.
An image processing method comprising: extracting, from a bitstream that includes coded data of a video frame that includes a patch obtained by projecting, onto a two-dimensional plane, a point cloud that represents an object having a three-dimensional shape as a group of points, and a two-dimensional image different from the patch, the coded data; and decoding the coded data extracted from the bitstream to restore the two-dimensional image.

Description

TECHNICAL FIELD

[0001] The present disclosure relates to an image processing apparatus and a method, and more particularly to an image processing apparatus and a method that allow for easier reproduction of a two-dimensional image.

BACKGROUND ART

[0002] As an encoding method for 3D data representing an object having a three-dimensional shape such as a point cloud, there has conventionally been encoding using voxels such as Octree (see, for example, Non-Patent Document 1).

[0003] In recent years, as another encoding method, for example, an approach in which each of position information and color information of a point cloud is projected onto a two-dimensional plane for each subregion and encoded by an encoding method for two-dimensional images (hereinafter also referred to as a video-based approach) has been proposed (see, for example, Non-Patent Document 2 to Non-Patent Document 4).

[0004] The 3D data encoded as described above is, for example, transmitted as a bitstream and decoded. Then, the object having a three-dimensional shape is reproduced as a two-dimensional image just like an image captured with a camera at an optional position and orientation.

CITATION LIST

Non-Patent Document

[0005] Non-Patent Document 1: R. Mekuria, Student Member IEEE, K. Blom, P. Cesar., Member, IEEE, “Design, Implementation and Evaluation of a Point Cloud Codec for Tele-Immersive Video”, tcsvt paper submitted february.pdf [0006] Non-Patent Document 2: Tim Golla and Reinhard Klein, “Real-time Point Cloud Compression”, IEEE, 2015 [0007] Non-Patent Document 3: K. Mammou, “Video-based and Hierarchical Approaches Point Cloud Compression”, MPEG m41649, Oct. 2017 [0008] Non-Patent Document 4: K. Mammou, “PCC Test Model Category 2 v0”, N17248 MPEG output document, October 2017

SUMMARY OF THE INVENTION

Problems to be Solved by the Invention

[0009] However, in a case of such a reproduction method, it is necessary to render an object having a three-dimensional shape indicated by decoded and reconstructed 3D data just like capturing an image with a camera at an optional position and orientation, and there has been a possibility that a processing load increases.

[0010] The present disclosure has been made in view of such circumstances, and is intended to allow for easier reproduction of a two-dimensional image.

Solutions to Problems

[0011] An image processing apparatus according to one aspect of the present technology includes a generation unit that generates a video frame that includes a patch obtained by projecting, onto a two-dimensional plane, a point cloud that represents an object having a three-dimensional shape as a group of points, and a two-dimensional image different from the patch, and a coding unit that encodes the video frame generated by the generation unit to generate a bitstream.

[0012] An image processing method according to the one aspect of the present technology includes generating a video frame that includes a patch obtained by projecting, onto a two-dimensional plane, a point cloud that represents an object having a three-dimensional shape as a group of points, and a two-dimensional image different from the patch, and encoding the generated video frame to generate a bitstream.

[0013] An image processing apparatus according to another aspect of the present technology includes an extraction unit that extracts, from a bitstream that includes coded data of a video frame that includes a patch obtained by projecting, onto a two-dimensional plane, a point cloud that represents an object having a three-dimensional shape as a group of points, and a two-dimensional image different from the patch, the coded data, and a two-dimensional decoding unit that decodes the coded data extracted from the bitstream by the extraction unit to restore the two-dimensional image.

[0014] An image processing method according to the other aspect of the present technology includes extracting, from a bitstream that includes coded data of a video frame that includes a patch obtained by projecting, onto a two-dimensional plane, a point cloud that represents an object having a three-dimensional shape as a group of points, and a two-dimensional image different from the patch, the coded data, and decoding the coded data extracted from the bitstream to restore the two-dimensional image.

[0015] In the image processing apparatus and the method according to the one aspect of the present technology, a video frame that includes a patch obtained by projecting, onto a two-dimensional plane, a point cloud that represents an object having a three-dimensional shape as a group of points, and a two-dimensional image different from the patch is generated, and the generated video frame is encoded and a bitstream is generated.

[0016] In the image processing apparatus and method according to the other aspect of the present technology, from a bitstream that includes coded data of a video frame that includes a patch obtained by projecting, onto a two-dimensional plane, a point cloud that represents an object having a three-dimensional shape as a group of points, and a two-dimensional image different from the patch, the coded data is extracted, and the coded data extracted from the bitstream is decoded and the two-dimensional image is restored.

Effects of the Invention

[0017] According to the present disclosure, images can be processed. In particular, a two-dimensional image can be reproduced more easily.

BRIEF DESCRIPTION OF DRAWINGS

[0018] FIG. 1 illustrates addition of 2D data to a bitstream.

[0019] FIG. 2 illustrates an outline of a system.

[0020] FIG. 3 illustrates an example of a camera parameter.

[0021] FIG. 4 illustrates an example of syntax.

[0022] FIG. 5 illustrates an example of syntax.

[0023] FIG. 6 illustrates an example of syntax.

[0024] FIG. 7 illustrates an example of adding 2D data.

[0025] FIG. 8 is a block diagram illustrating an example of a main configuration of a coding device.

[0026] FIG. 9 is a block diagram illustrating an example of a main configuration of a 2D data generation unit.

[0027] FIG. 10 is a flowchart illustrating an example of a flow of coding processing.

[0028] FIG. 11 is a flowchart illustrating an example of a flow of 2D data generation processing.

[0029] FIG. 12 is a block diagram illustrating an example of a main configuration of a decoding device.

[0030] FIG. 13 is a flowchart illustrating an example of a flow of decoding processing.

[0031] FIG. 14 is a block diagram illustrating an example of a main configuration of a decoding device.

[0032] FIG. 15 is a block diagram illustrating an example of a main configuration of a computer.

MODE FOR CARRYING OUT THE INVENTION

[0033] Modes for carrying out the present disclosure (hereinafter referred to as “embodiments”) will be described below. Note that the description will be made in the order below.

[0034] 1. Addition of 2D data

[0035] 2. First embodiment (coding device)

[0036] 3. Second embodiment (decoding device)

[0037] 4. Third embodiment (decoding device)

[0038] 5. Note

Addition of 2D Data

[0039]

[0040] The scope disclosed in the present technology includes not only the contents described in the embodiments but also the contents described in the following non-patent documents known at the time of filing.

[0041] Non-Patent Document 1: (described above)

[0042] Non-Patent Document 2: (described above)

[0043] Non-Patent Document 3: (described above)

[0044] Non-Patent Document 4: (described above)

[0045] Non-Patent Document 5: TELECOMMUNICATION STANDARDIZATION SECTOR OF ITU (International Telecommunication Union), “Advanced video coding for generic audiovisual services”, H.264, April 2017

[0046] Non-Patent Document 6: TELECOMMUNICATION STANDARDIZATION SECTOR OF ITU (International Telecommunication Union), “High efficiency video coding”, H.265, December 2016

[0047] Non-Patent Document 7: Jianle Chen, Elena Alshina, Gary J. Sullivan, Jens-Rainer, Jill Boyce, “Algorithm Description of Joint Exploration Test Model 4”, JVET-G1001 v1, Joint Video Exploration Team (JVET) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11 7th Meeting: Torino, IT, 13-21 Jul. 2017

[0048] That is, the contents described in the non-patent documents described above are also the basis for determining support requirements. For example, even in a case where a quad-tree block structure described in Non-Patent Document 6 and a quad tree plus binary tree (QTBT) block structure described in Non-Patent Document 7 are not directly described in the embodiments, they are included in the scope of the disclosure of the present technology and meet the support requirements of the claims. Furthermore, for example, technical terms such as parsing, syntax, and semantics are also included in the scope of the disclosure of the present technology and meet the support requirements of the claims even in a case where they are not direct described in the embodiments.

[0049]

[0050] There has conventionally been 3D data such as a point cloud representing an object having a three-dimensional shape on the basis of position information, attribute information, and the like of a group of points, and a mesh that is constituted by vertices, edges, and faces and defines an object having a three-dimensional shape using a polygonal representation.

[0051] For example, in the case of a point cloud, a three-dimensional structure (object having a three-dimensional shape) is represented as a set of a large number of points (group of points). That is, point cloud data is constituted by position information and attribute information (e.g., color) of each point in this group of points. Consequently, the data has a relatively simple structure, and any three-dimensional structure can be represented with sufficient accuracy with use of a sufficiently large number of points.

[0052]

[0053] A video-based approach has been proposed, in which a two-dimensional image is formed by projecting each of position information and color information of such a point cloud onto a two-dimensional plane for each subregion, and the two-dimensional image is encoded by an encoding method for two-dimensional images.

[0054] In this video-based approach, an input point cloud is divided into a plurality of segmentations (also referred to as regions), and each region is projected onto a two-dimensional plane. Note that data for each position of the point cloud (i.e., data for each point) is constituted by position information (geometry (also referred to as depth)) and attribute information (texture) as described above, and each region is projected onto a two-dimensional plane.

[0055] Then, each segmentation (also referred to as a patch) projected onto the two-dimensional plane is arranged to form a two-dimensional image, and is encoded by an encoding method for two-dimensional plane images such as Advanced Video Coding (AVC) or High Efficiency Video Coding (HEVC).

[0056]

[0057] The 3D data encoded as described above is, for example, encoded and transmitted as a bitstream to a transmission destination, where the 3D data is decoded and then reproduced. For example, in a case of a device having a 2D display that displays a two-dimensional image, an object having a three-dimensional shape indicated by decoded and reconstructed 3D data is rendered just like capturing an image with a camera at an optional position and orientation, and displayed on the 2D display as a two-dimensional image (also referred to as a rendered image).

[0058] Note that a two-dimensional image (rendered image) obtained by rendering an object as described above is different from a two-dimensional image (two-dimensional image in which patches are arranged) at the time of encoding. A two-dimensional image in which patches are arranged is a format for transmitting 3D data, and is not an image intended for display. That is, even if this two-dimensional image in which the patches are arranged is displayed, the displayed image cannot be understood by a user who views it (the image does not serve as content). On the other hand, a rendered image is an image that represents an object having a three-dimensional shape in two dimensions. Consequently, the image is displayed as an image that can be understood by a user who views it (the image serves as content).

[0059] However, in the case of this reproduction method, it is necessary to render an object having a three-dimensional shape. This rendering involves a large load, and there has been a possibility that a processing time increases. For example, in a case of checking the type of content data of a bitstream, it has been necessary to decode the bitstream, reconstruct 3D data, further render an object indicated by the 3D data, and reproduce the object as a two-dimensional image. Therefore, there has been a possibility that the time required to check the contents of the bitstream increases. Furthermore, for example, in a case where a recommended camera work (position, direction, or the like of a camera used for rendering) is specified on an encoding side, and a rendered image obtained by rendering the object with the recommended camera work is displayed on a decoding side, it is necessary to render the object on the decoding side, and there has been a possibility that the time required to display the rendered image increases.

[0060] Furthermore, the load of rendering is heavy, and there has been a possibility that only higher-performance devices can be equipped with a bitstream decoding/reproduction function. That is to say, there has been a possibility that the number of devices that cannot be equipped with the bitstream decoding/reproduction function increases (there has been a possibility that the number of devices that can be equipped with the bitstream decoding/reproduction function reduces).

[0061]

[0062] Thus, as shown in the first row from the top of Table 10 in FIG. 1, in a video-based approach for 3D data on the encoding side, 2D data, which is different from the 3D data, is added to a bitstream. That is, data that can be displayed without the need for rendering is included in the bitstream of the 3D data.

[0063] Thus, on the decoding side, a two-dimensional image can be displayed (2D data included in a bitstream can be reproduced) without rendering of an object having a three-dimensional shape.

[0064] For example, in a case of displaying 3D data as three-dimensional content on a 3D display 35 on the decoding side in FIG. 2, a 3D data decoder 32 decodes a bitstream of the 3D data and reconstructs the 3D data (e.g., a point cloud). Then, the 3D display 35 displays the 3D data.

[0065] On the other hand, for example, in a case of displaying 3D data as a two-dimensional image on a 2D display 36 on the decoding side in FIG. 2, the 3D data decoder 32 decodes a bitstream of the 3D data and reconstructs the 3D data. Then, a renderer 34 renders the 3D data to generate a rendered image (two-dimensional image), and the 2D display 36 displays the rendered image. That is, in this case, rendering processing is required, and there has been a possibility that the load increases.

[0066] On the other hand, in a case where 2D data (e.g., a rendered image) has been added to a bitstream, a demultiplexer 31 extracts coded data of the 2D data from the bitstream, a 2D video decoder 33 decodes the coded data to generate a two-dimensional image, and the 2D display 36 can thus display the two-dimensional image. That is, the rendering processing on the decoding side can be skipped (omitted).

[0067] That is, a two-dimensional image can be displayed more easily. Consequently, for example, a two-dimensional image indicating contents of a bitstream can be included in the bitstream so that the two-dimensional image can be displayed without rendering of an object having a three-dimensional shape on the decoding side. Consequently, the contents of the bitstream can be checked more quickly. Furthermore, for example, a rendered image obtained with a recommended camera work can be added as 2D data to a bitstream, so that the rendered image can be displayed without rendering of an object having a three-dimensional shape on the decoding side. Consequently, the recommended camera work can be checked more quickly.

[0068] Furthermore, a two-dimensional image can be displayed without rendering of an object having a three-dimensional shape, which requires a heavy processing load, and this allows even lower-performance devices to reproduce 2D data included in a bitstream. Consequently, it is possible to suppress a reduction in the number of devices that cannot be equipped with the bitstream decoding/reproduction function (increase the number of devices that can be equipped with the bitstream decoding/reproduction function).

[0069]

[0070] Note that contents of the 2D data added to the bitstream of the 3D data are optional as long as the contents are different from the patches of the 3D data. For example, as shown in the second row from the top of Table 10 in FIG. 1, the contents may be a rendered image of an object having a three-dimensional shape indicated by the 3D data.

[0071] For example, the contents may be a rendered image obtained by rendering the 3D data just like imaging the object having a three-dimensional shape indicated by the 3D data with a predetermined camera work (position, direction, or the like of the rendering camera). For example, when a 3D data encoder 21 encodes 3D data (point cloud) on the encoding side in FIG. 2, the 3D data encoder 21 may also encode a rendered image of the 3D data to be encoded to generate coded data, and the coded data of the rendered image may be added to a bitstream that includes coded data of the 3D data. That is, the rendered image may be added to the bitstream of the 3D data.

[0072] This allows the rendered image to be displayed on the decoding side without rendering (that is, more easily). For example, on the decoding side in FIG. 2, the demultiplexer 31 extracts coded data of a rendered image from a bitstream, and the 2D video decoder decodes the coded data, and thus the rendered image can be obtained. That is, the rendering processing can be skipped (omitted).

[0073] Furthermore, for example, as shown in the third row from the top of Table 10 in FIG. 1, this rendered image may be an image obtained by rendering just like imaging an object having a three-dimensional shape indicated by 3D data from a recommended camera position and direction. That is, this rendered image may be an image obtained by rendering with a recommended camera work.

[0074] For example, when the 3D data encoder 21 encodes 3D data (point cloud) on the encoding side in FIG. 2, the 3D data encoder 21 may also encode a rendered image obtained by rendering an object in the 3D data to be encoded with a recommended camera work to generate coded data, and the coded data of the rendered image may be added to a bitstream that includes coded data of the 3D data. That is, the rendered image obtained by rendering with the recommended camera work may be added to the bitstream of the 3D data.

[0075] This allows a rendered image obtained with a recommended camera work specified on the encoding side to be displayed on the decoding side without rendering (that is, more easily). For example, on the decoding side in FIG. 2, the demultiplexer 31 extracts coded data of a rendered image from a bitstream, and the 2D video decoder decodes the coded data, and thus the rendered image obtained with a recommended camera work specified on the encoding side can be obtained. That is, the rendering processing can be skipped (omitted).

[0076] Note that, for example, as shown in the fourth row from the top of Table 10 in FIG. 1, this rendered image may be generated on the encoding side. For example, on the encoding side in FIG. 2, a renderer 22 may render an object in 3D data to be encoded to generate a rendered image, and the 3D data encoder 21 may encode and add the rendered image to a bitstream of the 3D data.

[0077] This allows a rendered image generated on the encoding side to be displayed on the decoding side without rendering (that is, more easily). For example, on the decoding side in FIG. 2, the demultiplexer 31 extracts coded data of a rendered image from a bitstream, and the 2D video decoder decodes the coded data, and thus the rendered image generated by the renderer 22 can be obtained. That is, the rendering processing on the decoding side can be skipped (omitted).

[0078] Note that the 2D data is not limited to the example described above. This 2D data may not be a rendered image. For example, the 2D data may be an image including information (characters, numbers, symbols, figures, patterns, or the like) regarding contents of 3D data included in a bitstream. Such 2D data may be added to the bitstream so that the information regarding the contents of the 3D data can be more easily displayed on the decoding side. That is, a user on the decoding side can grasp the contents of the bitstream more quickly. Furthermore, the user can grasp the contents of the bitstream on a wider variety of devices.

[0079] Furthermore, the 2D data may be an image with contents independent of the 3D data included in the bitstream (an irrelevant image). For example, the 2D data may be a rendered image of an object different from the object indicated by the 3D data included in the bitstream, or may be an image including information (characters, numbers, symbols, figures, patterns, or the like) unrelated to the contents of the 3D data included in the bitstream. Such 2D data may be added to the bitstream so that a wider variety of information can be more easily displayed on the decoding side. That is, a user on the decoding side can obtain a wider variety of information more quickly. Furthermore, the user can obtain a wider variety of information on a wider variety of devices.

[0080] Furthermore, each of the 2D data and the 3D data may be a moving image or a still image. Moreover, the length of reproduction time of the 2D data and that of the 3D data may be the same as each other or may be different from each other. Such 2D data may be added to the bitstream so that the 2D data can be more easily displayed on the decoding side, regardless of whether the 2D data is a moving image or a still image. That is, a user on the decoding side can start viewing the 2D data more quickly, regardless of whether the 2D data is a moving image or a still image. Furthermore, the user can view the 2D data on a wider variety of devices, regardless of whether the 2D data is a moving image or a still image.

[0081] Furthermore, a plurality of pieces of 2D data may be added to the bitstream of the 3D data. Moreover, the lengths of reproduction time of the plurality of pieces of 2D data may be the same as each other or may be different from each other. Furthermore, the plurality of pieces of 2D data may be added to the bitstream in a state in which each of them is reproduced in sequence.

[0082] For example, a plurality of pieces of 2D data may be added to a bitstream in a state in which each of them is reproduced in sequence, so that the plurality of pieces of 2D data can be more easily reproduced in sequence on the decoding side. That is, a user on the decoding side can start viewing the plurality of pieces of 2D data more quickly. Furthermore, the user can view the plurality of pieces of 2D data on a wider variety of devices.

[0083] For example, as shown in the fifth row from the top of Table 10 in FIG. 1, the same moving image may be added to the bitstream a plurality of times as the plurality of pieces of 2D data. Thus, the moving image can be reproduced a plurality of times more easily on the decoding side. That is, a user on the decoding side can start viewing the moving image reproduced the plurality of times more quickly. Furthermore, the user can view the moving image reproduced the plurality of times on a wider variety of devices.

[0084] Furthermore, for example, as shown in the fifth row from the top of Table 10 in FIG. 1, as the plurality of pieces of 2D data, for example, moving images with contents that are different from each other may be added to the bitstream in a state in which each of them is reproduced in sequence. For example, as the moving images with contents that are different from each other, a plurality of rendered images (moving images) obtained by rendering with camera works (camera position, direction, or the like) that are different from each other may be added to the bitstream. Thus, the rendered images from a plurality of viewpoints (by a plurality of camera works) can be displayed (the rendered images from the corresponding viewpoints (by the corresponding camera works) can be displayed in sequence) more easily on the decoding side. That is, a user on the decoding side can start viewing the rendered images from the plurality of viewpoints more quickly. Furthermore, the user can view the rendered images from the plurality of viewpoints on a wider variety of devices.

[0085]

[0086] Note that 2D data may be added to any location in a bitstream. For example, the 2D data may be added to a video frame. As described above, a point cloud (3D data) is constituted by position information and attribute information of a group of points. Furthermore, in a case of the video-based approach, position information and attribute information of a point cloud are projected onto a two-dimensional plane for each segmentation and packed in a video frame as patches. The 2D data described above may be added to such a video frame.

[0087] Adding the 2D data to the video frame as described above makes it possible to encode the 2D data together with the 3D data. For example, in a case of FIG. 2, the 3D data encoder 21 encodes a packed video frame by an encoding method for two-dimensional plane images such as AVC or HEVC, thereby encoding 3D data and 2D data. That is, 2D data can be encoded more easily.

[0088] Furthermore, 2D data can be decoded more easily on the decoding side. For example, in the case of FIG. 2, the 2D video decoder 33 can generate 2D data by decoding coded data by a decoding method for two-dimensional plane images such as AVC or HEVC.

[0089] Note that, for example, as shown in the sixth row from the top of Table 10 in FIG. 1, 2D data may be added to a color video frame in which attribute information patches of a point cloud are packed. As illustrated in FIG. 3, a bitstream 40 of 3D data includes a stream header 41, a group of frames (GOF) stream 42-1, a GOF stream 42-2, … , a GOF stream 42-n-1, and a GOF stream 42-n (n is an optional natural number).

[0090] The stream header 41 is header information of the bitstream 40, where various types of information regarding the bitstream 40 are stored.

[0091] Each of the GOF stream 42-1 to the GOF stream 42-n is created by packing correlations in a time direction in random access. That is, they are bitstreams for a predetermined length of time. In a case where it is not necessary to distinguish the GOF stream 42-1 to the GOF stream 42-n from each other in the description, they are referred to as GOF streams 42.

[0092] A GOF stream 42 includes a GOF header 51, a GOF geometry video stream 52, GOF auxiliary info & occupancy maps 53, and a GOF texture video stream 54.

[0093] The GOF header 51 includes parameters 61 for the corresponding GOF stream 42. The parameters 61 include parameters such as information regarding a frame height (frame Width), information regarding a frame width (frameHeight), and information regarding a resolution of an occupancy map (occupancyResolution), for example.

[0094] The GOF geometry video stream 52 is coded data (bitstream) obtained by encoding, by an encoding method for two-dimensional plane images such as AVC or HEVC, a geometry video frame 62 in which position information patches of a point cloud are packed, for example.

[0095] The GOF auxiliary info & occupancy maps 53 are coded data (bitstream) in which auxiliary info and an occupancy map 64 are encoded by a predetermined encoding method. The occupancy map 64 is map information that indicates whether or not there are position information and attribute information at each position on a two-dimensional plane.

[0096] The GOF texture video stream 54 is coded data (bitstream) obtained by encoding a color video frame 65 by an encoding method for two-dimensional plane images such as AVC or HEVC, for example. This color video frame 65 may have 2D data 72 added.

[0097] With such a configuration, 2D data can be encoded together with 3D data. For example, in the case of FIG. 2, the 3D data encoder 21 encodes a packed color video frame by an encoding method for two-dimensional plane images such as AVC or HEVC, thereby encoding not only attribute information of a point cloud but also 2D data. That is, 2D data can be encoded more easily.

[0098] Furthermore, 2D data can be decoded more easily on the decoding side. For example, in the case of FIG. 2, the demultiplexer 31 extracts coded data of a color video frame (the GOF texture video stream 54 in the case of the example in FIG. 3) from a bitstream, and the 2D video decoder 33 decodes the extracted coded data (the GOF texture video stream 54) by a decoding method for two-dimensional plane images such as AVC or HEVC, and thus 2D data (the 2D data 72 in the case of the example in FIG. 3) can be generated.

[0099] Note that, in this case, the 2D data 72 is information different from a point cloud, and the 2D data 72 is not reflected in the occupancy map 64. Consequently, for example, in a case where the 3D data decoder 32 (FIG. 2) decodes the bitstream 40 of the 3D data, the 2D data 72 is ignored. That is, the 3D data decoder 32 can decode the bitstream 40 in a similar manner to a case of decoding a bitstream of 3D data to which 2D data is not added. That is, 3D data can be easily decoded.

[0100] Furthermore, 2D data may be added to all color video frames, or may be added to some of the video frames. For example, as shown in the seventh row from the top of Table 10 in FIG. 1, in a case where color video frames are hierarchically encoded in the time direction, 2D data may be added to video frames of all layers to be encoded, or the 2D data may be added to video frames of some of the layers to be encoded.

[0101] In a case of the video-based approach, a predetermined number of two-dimensional images may be generated from one point cloud frame so that a patch depth can be represented. In other words, a plurality of patches can be generated in a depth direction for one point cloud frame. In that case, a packed video frame can be hierarchically encoded in the time direction, and a layer to be encoded can be assigned to each position in the patch depth direction (each patch can be arranged in a video frame of a layer corresponding to the depth direction).

[0102] In a case of such a layered structure, for example, when 2D data is added to all color video frames of 3D data in order from the first frame, there is a possibility that the 2D data may not be added to color video frames that come later in the encoding/decoding order. That is, when 2D data is extracted from all color video frames of 3D data and displayed, there is a possibility that video frames that come later in the encoding/decoding order may become a noise image (an image that is not 2D data).

[0103] Thus, 2D data may be added only to color video frames of some of the layers to be encoded in the layered structure described above, and the color video frames of all the layers to be encoded may be encoded in accordance with the layered structure, and then the 2D data may be reproduced by decoding only coded data of the color video frames of the layers to be encoded to which the 2D data has been added. For example, 2D data may be added to a color video frame of one layer to be encoded, and the color video frames of all the layers to be encoded may be encoded in accordance with the layered structure, and then the 2D data may be reproduced by decoding only coded data of the color video frame of the layer to be encoded to which the 2D data has been added. Thus, the 2D data can be extracted from all the decoded color video frames, and this prevents the noise image described above from being displayed.

[0104] Furthermore, the 2D data may be added to the color video frames of all the layers to be encoded in the layered structure described above. For example, the 2D data may be added to all the color video frames in the layered structure to be encoded, in order from the first frame. In that case, the same 2D data may be added repeatedly. For example, when an image of the last frame of a two-dimensional moving image has been added to a certain color video frame, the images of the 2D data may be added again to the subsequent color video frames from the first frame. Thus, the 2D data can be extracted from all the decoded color video frames, and this prevents the noise image described above from being displayed.

[0105] For example, in a case where 2D data is used as rendered images of 3D data, each of rendered images obtained by rendering by a predetermined camera work may be added to color video frames of all layers in order from the first frame, and after the last rendered image has been added, the rendered images once added may be added to the remaining color video frames in order from the first rendered image. Thus, one rendered image, which is a moving image, can be repeatedly displayed on the decoding side.

[0106] Furthermore, new 2D data may be added. For example, when an image of the last frame of a two-dimensional moving image has been added to a certain color video frame, images of new 2D data may be added to the subsequent color video frames from the first frame. Thus, the 2D data can be extracted from all the decoded color video frames, and this prevents the noise image described above from being displayed.

[0107] For example, in a case where 2D data is used as rendered images of 3D data, each of rendered images obtained by rendering by a predetermined camera work may be added to color video frames of all layers in order from the first frame, and after the last rendered image has been added, each of rendered images obtained by rendering by a new camera work may be added to the remaining color video frames in order. Thus, a plurality of rendered images, which are moving images, can be displayed in sequence on the decoding side.

[0108] Note that, in a similar manner, in a case where color video frames are hierarchically encoded in a scalable manner, 2D data may be added to video frames of all layers, or the 2D data may be added to video frames of some of the layers.

[0109] Note that the 2D data may be added to other than color video frames. For example, the 2D data may be added to geometry video frames.

[0110]

[0111] Information regarding 2D data to be added to a bitstream of 3D data as described above may be further included in the bitstream. This information regarding the 2D data may be any information.

[0112] Furthermore, the information regarding the 2D data may be added to any position in the bitstream. For example, the information regarding the 2D data may be added to a header of the bitstream as metadata. For example, as illustrated in FIG. 3, the information regarding the 2D data may be added as 2D control syntax 71 to the stream header 41 of the bitstream 40.

[0113] For example, as shown in the eighth row from the top of Table 10 in FIG. 1, information regarding a two-dimensional image may include two-dimensional image presence/absence identification information that indicates whether or not a bitstream includes two-dimensional image data.

[0114] FIG. 4 illustrates an example of the 2D control syntax 71. As illustrated in FIG. 4, thumbnail_available_flag may be transmitted as the 2D control syntax 71. The thumbnail_available_flag is flag information (i.e., two-dimensional image presence/absence identification information) that indicates whether or not there is 2D data in the bitstream (whether or not 2D data has been added). If this flag information is true (e.g., “1”), it indicates that there is 2D data in the bitstream. Furthermore, if this flag information is false (e.g., “0”), it indicates that there is no 2D data in the bitstream.

[0115] Furthermore, for example, as shown in the 11th row from the top of Table 10 in FIG. 1, the information regarding the two-dimensional image may include two-dimensional image reproduction assisting information for assisting reproduction of the two-dimensional image. For example, if the thumbnail_available_flag is true (if (thumbnail_available_flag) {), num_rendering_view, InsertionMethod, SeparationID, and IndependentDecodeflag may be transmitted as the 2D control syntax 71. These syntaxes are two-dimensional image reproduction assisting information for assisting reproduction of a two-dimensional image.

[0116] The num_rendering_view is information indicating the number of rendered viewpoints (the number of camera works). The InsertionMethod is information indicating whether 2D data has been added with layers divided by LayerID or TemporalID, or the 2D data has been added by repeat or the like (whether the 2D data has been added to all the layers). Note that, in a case where the 2D data has been added with the layers divided by LayerID or TemporalID, it is necessary to change an operation of an AVC or HEVC decoder. That is, the operation of the decoder can be changed on the basis of this information. The SeparationID is information indicating a break of LayerID or TemporallD. This information may be passed to the AVC or HEVC decoder so that only a specific layer can be displayed.

[0117] The IndependentDecodeflag is flag information that indicates whether or not a 2D data portion can be independently decoded by a tile or the like. If this flag information is true (e.g., “1”), it indicates that the 2D data can be independently decoded. Furthermore, if this flag information is false (e.g., “0”), it indicates that the 2D data cannot be independently decoded.

[0118] Furthermore, if the IndependentDecodeflag is true (if (IndependentDecodeflag) {), MCTS_ID may be transmitted as the 2D control syntax 71. The MCTS_ID is information for identifying a tile to be specified for decoding of a specific tile portion defined separately in motion-constrained tile sets supplemental enhancement information (MCTS SEI).

[0119] As a matter of course, the syntaxes illustrated in FIG. 4 are examples, and the 2D control syntax 71 may include any syntax.

[0120] Furthermore, for example, as shown in the ninth row from the top of Table 10 in FIG. 1, the information regarding the two-dimensional image may include two-dimensional image spatial position management information for managing a position in a spatial direction to which the two-dimensional image has been added. For example, as illustrated in FIG. 5, def_disp_win_left_offset, def_disp_win_right_offset, def_disp_win_top_offset, and def_disp_win_bottom_offset may be transmitted. These syntaxes are two-dimensional image spatial position management information for managing the position in the spatial direction to which the two-dimensional image has been added.

……
……
……

本文链接：https://patent.nweon.com/20558

Sony Patent | Image processing apparatus and method

您可能还喜欢...

分类

最新AR/VR行业分享

最新AR/VR论文

最新AR/VR行业招聘

Sony Patent | Image processing apparatus and method

您可能还喜欢...

Sony Patent | Dynamic Graphics Rendering Based On Predicted Saccade Landing Point

Sony Patent | Methods and systems for synthesising an hrtf

Sony Patent | Information Processing Apparatus, Information Processing Method, Program, And Communication System

分类

最新AR/VR行业分享

最新AR/VR论文

最新AR/VR行业招聘