Sony Patent | Information processing apparatus, information processing method, and program

小编映维 | 分类：Sony | 2021年7月29日

Patent: Information processing apparatus, information processing method, and program

Drawings: Click to check drawins

Publication Number: 20210235056

Publication Date: 20210729

Applicant: Sony

Assignee: Sony Corporation

Abstract

The present disclosure relates to an information processing apparatus and an information processing method that enable processing to be performed simply, and a program. By converting a point cloud representing a three-dimensional structure into two dimensions, a geometry image and a texture image, and three-dimensional information metadata required for constructing the geometry image and the texture image in three dimensions are obtained. Then, one PC sample included in a Point Cloud displayed at a specific time is generated by storing the geometry image, the texture image, and the three-dimensional information metadata in accordance with a playback order required at a time of reproducing and playing back the geometry image and the texture image in three dimensions on the basis of the three-dimensional information metadata. It is possible to apply to a data generation device that generates data for distribution of a Point Cloud.

Claims

An information processing apparatus wherein, in a PC Sample that is a unit forming a Point Cloud displayed at a specific time, the PC Sample includes a Sub Sample of a Geometry image, a Sub Sample of a Texture image, and three-dimensional information metadata that are obtained by converting 3D data corresponding to the said Point Cloud into two dimensions, the three-dimensional information metadata includes a PC header that is header information of the PC Sample, the PC header includes information regarding a number of Layers that is a number of Sub Samples of the Geometry image or a number of Sub Samples of the Texture image in the PC Sample, and the information processing apparatus comprises a file generation unit configured to generate a file by storing the Sub Sample of the Geometry image, the Sub Sample of the Texture image, and the three-dimensional information metadata.
The information processing apparatus according to claim 1, wherein the file includes ISO base media file format (ISOBMFF), and the information regarding the number of Layers is stored in a moov Box that is a metadata region of the file.
The information processing apparatus according to claim 2, wherein the information regarding the number of Layers is stored in a Sample Entry of a Track corresponding to the Sub Sample of the Geometry image and the Sub Sample of the Texture image that are stored in the file.
The information processing apparatus according to claim 1, wherein the PC header further includes: Type information indicating whether the Sub Sample is a Sub Sample of a Geometry image or a Sub Sample of a Texture image; and a Layer identifier indicating a Layer to which the Sub Sample corresponds.
The information processing apparatus according to claim 4, wherein when the Type information is 0, it is indicated that the Sub Sample is a Sub Sample of a Geometry image, and when the Type information is 1, it is indicated that the Sub Sample is a Sub Sample of a Texture image.
The information processing apparatus according to claim 5, wherein the Type information is signaled as SubSampleInformation.
The information processing apparatus according to claim 5, wherein the Type information is signaled in SubSampleEntryBox.
An information processing apparatus wherein in multiple Samples included in a Point Cloud displayed at a specific time, the multiple Samples include a Sample of a Geometry image, a Sample of a Texture image, and three-dimensional information metadata that are obtained by converting 3D data corresponding to the Point Cloud into two dimensions, the three-dimensional information metadata includes a PC header that is header information of each of the Samples, the PCheader includes information regarding a number of Layers that is a number of Samples of the Geometry image or a number of Samples of the Texture image, included in a Point Cloud displayed at a specific time, and the information processing apparatus comprises a file generation unit configured to generate a file by storing the Sample of the Geometry image, the Sample of the Texture image, and the three-dimensional information metadata.
The information processing apparatus according to claim 8, wherein the file includes ISO base media file format (ISOBMFF), and the information regarding the number of Layers is stored in a moov Box that is a metadata region of the file.
The information processing apparatus according to claim 9, wherein the information regarding the number of Layers is stored in a Sample Entry of a Track corresponding to the Sample of the Geometry image and the Sample of the Texture image that are stored in the file.
The information processing apparatus according to claim 9, wherein the PC header further includes: Type information indicating whether each of the Samples is a Sample of a Geometry image or a Sample of a Texture image; and a Layer identifier indicating a Layer to which each of the Samples corresponds.
The information processing apparatus according to claim 11, wherein the Layer identifier is signaled by a Sample Entry.
The information processing apparatus according to claim 11, wherein when the Type information is 0, it is indicated that each of the Samples is a Sample of a Geometry image, and when the Type information is 1, it is indicated that each of the Samples is a Sample of a Texture image.
The information processing apparatus according to claim 13, wherein the Type information is signaled as SubSampleInformation.
The information processing apparatus according to claim 13, wherein the Type information is signaled in SubSampleEntryBox.
The information processing apparatus according to claim 8, wherein the three-dimensional information metadata includes information indicating a relationship between the multiple Samples.
The information processing apparatus according to claim 8, wherein the geometry image, the texture image, and the three-dimensional information metadata are signaled by the ISOBMFF.
The information processing apparatus according to claim 8, wherein the geometry image, the texture image, and the three-dimensional information metadata are signaled by a media presentation description (MPD) of dynamic adaptive st reaming over HTTP (DASH).
An information processing method wherein, in an information processing apparatus, in a PC Sample that is a unit forming a Point Cloud displayed at a specific time, the PC Sample includes a Sub Sample of a Geometry image, a Sub Sample of a Texture image, and three-dimensional information metadata that are obtained by converting 3D data corresponding to the said Point Cloud into two dimensions, the three-dimensional information metadata includes a PC header that is header information of the PC Sample, the PC header includes information regarding a number of Layers that is a number of Sub Samples of the Geometry image or a number of Sub Samples of the Texture image in the PC Sample, and the information processing method comprises generating a file by storing the Sub Sample of the Geometry image, the Sub Sample of the Texture image, and the three-dimensional information metadata.
A program for causing a computer of an information processing apparatus to execute information processing, wherein in a PC Sample that is a unit forming a Point Cloud displayed at a specific time, the PC Sample includes a Sub Sample of a Geometry image, a Sub Sample of a Texture image, and three-dimensional information metadata that are obtained by converting 3D data corresponding to the said Point Cloud into two dimensions, the three-dimensional information metadata includes a PC header that is header information of the PC Sample, the PC header includes information regarding a number of Layers that is a number of Sub Samples of the Geometry image or a number of Sub Samples of the Texture image in the PC Sample, and the information processing comprises generating a file by storing the Sub Sample of the Geometry image, the Sub Sample of the Texture image, and the three-dimensional information metadata.
An information processing method wherein, in an information processing apparatus, in multiple Samples included in a Point Cloud displayed at a specific time, the multiple Samples include a Sample of a Geometry image, a Sample of a Texture image, and three-dimensional information metadata that are obtained by converting 3D data corresponding to the Point Cloud into two dimensions, the three-dimensional information metadata includes a PC header that is header information of each of the Samples, the PCheader includes information regarding a number of Layers that is a number of Samples of the Geometry image or a number of Samples of the Texture image, included in a Point Cloud displayed at a specific time, and the information processing method comprises generating a file by storing the Sample of the Geometry image, the Sample of the Texture image, and the three-dimensional information metadata.
A program for causing a computer of an information processing apparatus to execute information processing, wherein in multiple Samples included in a Point Cloud displayed at a specific time, the multiple Samples include a Sample of a Geometry image, a Sample of a Texture image, and three-dimensional information metadata that are obtained by converting 3D data corresponding to the Point Cloud into two dimensions, the three-dimensional information metadata includes a PC header that is header information of each of the Samples, the PCheader includes information regarding a number of Layers that is a number of Samples of the Geometry image or a number of Samples of the Texture image, included in a Point Cloud displayed at a specific time, and the information processing comprises generating a file by storing the Sample of the Geometry image, the Sample of the Texture image, and the three-dimensional information metadata.

Description

TECHNICAL FIELD

[0001] The present disclosure relates to an information processing apparatus, an information processing method, and a program. In particular, the present disclosure relates to an information processing apparatus and an information processing method that enable processing to be performed simply, and a program.

BACKGROUND ART

[0002] Conventionally, as disclosed in Non Patent Document 1, there is defined a compression method of a Point Cloud, which is a set of points having position information and attribute information (especially color information) at the same time in a three-dimensional space.

[0003] Furthermore, Non Patent Document 2 discloses, as one of compression methods of the Point Cloud, a method of dividing the Point Cloud into multiple regions (hereinafter, referred to as segmentation), performing plane projection for each region to generate a texture image and a geometry image, and then encoding these with a moving image codec. Here, the geometry image is an image including depth information of a point group included in the Point Cloud.

[0004] Such a compression method of a Point Cloud is called point cloud coding test model category 2 (PCC TMC2). Furthermore, a structure of a Point Cloud stream (hereinafter, also referred to as a PC stream) generated by PCC TMC2 is disclosed in Non Patent Document 3.

[0005] Then, a use case in which such a PC stream is distributed through an over IP network is expected. Therefore, in order to suppress an impact on an existing distribution platform and aim for early service realization, as disclosed in Non Patent Document 4, examination of a distribution technology using existing framework ISO base media file format/dynamic adaptive streaming over HTTP (ISOBMFF/DASH) has been started in a moving picture experts group (MPEG).

[0006] Furthermore, Patent Document 1 discloses an image processing device that packs and records three-dimensional image data by a side-by-side method, a top-and-bottom method, or the like.

CITATION LIST

Patent Document

[0007] Patent Document 1: Japanese Patent Application Laid-Open No. 2011-142586

Non Patent Document

[0008] Non Patent Document 1: MPEG-I Part 5 Point Cloud Compression (ISO/IEC 23090-5)

[0009] Non Patent Document 2: w17348, PCC Test Model Category 2 v1, January 2018, Gwangju, Korea

[0010] Non Patent Document 3: w17374, First Working Draft for PCC Category 2, January 2018, Gwangju, Korea

[0011] Non Patent Document 4: w17675, First idea on Systems technologies for Point Cloud Coding, April 2018, San Diego,* US*

SUMMARY OF THE INVENTION

Problems to be Solved by the Invention

[0012] Meanwhile, in a structure of a PC stream as disclosed in Non Patent Document 3 described above, a large amount of random access will occur during normal playback, and processing on a client side that plays back the Point Cloud becomes complicated, which causes a concern of delayed processing and increased processing costs.

[0013] The present disclosure has been made in view of such a situation, and is intended to enable processing to be performed simply.

Solutions to Problems

[0014] In an information processing apparatus according to one aspect of the present disclosure, by converting 3D data representing a three-dimensional structure into two dimensions, there are obtained at least a first image and a second image, and three-dimensional information metadata required for constructing the first image and the second image in three dimensions. The information processing apparatus includes a file generation unit configured to generate a file of one unit forming the 3D data displayed at a specific time, by storing the first image, the second image, and the three-dimensional information metadata in accordance with a playback order required at a time of reproducing and playing back the first image and the second image in three dimensions on the basis of the three-dimensional information metadata.

[0015] In an information processing method or a program of one aspect of the present disclosure, by converting 3D data representing a three-dimensional structure into two dimensions, there are obtained at least a first image and a second image, and three-dimensional information metadata required for constructing the first image and the second image in three dimensions. The information processing method or the program includes: generating a file of one unit forming the 3D data displayed at a specific time, by storing the first image, the second image, and the three-dimensional information metadata in accordance with a playback order required at a time of reproducing and playing back the first image and the second image in three dimensions on the basis of the three-dimensional information metadata.

[0016] In one aspect of the present disclosure, by converting 3D data representing a three-dimensional structure into two dimensions, there are obtained at least a first image and a second image, and three-dimensional information metadata required for constructing the first image and the second image in three dimensions. Further, a file of one unit forming the 3D data displayed at a specific time is generated by storing the first image, the second image, and the three-dimensional information metadata in accordance with a playback order required at a time of reproducing and playing back the first image and the second image in three dimensions on the basis of the three-dimensional information metadata.

Effects of the Invention

[0017] According to one aspect of the present disclosure, processing can be performed simply.

[0018] Note that the effects described herein are not necessarily limited, and any of the effects described in the present disclosure is possible.

BRIEF DESCRIPTION OF DRAWINGS

[0019] FIG. 1 is a view for explaining a compression method of a Point Cloud.

[0020] FIG. 2 is a view showing a structure of a PC stream.

[0021] FIG. 3 is a view showing a structure of a GOF.

[0022] FIG. 4 is a view showing a structure of a geometry video stream and a texture video stream.

[0023] FIG. 5 is a view showing a structure of a PC sample to be newly defined.

[0024] FIG. 6 is a view showing an example of a PC sample.

[0025] FIG. 7 is a view showing an example of information stored in a PC header.

[0026] FIG. 8 is a view showing an example of information stored in a header.

[0027] FIG. 9 is a view showing an example of information stored in a GOF header.

[0028] FIG. 10 is a view showing an example of storing a PC stream in one track of ISOBMFF.

[0029] FIG. 11 is a view showing a definition of codec_specific_parameters.

[0030] FIG. 12 is a view showing a structure of PCSampleEntry.

[0031] FIG. 13 is a view showing an example of PCHeaderBox.

[0032] FIG. 14 is a view showing an example of SubSampleEntryBox.

[0033] FIG. 15 is a view showing an example of LayerCompositionBox.

[0034] FIG. 16 is a view showing an example of a PC sample with num_layer_composed=2 and num_of_component=2.

[0035] FIG. 17 is a block diagram showing a first configuration example of an information processing apparatus that executes a PC file generation process.

[0036] FIG. 18 is a block diagram showing a first configuration example of an information processing apparatus that executes a Point Cloud playback process.

[0037] FIG. 19 is a flowchart for explaining a first processing example of the PC file generation process.

[0038] FIG. 20 is a flowchart for explaining a first processing example of the Point Cloud playback process.

[0039] FIG. 21 is a view showing an example of an ISOBMFF structure in a first storage method.

[0040] FIG. 22 is a view for explaining association from an enhancement track to a base track by a track reference.

[0041] FIG. 23 is a view showing packing variations.

[0042] FIG. 24 is a view showing an example of elementary stream signaling.

[0043] FIG. 25 is a view showing an example of SEI to be newly defined.

[0044] FIG. 26 is a view for explaining definitions of packing arrangement and frame0_is_geometry.

[0045] FIG. 27 is a view showing a definition of frame0.

[0046] FIG. 28 is a view showing an example of storing a PC stream in one track of ISOBMFF.

[0047] FIG. 29 is a view showing an example of PCPackingInfoBox.

[0048] FIG. 30 is a block diagram showing a second configuration example of an information processing apparatus that executes a packed PC file generation process.

[0049] FIG. 31 is a block diagram showing a second configuration example of an information processing apparatus that executes the Point Cloud playback process.

[0050] FIG. 32 is a flowchart for explaining a second processing example of the packed PC file generation process.

[0051] FIG. 33 is a flowchart for explaining a second processing example of the Point Cloud playback process.

[0052] FIG. 34 is a view showing a modified example of PCPackingInfoBox.

[0053] FIG. 35 is a view showing an example of an ISOBMFF structure in a second storage method.

[0054] FIG. 36 is a view showing an example of elementary stream signaling.

[0055] FIG. 37 is a view showing an example of ISOBMFF signaling.

[0056] FIG. 38 is a view showing a definition of GOFEntry( ).

[0057] FIG. 39 is a view showing a definition of PCFrameEntry( ).

[0058] FIG. 40 is a view showing a definition of ComponentGroupEntry( ).

[0059] FIG. 41 is a view for explaining identification of an interleave cycle of a sample.

[0060] FIG. 42 is a view showing an example of an ISOBMFF structure in a third storage method.

[0061] FIG. 43 is a view for explaining an example of signaling without using SEI.

[0062] FIG. 44 is a view showing an example of ISOBMFF signaling.

[0063] FIG. 45 is a view showing an example of syntax of PCMultiStreamBox.

[0064] FIG. 46 is a view showing a modified example of PCMultiStreamBox.

[0065] FIG. 47 is a view showing an example of syntax of PCTextureTrackGroup.

[0066] FIG. 48 is a view showing a modified example of PCMultiStreamBox.

[0067] FIG. 49 is a view showing an example of signaling a geometry track and a texture track associated by a track group.

[0068] FIG. 50 is a view showing an example of newly defined PCStreamGroupBox.

[0069] FIG. 51 is a view showing an example of an ISOBMFF structure in a fourth storage method.

[0070] FIG. 52 is a view for explaining DASH signaling.

[0071] FIG. 53 is a view for explaining DASH signaling.

[0072] FIG. 54 is a view showing an outline of SubSampleInformationBox.

[0073] FIG. 55 is a view showing an outline of a Sample Group.

[0074] FIG. 56 is a block diagram showing a configuration example of a data generation device.

[0075] FIG. 57 is a block diagram showing a configuration example of a data playback device.

[0076] FIG. 58 is a block diagram showing a configuration example of an embodiment of a computer to which the present technology is applied.

MODE FOR CARRYING OUT THE INVENTION

[0077] Hereinafter, a specific embodiment to which the present technology is applied will be described in detail with reference to the drawings.

[0078]

[0079] Before describing an encoding method to which the present technology is applied, a conventional encoding method will be described with reference to FIGS. 1 to 4.

[0080] FIG. 1 is a view for briefly explaining a Point Cloud compression method disclosed in Non Patent Document 2 described above.

[0081] As shown in FIG. 1, first, a Point Cloud representing a three-dimensional structure is inputted, and the Point Cloud is segmented. In the example shown in FIG. 1, a Point Cloud representing a three-dimensional structure that combines a hemispherical shape and a conical shape is inputted, and segmented into three regions in which a hemispherical shape is one region and a cone shape is divided into two regions. Next, a plane projection is performed for every region, to generate a texture image including color information representing an appearance of a surface of each region, and a geometry image including position information representing a depth to the surface of each region. Then, the texture image and the geometry image are encoded by a moving image codec such as, for example, advanced video coding (AVC) or high efficiency video coding (HEVC).

[0082] FIG. 2 is a view showing a structure of a PC stream.

[0083] As shown in FIG. 2, the PC stream includes one stream, in which constituent elements of multiple Point Cloud frames that are displayed continuously in a specific time width are grouped together in a unit called group of frames (GOF). Here, the Point Cloud frame (hereinafter, also referred to as a PC frame) is a Point Cloud displayed at the same time.

[0084] Then, the GOF includes a texture video stream in which a texture image is encoded, a geometry video stream in which a geometry image is encoded, and three-dimensional information metadata (auxiliary patch info and an occupancy map) to be used for 2D3D conversion. Here, the geometry video stream and the texture video stream have multiple frames of (frame rate of PC stream).times.(GOF time width).times.N (number of layers). Then, there is a characteristic in which each of the geometry video stream and the texture video stream has N pieces of frame for one frame of the PC stream. This is for representing a layer structure in a thickness direction of the Point Cloud. That is, it means that each PC frame includes N pieces of frame.

[0085] For example, a client decodes the geometry video stream and the texture video stream in the PC stream, and uses the occupancy map to generate a geometry patch and a texture patch. Thereafter, the client uses the auxiliary patch info to generate a Point Cloud with no color from the geometry patch first, and then colors the Point Cloud with the texture patch.

[0086] FIG. 3 shows a GOF structure in a conventional PC stream, and shows an example in which the number of layers is 2.

[0087] As shown in FIG. 3, in the conventional structure, PC frames displayed within a specific time width are not continuously arranged in the stream. That is, the conventional PC stream has a structure in which elements required for playback of PC frame #1 and elements required for playback of PC frame #2 are arranged before and after each other. Therefore, for example, in a case of accessing geometry frame #1 layer0, geometry frame #1 layer1, auxiliary patch info & occupancy map #1, texture frame #1 layer0, and texture frame #1 layer1, which are elements required for playback of PC frame #1, and thereafter, accessing elements required for playback of PC frame #2, a process of starting access from geometry frame #2 layer0 arranged at a position before texture frame #1 layer1 occurs. Therefore, as described above, a large amount of random access occurs during normal playback.

[0088] In this regard, the PC stream structure shown in FIG. 3 is different from a stream configuration of an existing video codec, and is not suitable for storage in ISOBMFF and distribution with over IP (Internet Protocol).

[0089] For example, Non Patent Document 4 described above discloses a methodology in which a geometry video stream and a texture video stream are treated independently for every stream and stored in individual tracks. However, even with this methodology, a large amount of random access will occur during normal playback.

[0090] Furthermore, as shown in FIG. 4, the methodology disclosed in this Non Patent document 4 is useful for a use case of, when acquiring a geometry video stream and a texture video stream, selecting a bitrate of each stream in accordance with a network bandwidth and reconfiguring PCC having optimal quality. This is because a burden of arranging a large number of single streams (a combination of a geometry video stream and a texture video stream having different bitrates) on a server is eliminated.

[0091] Therefore, an embodiment described below makes it possible to perform processing simply in a client in realizing such a use case, by newly defining an association method of a geometry video stream and a texture video stream included in a PC stream.

[0092]

[0093] A Point Cloud sample (hereinafter, referred to as a PC sample) to be newly defined for ISOBMFF storage will be described with reference to FIG. 5.

[0094] FIG. 5 is a view showing a structure of a PC sample to be newly defined.

[0095] First, a unit that forms a Point Cloud displayed at the same time is called one PC frame, and it is defined that one PC frame includes one PC sample. Then, in accordance with a playback order required at a time of reproducing and playing back a geometry image and a texture image in three dimensions on the basis of three-dimensional information metadata, the geometry image, the texture image, and the three-dimensional information metadata are stored in one track of ISOBMFF. By defining in this way, a client can form the Point Cloud displayed at the same time by decoding one PC sample. Therefore, the client can avoid an occurrence of a large amount of random access during normal playback as described above, and can perform processing simply. Therefore, for example, delayed processing and increased processing costs can be avoided.

[0096] The PC stream defined in this way includes a series of multiple PC samples.

[0097] As shown in FIG. 5, one PC sample includes multiple components and three-dimensional information metadata. Here, the component includes a texture image and a geometry image. Furthermore, the three-dimensional information metadata includes auxiliary patch info and an occupancy map, and a PC header.

[0098] Furthermore, one component includes multiple subsamples. That is, one component includes N pieces (layers) of geometry subsample and N pieces (layers) of texture subsample.

[0099] Then, the geometry subsamples and the texture subsamples are individually encoded by a moving image codec. Here, the subsample is a unit that forms one picture of the geometry video stream and the texture video stream.

[0100] Furthermore, the PC header has information indicating a layer structure of the component. Furthermore, the occupancy map has patch position information in a picture of the component, and the auxiliary patch info has 2D3D conversion information for attaching a patch to a 3D object.

[0101] Note that GOF information is contained in the PC header, but a GOF header may be individually signaled in the PC sample. Furthermore, the three-dimensional information metadata such as the PC header, the auxiliary patch info, and the occupancy map may be used as a subsample.

[0102]

[0103] With reference to FIGS. 6 to 19, a first storage method, which is a method of storing a PC stream in one track of ISOBMFF, will be described.

[0104] With reference to FIGS. 6 to 9, elementary stream signaling will be described.

[0105] FIG. 6 shows an example of a PC sample (that is, a sample of ISOBMFF) in a case where the number of layers is 2 and a moving image codec used for encoding is AVC or HEVC. Here, an access unit delimiter (AUD) shown in FIG. 6 is a network abstraction layer (NAL) unit signaled in an access unit boundary of AVC or HEVC.

[0106] The PC sample shown in FIG. 6 has a configuration in which a geometry subsample of layer0, a geometry subsample of layer1, a texture subsample of layer0, and a texture subsample of layer1 are continuously arranged after the PC header.

[0107] FIG. 7 shows an example of information stored in the PC header of FIG. 6.

[0108] As shown in FIG. 7, the PC header stores, for example, information (size_of_pc_frame) indicating a size of the PC sample, and information (number_of_layer) indicating the number of layers of the texture image or the geometry image included in one PC frame. Furthermore, the PC header stores information (frameWidth) indicating a width of the geometry image or the texture image, information (frameHeight) indicating a height of the geometry image or the texture image, and information (occupancyResolution) indicating a resolution of the occupancy map. Moreover, the PC header stores information (radiusToSmoothing) indicating a radius to detect a neighboring point for smoothing, information (neighborCountSmoothing) indicating a maximum number of neighboring points used for smoothing, information (radius2BoundaryDetection) indicating a radius to detect a boundary point, and information (thresholdSmoothing) indicating a smoothing threshold value.

[0109] FIG. 8 shows an example of information stored in a header of FIG. 6.

[0110] As shown in FIG. 8, the header stores information (size_of_sample) indicating a size of a subsample of the texture image or the geometry image, information (type) indicating a type of the subsample, and a layer identifier (layer_id) of the subsample. For example, in a case where the information indicating the type of subsample is 0, it is indicated that the subsample is the geometry image. In a case where the information indicating the type of the subsample is 1, it is indicated that the subsample is the texture image.

[0111] Furthermore, an occupancy map and auxiliary patch info are contained as supplemental enhancement information (SEI) in the geometry subsample.

[0112] According to the signaling of these, the client can identify a boundary of a PC sample from the PC header and a boundary of the subsample of each component from the header. Therefore, the client can extract a geometry video stream and a texture video stream from the PC sample in accordance with the boundaries, and individually input to a decoder to perform decode processing.

[0113] Note that, as long as the occupancy map and the auxiliary patch info are the same between layers, the same information is signaled in the subsample of each layer. However, the information may be different for every layer. Furthermore, the SEI may also be signaled in the texture subsample.

[0114] Furthermore, a known GOF header may be signaled in a GOF head of the PC stream.

[0115] FIG. 9 shows an example of information stored in the GOF header.

[0116] As shown in FIG. 9, the GOF header stores, for example, information (groupOfFramesSize) indicating the number of frames in Group of Frames. In addition, the GOF header stores the same information as the PC header shown in FIG. 7.

[0117] Note that the occupancy map may be encoded by a moving image codec, and used as an occupancy subsample as one type of the component.

[0118] With reference to FIGS. 10 to 16, ISOBMFF signaling will be described.

[0119] FIG. 10 shows an example of storing a PC stream in one track of ISOBMFF.

[0120] For example, moov of the ISOBMFF stores PCSampleEntry (see FIG. 12) to be newly defined, and the PCSampleEntry includes PCHeaderBox shown in FIG. 13, SubSampleEntryBox shown in FIG. 14, and LayerCompositionBox shown in FIG. 15.

[0121] Furthermore, subs (SubSampleInformationBox) is stored in moof of the ISOBMFF, and the SubSampleInformationBox can be used to signal a boundary between the geometry subsample and the texture subsample in the PC sample, as shown in FIG. 10. Note that SubSampleInformation will be described with reference to an outline of the SubSampleInformationBox shown in FIG. 54 described later.

[0122] Then, as a sample stored in mdat of the ISOBMFF, a PC sample as shown in FIG. 6 is stored.

[0123] Here, in the SubSampleInformationBox, codec_specific_parameters, which is information of a subsample determined for every encoding codec, is defined.

[0124] That is, as shown in FIG. 11, it is indicated that the subsample is a geometry subsample when a value of codec_specific_parameters is 0, and the subsample is a texture subsample when a value of codec_specific_parameters is 1.

[0125] Note that the codec_specific_parameters may contain layer identifier information. Furthermore, subsample information may be provided in a unit of a continuous geometry subsample group and texture subsample group. Moreover, in a case of signaling three-dimensional information metadata such as the occupancy map or the auxiliary patch info as the subsample as one type of the component, rather than as SEI, a value indicating each subsample may be added to the subsample information.

[0126] FIG. 12 shows a structure of PCSampleEntry to be newly defined in the present disclosure.

[0127] In a configuration shown in FIG. 12, a sample entry of a track of the ISOBMFF that stores a PC stream is, for example, pcbs.

[0128] As shown in FIG. 13, the same information as the PC header shown in FIG. 7 is described in the PCHeaderBox.

[0129] For example, in a case where PC header information changes in a PC stream, the information of PCHeaderBox may be signaled as a sample group. Furthermore, information of the GOF header may be signaled in a sample entry, a sample group, and the like, individually from PCHeaderBox. Note that the Sample Group will be described with reference to an outline of the Sample Group shown in FIG. 55 described later.

[0130] Furthermore, at this time, the PC header and the header in the PC sample may not be signaled.

[0131] Then, according to the signaling of these, the client can identify a boundary of a PC sample and a boundary of a subsample of each component without parsing a content of the subsample. That is, with simplified processing of referring to only signaling of a system layer, the client can extract a geometry video stream and a texture video stream from the PC sample, and individually input to a decoder to perform decode processing.

[0132] As shown in FIG. 14, the SubSampleEntryBox signals a codec of a texture video stream and a geometry video stream including continuous subsamples, and decoder configuration information.

[0133] num_of_component shown in FIG. 14 indicates a total number of components stored in the track, and a type field indicates a type of the component to which the sub sample entry corresponds. For example, in a case where the type field is 0, it is indicated that a component type to which a sub sample entry corresponds is a geometry image. In a case where the type field is 1, it is indicated that a component type to which the sub sample entry corresponds is a texture image. SampleEntry( ) changes depending on an encoding codec of the component, and becomes HEVCSampleEntry in a case of being encoded with HEVC, for example.

[0134] Furthermore, association with a subsample to which each sub sample entry corresponds is performed in accordance with a type of the component signaled by the codec_specific_parameters of sub sample information and the type of sub sample entry. Note that the encoding codec may be changed for every type of the component.

[0135] According to this signaling, the client can identify the codec of the geometry video stream and the texture video stream, and correctly execute decode processing.

[0136] As shown in FIG. 15, LayerCompositionBox signals the number of layers having the same composition time, and the number of components included in a PC sample.

[0137] For example, num_layer_composed indicates the number of decoded pictures included in one PC frame in an output order. Furthermore, num_of_component indicates the number of components included in one PC frame.

[0138] Note that, an example of signaling to the sample entry has been described above, but without limiting to this location, signaling may be made to SubSampleEntryBox, for example. According to this signaling, in subsamples of individual layers of individual components to which different composition times are assigned, subsamples that are included in a Point Cloud displayed at the same time and that should be rendered at the same time can be explicitly indicated. Therefore, the client can properly form and render the Point Cloud.

[0139] FIG. 16 shows an example of a PC sample including a total of four subsamples, in which the number of decoded pictures included in one PC frame in the output order is 2 (num_layer_composed=2), and the number of components included in one PC frame is 2 (num_of_component=2) of a geometry image and a texture image. It is shown that such a PC sample includes four subsamples by the information of LayerCompositionBox shown in FIG. 15.

[0140] Note that a configuration in which the number of components is 1 and the number of layers is 2 is similar idea to the conventional technique, for example, temporal interleaving in stereo video. However, the present technology is different in that a configuration with multiple components and multiple layers can be supported, from the conventional technology that cannot support such a configuration.

[0141]

[0142] FIG. 17 shows a block diagram showing a first configuration example of an information processing apparatus that executes a PC file generation process of generating a PC stream from a Point Cloud content on a server side that provides the content, and generating a PC file that stores the PC stream in ISOBMFF.

[0143] As shown in FIG. 17, an information processing apparatus 11 includes a 3D2D conversion unit 21, an encoding unit 22, a PC stream generation unit 23, and a file generation unit 24.

[0144] The 3D2D conversion unit 21 converts an inputted Point Cloud into two dimensions, to generate a geometry image, a texture image, and three-dimensional information metadata (auxiliary patch info and an occupancy map), and supplies to the encoding unit 22.

[0145] The encoding unit 22 encodes a geometry video stream containing the three-dimensional information metadata as SEI from the geometry image and the three-dimensional information metadata, and encodes a texture video stream from the texture image. For example, the encoding is performed by non-layered coding such as the encoding unit 22 AVC and HEVC, or layered coding such as SVC and SHEVC. At this time, the encoding unit 22 can perform the encoding of the geometry image and the encoding of the texture image in parallel by two encoders 25-1 and 25-2.

[0146] The PC stream generation unit 23 interleaves the geometry video stream and the texture video stream encoded by the encoding unit 22 in a unit forming a PC frame, and generates a PC sample as shown in FIG. 5. Then, the PC stream generation unit 23 generates a PC stream including multiple PC samples, and supplies to the file generation unit 24.

[0147] The file generation unit 24 generates a PC file by storing a geometry image, a texture image, and three-dimensional information metadata in one track of ISOBMFF, in accordance with a playback order required at a time of reproducing and playing back the geometry image and the texture image in three dimensions on the basis of the three-dimensional information metadata.

[0148] The information processing apparatus 11 configured in this way can generate a PC stream from a Point Cloud content, and output a PC file in which the PC stream is stored in one track of ISOBMFF.

[0149] FIG. 18 is a block diagram showing a first configuration example of an information processing apparatus that executes a Point Cloud playback process of generating a display image from a PC file and playing back a Point Cloud, on a client side that plays back a content.

[0150] As shown in FIG. 18, an information processing apparatus 12 includes an extraction unit 31, a decoding unit 32, a 2D3D conversion unit 33, and a display processing unit 34.

[0151] The extraction unit 31 extracts a geometry video stream and a texture video stream corresponding to a playback time from a PC file on the basis of information signaled by a Box of ISOBMFF, and supplies to the decoding unit 32.

[0152] The decoding unit 32 decodes the geometry video stream and the texture video stream supplied from the extraction unit 31, and supplies to the 2D3D conversion unit 33. At this time, the decoding unit 32 can perform the decoding of the geometry image and the decoding of the texture image in parallel by two decoders 35-1 and 35-2.

[0153] The 2D3D conversion unit 33 constructs a Point Cloud on the basis of SEI information contained in the geometry video stream.

[0154] The display processing unit 34 renders the Point Cloud constructed by the 2D3D conversion unit 33 in accordance with a display device of the client, generates a display image, and causes the display device (not shown) to display.

[0155] The information processing apparatus 12 configured in this way can play back the Point Cloud from the PC file, and control to display the display image obtained by rendering the Point Cloud.

[0156]

[0157] FIG. 19 is a flowchart for explaining a PC file generation process in which the information processing apparatus 11 in FIG. 17 generates a PC file from a Point Cloud.

[0158] For example, when a Point Cloud is inputted to the information processing apparatus 11, processing is started. Then, in step S11, the 3D2D conversion unit 21 generates a geometry image, a texture image, and three-dimensional information metadata from the inputted Point Cloud.

[0159] In step S12, the encoding unit 22 encodes a geometry video stream containing the three-dimensional information metadata as SEI from the geometry image and the three-dimensional information metadata generated by the 3D2D conversion unit 21 in step S11, and encodes a texture video stream from the texture image.

[0160] In step S13, the PC stream generation unit 23 interleaves in a unit (a PC sample) forming the PC frame, to generate a PC stream.

[0161] In step S14, the file generation unit 24 stores a PC stream in ISOBMFF in which a Box containing the three-dimensional information metadata is signaled, to generate a PC file. At this time, a sample of the ISOBMFF is to be a PC sample.

[0162] FIG. 20 is a flowchart for explaining a Point Cloud playback process in which the information processing apparatus 12 of FIG. 18 generates a display image from a PC file and plays back.

[0163] For example, when supply to the information processing apparatus 12 starts from a head of a PC file, processing is started. Then, in step S21, the extraction unit 31 extracts a geometry video stream and a texture video stream corresponding to a playback time from a PC file on the basis of information signaled by a Box of ISOBMFF.

[0164] In step S22, the decoding unit 32 individually decodes each of the geometry video stream and the texture video stream. At this time, the geometry video stream and the texture video stream are individually decoded with two decoder instances.

[0165] In step S23, the 2D3D conversion unit 33 constructs a Point Cloud on the basis of SEI information contained in the geometry video stream.

[0166] In step S24, the display processing unit 34 renders the Point Cloud in accordance with a display device of the client, to cause a display image to be displayed.

[0167] In step S25, the extraction unit 31 determines whether or not it is an end of the PC stream. In a case of being not the end of the PC stream, the process returns to step S21, and in a case of being the end of the PC stream, the process ends.

[0168] By the PC file generation process and the Point Cloud playback process as described above, processing on the client side can be performed simply.

[0169] Here, FIG. 21 shows an example of an ISOBMFF structure when a PC stream is stored in the ISOBMFF structure by the first storage method.

[0170] Note that, as a modified example, at a time of storing the PC stream in the ISOBMFF structure by the first storage method, a boundary of a Group of Frames may be signaled with use of a sample group, similarly to ISOBMFF signaling in a third storage method described later. Furthermore, in encoding of an elementary stream, layered coding such as scalable video coding (SVC) or scalable high efficiency video coding (SHEVC) may be used rather than non-layered coding such as AVC and HEVC.

[0171] For example, layer 0 of the geometry video stream and the texture video stream is encoded as a base layer, and layer 1 of the geometry video stream and the texture video stream is encoded as an enhancement layer.

[0172] That is, as shown in FIG. 22, in a case where one layer is stored in one track of ISOBMFF, an enhancement track stores the geometry video stream and the texture video stream of the enhancement layer, and a base track stores the geometry video stream and the texture video stream of the base layer. Then, association is made from the enhancement track to the base track by a track reference (reference type sbas). At this time, a sample entry of the base track is PCSampleEntry (pcbs), and a sample entry of the enhancement track is PCEnhancementSampleEntry (penh). For example, enhancement tracks exist in accordance with the number of layers.

[0173] Here, for example, in order to indicate whether the layer stored in each track is layer 0 or layer 1, a layer identifier (for example, layer_id in FIG. 8) signaled by the header in the PC sample may be signaled by the sample entry of each track. Alternatively, the header itself in the PC sample may be stored in the sample entry of each track.

[0174]

[0175] With reference to FIGS. 23 to 35, a description is given to a second storage method, which is a method of packing geometry/texture video streams into one stream and storing in one track of ISOBMFF.

[0176] In the first storage method described above, it is possible to realize decode processing by one decoder, by interleaving a texture image and a geometry image before encoding, and encoding as one video stream. However, in such encoding, since correlation between the texture image and the geometry image is small, compression efficiency deteriorates.

[0177] Furthermore, in the first storage method, it is also possible to decode a generated single stream by one decoder. However, since initialization of the decoder is required at a boundary between a texture image and a geometry image that have been encoded, overhead will increase.

[0178] That is, in the first storage method, it has been necessary to individually encode a texture video stream and a geometry video stream in the PC file generation process, and perform individual decoding with two decoder instances for the geometry video stream and the texture video stream in the Point Cloud playback process. Therefore, when the first storage method is applied to decoding with one decoder instance, compression efficiency will deteriorate and overhead will increase.

[0179] Therefore, in the second storage method, multiple images are packed into one image by the method described in Patent Document 1 described above or the like, and encoded to form a PC stream. For example, it is possible to pack images of multiple components into one image, pack images of multiple layers into one image, and pack images of multiple layers of multiple components into one image. This configuration enables decode processing with one decoder instance by a client, while avoiding the deterioration of the compression efficiency as described above. Note that, as a packing method, in addition to adopting methods such as a side by side and top & bottom, another method may be adopted.

[0180] With reference to FIG. 23, packing variations will be described.

[0181] A of FIG. 23 shows an example of packing the same layers of different components into one image by the side by side method. That is, a texture image of layer 0 and a geometry image of layer 0 are packed by the side by side into one image, and a texture image of layer 1 and a geometry image of layer 1 are packed by the side by side into one image.

[0182] B of FIG. 23 shows an example in which images of different layers of the same components are packed into one image by the side by side method. That is, a texture image of layer 0 and a texture image of layer 1 are packed into one image by the side by side, and a geometry image of layer 0 and a geometry image of layer 1 are packed into one image by the side by side.

……
……
……

本文链接：https://patent.nweon.com/19729

Sony Patent | Information processing apparatus, information processing method, and program

您可能还喜欢...

分类

最新AR/VR行业分享

最新AR/VR论文

最新AR/VR行业招聘

Sony Patent | Information processing apparatus, information processing method, and program

您可能还喜欢...

Sony Patent | Information processing apparatus for controlling operation pattern

Sony Patent | Information processing system and information processing method

Sony Patent | Image processing method and system

分类

最新AR/VR行业分享

最新AR/VR论文

最新AR/VR行业招聘