Sony Patent | Image processing device and method

编辑：映维 | 分类：Sony | 2022年2月10日

Patent: Image processing device and method

Drawings: Click to check drawins

Publication Number: 20220044448

Publication Date: 20220210

Applicant: Sony

Assignee: Sony Corporation

Sony Patent | Image processing device and method

Abstract

There is provided an image processing device and a method that are configured to be capable of reducing the increase of a load on encoding/decoding of attribute information of a point cloud. The attribute information of the point cloud is encoded using contexts corresponding to weight values obtained by an orthogonal transformation made on location information of the point cloud, the orthogonal transformation taking into consideration a three-dimensional structure. Further, encoded data associated with the attribute information of the point cloud is decoded using contexts corresponding to weight values obtained by an orthogonal transformation made on the location information of the point cloud, the orthogonal transformation taking into consideration a three-dimensional structure. The present disclosure can be applied to, for example, image processing devices, electronic equipment, image processing methods, programs, and the like.

Claims

An image processing device comprising: an encoding section that encodes attribute information of a point cloud by using contexts corresponding to weight values obtained by an orthogonal transformation made on location information of the point cloud, the orthogonal transformation taking into consideration a three-dimensional structure.
The image processing device according to claim 1, further comprising: a context selection section that selects the contexts corresponding to the weight values, wherein the encoding section encodes the attribute information by using the contexts having been selected by the context selection section.
The image processing device according to claim 2, wherein the context selection section selects the contexts according to the weight values by using a preliminarily determined number of the contexts and preliminarily determined threshold values associated with the weight values.
The image processing device according to claim 2, wherein the context selection section selects the contexts according to the weight values by using a set number of the contexts and set threshold values associated with the weight values.
The image processing device according to claim 2, further comprising: a weight value deriving section that derives the weight values, wherein the context selection section selects the contexts corresponding to the weight values having been derived by the weight value deriving section.
The image processing device according to claim 5, wherein the weight value deriving section performs RAHT (Region Adaptive Hierarchical Transform) as the orthogonal transformation on the location information, to derive the weight values.
The image processing device according to claim 5, wherein the weight value deriving section derives the weight values on a basis of an octree regarding the location information.
The image processing device according to claim 5, further comprising: a RAHT (Region Adaptive Hierarchical Transform) processing section that performs RAHT on the attribute information by using the weight values having been derived by the weight value deriving section, wherein the encoding section encodes transformed coefficients associated with the attribute information and generated by the RAHT processing section.
The image processing device according to claim 1, further comprising: a bitstream generation section that generates a bitstream including encoded data associated with the location information and encoded data associated with the attribute information and generated by the encoding section.
An image processing method comprising: encoding attribute information of a point cloud by using contexts corresponding to weight values obtained by an orthogonal transformation made on location information of the point cloud, the orthogonal transformation taking into consideration a three-dimensional structure.
An image processing device comprising: a decoding section that decodes encoded data associated with attribute information of a point cloud, by using contexts corresponding to weight values obtained by an orthogonal transformation made on location information of the point cloud, the orthogonal transformation taking into consideration a three-dimensional structure.
The image processing device according to claim 11, further comprising: a context selection section that selects the contexts corresponding to the weight values, wherein the decoding section decodes the encoded data associated with the attribute information, by using the contexts having been selected by the context selection section.
The image processing device according to claim 12, wherein the context selection section selects the contexts according to the weight values by using a preliminarily determined number of the contexts and preliminarily determined threshold values associated with the weight values.
The image processing device according to claim 12, wherein the context selection section selects the contexts according to the weight values by using the number of the contexts and threshold values associated with the weight values that are supplied from an encoding side.
The image processing device according to claim 12, further comprising: a weight value deriving section that derives the weight values, wherein the context selection section selects the contexts corresponding to the weight values having been derived by the weight value deriving section.
The image processing device according to claim 15, wherein the weight value deriving section performs RAHT (Region Adaptive Hierarchical Transform) as the orthogonal transformation on the location information, to derive the weight values.
The image processing device according to claim 15, wherein the weight value deriving section derives the weight values on a basis of an octree regarding the location information.
The image processing device according to claim 15, further comprising: an inverse RAHT (Region Adaptive Hierarchical Transform) processing section that performs inverse RAHT on the attribute information having been generated by the decoding section, by using the weight values having been derived by the weight value deriving section.
The image processing device according to claim 11, further comprising: a point cloud data generation section that generates point cloud data including the location information and the attribute information having been generated by the decoding section.
An image processing method comprising: decoding encoded data associated with attribute information of a point cloud, by using contexts corresponding to weight values obtained by an orthogonal transformation made on location information of the point cloud, the orthogonal transformation taking into consideration a three-dimensional structure.

Description

TECHNICAL FIELD

[0001] The present disclosure relates to an image processing device and a method, and in particular, relates to an image processing device and a method that are configured to be capable of reducing the increase of a load on encoding/decoding of attribute information of a point cloud.

BACKGROUND ART

[0002] Heretofore, for example, an encoding method using an entree has been known as an encoding method for 3D data representing a three-dimensional structure, such as a point cloud (see, for example, NPL 1).

[0003] Recently, a method has been proposed for encoding attribute information by using RAHT (Region Adaptive Hierarchical Transform) (see, for example, NPL 2).

CITATION LIST

Non Patent Literature

NPL 1

[0004] R. Mekuria, Student Member IEEE, K. Blom, P. Cesar., Member, IEEE, “Design, Implementation and Evaluation of a Point Cloud Codec for Tele-Immersive Video,” tcsvt_paper_submitted_february.pdf

NPL 2

[0005] Ohji Nakagami, Phil Chou, Maja Krivokuca, Khaled Mammou, Robert Cohen, Vladyslav Zakharchenko, Gaelle Martin-Cocher, “Second Working Draft for PCC Categories 1, 3,” ISO/IEC JTC1/SC29/WG11, MPEG 2018/N17533, April 2018, San Diego,* US*

SUMMARY

Technical Problem

[0006] In the case of the above methods, however, likely to arise is a situation in which a load on encoding/decoding of the attribute information of the point cloud increases because of the increased processing caused by the rearrangement of coefficients into the descending order of weight values obtained by the RAHT.

[0007] The present disclosure has been made in view of the above situation and is aimed to make it possible to reduce the increase of the load on the encoding/decoding of the attribute information of the point cloud.

Solution to Problem

[0008] An image processing device according to an aspect of the present technology is an image processing device including an encoding section that encodes attribute information of a point cloud by using contexts corresponding to weight values obtained by an orthogonal transformation made on location information of the point cloud, the orthogonal transformation taking into consideration a three-dimensional structure.

[0009] An image processing method according to an aspect of the present technology is an image processing method including encoding attribute information of a point cloud by using contexts corresponding to weight values obtained by an orthogonal transformation made on location information of the point cloud, the orthogonal transformation taking into consideration a three-dimensional structure.

[0010] An image processing device according to another aspect of the present technology is an information processing device including a decoding section that decodes encoded data associated with attribute information of a point cloud, by using contexts corresponding to weight values obtained by an orthogonal transformation made on location information of the point cloud, the orthogonal transformation taking into consideration a three-dimensional structure.

[0011] An image processing method according to another aspect of the present technology is an image processing method including decoding encoded data associated with attribute information of a point cloud, by using contexts corresponding to weight values obtained by an orthogonal transformation made on location information of the point cloud, the orthogonal transformation taking into consideration a three-dimensional structure.

[0012] In the image processing device and method according to an aspect of the present technology, attribute information of a point cloud is encoded using contexts corresponding to weight values obtained by an orthogonal transformation made on location information of the point cloud, the orthogonal transformation taking into consideration a three-dimensional structure.

[0013] In the image processing device and method according to another aspect of the present invention, encoded data associated with attribute information of a point cloud is decoded using contexts corresponding to weight values obtained by an orthogonal transformation made on location information of the point cloud, the orthogonal transformation taking into consideration a three-dimensional structure.

BRIEF DESCRIPTION OF DRAWINGS

[0014] FIG. 1 is a diagram that describes the outline of RAHT.

[0015] FIG. 2 is a block diagram illustrating a main configuration example of an encoding device.

[0016] FIG. 3 is a block diagram illustrating a main configuration example of a decoding device.

[0017] FIG. 4 is a diagram illustrating an example of a condition of the distribution of coefficient values corresponding to individual weight values.

[0018] FIG. 5 is a block diagram illustrating a main configuration example of an encoding device.

[0019] FIG. 6 is a flowchart illustrating an example of the flow of encoding processing.

[0020] FIG. 7 is a block diagram illustrating a main configuration example of a decoding device.

[0021] FIG. 8 is a flowchart illustrating an example of the flow of decoding processing.

[0022] FIG. 9 is a block diagram illustrating a main configuration example of an encoding device.

[0023] FIG. 10 is a flowchart illustrating an example of the flow of encoding processing.

[0024] FIG. 11 is a block diagram illustrating a main configuration example of a decoding device.

[0025] FIG. 12 is a flowchart illustrating an example of the flow of decoding processing.

[0026] FIG. 13 is a block diagram illustrating a main configuration example of a computer.

DESCRIPTION OF EMBODIMENTS

[0027] Hereinafter, modes for practicing the present disclosure (hereinafter referred to embodiments) will be described. Here, the description will be made in the following order.

[0028] 1. Encoding of attribute information

[0029] 2. First embodiment (encoding device)

[0030] 3. Second embodiment (decoding device)

[0031] 4. Third embodiment (encoding device)

[0032] 5. Fourth embodiment (decoding device)

[0033] 6. Appended notes

Encoding of Attribute Information

Documents, Etc., That Support Technical Contents and Technical Terms

[0034] The scope of the disclosure of the present technology includes not only contents described in the embodiments, but also contents described in the following pieces of non-patent literature that are already publicly known at the time of filing of the present disclosure.

[0035] NPL 1: (mentioned above)

[0036] NPL 2: (mentioned above)

[0037] NPL 3: TELECOMMUNICATION STANDARDIZATION SECTOR OF ITU (International Telecommunication Union), “Advanced video coding for generic audiovisual services,” H.264, 04/2017

[0038] NPL 4: TELECOMMUNICATION STANDARDIZATION SECTOR OF ITU (International Telecommunication Union), “High efficiency video coding,” H.265, 12/2016

[0039] NPL 5: Jianle Chen, Elena Aishina, Gary J. Sullivan, Jens-Rainer, Jill Boyce, “Algorithm Description of Joint Exploration Test Model 4,” JVET-G1001_v1, Joint Video Exploration Team (JVET) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11 7th Meeting: Torino, IT, 13-21 July 2017

[0040] That is, contents described in the above pieces of non-patent literature also serve as grounds at the time of the determination of support requirements. For example, Quad-Tree Block Structure described in NPL 4 and QTBT (Quad Tree Plus Binary Tree) Block Structure described in NPL 5 are deemed to fall within the scope of the disclosure of the present technology and satisfy support requirements for the scope of the appended claims even when not directly described in the embodiments. Further, technical terms such as parsing, syntax, and semantics are similarly deemed to fall within the scope of the disclosure of the present technology and satisfy the support requirements for the scope of the appended claims even when not directly described in the embodiments.

Point Cloud

[0041] In the past, there existed various kinds of 3D data which include a point cloud that represents a three-dimensional structure by using location information, attribute information, and the like of a point cluster, a mesh that includes apexes, edges, and faces and defines a three-dimensional shape by using a polygonal representation, and the like.

[0042] For example, in the case of the point cloud, a tridimensional structure (a three-dimensionally-shaped object) is represented as a set of a large number of points (a point cluster). That is, pieces of data constituting the point cloud (which are also referred to as point cloud data) include location information and attribute information (for example, color and the like) associated with each of points constituting the point cluster. Thus, the point cloud has a relatively simple structure and is capable of sufficiently accurately representing any tridimensional structure by using a sufficiently large number of points.

Quantization of Location Information by Using Voxels

[0043] For such point cloud data, since its data amount is relatively large, an encoding method using voxels has been invented to compress an amount of data targeted for encoding and the like. The voxels are three-dimensional regions for use in quantizing location information targeted for encoding.

[0044] That is, a three-dimensional region containing a point cloud is separated into small three-dimensional regions called voxels and is configured to indicate, for each of the voxels, whether or not each voxel contains one or more points. In such a manner, the locations of the individual points are quantized for each of the voxels. Thus, converting the point cloud data into data of such voxels (also referred to as voxel data) makes it possible to reduce the increase of the amount of information (typically, to decrease the amount of information).

Octree

[0045] Moreover, a method has been invented for establishing an octree by using such voxel data. An octree is a structure resulting from tree-structuring of the voxel data. The value of each of bits of a lowest-layer node of the octree indicates the presence/absence of a point of a corresponding voxel. For example, a value “1” indicates a voxel containing one or more points, and a value “0” indicates a voxel containing no point. In the octree, one node corresponds to eight voxels. That is, each of nodes constituting the octree includes data composed of eight bits, and the eight bits each indicate the presence/absence of a point of a corresponding one of the eight voxels.

[0046] Further, an upper-layer node of the octree indicates the presence/absence of one or more points with respect to one region obtained by integrating eight voxels corresponding to each of lower-layer nodes belonging to the upper-layer node. That is, the upper-layer node is generated by integrating information regarding the voxels of the lower-laver nodes. Here, for a node having the value “0,” that is, for a node in the case where all of corresponding eight voxels contain no point, the node is deleted.

[0047] In such a manner, a tree structure (octree) including nodes each not having the value “0” is established. That is, the octree is capable of indicating, for each of resolution levels, the presence/absence of one or more points with respect to voxels. Thus, encoding voxel data having been transformed into such an octree structure makes it possible to further easily restore voxel data having further various resolution levels at the time of decoding the voxel data. That is, the scalability of the voxels can be further easily achieved.

[0048] Further, the above-described method of omitting the node having the value “0” makes it possible to cause a voxel including regions in which no point exists to be a voxel of a low-resolution level, thus enabling further reduction of the increase of the amount of information (typically, further decrease of the amount of information).

RAHT

[0049] Recently, as described in, for example, PTL 2, a method has been proposed for encoding attribute information of a point cloud by using RAHT (Region Adaptive Hierarchical Transform). The attribute information includes, for example, color information, reflectance information, normal-line information, and the like.

[0050] The RAHT is one of orthogonal transformations taking into consideration a three-dimensional structure, and is a Haar transformation using weighting (weight values) according to a point-to-point location relationship (for example, whether or not one or more points exist in an adjacent voxel) in a voxelized space.

[0051] For example, as illustrated in FIG. 1, processing proceeds such that, in the case where one or more points exist in a region adjacent to a region targeted for the Haar transformation, the weight value of the adjacent region is added to the weight value of the region targeted for the Haar transformation, whereas, in the case where no point exists in the adjacent region, the weight value of the region targeted for the Haa transform is inherited as it is. That is, the denser the points of a portion are, the larger the weight value of the portion is. Thus, the dense/non-dense of the points can be determined from a weight value.

[0052] For example, performing quantization so as to allow points included in dense portions to remain on the basis of the weight values makes it possible to increase the efficiency of the encoding simultaneously with reducing the degradation of the quality of the point cloud.

Encoding Device

[0053] An encoding device 10 illustrated in FIG. 2 is an example of a device that encodes a point cloud by making such a coefficient rearrangement. For example, in the case of the encoding device 10, a geometry encoding section 11 generates geometry encoded data by encoding location information of input point cloud data.

[0054] A geometry coefficient rearrangement section 12 rearranges the coefficients of the geometry encoded data into Morton code order. A RAHT processing section 13 performs RAHT on the coefficients of the geometry encoded data in the Morton code order. Through this processing, weight values are derived.

[0055] A RAHT processing section 21 of an attribute encoding section 15 performs RAHT on attribute information by using the weight values that the RAHT processing section 13 has derived on the basis of the location information. A quantization section 22 quantizes transformed coefficients of the attribute information that are obtained by the above RAHT.

[0056] Further, a geometry coefficient rearrangement section 14 rearranges the coefficients into the descending order of the weight values having been derived by the RAHT processing section 13. An attribute coefficient rearrangement section 23 rearranges the quantized coefficients having been obtained by the quantization section 22 into the same order as that of the coefficients having been rearranged by the geometry coefficient rearrangement section 14. That is, a lossless encoding section 24 encodes the individual coefficients of the attribute information in the descending order of the weight values.

[0057] Further, a bitstream generation section 16 generates and outputs a bitstream including geometry encoded data that is encoded data associated with the location information and that has been generated by the geometry encoding section 11 and attribute encoded data that is encoded data associated with the attribute information and that has been generated by the lossless encoding section 24.

Decoding Device

[0058] A decoding device 50 illustrated in FIG. 3 is an example of a device that decodes encoded data of a point cloud by making such a coefficient rearrangement. For example, in the case of the decoding device 50, a geometry decoding section 51 decodes geometry encoded data included in an input bitstream. A geometry coefficient rearrangement section 52 rearranges coefficient data (geometry coefficients) having been decoded and obtained, into Morton code order.

[0059] A RAHT processing section 53 performs RAHT on the geometry coefficients that are arranged in the Morton code order, to derive weight values. A geometry coefficient rearrangement section 54 rearranges the coefficients into the descending order of the weight values having been derived by the RAHT processing section 53.

[0060] A lossless decoding section 61 of an attribute decoding section 55 decodes attribute encoded data included in the input bitstream. An inverse attribute coefficient rearrangement section 62 rearranges attribute coefficients having been arranged in descending order of the weight values into Morton code order, on the basis of the descending order of the weight values, which is indicated by the geometry coefficient rearrangement section 54. Further, the inverse quantization section 63 performs inverse quantization on the attribute coefficients arranged in the Morton code order.

[0061] An inverse RAHT processing section 64 performs inverse RAHT, which is processing inverse to the RAHT, on the inverse-quantized attribute coefficients by using the weight values having been derived by the RAHT processing section 53, to generate attribute information (attribute data).

[0062] A point cloud data generation section 56 generates point cloud data by synthesizing location information (geometry data) having been generated by the geometry decoding section 51 and the attribute information (attribute data) having been generated by the inverse RAHT processing section 64, and outputs the generated point cloud data.

Rearrangement of Coefficients

[0063] As described above, in the above encoding and decoding, the rearrangement which causes the coefficient data associated with the attribute information to be arranged in the descending order of the weight values has been made.

[0064] FIG. 4 illustrates a graph representing relations between weight values and coefficient variations. As indicated in FIG. 4, the larger the weight value is, the smaller the coefficient variation is. Thus, performing encoding/decoding from a larger weight value with higher priority makes it possible to increase the efficiency of the encoding.

[0065] This rearrangement, however, requires a significantly large amount of processing. In particular, in the case of such a point cloud having a significantly large amount of data, the increase of the load on the rearrangement processing becomes further remarkable. Such a method as described above has thus been likely to cause the increase of the load on the encoding/decoding of the attribute information of the point cloud.

Selection of Contexts

[0066] For this reason, implemented is a configuration in which, instead of the rearrangement of the coefficients, contexts are selected according to the above-described weight values.

[0067] For example, in the case of the encoding, implemented is a configuration in which attribute information of a point cloud is encoded using contexts corresponding to weight values obtained by an orthogonal transformation made on location information of the point cloud, the orthogonal transformation taking into consideration a three-dimensional structure.

[0068] For example, an image processing device includes an encoding section that encodes attribute information of a point cloud using contexts corresponding to weight values obtained by an orthogonal transformation made on location information of the point cloud, the orthogonal transformation taking into consideration a three-dimensional structure.

[0069] Employing such a configuration enables reduction of the increase of the load on the encoding of the attribute information of the point cloud.

[0070] Further, for example, in the case of the decoding, implemented is a configuration in which encoded data associated with attribute information of a point cloud is decoded by using contexts corresponding to weight values obtained by an orthogonal transformation made on location information of the point cloud, the orthogonal transformation taking into consideration a three-dimensional structure.

[0071] For example, an image processing device includes a decoding section that decodes encoded data associated with attribute information of a point cloud, by using contexts corresponding to weight values obtained by an orthogonal transformation made on location information of the point cloud, the orthogonal transformation taking into consideration a three-dimensional structure.

[0072] Employing such a configuration enables reduction of the increase of the load on the decoding of the attribute information of the point cloud.

[0073] For example, in the graph of FIG. 4, there is a significantly large difference between the variations of coefficients in areas where weight values are small (for example, in the vicinity enclosed by a frame 71) and those in areas where weight values are large (for example, in the vicinity enclosed by a frame 72). That is, the variation degrees of the coefficients depend on the weight values. In other words, the variation degrees of the coefficients are, to some extent, determined according to the weight values. For this reason, implemented is a configuration in which plural contexts each associated with mutually different variation degrees of the coefficients are prepared, and a context to be applied is selected from among the contexts on the basis of the weight values. Implementing such a configuration makes it possible to, on the basis of a weight value, select a context that is further suitable for coefficient variations at the weight value. That is, performing the encoding/decoding using a context according to a weight value (coefficient variations) enables reduction of the decrease of the efficiency of the encoding.

[0074] Further, in the case of this method, the rearrangement of the coefficients is unnecessary, and thus, reduction of the increase of the load can be achieved. That is, the decrease of the efficiency of the encoding can be reduced simultaneously with reducing the increase of the load thereon.

First Embodiment

Encoding Device

[0075] FIG. 5 is a block diagram illustrating a configuration example of an encoding device that is an embodiment of an image processing device to which the present technology is applied. As encoding device 100 illustrated in FIG. 4 is a device that encodes location information and attribute information of a point cloud.

[0076] Note that FIG. 5 illustrates only main elements of processing sections, data flows, and the like, and the elements illustrated in FIG. 5 are not necessarily all elements. That is, in the encoding device 100, one or more processing sections that are not illustrated as blocks in FIG. 5 may exist, and one or more processes and one or more data flows that are not illustrated as arrows in FIG. 5 may exist.

[0077] As illustrated in FIG. 5, the encoding device 100 includes a geometry encoding section 111, a geometry coefficient rearrangement section 112, a RAHT processing section 113, a context selection section 114, an attribute encoding section 115, and a bitstream generation section 116.

[0078] The geometry encoding section 111 performs processing regarding encoding of location information. For example, the geometry encoding section 111 obtains and encodes location information of point cloud data having been input to the encoding device 100. The geometry encoding section 111 supplies geometry encoded data obtained by the encoding to the bitstream generation section 116. Further, the geometry encoding section 111 also supplies geometry coefficients, which are the location information, to the geometry coefficient rearrangement section 112.

[0079] The geometry coefficient rearrangement section 112 performs processing regarding rearrangement of coefficient data. For example, the geometry coefficient rearrangement section 112 obtains the geometry coefficients supplied from the geometry encoding section 111. The geometry coefficient rearrangement section 112 rearranges the geometry coefficients into Morton code order, and supplies the rearranged geometry coefficients to the RAHT processing section 113.

[0080] The RAHT processing section 113 performs processing regarding the RAHT. For example, the RAHT processing section 113 performs the RAHT on the geometry coefficients that are supplied in the Morton code order from the geometry coefficient rearrangement section 112, to derive weight values regarding the location information. The RAHT processing section 113 supplies the derived weight values to the context selection section 114 and (a RAHT processing section 121 of) the attribute encoding section 115.

[0081] The context selection section 114 performs processing regarding the selection of contexts. For example, the context selection section 114 obtains the weight values from the RAHT processing section 113. The context selection section 114 selects contexts on the basis of the weight values.

[0082] For example, the context selection section 114 preliminarily stores therein a plurality of candidates for the contexts. Mutually different weight values (a value region thereof) are (is) assigned to each of the candidates, and any one of the candidates is selected according to the magnitude of a weight value. For example, this selection of any one of the candidates according to the magnitude of a weight value is made in such a way that, in the case where the weight value is smaller than a threshold value A, a context A assigned to this region is selected, in the case where the weight value is equal to or larger than A but smaller than B, a context B assigned to this region is selected, … , and in the case where the weight value is equal to or larger than Y, a context Z assigned to this region is selected. Note that each of the candidates is set to a value that is further suited for coefficient variation degrees corresponding to an assigned value region.

[0083] That is, implemented is a configuration in which the context selection section 114 selects a context on the basis of a weight value and this selection enables selection of the context that is further suited for a coefficient variation degree corresponding to the weight value.

[0084] Here, the number of the candidates and the magnitudes of the threshold values can be determined freely. These values may be, for example, preliminarily determined fixed values, or may be settable by a user or the like (or may be variable). In the case of being variable, the number of the candidates and the magnitudes of the threshold values may be configured to be transmitted to a decoding side in such a way as to be included in a bitstream as its header information or the like. Implementing such a configuration enables a decoder to more easily select contexts similar to those for the encoding device 100 by using the header information or the like.

[0085] The context selection section 114 supplies the selected contexts to (a lossless encoding section 123 of) the attribute encoding section 115.

[0086] The attribute encoding section 115 performs processing regarding encoding of attribute information. For example, the attribute encoding section 115 obtains attribute information of the point cloud data that is input to the encoding device 100, and encodes the attribute information to generate attribute encoded data. The attribute encoding section 115 supplies the generated attribute encoded data to the bitstream generation section 116.

[0087] The bitstream generation section 116 generates a bitstream including the geometry encoded data supplied from the geometry encoding section 111 and the attribute encoded data supplied from the attribute encoding section 115, and outputs the bitstream to the outside of the encoding device 100.

[0088] Further, the attribute encoding section 115 includes the RAHT processing section 121, a quantization section 122, and the lossless encoding section 123.

[0089] The RAHT processing section 121 performs processing regarding the RAHT. For example, the RAHT processing obtains attribute information of the point cloud data that is input to the encoding device 100. Further, the RAHT processing section 121 obtains the weight values supplied from the RAHT processing section 113. The RAHT processing section 121 performs the RAHT on the attribute information by using the weight values. The RAHT processing section 121 supplies transformed coefficients having been obtained by the RAHT to the quantization section 122.

[0090] The quantization section 122 performs processing regarding quantization. For example, the quantization section 122 obtains the transformed coefficients supplied from the RAHT processing section. Further, the quantization section 122 quantizes the obtained, transformed coefficients. The quantization section 122 supplies quantized coefficients having been obtained by the quantization to the lossless encoding section 123.

[0091] The lossless encoding section 123 performs processing regarding lossless encoding. For example, the lossless encoding section 123 obtains the quantized coefficients supplied from the quantization section 122. Further, the lossless encoding section 123 obtains the contexts having been selected by the context selection section 114. The lossless encoding section 123 encodes the quantized coefficients by using the contexts. That is, the lossless encoding section 123 encodes the attribute information of the point cloud by using the contexts corresponding to the weight values obtained by the orthogonal transformation made on the location information of the point cloud, the orthogonal transformation taking into consideration a three-dimensional structure. The lossless encoding section 123 supplies the bitstream generation section 116 with encoded data (attribute encoded data) that is associated with the attribute information and that has been generated in such a manner as described above.

[0092] Here, these processing sections (ranging from the geometry encoding section 111 to the bitstream generation section 116 and ranging from the RAHT processing section 121 to the lossless encoding section 123) each have an optional configuration. For example, each of these processing sections may include a logic circuit that implements a corresponding one of the above-described kinds of processing. Further, each of the processing sections may include components such as a CPU, a ROM, and a RAM, and implement a corresponding one of the above-described kinds of processing by executing a program with the components. Naturally, each of the processing sections may have both of the above-described configurations to implement a portion of a corresponding one of the above-described kinds of processing with the logic circuit and implement the other portion of the corresponding processing by executing the program. The configurations of the individual processing sections may be mutually independent and may be made such that, for example, one or more processing sections among the above processing sections each implement a corresponding one of the above-described kinds of processing with a logic circuit, another one or more processing sections among the above processing sections each implement a corresponding one of the above-described kinds of processing by executing a program, and further another one or more processing sections among the above processing sections each implement a corresponding one of the above-described kinds of processing by means of both a logic circuit and the execution of a program.

[0093] Implementing such a configuration as described above makes it possible for the encoding device 100 to bring about effects such as those described in <1. Encoding of attribute information>. For example, the encoding device 100 is capable of reducing the decrease of the efficiency of the encoding simultaneously with reducing the increase of the load thereon.

Flow of Encoding Processing

[0094] Next, an example of the flow of encoding processing performed by the encoding device 100 will be described with reference to the flowchart of FIG. 6.

[0095] Upon start of the encoding processing, the geometry encoding section 111 encodes geometry data (location information) in step S101.

[0096] In step S102, the geometry coefficient rearrangement section 112 rearranges geometry coefficients into the Morton code order.

[0097] In step S103, the RAHT processing section 113 performs the RAHT on the geometry data to derive weight values.

[0098] In step S104, the context selection section 114 selects contexts on the basis of the weight values having been derived in step S103.

[0099] In step S105, the RAHT processing section 121 performs the RAHT on attribute data (attribute information) by using the weight values associated with the geometry and derived in step S103.

[0100] In step S106, the quantization section 122 performs quantization on transformed coefficients having been obtained in step S105.

[0101] In step S107, the lossless encoding section 123 performs lossless encoding on quantized coefficients (attribute data) having been obtained in step S106, by using the contexts having been selected in step S104.

……
……
……

本文链接：https://patent.nweon.com/21976

Sony Patent | Image processing device and method

您可能还喜欢...

分类

最新AR/VR行业分享

最新AR/VR论文

最新AR/VR行业招聘

Sony Patent | Image processing device and method

您可能还喜欢...

Sony Patent | Course Profiling And Sharing

Sony Patent | Dimming Device, Image Display Device, And Display Device, And Dimming Device Manufacturing Method

Sony Patent | An Apparatus, Computer Program And Method

分类

最新AR/VR行业分享

最新AR/VR论文

最新AR/VR行业招聘