Sony Patent | Image processing apparatus and method
Patent: Image processing apparatus and method
Drawings: Click to check drawins
Publication Number: 20210168394
Publication Date: 20210603
Applicant: Sony
Abstract
The present disclosure relates to an image processing apparatus and a method that allow for easier and more appropriate rendering. Coded data is generated by encoding a two-dimensional plane image in which position information and attribute information for a point cloud that represents an object having a three-dimensional shape as a group of points are projected onto a two-dimensional plane, and a bitstream that includes the generated coded data and metadata to be used to render the point cloud is generated. The present disclosure can be applied to, for example, an image processing apparatus, an electronic device, an image processing method, a program, or the like.
Claims
-
An image processing apparatus comprising: a coding unit that generates coded data by encoding a two-dimensional plane image in which position information and attribute information for a point cloud that represents an object having a three-dimensional shape as a group of points are projected onto a two-dimensional plane; and a generation unit that generates a bitstream that includes the coded data generated by the coding unit and metadata to be used to render the point cloud.
-
The image processing apparatus according to claim 1, wherein the metadata includes an index that identifies a camera parameter with a preset condition for a camera for rendering.
-
The image processing apparatus according to claim 2, wherein the camera parameter includes a parameter that indicates a position of the camera.
-
The image processing apparatus according to claim 2, wherein the camera parameter includes a parameter that indicates a direction of the camera.
-
The image processing apparatus according to claim 2, wherein the camera parameter includes a parameter that indicates an upward direction of the camera.
-
The image processing apparatus according to claim 2, wherein the camera parameter includes a parameter that indicates a projection method of the camera.
-
The image processing apparatus according to claim 2, wherein the camera parameter includes a parameter that indicates an angle of view of the camera.
-
The image processing apparatus according to claim 1, wherein the metadata includes an index that identifies a purpose of a camera for rendering.
-
The image processing apparatus according to claim 8, wherein the purpose includes a quality check at a time of encoding.
-
The image processing apparatus according to claim 8, wherein the purpose includes a recommended orientation of the camera.
-
The image processing apparatus according to claim 8, wherein the purpose includes a recommended movement trajectory of the camera.
-
The image processing apparatus according to claim 1, wherein the metadata includes a conversion rate between a real scale and a scale in an image obtained by rendering the point cloud.
-
The image processing apparatus according to claim 1, wherein the metadata includes a camera parameter set with use of a bounding box as a reference.
-
The image processing apparatus according to claim 1, wherein the metadata includes a camera movement trajectory parameter that indicates a movement trajectory of a camera for rendering.
-
The image processing apparatus according to claim 1, wherein the metadata includes an object movement trajectory parameter that indicates a movement trajectory of the object.
-
An image processing method comprising: generating coded data by encoding a two-dimensional plane image in which position information and attribute information for a point cloud that represents an object having a three-dimensional shape as a group of points are projected onto a two-dimensional plane; and generating a bitstream that includes the generated coded data and metadata to be used to render the point cloud.
-
An image processing apparatus comprising: a decoding unit that decodes a bitstream that includes coded data obtained by encoding a two-dimensional plane image in which position information and attribute information for a point cloud that represents an object having a three-dimensional shape as a group of points are projected onto a two-dimensional plane and metadata to be used to render the point cloud, reconstructs the point cloud, and extracts the metadata; and a rendering unit that renders the point cloud reconstructed by the decoding unit by using the metadata extracted by the decoding unit.
-
The image processing apparatus according to claim 17, further comprising a control unit that controls a camera parameter to be used to render the point cloud on a basis of the metadata, wherein the rendering unit renders the point cloud by using the camera parameter controlled by the control unit.
-
The image processing apparatus according to claim 17, further comprising: a control unit that controls a camera parameter to be used to render the point cloud on a basis of an external input; and a monitoring unit that monitors, on a basis of the metadata, whether the camera parameter is within a range for which a quality check has been performed, wherein in a case where the monitoring unit determines that the camera parameter is within the range for which the quality check has been performed, the rendering unit renders the point cloud by using the camera parameter controlled by the control unit.
-
An image processing method comprising: decoding a bitstream that includes coded data obtained by encoding a two-dimensional plane image in which position information and attribute information for a point cloud that represents an object having a three-dimensional shape as a group of points are projected onto a two-dimensional plane and metadata to be used to render the point cloud, reconstructing the point cloud, and extracting the metadata; and rendering the reconstructed point cloud by using the extracted metadata.
Description
TECHNICAL FIELD
[0001] The present disclosure relates to an image processing apparatus and a method, and more particularly to an image processing apparatus and a method that allow for easier and more appropriate rendering.
BACKGROUND ART
[0002] As an encoding method for 3D data representing a three-dimensional structure such as a point cloud, there has conventionally been encoding using voxels such as Octree (see, for example, Non-Patent Document 1).
[0003] In recent years, as another encoding method, for example, an approach in which each of position information and color information of a point cloud is projected onto a two-dimensional plane for each subregion and encoded by an encoding method for two-dimensional images (hereinafter also referred to as a video-based approach) has been proposed (see, for example, Non-Patent Document 2 to Non-Patent Document 4).
[0004] The 3D data encoded as described above is transmitted as a bitstream and decoded. Then, the three-dimensional structure is rendered as if it has been imaged by a camera at an optional position and orientation, and is converted into a two-dimensional image, and the two-dimensional image is displayed or stored.
CITATION LIST
Non-Patent Document
[0005] Non-Patent Document 1: R. Mekuria, Student Member IEEE, K. Blom, P. Cesar., Member, IEEE, “Design, Implementation and Evaluation of a Point Cloud Codec for Tele-Immersive Video”, tcsvt_paper_submitted_february.pdf [0006] Non-Patent Document 2: Tim Golla and Reinhard Klein, “Real-time Point Cloud Compression”, IEEE, 2015 [0007] Non-Patent Document 3: K. Mammou, “Video-based and Hierarchical Approaches Point Cloud Compression”, MPEG m41649, October 2017 [0008] Non-Patent Document 4: K. Mammou, “PCC Test Model Category 2 v0”, N17248 MPEG output document, October 2017
SUMMARY OF THE INVENTION
Problems to be Solved by the Invention
[0009] However, in the case of this method, it has not been possible to grasp an appropriate value to be set as a camera parameter at the time of rendering decoded 3D data, and it has been difficult to perform appropriate rendering.
[0010] The present disclosure has been made in view of such circumstances, and is intended to allow for easier and more appropriate rendering.
Solutions to Problems
[0011] An image processing apparatus according to one aspect of the present technology includes a coding unit that generates coded data by encoding a two-dimensional plane image in which position information and attribute information for a point cloud that represents an object having a three-dimensional shape as a group of points are projected onto a two-dimensional plane, and a generation unit that generates a bitstream that includes the coded data generated by the coding unit and metadata to be used to render the point cloud.
[0012] An image processing method according to the one aspect of the present technology includes generating coded data by encoding a two-dimensional plane image in which position information and attribute information for a point cloud that represents an object having a three-dimensional shape as a group of points are projected onto a two-dimensional plane, and generating a bitstream that includes the generated coded data and metadata to be used to render the point cloud.
[0013] An image processing apparatus according to another aspect of the present technology includes a decoding unit that decodes a bitstream that includes coded data obtained by encoding a two-dimensional plane image in which position information and attribute information for a point cloud that represents an object having a three-dimensional shape as a group of points are projected onto a two-dimensional plane and metadata to be used to render the point cloud, reconstructs the point cloud, and extracts the metadata, and a rendering unit that renders the point cloud reconstructed by the decoding unit by using the metadata extracted by the decoding unit.
[0014] An image processing method according to the other aspect of the present technology includes decoding a bitstream that includes coded data obtained by encoding a two-dimensional plane image in which position information and attribute information for a point cloud that represents an object having a three-dimensional shape as a group of points are projected onto a two-dimensional plane and metadata to be used to render the point cloud, reconstructing the point cloud, and extracting the metadata, and rendering the reconstructed point cloud by using the extracted metadata.
[0015] In the image processing apparatus and method according to the one aspect of the present technology, coded data is generated by encoding a two-dimensional plane image in which position information and attribute information for a point cloud that represents an object having a three-dimensional shape as a group of points are projected onto a two-dimensional plane, and a bitstream that includes the generated coded data and metadata to be used to render the point cloud is generated.
[0016] In the image processing apparatus and method according to the other aspect of the present technology, a bitstream that includes coded data obtained by encoding a two-dimensional plane image in which position information and attribute information for a point cloud that represents an object having a three-dimensional shape as a group of points are projected onto a two-dimensional plane and metadata to be used to render the point cloud is decoded, the point cloud is reconstructed, and the metadata is extracted, and then the extracted metadata is used to render the reconstructed point cloud.
Effects of the Invention
[0017] According to the present disclosure, images can be processed. In particular, rendering can be performed more easily and more appropriately.
BRIEF DESCRIPTION OF DRAWINGS
[0018] FIG. 1 illustrates an example of rendering 3D data.
[0019] FIG. 2 illustrates an example of metadata to which the present technology is applied.
[0020] FIG. 3 illustrates an example of a camera parameter.
[0021] FIG. 4 illustrates an example of a camera parameter index.
[0022] FIG. 5 illustrates an example of Social Zone.
[0023] FIG. 6 illustrates an example of Friendship Zone.
[0024] FIG. 7 illustrates an example of Intimate Zone.
[0025] FIG. 8 illustrates an example of rendering.
[0026] FIG. 9 illustrates an example of a camera parameter category index.
[0027] FIG. 10 illustrates an example of a bounding box.
[0028] FIG. 11 illustrates an example of a bounding box.
[0029] FIG. 12 illustrates an example of a metadata update timing.
[0030] FIG. 13 is a block diagram illustrating an example of a main configuration of a coding device.
[0031] FIG. 14 is a flowchart illustrating an example of a flow of coding processing.
[0032] FIG. 15 is a flowchart illustrating an example of a flow of point cloud coding processing.
[0033] FIG. 16 is a block diagram illustrating an example of a main configuration of a reproduction device.
[0034] FIG. 17 is a block diagram illustrating an example of a main configuration of a decoding unit.
[0035] FIG. 18 is a flowchart illustrating an example of a flow of reproduction processing.
[0036] FIG. 19 is a flowchart illustrating an example of a flow of point cloud decoding processing.
[0037] FIG. 20 is a block diagram illustrating an example of a main configuration of a reproduction device.
[0038] FIG. 21 is a flowchart illustrating an example of a flow of reproduction processing.
[0039] FIG. 22 is a block diagram illustrating an example of a main configuration of a computer.
MODE FOR CARRYING OUT THE INVENTION
[0040] Modes for carrying out the present disclosure (hereinafter referred to as “embodiments”) will be described below. Note that the description will be made in the order below.
[0041] 1. Signals of rendering camera parameters
[0042] 2. First embodiment (coding device)
[0043] 3. Second embodiment (reproduction device)
[0044] 4. Third embodiment (reproduction device)
[0045] 5. Note
-
Signals of Rendering Camera Parameters
[0046]
[0047] The scope disclosed in the present technology includes not only the contents described in the embodiments but also the contents described in the following non-patent documents known at the time of filing. [0048] Non-Patent Document 1: (described above) [0049] Non-Patent Document 2: (described above) [0050] Non-Patent Document 3: (described above) [0051] Non-Patent Document 4: (described above) [0052] Non-Patent Document 5: TELECOMMUNICATION STANDARDIZATION SECTOR OF ITU (International Telecommunication Union), “Advanced video coding for generic audiovisual services”, H.264, April 2017 [0053] Non-Patent Document 6: TELECOMMUNICATION STANDARDIZATION SECTOR OF ITU (International Telecommunication Union), “High efficiency video coding”, H.265, December 2016 [0054] Non-Patent Document 7: Jianle Chen, Elena Alshina, Gary J. Sullivan, Jens-Rainer, Jill Boyce, “Algorithm Description of Joint Exploration Test Model 4”, JVET-G1001_v1, Joint Video Exploration Team (JVET) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11 7th Meeting: Torino, IT, 13-21 Jul. 2017
[0055] That is, the contents described in the non-patent documents described above are also the basis for determining support requirements. For example, even in a case where a quad-tree block structure described in Non-Patent Document 6 and a quad tree plus binary tree (QTBT) block structure described in Non-Patent Document 7 are not directly described in the embodiments, they are included in the scope of the disclosure of the present technology and meet the support requirements of the claims. Furthermore, for example, technical terms such as parsing, syntax, and semantics are also included in the scope of the disclosure of the present technology and meet the support requirements of the claims even in a case where they are not direct described in the embodiments.
[0056]
[0057] There has conventionally been 3D data such as a point cloud representing a three-dimensional structure based on position information, attribute information, and the like of a group of points, and a mesh that is constituted by vertices, edges, and faces and defines a three-dimensional shape using a polygonal representation.
[0058] For example, in the case of a point cloud, a three-dimensional structure (object having a three-dimensional shape) is represented as a set of a large number of points (group of points). That is, point cloud data is constituted by position information and attribute information (e.g., color) of each point in this group of points. Consequently, the data has a relatively simple structure, and any three-dimensional structure can be represented with sufficient accuracy with use of a sufficiently large number of points.
[0059]
[0060] A video-based approach has been proposed, in which a two-dimensional image is formed by projecting each of position information and color information of such a point cloud onto a two-dimensional plane for each subregion, and the two-dimensional image is encoded by an encoding method for two-dimensional images.
[0061] In this video-based approach, an input point cloud is divided into a plurality of segmentations (also referred to as regions), and each region is projected onto a two-dimensional plane. Note that data for each position of the point cloud (i.e., data for each point) is constituted by position information (geometry (also referred to as depth)) and attribute information (texture) as described above, and each region is projected onto a two-dimensional plane.
[0062] Then, each segmentation (also referred to as a patch) projected onto the two-dimensional plane is arranged to form a two-dimensional image, and is encoded by an encoding method for two-dimensional plane images such as Advanced Video Coding (AVC) or High Efficiency Video Coding (HEVC).
[0063]
[0064] The 3D data encoded as described above is transmitted as a bitstream and decoded. Then, the three-dimensional structure is rendered as if it has been imaged by a camera at an optional position and orientation, and is converted into a two-dimensional image, and the two-dimensional image is displayed or stored. Note that the two-dimensional image obtained by rendering 3D data is an image different from a two-dimensional image (two-dimensional image in which patches are arranged) at the time of encoding.
[0065] A subjective image quality of a two-dimensional image obtained by rendering 3D data is affected by a distance between an object for which the 3D data is rendered and a camera for rendering, a projection method, and the like. The same applies to appearance of compression strain caused by an encoder.
[0066] However, it has been difficult for a stream user to perform appropriate rendering without knowing conditions under which a stream creator has performed a quality check at the time of encoding (the distance between a 3D object and the camera, the projection method, and the like).
[0067] For example, a point cloud represents an object having a three-dimensional shape as a group of points, and a density of the group of points greatly affects appearance of the object. For example, in a situation in which the group of points is so dense that each point cannot be identified, the group of points is visible as an object (three-dimensional shape), but in a situation in which the group of points is so sparse that each point can be identified, there is a possibility that the group of points becomes less visible as an object (three-dimensional shape).
[0068] Then, the subjective density (appearance) of the group of points changes in accordance with a distance between the group of points (three-dimensional shape) and a viewpoint. For example, the farther away the viewpoint is from the group of points, the denser the group of points looks, and the closer the viewpoint is to the group of points, the sparser the group of points looks.
[0069] Furthermore, in general, as a 3D data rendering method, there is a method of performing rendering just like capturing an image with a camera (i.e., generating an image like a captured image). In a case of such a method, the camera (also referred to as the camera for rendering) can be at an optional position and orientation as in an example illustrated in FIG. 1. Each of an image 11 to an image 15 illustrated in FIG. 1 is an example of a two-dimensional image obtained by rendering just like imaging an object 10, which is a point cloud, with a camera. As described above, the position and orientation of the camera for rendering can be freely set.
[0070] That is, the subjective image quality of the two-dimensional image obtained by rendering the point cloud may be affected by the position of the camera for rendering (particularly the distance between the camera and the object (group of points)). In other words, the appropriate position (or range) of the rendering camera depends on the density of the group of points in the point cloud. The same applies to other types of 3D data, and the appropriate position (or range) of the rendering camera depends on the structure of the 3D data.
[0071] However, in a case of a conventional method, such information is not provided, and it has been difficult for a stream user who performs rendering to grasp such an appropriate position of the camera.
[0072] For example, in FIG. 1, in a case of the image 12, the distance from the camera to the object 10 is shorter than that in a case of the image 11, and the subjective image quality of the object 10 may be reduced. However, in the conventional method, it has been difficult for a stream user to grasp whether the camera position for the image 11 is appropriate or the camera position for the image 12 is appropriate without checking rendering results.
[0073] For example, in general, when a stream creator creates a stream, the stream creator performs a quality check on a two-dimensional image obtained by rendering 3D data. In that case, an appropriate camera position as described above is assumed in accordance with the structure of the 3D data (e.g., in accordance with the density of the group of points in the point cloud), and a quality check is performed on a rendering result at that position. However, in the case of the conventional method, such information is not provided to a stream user, and it has been difficult to grasp the camera position assumed by the stream creator.
[0074] Furthermore, for example, it has been difficult to provide the stream user with a camera position, a camera movement trajectory, and the like recommended by the stream creator.
[0075] Consequently, it has not been possible for the stream user to grasp an appropriate value to be set as a camera parameter at the time of rendering decoded 3D data, and it has been difficult to perform appropriate rendering.
[0076]
[0077] Thus, information regarding a camera for rendering 3D data (e.g., a point cloud that represents an object having a three-dimensional shape as a group of points) is provided to a decoding side in association with the 3D data. For example, the information regarding the camera may be included in a bitstream of 3D data as metadata to be used to render 3D data (e.g., a point cloud) and transmitted to the decoding side.
[0078] Thus, the information regarding the camera can be acquired on the decoding side. Then, using the information regarding the camera allows for easier and more appropriate rendering.
[0079]
[0080]
[0081] The information regarding the camera may be any information as long as it relates to a camera for rendering 3D data. For example, various types of information as shown in Table 21 in FIG. 2 may be included.
[0082] For example, as shown in the first row (excluding an item name row) from the top of Table 21, information regarding the camera may include a camera parameter index, which is an index indicating a defined camera parameter (a camera parameter with a preset a condition for the camera for rendering).
[0083] Camera parameters are parameters related to the camera for rendering 3D data. Specifically, the camera parameters may include any parameters. For example, as in syntax 31 illustrated in A of FIG. 3, the camera parameters may include x, y, and z coordinates (camera_pos_x, camera_pos_y, camera_pos_z) that indicate the position of the camera, that is, a camera position coordinate (camera_pos) 32 in B of FIG. 3.
[0084] Furthermore, the camera parameters may include x, y, and z coordinates (center_pos_x, center_pos_y, center_pos_z) that indicate a position of a camera gaze point, that is, a camera gaze point coordinate (center_pos) 33 in B of FIG. 3. Note that, instead of the camera gaze point, a vector 34 from the camera position coordinate (camera_pos) 32 to the camera gaze point coordinate (center_pos) 33 illustrated in B of FIG. 3 may be included in the camera parameters. These parameters indicate a direction (orientation) of the camera.
[0085] Furthermore, the camera parameters may include a vector indicating an upward direction of the camera (camera_up_x, camera_up_y, camera_up_z), that is, a vector (camera_up) 35 indicating the upward direction of the camera in B of FIG. 3
[0086] Note that the camera gaze point coordinate 33 (vector 34) and the vector 35 indicating the upward direction of the camera are also parameters indicating a posture of the camera.
[0087] Furthermore, the camera parameters may include a parameter indicating the projection method of the camera, that is, a rendering method. For example, as a parameter indicating the projection method of the camera, a parameter (PerspectiveProjection) indicating whether or not the projection method is a perspective projection may be included. Furthermore, for example, as a parameter indicating the projection method of the camera, a parameter indicating whether or not the projection method is a parallel projection may be included. Moreover, as a parameter indicating the projection method of the camera, a parameter indicating whether the projection method is a perspective projection or a parallel projection may be included.
[0088] Furthermore, the camera parameters may include a parameter (field of view (FOV)) indicating an angle of view of the camera.
[0089] The camera parameter index may indicate any camera parameter as long as it indicates a defined camera parameter, and may be an index of any value.
[0090] For example, as shown in Table 41 in FIG. 4, the camera parameter index may indicate a defined imaging situation (camera position, orientation, posture, or the like). That is, a camera parameter that realizes a situation corresponding to a value of a camera parameter index may be specified by specifying the camera parameter index.
[0091] In the case of the example of Table 41 in FIG. 4, index “0” indicates a situation referred to as “Social Zone”, index “1” indicates a situation referred to as “Friendship Zone”, and index “2” indicates a situation referred to as “Intimate Zone”.
[0092] “Social Zone” indicates, as shown in the table, a situation in which the camera is located at a distance of 2 m from a 3D object and 1.4 m from the ground, and faces 10 degrees downward from a horizontal direction. That is, when this situation is specified, the camera position coordinate (camera_pos) 32 is set to a position 2000 mm away from the object 10 and 1400 mm from the ground as illustrated in FIG. 5. Furthermore, the vector 34 is set to a direction 10 degrees downward from the horizontal direction (10.degree. face down).
[0093] “Friendship Zone” indicates, as shown in the table, a situation in which the camera is located at a distance of 1 m from the 3D object and 1.4 m from the ground, and faces 10 degrees downward from the horizontal direction. That is, when this situation is specified, the camera position coordinate (camera_pos) 32 is set to a position 1000 mm away from the object 10 and 1400 mm from the ground as illustrated in FIG. 6. Furthermore, the vector 34 is set to a direction 10 degrees downward from the horizontal direction (10.degree. face down).
[0094] “Intimate Zone” indicates, as shown in the table, a situation in which the camera is located at a distance of 0.5 m from the 3D object and 1.4 m from the ground, and faces in the horizontal direction. That is, when this situation is specified, the camera position coordinate (camera_pos) 32 is set to a position 500 mm away from the object 10 and 1400 mm from the ground as illustrated in FIG. 7. Furthermore, the vector 34 is set in the horizontal direction (0.degree.).
[0095] A correspondence relationship between such a situation (corresponding camera parameter) and a camera parameter index is specified in advance by, for example, a standard, and the relationship is grasped in advance on an encoding side and the decoding side. Consequently, on both the encoding side and the decoding side, it is possible to easily specify a camera parameter that realizes a situation as described above simply by specifying a camera parameter index.
[0096] Note that any camera parameter may be specified by a camera parameter index, and such a camera parameter is not limited to the examples described above. Furthermore, any situation may be specified by a camera parameter index, and such a situation is not limited to the examples described above. Moreover, the number of camera parameter indexes specified in advance is optional. The number is not limited to the example described above, and may be two or less, or may be four or more. Furthermore, the camera parameter indexes may be set to any values, and the values are not limited to the example described above (0 to 2).
[0097] A camera parameter specified by such a camera parameter index may be set at the time of rendering so that rendering can be performed in a situation specified by the camera parameter index.
[0098] For example, in a case where the camera parameter is set on the basis of camera parameter index “0”, rendering can be performed in “Social Zone”, and an image 51 as illustrated in FIG. 8 is obtained. In this case, the camera position is relatively far from the object 10, and the image 51 shows the whole of the object 10 (whole body).
[0099] For example, in a case where the camera parameter is set on the basis of camera parameter index “2”, rendering can be performed in “Intimate Zone”, and an image 52 as illustrated in FIG. 8 is obtained. In this case, the camera position is relatively close to the object 10, and the image 52 shows only a part (upper body) of the object 10.
[0100] As described above, an image of a situation specified by the index is obtained.
[0101] For example, on the encoding side, a sequence creator (stream creator) selects a situation in which an image of a sufficient quality (subjective image quality) can be obtained by performing a quality check at the time of encoding, and sets a camera parameter index indicating the situation. The camera parameter index is included in a bitstream as information regarding the camera, and transmitted to the decoding side. Thus, on the decoding side, a sequence user (stream user) can use the camera parameter index to easily perform rendering in the situation in which an image of sufficient quality (subjective image quality) can be obtained.
[0102] For example, the sequence creator can use this camera parameter index to notify, more easily, the decoding side of a recommended situation or a situation where an acceptable quality is obtained. In other words, the sequence user can more easily grasp those situations specified by the sequence creator.
[0103] That is, transmitting this camera parameter index from the encoding side to the decoding side allows the sequence creator to specify an appropriate situation (camera position, orientation, or the like), and the sequence user to more easily grasp the appropriate situation (situation in which quality is guaranteed). Consequently, rendering can be performed more easily and more appropriately.
[0104] Note that a camera parameter index can be used to specify a plurality of camera parameters, and it is therefore possible to suppress a reduction in coding efficiency as compared with a case of transmitting information in which each camera parameter is individually specified. Furthermore, the sequence creator is only required to perform a quality check in the situation specified by this camera parameter index, and this allows the quality check to be performed more easily, for example, it is not necessary to consider a value to which each camera parameter is to be set. Furthermore, during a quality check, a situation specified by the camera parameter index can be applied so that the situation can be commonalized regardless of sequence. That is, quality evaluation can be performed on a plurality of sequences under the same conditions.
[0105] Note that the number of camera parameter indexes transmitted from the encoding side to the decoding side is optional, and may be one, or may be two or more.
[0106]
[0107] Furthermore, information regarding the camera may include a camera parameter category index, which is an index that identifies a purpose of the camera for rendering, as shown in the second row (excluding the item name row) from the top of Table 21 in FIG. 2, for example. That is, the camera parameter category index has a value that specifies the purpose of the camera situation realized by a set camera parameter.
[0108] The purpose of the camera specified by this camera parameter category index is optional. That is, the camera parameter category index may specify any purpose of the camera. FIG. 9 illustrates an example of the camera parameter category index.
[0109] In the case of the example of Table 61 in FIG. 9, index “0” indicates that the purpose of the camera is a quality check at the time of encoding. That is, index “0” indicates that the camera situation has been used for a quality check at the time of encoding. In other words, the camera situation is a situation in which a quality check has been performed (a situation in which the quality is guaranteed).
[0110] Furthermore, index “1” indicates that the purpose of the camera is a recommended angle. That is, index “1” indicates that the camera situation is a situation (i.e., an angle) recommended by a sequence creator (encoding side). For example, such a value is set for the first frame (1st frame).
[0111] Moreover, index “2” indicates that the purpose of the camera is a recommended camera path (recommended movement trajectory of the camera). That is, index “2” indicates that the set camera movement is a movement trajectory of the camera recommended by the sequence creator (encoding side).
[0112] A correspondence relationship between such a purpose of the camera and a camera parameter category index is specified in advance by, for example, a standard, and the relationship is grasped in advance on the encoding side and the decoding side. Consequently, on the encoding side, it is possible to easily specify the purpose of the camera as described above simply by specifying a camera parameter category index. Furthermore, on the decoding side, the purpose of the camera as described above can be easily grasped on the basis of the camera parameter category index.
[0113] Note that any purpose of the camera may be specified by a camera parameter category index, and such a camera purpose is not limited to the examples described above. Furthermore, the number of camera parameter category indexes specified in advance is optional. The number is not limited to the example described above, and may be two or less, or may be four or more. Moreover, the camera parameter category indexes may be set to any values, and the values are not limited to the example described above (0 to 2).
[0114] For example, at the time of rendering, the set purpose of the camera can be easily grasped on the basis of such a camera parameter category index. Consequently, whether or not to apply the camera parameter to rendering can be determined more easily and more appropriately.
[0115] For example, in a case where rendering is performed with a quality-guaranteed camera, a camera parameter specified by camera parameter category index “0” may be applied. Furthermore, for example, in a case where rendering is performed at an angle recommended by the sequence creator, a camera parameter specified by camera parameter category index “1” may be applied. Moreover, for example, in a case where rendering is performed with a movement trajectory of the camera recommended by the sequence creator, a camera parameter specified by camera parameter category index “2” may be applied.
[0116] This allows a sequence user (stream user) on the decoding side to easily check the purpose of the camera before using the camera parameter. Consequently, rendering can be performed more easily and more appropriately.
[0117] Note that, on the encoding side, the sequence creator (stream creator) can easily notify the decoding side of the purpose of the camera by using this camera parameter category index.
[0118] Note that the number of camera parameter category indexes transmitted from the encoding side to the decoding side is optional, and may be one, or may be two or more.
[0119]
[0120] Furthermore, the information regarding the camera may include a conversion rate (frame to world scale) between a scale in a two-dimensional image obtained by rendering 3D data (e.g., a point cloud) and a real scale, as shown in the third row (excluding the item name row) from the top of Table 21 in FIG. 2, for example.
[0121] In general, it is possible to use a scale in 3D data or a rendered image different from a scale in the real world. Consequently, a conversion rate between those scales can be set and used so that a camera parameter can be set in, for example, a real scale.
[0122] For example, on the encoding side, a camera parameter is set in the real scale, the conversion rate described above is set, and they are transmitted as information regarding the camera. Thus, on the decoding side, the conversion rate can be used so that the camera parameter set in the real scale can be converted more easily into a camera parameter in a scale in an image after rendering. Consequently, on the decoding side, the camera parameter set in the real scale can be applied more easily. Furthermore, on the encoding side, it is not necessary to consider the scale in the image after rendering, and the camera parameter can be set more easily.
[0123] Note that the number of conversion rates transmitted from the encoding side to the decoding side is optional, and may be one, or may be two or more. For example, a plurality of rates that differs from each other may be transmitted.
[0124]
[0125] Furthermore, the information regarding the camera may include a camera parameter that is set with use of a bounding box as a reference, as shown in the fourth row (excluding the item name row) from the top of Table 21 in FIG. 2, for example.
[0126] A bounding box is information for normalizing position information of a coding target, and is a region formed so as to surround an object in 3D data, which is the coding target. This bounding box may have any shape. For example, the bounding box may have a rectangular parallelepiped shape.
[0127] In a case of setting camera parameters indicating the position, direction, and the like of the camera, it is necessary to set a reference position for the camera parameters. For example, it is conceivable to use an object to be a subject of the camera as a reference. However, objects move in some cases, and such an object can be difficult to use as a reference. Thus, a bounding box is used as a reference for the camera parameters.
[0128] Incidentally, a bounding box can be set in a flexible manner. For example, in a case of a moving object, a bounding box may be set so as to surround the object in each frame (for each predetermined time), or may be set so as to surround the object at all times.
[0129] Thus, for example, as illustrated in FIG. 10, camera parameters (camera position coordinate 32, vector 34, and the like) may be set with use of, as a reference, a bounding box 71 set so as to surround the position of the moving object 10 in the first frame.
[0130] Furthermore, for example, as illustrated in FIG. 11, camera parameters (camera position coordinate 32, vector 34, and the like) may be set with use of, as a reference, a bounding box 72 set so as to surround all the positions of the object 10 (a moving range of the object 10) during the entire sequence or a predetermined period. In the case of the example in FIG. 11, the object 10 moves from the position of an object 10-1 to the position of an object 10-2 as indicated by a dotted arrow 81, and further moves from the position of the object 10-2 to the position of an object 10-3 as indicated by a dotted arrow 82. The bounding box 72 is set so as to surround the object at all of these positions.
[0131] Using a bounding box as a reference as described above allows camera parameters to be more easily set for the moving object 10 regardless of its movement.
[0132] Note that a reference position for camera parameters may be any position with respect to a bounding box. For example, a predetermined position in the bounding box (e.g., the center) may be used as the reference position for the camera parameters, a predetermined position on a boundary between the inside and outside of the bounding box may be used as the reference position for the camera parameters, or a predetermined position outside the bounding box may be used as the reference position for the camera parameters.
[0133] For example, the position of the object 10 at a predetermined time in the bounding box may be used as the reference position for the camera parameters. Furthermore, for example, the center of gravity of the positions of the moving object 10 in the bounding box at all times may be used as the reference position for the camera parameters.
[0134] Note that information regarding a bounding box is specifically optional, and any information regarding the bounding box may be included. For example, information such as the position, size, shape, and target time range of the bounding box may be included.
[0135] Such information regarding a bounding box may be transmitted from the encoding side to the decoding side so that camera parameters set with use of the bounding box as a reference can be more easily interpreted on the decoding side, with interpretation similar to that on the encoding side.
[0136]
[0137] Furthermore, information regarding the camera may include a camera parameter, as shown in the fifth row (excluding the item name row) from the top of Table 21 in FIG. 2, for example. As described above in , camera parameters are parameters related to the camera for rendering 3D data, and specifically may include any parameters. For example, the camera parameters may include the camera position coordinate (camera_pos) 32, the camera gaze point coordinate (center_pos) 33, the vector 34, the vector 35 indicating the upward direction of the camera, the projection method of the camera, and a parameter indicating the angle of view of the camera, or may include any other parameters.
[0138] That is, the camera parameters may be set directly without indexes. Furthermore, the camera parameters may be used in combination with the camera parameter indexes described above for an update of values of some of the camera parameters set in accordance with the camera parameter indexes. Moreover, other camera parameters that are not set with use of the camera parameter indexes may be additionally set. Note that the number of camera parameters that can be set is optional, and may be one, or may be two or more.
[0139] The camera parameters can be set directly as described above, and this improves a degree of freedom in setting the camera parameters as compared with a case of using indexes. Furthermore, the camera parameters may be transmitted from the encoding side to the decoding side, so that the camera parameters that have been set more freely can be applied to rendering on the decoding side. Consequently, rendering can be performed more easily and more appropriately.
[0140]
[0141] Furthermore, the information regarding the camera may include information regarding a movement trajectory of the camera for rendering, as shown in the sixth row (excluding the item name row) from the top of Table 21 in FIG. 2, for example. For example, the information regarding the movement trajectory of the camera for rendering may include a camera activation trajectory parameter (camera path) indicating the movement trajectory of the camera for rendering.
[0142] A camera movement trajectory parameter (camera path) indicates a trajectory of a movement in a case where the camera for rendering is moved to a different position, orientation, or the like. Such information may be transmitted from the encoding side to the decoding side so that, for example, a sequence creator can provide a recommended camera work to the decoding side. Furthermore, on the decoding side, the information transmitted as described above can be used for easier generation of an image obtained by rendering in which the recommended camera work is replicated. Note that a movement of the camera indicated by this trajectory may be a continuous movement or a discrete movement.
[0143]
[0144] Furthermore, the information regarding the camera may include information regarding a movement trajectory of an object to be the subject of the camera, as shown in the seventh row (excluding the item name row) from the top of Table 21 in FIG. 2, for example. For example, the information regarding the movement trajectory of the object may include an object activation trajectory parameter (object path) indicating the movement trajectory of the object.
[0145] As described above in , a 3D data object is capable of a variety of motions and deformations. For example, the object can move, turn, deform, expand, or shrink. An object movement trajectory parameter (object path) indicates a trajectory of such motions and deformations of the object. Such information may be transmitted from the encoding side to the decoding side so that, for example, motions and deformations of the object can be more easily grasped on the decoding side. Consequently, for example, it is possible to set, more easily, a more appropriate camera work (a camera work more appropriate for motions and deformations of the object) of the camera for rendering. Note that a movement of the object indicated by this trajectory may be a continuous movement or a discrete movement.
[0146]
[0147] The above-described information regarding the camera for rendering 3D data may be, for example, added as metadata to a bitstream that includes coded data of a two-dimensional plane image obtained by projecting 3D data.
[0148] In that case, for example, as illustrated in A of FIG. 12, such information may be added to a bitstream as metadata (e.g., as a picture parameter set) of the first frame of a moving image constituted by frame images, which are two-dimensional plane images obtained by projecting 3D data. For example, in A of FIG. 12, information regarding the camera for rendering 3D data is added to a bitstream as metadata 91 (picture parameter set) of the first frame (frame #0) of a moving image.
……
……
……