Sony Patent | Transmission Device, Transmission Method, Reception Device, And Reception Method

Patent: Transmission Device, Transmission Method, Reception Device, And Reception Method

Publication Number: 20200186780

Publication Date: 20200611

Applicants: Sony

Abstract

It is made possible to obtain a common image between a VR-compatible terminal and a VR non-compatible terminal when distributing VR content. A projection picture having a rectangular shape is obtained by cutting off a part or the whole of a spherical capture image and performing in-plane packing on the cut-off spherical capture image. A video stream is obtained by encoding image data of this projection picture. A container containing this video stream is transmitted. Meta information for rendering the projection picture is inserted into a layer of the container and/or the video stream. The center of a cut-out position indicated by cut-out position information inserted in a layer of the video stream is adjusted to coincide with a reference point of the projection picture indicated by the meta information for rendering.

TECHNICAL FIELD

[0001] The present technology relates to a transmission device, a transmission method, a reception device, and a reception method. In more detail, the present technology relates to a transmission device and the like that transmit a projection picture obtained by in-plane packing of a cut-out image from a spherical capture image.

BACKGROUND ART

[0002] Recently, distribution of virtual reality (VR) content has been considered. For example, Patent Document 1 describes that a front image and a back image with an ultra-wide viewing angle having a viewing angle of 180.degree. or more are obtained by performing imaging using a back-to-back technique, and an equidistant cylindrical image is created from these two images and transmitted to a communication terminal. Here, the front image and the back image with an ultra-wide viewing angle having a viewing angle of 180.degree. or more constitute a spherical capture image (360.degree. VR image), and the equidistant cylindrical method is one of in-plane packing methods.

CITATION LIST

Patent Document

[0003] Patent Document 1: Japanese Patent Application Laid-Open No. 2016-194784

SUMMARY OF THE INVENTION

Problems to be Solved by the Invention

[0004] VR content distribution is effective with VR-compatible terminals. However, if consideration is not given to a case where a conventional VR non-compatible terminal receives VR content, a distorted image is displayed on the VR non-compatible terminal. It is necessary to ensure backward compatibility with the conventional terminal with respect to display.

[0005] An object of the present technology is to make it possible to obtain a common image between a VR-compatible terminal and a VR non-compatible terminal when distributing VR content.

Solutions to Problems

[0006]* A concept of the present technology is in*

[0007] a transmission device including:

[0008] a processing unit that cuts off a part or whole of a spherical capture image and performs in-plane packing on the cut-off spherical capture image to obtain a projection picture having a rectangular shape;

[0009] an encoding unit that encodes image data of the projection picture to obtain a video stream;

[0010] a transmission unit that transmits a container including the video stream;* and*

[0011] an insertion unit that inserts meta information for rendering the projection picture into a layer of the container and/or the video stream,* in which*

[0012] a center of a cut-out position indicated by cut-out position information inserted in a layer of the video stream coincides with a reference point of the projection picture indicated by the meta information for rendering.

[0013] In the present technology, a part or whole of the spherical capture image (360.degree. VR image) is cut off, and in-plane packing is further performed on the cut-off spherical capture image by the processing unit such that a projection picture having a rectangular shape is obtained. For example, the spherical capture image is constituted by a front image and a back image with an ultra-wide viewing angle having a viewing angle of 180.degree. or more. Furthermore, examples of the format type of in-plane packing include equirectangular, cross-cubic, and the like.

[0014] The image data of the projection picture is encoded by the encoding unit such that a video stream is obtained. A container including this video stream is transmitted by the transmission unit. For example, the container is an International Organization for Standardization base media file format (ISOBMFF) stream, a moving picture experts group 2-transport stream (MPEG2-TS), a moving picture experts group media transport (MMT) stream, or the like. Meta information for rendering the projection picture is inserted into a layer of the container and/or the video stream by the insertion unit. By inserting the meta information for rendering into the layer of the video stream, the meta information for rendering can be dynamically changed regardless of the container type.

[0015] The center of the cut-out position indicated by cut-out position information inserted in a layer of the video stream is adjusted to coincide with the reference point of the projection picture indicated by the meta information for rendering. For example, the projection picture may be made up of a plurality of regions including a default region whose position is centered on the reference point, and a position indicated by the cut-out position information may be adjusted to coincide with the position of the default region.

[0016] For example, the meta information for rendering may have position information on the reference point. Furthermore, for example, the meta information for rendering may have backward compatibility information indicating that the center of the cut-out position indicated by the cut-out position information inserted in the layer of the video stream coincides with the reference point of the projection picture indicated by the meta information for rendering. In addition, for example, the transmission unit may further transmit a metafile having meta information regarding the video stream, and identification information indicating the fact that the meta information for rendering is inserted in a layer of the container and/or the video stream may be further inserted into the metafile.

[0017] For example, the container may be in ISOBMFF, and the insertion unit may insert the meta information for rendering into a moov box. Furthermore, for example, the container may be an MPEG2-TS, and the insertion unit may insert the meta information for rendering into a program map table. In addition, for example, the container may be an MMT stream, and the insertion unit may insert the meta information for rendering into an MMT package table.

[0018] As described above, in the present technology, the center of the cut-out position indicated by the cut-out position information inserted in a layer of the video stream coincides with the reference point of the projection picture indicated by the meta information for rendering. Therefore, it is possible to obtain a common image between a VR-compatible terminal and a VR noncompatible terminal when distributing VR content.

[0019] Furthermore,* another concept of the present technology is in*

[0020]* a reception device including*

[0021] a reception unit that receives a container including a video stream obtained by encoding image data of a projection picture having a rectangular shape,* in which*

[0022] the projection picture is obtained by cutting off a part or whole of a spherical capture image and performing in-plane packing on the cut-off spherical capture image,

[0023] meta information for rendering the projection picture is inserted in a layer of the container and/or the video stream,* and*

[0024] a center of a cut-out position indicated by cut-out position information inserted in a layer of the video stream coincides with a reference point of the projection picture indicated by the meta information for rendering,

[0025] the reception device further including a control unit that controls: processing of decoding the video stream to obtain the projection picture; processing of rendering the projection picture on the basis of the meta information for rendering to obtain a first display image; processing of cutting out the projection picture on the basis of the cut-out position information to obtain a second display image; and processing of selectively retrieving the first display image or the second display image.

[0026] In the present technology, a container including a video stream obtained by encoding image data of a projection picture having a rectangular shape is received by the reception unit. This projection picture is obtained by cutting off a part or the whole of a spherical capture image and performing in-plane packing on the cut-off spherical capture image. Furthermore, meta information for rendering the projection picture is inserted in a layer of the container and/or the video stream. In addition, the center of the cut-out position indicated by cut-out position information inserted in a layer of the video stream coincides with the reference point of the projection picture indicated by the meta information for rendering.

[0027] Processing of decoding the video stream to obtain the projection picture, processing of rendering the obtained projection picture on the basis of the meta information for rendering to obtain a first display image, processing of cutting out the projection picture on the basis of the cut-out position information to obtain a second display image, and processing of selectively retrieving the first display image or the second display image are controlled by the control unit.

[0028] As described above, in the present technology, the first display image obtained by rendering the projection picture on the basis of the meta information for rendering, or the second display image obtained by cutting out the projection picture on the basis of the cut-out position information is selectively retrieved. Therefore, these two images can be selectively displayed. In this case, the center of the cut-out position indicated by cut-out position information inserted in a layer of the video stream coincides with the reference point of the projection picture indicated by the meta information for rendering, and the first display image and the second display image form a common image.

Effects of the Invention

[0029] According to the present technology, it is possible to obtain a common image between a VR-compatible terminal and a VR non-compatible terminal when distributing VR content. Note that the effects described herein are not necessarily limited and any effects described in the present disclosure may be applied.

BRIEF DESCRIPTION OF DRAWINGS

[0030] FIG. 1 is a block diagram illustrating a configuration example of a moving picture experts group–dynamic adaptive streaming over hypertext transfer protocol (HTTP) (MPEG-DASH)-based stream distribution system.

[0031] FIG. 2 is a diagram illustrating an example of a relationship between respective structures arranged hierarchically in a media presentation description (MPD) file.

[0032] FIG. 3 is a block diagram illustrating a configuration example of a transmission/reception system serving as an embodiment.

[0033] FIG. 4 is a diagram schematically illustrating a configuration example of the entire system of the transmission/reception system.

[0034] FIG. 5 is a diagram for explaining in-plane packing for obtaining a projection picture from a spherical capture image.

[0035] FIG. 6 is a diagram illustrating a structure example of a sequence parameter set network abstraction layer (SPS NAL) unit in high efficiency video coding (HEVC) encoding.

[0036] FIG. 7 is a diagram for explaining that a center O (p, q) of a cut-out position is adjusted to coincide with a reference point RP (x, y) of the projection picture.

[0037] FIG. 8 is a diagram for explaining that a position indicated by cut-out position information is adjusted to coincide with a position of a default region.

[0038] FIG. 9 is a diagram illustrating a structure example of rendering metadata.

[0039] FIG. 10 is a diagram for explaining each piece of information in the structure example illustrated in FIG. 9.

[0040] FIG. 11 is a diagram illustrating another structure example of the rendering metadata.

[0041] FIG. 12 is a diagram for explaining each piece of information in the structure example illustrated in FIG. 11.

[0042] FIG. 13 is a diagram illustrating the contents of primary information in the structure examples illustrated in FIGS. 9 and 11.

[0043] FIG. 14 is a diagram illustrating an example of an MP4 stream as a distribution stream.

[0044] FIG. 15 is a diagram illustrating a description example of the MPD file.

[0045] FIG. 16 is a diagram illustrating “Value” semantics of “SupplementaryDescriptor”.

[0046] FIG. 17 is a diagram for explaining processing of a VR-compatible terminal and a VR non-compatible terminal with respect to the projection picture.

[0047] FIG. 18 is a block diagram illustrating a configuration example of a service transmission system.

[0048] FIG. 19 is a block diagram illustrating a configuration example of a service receiver (VR-compatible terminal).

[0049] FIG. 20 is a diagram illustrating a display changeover sequence in the service receiver (VR-compatible terminal).

[0050] FIG. 21 is a block diagram illustrating a configuration example of a service receiver (VR non-compatible terminal).

[0051] FIG. 22 is a diagram illustrating a configuration example of a transport stream.

[0052] FIG. 23 is a diagram illustrating a configuration example of an MMT stream.

[0053] FIG. 24 is a diagram illustrating a projection picture whose format type is cross-cubic.

[0054] FIG. 25 is a diagram for explaining the specification of a reference point RP (x, y) and the specification of a cut-out position “Conformance_window”.

[0055] FIG. 26 is a diagram illustrating an example in which six views (regions) of cross-cubic, namely, “top”, “front”, “bottom”, “right”, “back”, and “left” are divided into four partitions and transferred in four MP4 streams.

[0056] FIG. 27 is a diagram illustrating a structure example of rendering metadata in partitioned cross-cubic.

[0057] FIG. 28 is a diagram illustrating another structure example of rendering metadata when the format type is partitioned cross-cubic.

[0058] FIG. 29 is a diagram illustrating the contents of primary information in the structure examples illustrated in FIGS. 27 and 28.

[0059] FIG. 30 is a diagram illustrating a description example of an MPD file when the format type is partitioned cross-cubic.

[0060] FIG. 31 is a diagram schematically illustrating MP4 streams (tracks) corresponding to four partitions.

[0061] FIG. 32 is a diagram illustrating a configuration example of a transport stream.

[0062] FIG. 33 is a diagram illustrating a configuration example of an MMT stream.

[0063] FIG. 34 is a diagram illustrating another configuration example of the transmission/reception system.

[0064] FIG. 35 is a diagram illustrating a structure example of a high-definition multimedia interface (HDMI) info frame for rendering metadata.

[0065] FIG. 36 is a diagram illustrating a structure example of the HDMI info frame for rendering metadata.

MODE FOR CARRYING OUT THE INVENTION

[0066] Modes for carrying out the invention (hereinafter, referred to as “embodiments”) will be described below. Note that the description will be given in the following order.

[0067] 1.* Embodiments*

[0068] 2.* Modifications*

1.* Embodiments*

[0069] [Outline of MPEG-DASH-based Stream Distribution System]

[0070] First, an outline of an MPEG-DASH-based stream distribution system to which the present technology can be applied will be described.

[0071] FIG. 1(a) illustrates a configuration example of a MPEG-DASH-based stream distribution system 30A. In this configuration example, a media stream and a media presentation description (MPD) file are transmitted through a communication network transfer path (communication transfer path). This stream distribution system 30A has a configuration in which N service receivers 33-1, 33-2, … , 33-N are connected to a DASH stream file server 31 and a DASH MPD server 32 via a content delivery network (CDN) 34.

[0072] The DASH stream file server 31 generates a stream segment meeting the DASH specifications (hereinafter referred to as “DASH segment” as appropriate) on the basis of media data (video data, audio data, caption data, and the like) of a predetermined piece of content, and sends out a segment according to an HTTP request from the service receiver. This DASH stream file server 31 may be a dedicated streaming server or is sometimes performed by a web server.

[0073] Furthermore, in response to a request for a segment of a predetermined stream sent from the service receiver 33 (33-1, 33-2, … , 33-N) via the CDN 34, the DASH stream file server 31 transmits the requested segment of the stream to the requesting receiver via the CDN 34. In this case, the service receiver 33 refers to the rate values described in the media presentation description (MPD) file to select a stream with the optimum rate according to the state of the network environment in which the client is located, and makes a request.

[0074] The DASH MPD server 32 is a server that generates an MPD file for acquiring a DASH segment generated in the DASH stream file server 31. The MPD file is generated in accordance with content metadata from a content management server (not illustrated) and the address (url) of a segment generated in the DASH stream file server 31. Note that the DASH stream file server 31 and the DASH MPD server 32 may be physically the same server.

[0075] In the MPD format, each attribute is described using an element called “Representation” for every single stream such as the video stream and the audio stream. For example, in the MPD file, for each of a plurality of video data streams having different rates, respective rates are described using its individual representations. The service receiver 33 can select an optimum stream according to the state of the network environment in which the service receiver 33 is located, with reference to the value of each rate, as described above.

[0076] FIG. 1(b) illustrates a configuration example of a MPEG-DASH-based stream distribution system 30B. In this configuration example, the media stream and the MPD file are transmitted through a radio frequency (RF) transfer path (broadcast transfer path). This stream distribution system 30B is constituted by a broadcast sending system 36 to which the DASH stream file server 31 and the DASH MPD server 32 are connected, and M service receivers 35-1, 35-2, … , 35-M.

[0077] In the case of this stream distribution system 30B, the broadcast sending system 36 carries a stream segment meeting the DASH specifications (DASH segment) generated by the DASH stream file server 31 and the MPD file generated by the DASH MPD server 32 on the broadcast wave to transmit.

[0078] FIG. 2 illustrates an example of a relationship between respective structures arranged hierarchically in the MPD file. As illustrated in FIG. 2(a), a media presentation (Media Presentation) for the whole MPD file contains a plurality of periods (Periods) separated by time intervals. For example, the first period starts from zero seconds, the next period starts from 100 seconds, and so forth.

[0079] As illustrated in FIG. 2(b), the period contains a plurality of adaptation sets (AdaptationSet). Each adaptation set depends on variations in media types such as video and audio, and variations in language, viewpoints, and the like even with the same media type. As illustrated in FIG. 2(c), the adaptation set contains a plurality of representations (Representations). Each representation depends on stream attributes, such as variations in rates, for example.

[0080] As illustrated in FIG. 2(d), the representation includes segment info (SegmentInfo). This segment info contains, as illustrated in FIG. 2(e), an initialization segment (Initialization Segment), and a plurality of media segments (Media Segments) that describe information on each segment (Segment) obtained by further separating the period. The media segment contains address (url) information and the like for actually acquiring segment data of video, audio, and the like.

[0081] Note that stream switching can be freely performed between a plurality of representations included in the adaptation set. With this configuration, a stream with the optimum rate can be selected according to the state of the network environment at the receiving side, and video distribution without interruption is enabled.

[0082] [Configuration Example of Transmission/Reception System]

[0083] FIG. 3 illustrates a configuration example of a transmission/reception system 10 serving as an embodiment. This transmission/reception system 10 is constituted by a service transmission system 100 and a service receiver 200. In this transmission/reception system 10, the service transmission system 100 corresponds to the DASH stream file server 31 and the DASH MPD server 32 of the stream distribution system 30A illustrated in FIG. 1(a) described above. Furthermore, in this transmission/reception system 10, the service transmission system 100 corresponds to the DASH stream file server 31, the DASH MPD server 32, and the broadcast sending system 36 of the stream distribution system 30B illustrated in FIG. 1(b) described above.

[0084] In addition, in the transmission/reception system 10, the service receiver 200 corresponds to the service receiver 33 (33-1, 33-2, … , 33-N) of the stream distribution system 30A illustrated in FIG. 1(a) described above. Likewise, in this transmission/reception system 10, the service receiver 200 corresponds to the service receiver 35 (35-1, 35-2, … , 35-M) of the stream distribution system 30B illustrated in FIG. 1(b) described above.

[0085] The service transmission system 100 transmits DASH/MP4, that is, the MPD file as a metafile, and MP4 (ISOBMFF) including a media stream (media segment) of video, audio, or the like, through the communication network transfer path (see FIG. 1(a)) or the RF transfer path (see FIG. 1(b)).

[0086] In this embodiment, a video stream obtained by encoding image data of a rectangular projection picture is included as the media stream. The projection picture is obtained by cutting off a part or the whole of a spherical capture image and performing in-plane packing on the cut-off spherical capture image.

[0087] Meta information for rendering the projection picture is inserted in a layer of a container and/or the video stream. By inserting the meta information for rendering into the layer of the video stream, the meta information for rendering can be dynamically changed regardless of the container type.

[0088] Furthermore, the center of a cut-out position indicated by cut-out position information inserted in a layer of the video stream is adjusted to coincide with the reference point of the projection picture indicated by the meta information for rendering. For example, the projection picture is made up of a plurality of regions including a default region whose position is centered on the reference point, and a position indicated by the cut-out position information is adjusted to coincide with the position of the default region.

[0089] The meta information for rendering has information for calculating the reference point. In addition, the meta information for rendering also has backward compatibility information. This backward compatibility information indicates that the center of the cut-out position indicated by the cut-out position information inserted in a layer of the video stream coincides with the reference point of the projection picture indicated by the meta information for rendering.

[0090] In the MPD file, identification information indicating the fact that the meta information for rendering is inserted in a layer of the container and/or the video stream, and the backward compatibility information, as well as format type information on the projection picture are inserted.

……
……
……

更多阅读推荐......