Sony Patent | Information processing apparatus, information processing method, and program

编辑：映维 | 分类：Sony | 2022年2月17日

Patent: Information processing apparatus, information processing method, and program

Drawings: Click to check drawins

Publication Number: 20220053179

Publication Date: 20220217

Applicant: Sony

Sony Patent | Information processing apparatus, information processing method, and program

Abstract

An information processing apparatus according to an embodiment of the present technology includes a processor. The processor switches between display of a first real space image and display of a second real space image by performing switching processing on the basis of metadata related to the switching between the display of the first real space image and the display of the second real space image, the switching processing corresponding to an angle of view of the first real space image, the first real space image being displayed on a virtual space, the second real space image being displayed on a region including a region, in the virtual space, on which the first real space image is displayed, the region on which the second real space image is displayed being larger than the region on which the first real space image is displayed.

Claims

An information processing apparatus, comprising a processor that switches between display of a first real space image and display of a second real space image by performing switching processing on a basis of metadata related to the switching between the display of the first real space image and the display of the second real space image, the switching processing corresponding to an angle of view of the first real space image, the first real space image being displayed on a virtual space, the second real space image being displayed on a region including a region, in the virtual space, on which the first real space image is displayed, the region on which the second real space image is displayed being larger than the region on which the first real space image is displayed.
The information processing apparatus according to claim 1, wherein on the basis of the metadata, the processor determines whether the time has come to perform the switching processing, and the processor performs the switching processing when the time has come to perform the switching processing.
The information processing apparatus according to claim 1, wherein on the basis of the metadata, the processor determines whether a switching condition for performing the switching processing is satisfied, and the processor performs the switching processing when the switching condition is satisfied.
The information processing apparatus according to claim 3, wherein the switching condition includes a condition that a difference in image-capturing position between the first real space image and the second real space image is equal to or less than a specified threshold.
The information processing apparatus according to claim 3, wherein the switching condition includes a condition that a difference in image-capturing time between the first real space image and the second real space image is equal to or less than a specified threshold.
The information processing apparatus according to claim 1, wherein the switching processing includes generating a restriction image in which display on a range other than a corresponding range in the second real space image is restricted, the corresponding range corresponding to the angle of view of the first real space image, and switching between the display of the first real space image and display of the restriction image.
The information processing apparatus according to claim 6, wherein the switching processing includes changing a size of the first real space image such that the first real space image has a size of the corresponding range in the second real space image, and then switching between the display of the first real space image and the display of the restriction image.
The information processing apparatus according to claim 6, wherein the switching processing includes generating the restriction image such that display content displayed on the corresponding range in the restriction image and display content of the first real space image are the same display content.
The information processing apparatus according to claim 1, wherein the first real space image is an image captured from a specified image-capturing position in a real space.
The information processing apparatus according to claim 1, wherein the second real space image is an image obtained by combining a plurality of images captured from a specified image-capturing position in a real space.
The information processing apparatus according to claim 1, wherein the second real space image is a full 360-degree spherical image.
The information processing apparatus according to claim 1, wherein the first real space image is a moving image including a plurality of frame images, and the processor switches between display of a specified frame image from among the plurality of frame images of the first real space image and the display of the second real space image.
The information processing apparatus according to claim 12, wherein the second real space image is a moving image including a plurality of frame images, and the processor switches between the display of the specified frame image of the first real space image and display of a specified frame image from among the plurality of frame images of the second real space image.
The information processing apparatus according to claim 1, wherein the metadata includes information regarding the angle of view of the first real space image.
The information processing apparatus according to claim 1, wherein the metadata includes first image-capturing information including an image-capturing position of the first real space image, and second image-capturing information including an image-capturing position of the second real space image.
The information processing apparatus according to claim 15, wherein the first image-capturing information includes an image-capturing direction and an image-capturing time of the first real space image, and the second image-capturing information includes an image-capturing time of the second real space image.
The information processing apparatus according to claim 1, wherein the metadata includes information regarding a timing of performing switching processing.
The information processing apparatus according to claim 1, wherein the processor controls the display of the first real space image and the display of the second real space image on a head-mounted display (HMD).
An information processing method that is performed by a computer system, the information processing method comprising switching between display of a first real space image and display of a second real space image by performing switching processing on a basis of metadata related to the switching between the display of the first real space image and the display of the second real space image, the switching processing corresponding to an angle of view of the first real space image, the first real space image being displayed on a virtual space, the second real space image being displayed on a region including a region, in the virtual space, on which the first real space image is displayed, the region on which the second real space image is displayed being larger than the region on which the first real space image is displayed.
A program that causes a computer system to perform a process comprising switching between display of a first real space image and display of a second real space image by performing switching processing on a basis of metadata related to the switching between the display of the first real space image and the display of the second real space image, the switching processing corresponding to an angle of view of the first real space image, the first real space image being displayed on a virtual space, the second real space image being displayed on a region including a region, in the virtual space, on which the first real space image is displayed, the region on which the second real space image is displayed being larger than the region on which the first real space image is displayed.

Description

TECHNICAL FIELD

[0001] The present technology relates to an information processing apparatus, an information processing method, and a program that are applicable to display of, for example, a full 360-degree spherical video.

BACKGROUND ART

[0002] Patent Literature 1 discloses an image processing apparatus in which, when a captured panoramic image is created, another captured image such as a moving image or a high-resolution image is attached to the captured panoramic image to be integrated with the captured panoramic image. This makes it possible to create a panoramic image that provides a greater sense of realism and a greater sense of immersion without imposing an excessive burden on a user (for example, paragraph [0075] of the specification in Patent Literature 1).

CITATION LIST

Patent Literature

[0003] Patent Literature 1: Japanese Patent Application Laid-open No. 2018-11302

DISCLOSURE OF INVENTION

Technical Problem

[0004] There is a need for a technology that can provide a high-quality viewing experience in, for example, a system that enables viewing of a panoramic video, a full 360-degree spherical video, and the like using, for example, a head-mounted display (HMD).

[0005] In view of the circumstances described above, it is an object of the present technology to provide an information processing apparatus, an information processing method, and a program that are capable of providing a high-quality viewing experience.

Solution to Problem

[0006] In order to achieve the object described above, an information processing apparatus according to an embodiment of the present technology includes a processor.

[0007] The processor switches between display of a first real space image and display of a second real space image by performing switching processing on the basis of metadata related to the switching between the display of the first real space image and the display of the second real space image, the switching processing corresponding to an angle of view of the first real space image, the first real space image being displayed on a virtual space, the second real space image being displayed on a region including a region, in the virtual space, on which the first real space image is displayed, the region on which the second real space image is displayed being larger than the region on which the first real space image is displayed.

[0008] In this information processing apparatus, switching processing corresponding to an angle of view of the first real space image is performed on the basis of metadata related to display switching, and switching is performed between display of the first real space image and display of the second real space image. This makes it possible to provide a high-quality viewing experience.

[0009] The processor may determine, on the basis of the metadata, whether the time has come to perform the switching processing, and the processor may perform the switching processing when the time has come to perform the switching processing.

[0010] The processor may determine, on the basis of the metadata, whether a switching condition for performing the switching processing is satisfied, and the processor may perform the switching processing when the switching condition is satisfied.

[0011] The switching condition may include a condition that a difference in image-capturing position between the first real space image and the second real space image is equal to or less than a specified threshold.

[0012] The switching condition may include a condition that a difference in image-capturing time between the first real space image and the second real space image is equal to or less than a specified threshold.

[0013] The switching processing may include generating a restriction image in which display on a range other than a corresponding range in the second real space image is restricted, the corresponding range corresponding to the angle of view of the first real space image; and switching between the display of the first real space image and display of the restriction image.

[0014] The switching processing may include changing a size of the first real space image such that the first real space image has a size of the corresponding range in the second real space image, and then switching between the display of the first real space image and the display of the restriction image.

[0015] The switching processing may include generating the restriction image such that display content displayed on the corresponding range in the restriction image and display content of the first real space image are the same display content.

[0016] The first real space image may be an image captured from a specified image-capturing position in a real space.

[0017] The second real space image may be an image obtained by combining a plurality of images captured from a specified image-capturing position in a real space.

[0018] The second real space image may be a full 360-degree spherical image.

[0019] The first real space image may be a moving image including a plurality of frame images. In this case, the processor may switch between display of a specified frame image from among the plurality of frame images of the first real space image and the display of the second real space image.

[0020] The second real space image may be a moving image including a plurality of frame images. In this case, the processor may switch between the display of the specified frame image of the first real space image and display of a specified frame image from among the plurality of frame images of the second real space image.

[0021] The metadata may include information regarding the angle of view of the first real space image.

[0022] The metadata may include first image-capturing information including an image-capturing position of the first real space image, and second image-capturing information including an image-capturing position of the second real space image.

[0023] The first image-capturing information may include an image-capturing direction and an image-capturing time of the first real space image. In this case, the second image-capturing information may include an image-capturing time of the second real space image.

[0024] The metadata may include information regarding a timing of performing switching processing.

[0025] The processor may control the display of the first real space image and the display of the second real space image on a head-mounted display (HMD).

[0026] An information processing method according to an embodiment of the present technology is an information processing method that is performed by a computer system, the information processing method including switching between display of a first real space image and display of a second real space image by performing switching processing on the basis of metadata related to the switching between the display of the first real space image and the display of the second real space image, the switching processing corresponding to an angle of view of the first real space image, the first real space image being displayed on a virtual space, the second real space image being displayed on a region including a region, in the virtual space, on which the first real space image is displayed, the region on which the second real space image is displayed being larger than the region on which the first real space image is displayed.

[0027] A program according to an embodiment of the present technology causes a computer system to perform a process including switching between display of a first real space image and display of a second real space image by performing switching processing on the basis of metadata related to the switching between the display of the first real space image and the display of the second real space image, the switching processing corresponding to an angle of view of the first real space image, the first real space image being displayed on a virtual space, the second real space image being displayed on a region including a region, in the virtual space, on which the first real space image is displayed, the region on which the second real space image is displayed being larger than the region on which the first real space image is displayed. Advantageous Effects of Invention

[0028] As described above, the present technology makes it possible to provide a high-quality viewing experience. Note that the effect described here is not necessarily limitative, and any of the effects described in the present disclosure may be provided.

BRIEF DESCRIPTION OF DRAWINGS

[0029] FIG. 1 schematically illustrates an example of a configuration of a VR providing system according to an embodiment of the present technology.

[0030] FIG. 2 illustrates an example of a configuration of an HMD.

[0031] FIG. 3 is a block diagram illustrating an example of a functional configuration of the HMD.

[0032] FIG. 4 is a block diagram illustrating an example of a functional configuration of a server apparatus.

[0033] FIG. 5 is a schematic diagram for describing planar video data.

[0034] FIG. 6 schematically illustrates a planar video displayed by the HMD.

[0035] FIG. 7 is a schematic diagram for describing full 360-degree spherical video data.

[0036] FIG. 8 schematically illustrates a full 360-degree spherical video displayed by the HMD.

[0037] FIG. 9 illustrates an example of metadata.

[0038] FIG. 10 illustrates an example of the metadata.

[0039] FIG. 11 illustrates an example of the metadata.

[0040] FIG. 12 is a flowchart illustrating an example of processing of display switching from the full 360-degree spherical video to the planar video.

[0041] FIG. 13 is a flowchart illustrating an example of processing of display switching from the planar video to the full 360-degree spherical video.

[0042] FIG. 14 is a schematic diagram for describing an example of controlling the full 360-degree spherical video.

[0043] FIG. 15 is a schematic diagram for describing an example of controlling the planar video.

[0044] FIG. 16 schematically illustrates an example of how a video looks to a user when display switching processing is performed.

[0045] FIG. 17 schematically illustrates an example of a transition image.

[0046] FIG. 18 schematically illustrates an example of how a video looks to a user when display switching processing is performed.

[0047] FIG. 19 is a block diagram illustrating an example of a configuration of hardware of the server apparatus.

MODE(S)* FOR CARRYING OUT THE INVENTION*

[0048] Embodiments according to the present technology will now be described below with reference to the drawings.

[0049] [Virtual Reality (VR) Providing System]

[0050] FIG. 1 schematically illustrates an example of a configuration of a VR providing system according to an embodiment of the present technology. A VR providing system 100 corresponds to an embodiment of an information processing system according to the present technology.

[0051] The VR providing system 100 includes an HMD 10 and a server apparatus 50.

[0052] The HMD 10 is used by being attached to the head of a user 1. The number of HMDs 10 included in the VR providing system 100 is not limited, although a single HMD 10 is illustrated in FIG. 1. In other words, the number of users 1 allowed to simultaneously participate in the VR providing system 100 is not limited.

[0053] The server apparatus 50 is communicatively connected to the HMD 10 through a network 3. The server apparatus 50 is capable of receiving various information from the HMD 10 through the network 3. Further, the server apparatus 50 is capable of storing various information in a database 60, and is capable of reading various information stored in the database 60 to transmit the read information to the HMD 10.

[0054] In the present embodiment, the database 60 stores therein full 360-degree spherical video data 61, planar video data 62, and metadata 63 (all of which are illustrated in FIG. 4). In the present embodiment, the server apparatus 50 transmits, to the HMD 10, content that includes display of a full 360-degree spherical video and display of a planar video. Further, the server apparatus 50 controls display of the full 360-degree spherical video and display of the planar video on the HMD 10. The server apparatus 50 serves as an embodiment of an information processing apparatus according to the present technology.

[0055] Note that, in the present disclosure, an “image” includes both a still image and a moving image. Further, the video is a concept included in a moving image. Thus, the “image” includes the video.

[0056] The network 3 is built using, for example, the Internet or a wide area communication network. Moreover, any wide area network (WAN), any local area network (LAN), or the like may be used, and the protocol used to build the network 3 is not limited.

[0057] In the present embodiment, so-called cloud services are provided by the network 3, the server apparatus 50, and the database 60. Thus, the HMD 10 is also considered to be connected to a cloud network.

[0058] Note that the method for communicatively connecting the server apparatus 50 and the HMD 10 is not limited. For example, the server apparatus 50 and the HMD 10 may be connected using near field communication such as Bluetooth (registered trademark) without building a cloud network.

[0059] [HMD]

[0060] FIG. 2 illustrates an example of a configuration of the HMD 10. A of FIG. 2 is a schematic perspective view of an appearance of the HMD 10, and B of FIG. 2 is a schematic exploded perspective view of the HMD 10.

[0061] The HMD 10 includes a base 11, an attachment band 12, a headphone 13, a display unit 14, an inward-oriented camera 15 (15a, 15b), an outward-oriented camera 16, and a cover 17.

[0062] The base 11 is a member arranged in front of the right and left eyes of the user 1, and the base 11 is provided with a front-of-head support 18 that is brought into contact with the front of the head of the user 1.

[0063] The attachment band 12 is attached to the head of the user 1. As illustrated in FIG. 2, the attachment band 12 includes a side-of-head band 19 and a top-of-head band 20. The side-of-head band 19 is connected to the base 11, and is attached to surround the head of the user 1 from the side to the back of the head. The top-of-head band 20 is connected to the side-of-head band 19, and is attached to surround the head of the user 1 from the side to the top of the head.

[0064] The headphone 13 is connected to the base 11 and arranged to cover the right and left ears of the user 1. The headphone 13 includes right and left speakers. The position of the headphone 13 is manually or automatically controllable. The configuration for that is not limited, and any configuration may be adopted.

[0065] The display unit 14 is inserted into the base 11 and arranged in front of the eyes of the user 1. A display 22 (refer to FIG. 3) is arranged within the display unit 14. Any display device using, for example, a liquid crystal or an electroluminescence (EL) may be used as the display 22. Further, a lens system (of which an illustration is omitted) that guides an image displayed using the display 22 to the right and left eyes of the user 1 is arranged in the display unit 14.

[0066] The inward-oriented camera 15 includes a left-eye camera 15a and a right-eye camera 15b that are respectively capable of capturing images of the left eye and the right eye of the user 1. The left-eye camera 15a and the right-eye camera 15b are respectively arranged in specified positions in the HMD 10, specifically, in specified positions in the base 11. For example, it is possible to detect, for example, line-of-sight information regarding a line of sight of the user 1 on the basis of the images of the left eye and the right eye that are respectively captured by the left-eye camera 15a and the right-eye camera 15b.

[0067] A digital camera that includes, for example, an image sensor such as a complementary metal-oxide semiconductor (CMOS) sensor or a charge coupled device (CCD) sensor is used as the left-eye camera 15a and the right-eye camera 15b. Further, for example, an infrared camera that includes an infrared illumination such as an infrared LED may be used.

[0068] The outward-oriented camera 16 is arranged in a center portion of the cover 17 to be oriented outward (toward the side opposite to the user 1). The outward-oriented camera 16 is capable of capturing an image of a real space on a front side of the user 1. A digital camera that includes, for example, an image sensor such as a CMOS sensor or a CCD sensor is used as the outward-oriented camera 16.

[0069] The cover 17 is mounted on the base 11, and is configured to cover the display unit 14. The HMD 10 having such a configuration serves as an immersive head-mounted display configured to cover the field of view of the user 1. For example, a three-dimensional virtual space is displayed by the HMD 10. When the user wears the HMD 10, this results in providing, for example, a virtual reality (VR) experience to the user 1.

[0070] FIG. 3 is a block diagram illustrating an example of a functional configuration of the HMD 10. The HMD 10 further includes a connector 23, an operation button 24, a communication section 25, a sensor section 26, a storage 27, and a controller 28.

[0071] The connector 23 is a terminal used to establish a connection with another device. For example, a terminal such as a universal serial bus (USB) and a high-definition multimedia interface (HDMI) (registered trademark) is provided. Further, upon charging, a charging terminal of a charging dock (cradle) and the connector 23 are connected to perform charging.

[0072] The operation button 24 is provided at, for example, a specified position in the base 11. The operation button 24 makes it possible to perform an ON/OFF operation of a power supply, and an operation related to various functions of the HMD 10, such as a function related to display of an image and output of sound, and a function of a network communication.

[0073] The communication section 25 is a module used to perform network communication, near-field communication, or the like with another device. For example, a wireless LAN module such as Wi-Fi, or a communication module such as Bluetooth is provided. When the communication section 25 is operated, this makes it possible to perform wireless communication with the server apparatus 50.

[0074] The sensor section 26 includes a nine-axis sensor 29, a GPS 30, a biological sensor 31, and a microphone 32.

[0075] The nine-axis sensor 29 includes a three-axis acceleration sensor, a three-axis gyroscope, and a three-axis compass sensor. The nine-axis sensor 29 makes it possible to detect acceleration, angular velocity, and azimuth of the HMD 10 in three axes. The GPS 30 acquires information regarding the current position of the HMD 10. Results of detection performed by the nine-axis sensor 29 and the GPS 30 are used to detect, for example, the pose and the position of the user 1 (the HMD 10), and the movement (motion) of the user 1. These sensors are provided at, for example, specified positions in the base 11.

[0076] The biological sensor 31 is capable of detecting biological information regarding the user 1. For example, a brain wave sensor, a myoelectric sensor, a pulse sensor, a perspiration sensor, a temperature sensor, a blood flow sensor, a body motion sensor, and the like are provided as the biological sensor 31.

[0077] The microphone 32 detects information regarding sound around the user 1. For example, a voice from speech of the user is detected as appropriate. This enables the user 1 to, for example, enjoy VR experience while making a voice call and perform input of an operation of the HMD 10 using voice input.

[0078] The type of sensor provided as the sensor section 26 is not limited, and any sensor may be provided. For example, a temperature sensor, a humidity sensor, or the like that is capable of measuring a temperature, humidity, or the like of the environment in which the HMD 10 is used may be provided. The inward-oriented camera 15 and the outward-oriented camera 16 can also be considered a portion of the sensor section 26.

[0079] The storage 27 is a nonvolatile storage device, and, for example, a hard disk drive (HDD), a solid state drive (SSD), or the like is used. Moreover, any non-transitory computer readable storage medium may be used.

[0080] The storage 27 stores therein a control program 33 used to control an operation of the overall HMD 10. The method for installing the control program 33 on the HMD 10 is not limited.

[0081] The controller 28 controls operations of the respective blocks of the HMD 10. The controller 28 is configured by hardware, such as a CPU and a memory (a RAM and a ROM), that is necessary for a computer. Various processes are performed by the CPU loading, into the RAM, the control program 33 stored in the storage 27 and executing the control program 33.

[0082] For example, a programmable logic device (PLD) such as a field programmable gate array (FPGA), or other devices such as an application specific integrated circuit (ASIC) may be used as the controller 28.

[0083] In the present embodiment, a tracking section 35, a display control section 36, and an instruction determination section 37 are implemented as functional blocks by the CPU of the controller 28 executing a program (such as an application program) according to the present embodiment. Then, the information processing method according to the present embodiment is performed by these functional blocks. Note that, in order to implement each functional block, dedicated hardware such as an integrated circuit (IC) may be used as appropriate.

[0084] The tracking section 35 performs head tracking for detecting the movement of the head of the user 1, and eye tracking for detecting a side-to-side movement of a line of sight of the user 1. In other words, the tracking section 35 makes it possible to detect in which direction the HMD 10 is oriented and in which direction the line of sight of the user 1 is oriented. Data of tracking detected by the tracking section 35 is included in information regarding a pose of the user 1 (the HMD 10) and information regarding a line of sight of the user 1 (the HMD 10).

[0085] The head tracking and the eye tracking are calculated on the basis of a result of detection performed by the sensor section 26 and images captured by the inward-oriented camera 15 and the outward-oriented camera 16. The algorithm used to perform the head tracking and the eye tracking is not limited, and any algorithm may be used. Any machine-learning algorithm using, for example, a deep neural network (DNN) may be used. For example, it is possible to improve the tracking accuracy by using, for example, artificial intelligence (AI) that performs deep learning.

[0086] The display control section 36 controls an image display performed using the display unit 14 (the display 22). The display control section 36 performs, for example, image processing and a display control as appropriate. In the present embodiment, rendering data used to display an image on the display 22 is transmitted to the HMD 10 by the server apparatus 50. The display control section 36 performs image processing and a display control on the basis of the rendering data transmitted by the server apparatus 50, and displays the image on the display 22.

[0087] The instruction determination section 37 determines an instruction that is input by the user 1. For example, the instruction determination section 37 determines the instruction of the user 1 on the basis of an operation signal generated in response to an operation performed on the operation button 24. Further, the instruction determination section 37 determines the instruction of the user 1 on the basis of a voice of the user 1 that is input through the microphone 32.

[0088] Further, for example, the instruction determination section 37 determines the instruction of the user 1 on the basis of a gesture that is given using the hand or the like of the user 1 and of which an image is captured by the outward-oriented camera 16. Furthermore, it is also possible to determine the instruction of the user 1 on the basis of the movement of a line of sight of the user 1. Of course, the determination of the instruction is not limited to being performed when it is possible to perform all of voice input, gesture input, and input using the movement of a line of sight. Moreover, another method for inputting an instruction may also be performed.

[0089] A specific algorithm used to determine an instruction input by the user 1 is not limited, and any technique may be used. Further, any machine-learning algorithm may also be used.

[0090] [Server Apparatus]

[0091] FIG. 4 is a block diagram illustrating an example of a functional configuration of the server apparatus 50.

[0092] The server apparatus 50 includes hardware, such as a CPU, a ROM, a RAM, and an HDD, that is necessary for a configuration of a computer (refer to FIG. 19). A decoder 51, a meta-parser 52, a user interface 53, a switching timing determination section 54, a parallax determination section 55, a switching determination section 56, a section 57 for controlling a full 360-degree spherical video, a planar video control section 58, and a rendering section 59 are implemented as functional blocks by the CPU loading, into the RAM, a program according to the present technology that has been recorded in the ROM or the like and executing the program, and this results in the information processing method according to the present technology being performed.

[0093] The server apparatus 50 can be implemented by any computer such as a personal computer (PC). Of course, hardware such as an FPGA or an ASIC may be used. In order to implement each block illustrated in FIG. 4, dedicated hardware such as an integrated circuit (IC) may be used.

[0094] The program is installed on the server apparatus 50 through, for example, various recording media. Alternatively, the installation of the program may be performed via, for example, the Internet.

[0095] The decoder 51 decodes the full 360-degree spherical video data 61 and the planar video data 62 that are read from the database 60. The decoded full 360-degree spherical video data 61 is output to the section 57 for controlling a full 360-degree spherical video. The decoded planar video data 62 is output to the planar video control section 58. Note that encode/decode formats and the like for image data are not limited.

[0096] The meta-parser 52 reads metadata 63 from the database 60 and outputs the read metadata 63 to the switching timing determination section 54 and the parallax determination section 55. The metadata 63 is metadata related to switching between display of a full 360-degree spherical video and display of a planar video, and will be described in detail later.

[0097] The user interface 53 receives tracking data transmitted from the HMD 10 and an instruction input by the user 1. The received tracking data and input instruction are output as appropriate to the switching determination section 56 and the planar video control section 58.

[0098] The switching timing determination section 54, the parallax determination section 55, the switching determination section 56, the section 57 for controlling a full 360-degree spherical video, the planar video control section 58, and the rendering section 59 are blocks used to perform display switching processing according to the present technology. The display switching processing according to the present technology is processing performed to switch between display of a full 360-degree spherical video (a full 360-degree spherical image) and display of a planar video (a planar image), and corresponds to switching processing.

[0099] In the present embodiment, an embodiment of a processor according to the present technology is implemented by functions of the switching timing determination section 54, the parallax determination section 55, the switching determination section 56, the section 57 for controlling a full 360-degree spherical video, the planar video control section 58, and the rendering section 59. Thus, it can also be said that an embodiment of the processor according to the present technology is implemented by hardware, such as a CPU, that configures a computer. The respective blocks that are the switching timing determination section 54 and the others will be described together with the display switching processing described later.

[0100] Note that the server apparatus 50 includes a communication section (refer to FIG. 19) used to perform network communication, near-field communication, or the like with another device. When the communication section is operated, this makes it possible to perform wireless communication with the HMD 10.

[0101] [Planar Video]

[0102] FIG. 5 is a schematic diagram for describing planar video data. The planar video data 62 is data of a moving image that includes a plurality of frame images 64.

[0103] An image (a video) and image data (video data) may be interchangeably described below. For example, when those are denoted by reference numerals to be described, a planar video 62 may be described using the same reference numeral as the planar video data 62.

[0104] In the present embodiment, a moving image is captured from a specified image-capturing position in a specified real space in order to create desired VR content. In other words, in the present embodiment, the planar video 62 is generated using a real space image that is an image of a real space. Further, in the present embodiment, the planar video 62 corresponds to a rectangle-shaped video of a real space that is captured using perspective projection.

[0105] The specified real space is a real space that is selected to obtain a virtual space, and any place such as indoor places including, for example, a stadium and a concert hall, and outdoor places including, for example, a mountain and a river, may be selected. The image-capturing position is also selected as appropriate. For example, any image-capturing position such as an entrance of a stadium, a specified auditorium, an entrance of a mountain trail, and a top of a mountain, may be selected.

[0106] In the present embodiment, the rectangular frame image 64 is generated by performing image-capturing at a specified aspect ratio and a specified resolution. The plurality of frame images 64 is captured at a specified frame rate to generate the planar video 62. The frame image 64 of the planar video 62 is hereinafter referred to as a planar frame image 64.

[0107] For example, a full HD image with 1920 pixels in width and 1080 pixels in height that has an aspect ratio of 16:9, is captured at 60 frames per second. Of course, the planar frame image 64 is not limited to this, and the aspect ratio, the resolution, the frame rate, and the like of the planar frame image 64 may be set discretionarily. Further, the shape of the planar video 62 (the planar frame image 64) is not limited to a rectangular shape. The present technology is also applicable to an image having another shape such as a circle or a triangle.

[0108] FIG. 6 schematically illustrates the planar video 62 displayed by the HMD 10. A of FIG. 6 illustrates the user 1 who is looking at the planar video 62 as viewed from the front (from the side of the planar video 62). B of FIG. 6 illustrates the user 1 who is looking at the planar video 62 as viewed from the diagonally rear of the user 1.

[0109] In the present embodiment, a space covering the complete 360 degrees circumference of the user 1 who is wearing the HMD 10, from back and forth, from side to side, and up and down, is a virtual space S represented by VR content. In other words, the user 1 is looking at a region in the virtual space S when the user 1 faces any direction around the user 1.

[0110] As illustrated in FIG. 6, the planar video 62 (the planar frame image 64) is displayed on the display 22 of the HMD 10. For the user 1 who is wearing the HMD 10, the planar video 62 is displayed on a region that is a portion of the virtual space S. The region, in the virtual space S, on which the planar video 62 is displayed is hereinafter referred to as a first display region R1.

[0111] For example, the planar video 62 is displayed on the front of the user 1. Thus, the position of the first display region R1 on which the planar video 62 is displayed can be changed according to, for example, the movement of the head of the user 1. Of course, it is also possible to adopt a display method that includes displaying the planar video 62 at a specified position in a fixed manner, which does not allow the user 1 to view the planar video 62 unless the user 1 looks in that direction.

[0112] Further, the size and the like of the planar video 62 can be changed by, for example, an instruction being given by the user 1. When the size of the planar video 62 is changed, the size of the first display region R1 is also changed. Note that, for example, in the virtual space S, a background image or the like is displayed on a region other than the first display region R1 on which the planar video 62 is displayed. The background image may be a homochromatic image such as a black or green image, or may be an image related to content. The background image may be generated using, for example, three-dimensional or two-dimensional CG.

[0113] In the present embodiment, the planar video 62 (the planar frame image 64) corresponds to a first real space image displayed on a virtual space. Further, the planar video 62 (the planar frame image 64) corresponds to an image captured from a specified image-capturing position in a real space. Note that the planar video 62 can also be referred to as an image having a specified shape. In the present embodiment, a rectangular shape is adopted as the specified shape, but another shape such as a circular shape may be adopted as the specified shape.

[0114] [Full 360-Degree Spherical Video]

[0115] FIG. 7 is a schematic diagram for describing full 360-degree spherical video data. In the present embodiment, a plurality of real space images 66 is captured from a specified image-capturing position in a specified real space. The plurality of real space images 66 is captured in different image-capturing directions from the same image-capturing position so as to cover a real space covering the complete 360 degrees circumference from back and forth, from side to side, and up and down. Further, the plurality of real space images 66 is captured such that the angles of view (the image-capturing ranges) of adjacent captured images overlap.

[0116] When the plurality of real space images 66 is combined on the basis of a specified format, this results in generating the full 360-degree spherical video data 61 illustrated in FIG. 7. In the present embodiment, the plurality of real space images 66 captured using perspective projection is combined on the basis of a specified format. Examples of a format used to generate the full 360-degree spherical video data 61 include equirectangular projection and a cubemap. Of course, the format is not limited to this, and any projection method or the like may be used. Note that FIG. 7 merely schematically illustrates the full 360-degree spherical video data 61.

[0117] FIG. 8 schematically illustrates the full 360-degree spherical video 61 displayed by the HMD 10. A of FIG. 8 illustrates the user 1 who is looking at the full 360-degree spherical video 61 as viewed from the front. B of FIG. 8 illustrates the user 1 who is looking at the full 360-degree spherical video 61 as viewed from the diagonally rear of the user 1.

[0118] In the present embodiment, the full 360-degree spherical video data 61 is attached to a sphere virtually arranged around the HMD 10 (the user 1). Thus, for the user 1 who is wearing the HMD 10, the full 360-degree spherical video 61 is displayed on an entire region of the virtual space S covering the complete 360 degrees circumference from back and forth, from side to side, and up and down. This results in being able to provide a considerably great sense of immersion into content, and to provide the user 1 with an excellent viewing experience.

[0119] The region, in the virtual space S, on which the full 360-degree spherical video 61 is displayed is referred to as a second display region R2. The second display region R2 is all of the region in the virtual space S around the user 1. Compared with the first display region R1 on which the planar video 62 illustrated in FIG. 6 is displayed, the second display region R2 is a region that includes the first display region R1 and is larger than the first display region R1.

[0120] FIG. 8 illustrates a display region 67 of the display 22. A range in the full 360-degree spherical video 61 that can be viewed by the user 1 is a range corresponding to the display region 67 of the display 22. The position of the display region 67 of the display 22 is changed according to, for example, the movement of the head of the user 1, and the viewable range in the full 360-degree spherical video 61 is changed. This enables the user 1 to view the full 360-degree spherical video 61 in all directions.

[0121] Note that, in FIG. 8, the display region 67 of the display 22 has a shape along an inner peripheral surface of a sphere. Actually, a rectangular image similar to the planar video 62 illustrated in FIG. 6 is displayed on the display 22. A visual effect of covering the surroundings of the user 1 is provided to the user 1.

[0122] In the present disclosure, a display region of an image in the virtual space S refers to a region, in the virtual space S, on which the image is to be displayed, and not a region corresponding to a range actually displayed by the display 22. Thus, the first display region R1 is a rectangular region corresponding to the planar video 62 in the virtual space S. The second display region R2 is an entire region of the virtual space S that corresponds to the full 360-degree spherical video 61 and covers the complete 360 degrees circumference from back and forth, from side to side, and up and down.

[0123] Further, in the present embodiment, moving images each including a plurality of frame images are captured as the plurality of real space images 66 illustrated in FIG. 7. Then, for example, the corresponding frame images are combined to generate the full 360-degree spherical video 61. Accordingly, in the present embodiment, it is possible to view the full 360-degree spherical video 61 in the form of a moving image.

[0124] For example, the plurality of real space images 66 (moving images) is simultaneously captured in all directions. Then, the corresponding frame images are combined to generate the full 360-degree spherical video 61. Without being limited thereto, another method may be used.

[0125] Full 360-degree spherical images (still images) that are included in the full 360-degree spherical video 61 in the form of a moving image and sequentially displayed along a time axis, are frame images of the full 360-degree spherical video 61. The frame rate and the like of the frame image of the full 360-degree spherical video is not limited, and may be set discretionarily. As illustrated in FIG. 7, the frame image of the full 360-degree spherical video 61 is referred to as a full 360-degree spherical frame image 68.

[0126] Note that the size of the full 360-degree spherical video 61 (the full 360-degree spherical frame image 68) as viewed from the user 1 remains unchanged. For example, the scale of the full 360-degree spherical video 61 (the scale of a virtually set sphere) is changed centering on the user 1. In this case, the distance between the user 1 and the full 360-degree spherical video 61 (the inner peripheral surface of the virtual sphere) is also changed according to the change in scale, and this results in the size of the full 360-degree spherical video 61 remaining unchanged.

[0127] In the present embodiment, the full 360-degree spherical video 61 corresponds to a second real space image displayed on a region that includes a region, in a virtual space, on which the first real space image is displayed, the region on which the second real space image is displayed being larger than the region on which the first real space image is displayed. Further, the full 360-degree spherical video 61 corresponds to an image obtained by combining a plurality of images captured from a specified image-capturing position in a real space. Note that the full 360-degree spherical video 61 can also be referred to as a virtual reality video.

[0128] FIGS. 9 to 11 illustrate examples of the metadata 63. The metadata 63 is metadata related to switching between display of the planar video 62 and display of the full 360-degree spherical video 61. As illustrated in, for example, FIG. 9, metadata 63a related to the planar video 62 is stored. In the example illustrated in FIG. 9, information indicated below is stored as the metadata 63a.

[0129] ID: identification information given for each planar frame image 64

[0130] Angle of view: angle of view of the planar frame image 64

[0131] Image-capturing position: image-capturing position of the planar frame image 64

[0132] Image-capturing direction: image-capturing direction of the planar frame image 64

[0133] Rotation (roll, pitch, yaw): rotation position (rotation angle) of the planar frame image 64

[0134] Image-capturing time: date and time upon capturing the planar frame image 64

[0135] Image-capturing environment: image-capturing environment upon capturing the planar frame image 64

[0136] For example, the angle of view of the planar frame image 64 is determined by, for example, the angle of view and the focal length of a lens of an image-capturing apparatus used to capture the planar frame image 64. The angle of view of the planar frame image 64 can also be considered a parameter corresponding to an image-capturing range of the planar frame image 64. Thus, information regarding an image-capturing range of the planar frame image 64 may be stored as the metadata 63a. In the present embodiment, the angle of view of the planar frame image 64 corresponds to information regarding an angle of view of the first real space image.

[0137] The image-capturing position, the image-capturing direction, and the rotation position of the planar frame image 64 are determined by, for example, a specified XYZ coordinate system defined in advance. For example, an XYZ coordinate value is stored as the image-capturing position. A direction of an image-capturing optical axis of an image-capturing apparatus used to capture the planar frame image 64 is stored as the image-capturing direction using the XYZ coordinate value based on the image-capturing position. For example, a pitch angle, a roll angle, and a yaw angle when an X-axis is a pitch axis, a Y-axis is a roll axis, and a Z-axis is a yaw axis are stored as the rotation position. Of course, the present technology is not limited to the case in which such data is generated.

[0138] The date and time when the planar frame image 64 is captured is stored as the image-capturing time. Examples of the image-capturing environment include weather upon capturing the planar frame image 64. The type of the metadata 63a related to the planar video 62 is not limited. Further, there is also no limitation on the fact that each piece of information is to be stored in the form of what type of data.

[0139] In the present embodiment, the metadata 63a related to the planar video 62 corresponds to first image-capturing information. Of course, other information may be stored as the first image-capturing information.

[0140] Further, as illustrated in FIG. 10, metadata 63b related to the full 360-degree spherical video 61 is stored. In the example illustrated in FIG. 10, information indicated below is stored as the metadata 63b.

[0141] ID: identification information given for each full 360-degree spherical frame image 68

[0142] Image-capturing position: image-capturing position of the full 360-degree spherical frame image 68

[0143] Image-capturing time: date and time upon capturing the full 360-degree spherical frame image 68

[0144] Image-capturing environment: image-capturing environment upon capturing the full 360-degree spherical frame image 68

[0145] Format: format for the full 360-degree spherical video 61

[0146] The image-capturing position of the full 360-degree spherical frame image 68 is generated on the basis of the respective image-capturing positions of the plurality of real space images 66 illustrated in FIG. 7. Typically, the plurality of real space images 66 is captured at the same image-capturing position. Thus, that image-capturing position is stored. For example, an average of the respective image-capturing positions or the like is stored when the real space images 66 of the plurality of real space images 66 are captured in a state of being slightly offset with respect to one another.

……
……
……

本文链接：https://patent.nweon.com/22025

Sony Patent | Information processing apparatus, information processing method, and program

您可能还喜欢...

分类

最新AR/VR行业分享

最新AR/VR论文

最新AR/VR行业招聘

Sony Patent | Information processing apparatus, information processing method, and program

您可能还喜欢...