Sony Patent | Information processing apparatus, information processing method, and computer readable medium

编辑：映维 | 分类：Sony | 2022年1月13日

Patent: Information processing apparatus, information processing method, and computer readable medium

Drawings: Click to check drawins

Publication Number: 20220012922

Publication Date: 20220113

Applicant: Sony

Sony Patent | Information processing apparatus, information processing method, and computer readable medium

Abstract

An information processing apparatus according to an embodiment of the present technology includes an acquisition unit, a motion detection unit, an area detection unit, and a display control unit. The acquisition unit acquires one or more captured images in which the actual space is captured. The motion detection unit detects a contact motion, which is a series of motions when a user contacts an actual object in the actual space. The area detection unit detects a target area including the actual object according to the detected contact motion. The display control unit that generates a virtual image of the actual object by extracting a partial image corresponding to the target area from the one or more captured images, and controls display of the virtual image according to the contact motion.

Claims

An information processing apparatus, comprising: an acquisition unit that acquires one or more captured images obtained by capturing an actual space; a motion detection unit that detects a contact motion, which is a series of motions when a user contacts an actual object in the actual space; an area detection unit that detects a target area including the actual object according to the detected contact motion; and a display control unit that generates a virtual image of the actual object by extracting a partial image corresponding to the target area from the one or more captured images, and controls display of the virtual image according to the contact motion.
The information processing apparatus according to claim 1, wherein the display control unit generates the virtual image representing the actual object not shielded by a shielding object.
The information processing apparatus according to claim 2, wherein the display control unit generates the partial image from the captured image that does not include the shielding object in the target area among the one or more captured images.
The information processing apparatus according to claim 1, wherein the display control unit superimposes and displays the virtual image on the actual object.
The information processing apparatus according to claim 1, wherein the acquisition unit acquires the one or more captured images from at least one of a capturing apparatus that captures the actual space and a database that stores an output of the capturing apparatus.
The information processing apparatus according to claim 5, wherein the contact motion includes a motion of bringing a hand of the user closer to the actual object, the motion detection unit determines whether or not a state of the contact motion is a pre-contact state in which a contact of the hand of the user with respect to the actual object is predicted, and the acquisition unit acquires the one or more captured images by controlling the capturing apparatus if the state of the contact motion is determined as the pre-contact state.
The information processing apparatus according to claim 6, wherein the acquisition unit increases a capturing resolution of the capturing apparatus if the state of the contact motion is determined as the pre-contact state.
The information processing apparatus according to claim 1, wherein the motion detection unit detects a contact position between the actual object and the hand of the user, and the area detection unit detects the target area on a basis of the detected contact position.
The information processing apparatus according to claim 8, wherein the area detection unit detects a boundary of the actual object including the contact position as the target area.
The information processing apparatus according to claim 9, further comprising: a line-of-sight detection unit that detects a line-of-sight direction of the user, wherein the area detection unit detects the boundary of the actual object on a basis of the line-of-sight direction of the user.
The information processing apparatus according to claim 10, wherein the line-of-sight detection unit detects a gaze position on a basis of the line-of-sight direction of the user, and the area detection unit detects the boundary of the actual object including the contact position and the gaze position as the target area.
The information processing apparatus according to claim 9, wherein the area detection unit detects the boundary of the actual object on a basis of at least one of a shadow, a size, and a shape of the actual object.
The information processing apparatus according to claim 1, wherein the motion detection unit detects a fingertip position of the hand of the user, and the area detection unit detects the target area on a basis of a trajectory of the fingertip position accompanying a movement of the fingertip position.
The information processing apparatus according to claim 1, wherein the display control unit superimposes and displays an area image representing the target area on the actual object.
The information processing apparatus according to claim 14, wherein the area image is displayed such that at least one of a shape, a size, and a position can be edited, and the area detection unit changes the target area on a basis of the edited area image.
The information processing apparatus according to claim 1, wherein the motion detection unit detects a contact position between the actual object and the hand of the user, and the display control unit controls the display of the virtual image according to the detected contact position.
The information processing apparatus according to claim 1, wherein the motion detection unit detects a gesture of the hand of the user contacting the actual object, and the display control unit controls a display of the virtual image according to the detected gesture of the hand of the user.
The information processing apparatus according to claim 1, wherein the virtual image is at least one of a two-dimensional image and a three-dimensional image of the actual object.
An information processing method comprising, executed by a computer system: acquiring one or more captured images obtained by capturing an actual space; detecting a contact motion, which is a series of motions when a user contacts an actual object in the actual space; detecting a target area including the actual object according to the detected contact motion; and generating a virtual image of the actual object by extracting a partial image corresponding to the target area from the one or more captured images, and controlling display of the virtual image according to the contact motion.
A computer readable medium with program stored thereon, the program causes a computer system to execute: a step of acquiring one or more captured images obtained by capturing an actual space; a step of detecting a contact motion, which is a series of motions when a user contacts an actual object in the actual space; a step of detecting a target area including the actual object according to the detected contact motion; and a step of generating a virtual image of the actual object by extracting a partial image corresponding to the target area from the one or more captured images, and controlling display of the virtual image according to the contact motion.

Description

TECHNICAL FIELD

[0001] The present technology relates to an information processing apparatus, an information processing method, and a computer readable medium for providing a virtual experience.

BACKGROUND ART

[0002] Patent Literature 1 describes a system for providing a virtual experience using an image of an actual space. In this system, an image representing a field of view of a first user is generated using a wearable display worn by the first user and a wide-angle camera. This image is presented to a second user. The second user may enter a virtual object such as text and an icon into the presented image. Also, the input virtual object is presented to the first user. This makes it possible to realize a virtual experience of sharing vision among users (Patent Literature 1, paragraphs [0015]-[0017], [0051], [0062], FIGS. 1 and 3, etc.).

CITATION LIST

Patent Literature

[0003] Patent Literature 1: Japanese Patent Application Laid-open No. 2015-95802

DISCLOSURE OF INVENTION

Technical Problem

[0004] As described above, a technique for providing various virtual experiences using an image of an actual space or the like has been developed, and a technique capable of seamlessly connecting the actual space and the virtual space is demanded.

[0005] In view of the above circumstances, an object of the present technology is to provide an information processing apparatus, an information processing method, and a computer readable medium capable of seamlessly connecting the actual space and the virtual space.

Solution to Problem

[0006] In order to achieve the above object, an information processing apparatus according to an embodiment of the present technology includes an acquisition unit, a motion detection unit, an area detection unit, and a display control unit.

[0007] The acquisition unit acquires one or more captured images in which the actual space is captured.

[0008] The motion detection unit detects a contact motion, which is a series of motions when a user contacts an actual object in the actual space.

[0009] The area detection unit detects a target area including the actual object according to the detected contact motion.

[0010] The display control unit that generates a virtual image of the actual object by extracting a partial image corresponding to the target area from the one or more captured images, and controls display of the virtual image according to the contact motion.

[0011] In this information processing apparatus, the contact motion of the user contacting the actual object is detected, and the target area including the actual object is detected according to the contact motion. The partial image corresponding to the target area is extracted from the captured image obtained by capturing the actual space in which the actual object exists, and the virtual image of the actual object is generated. Then, the display control of the virtual image is executed according to the contact motion of the user. Thus, it becomes possible to easily display the virtual image in which the actual object is captured, and to seamlessly connect the actual space and the virtual space.

[0012] The display control unit may generate the virtual image representing the actual object that is not shielded by a shielding object.

[0013] This makes it possible to bring a clear image of the actual object which is not shielded by the shielding object into the virtual space, and to seamlessly connect the actual space and the virtual space.

[0014] The display control unit may generate the partial image from the captured image in which the object is not included in the target area among the one or more captured images.

[0015] This makes it possible to easily bring the virtual image representing the actual object without shielding into the virtual space. As a result, it becomes possible to connect seamlessly the actual space and the virtual space.

[0016] The display control unit may superimpose and display the virtual image on the actual object.

[0017] Thus, the virtual image in which the actual object is duplicated is displayed on the actual object. As a result, the virtual image can be easily handled, and excellent usability can be demonstrated.

[0018] The acquisition unit may acquire the one or more captured images from at least one of a capturing apparatus that captures the actual space and a database that stores an output of the capturing apparatus.

[0019] Thus, for example, it becomes possible to easily generate the virtual image with high accuracy representing an actual object without shielding.

[0020] The contact motion may include a motion of bringing a user’s hand closer to the actual object. In this case, the motion detection unit may determine whether or not a state of the contact motion is a pre-contact state in which the contact of the user’s hand with respect to the actual object is predicted. In addition, if it is determined that the state of the contact motion is the pre-contact state, the acquisition unit may acquire the one or more captured images by controlling the capturing apparatus.

[0021] Thus, for example, it becomes possible to capture the actual object immediately before the user contacts the actual object. This makes it possible to sufficiently improve the accuracy of the virtual image.

[0022] The acquisition unit may increase a capturing resolution of the capturing apparatus if the state of the contact motion is determined as the pre-contact state.

[0023] This makes it possible to generate the virtual image with high resolution, for example.

[0024] The motion detection unit may detect a contact position between the actual object and the hand of the user. In this case, the area detection unit may detect the target area on the basis of the detected contact position.

[0025] Thus, for example, it becomes possible to designate a capture target, a range, and the like by a simple motion, and to seamlessly connect the actual space and the virtual space.

[0026] The area detection unit may detect a boundary of the actual object including the contact position as the target area.

[0027] Thus, for example, it becomes possible to accurately separate the actual object and the other areas, and to generate a highly precise virtual image.

[0028] The information processing apparatus may further include a line-of-sight detection unit for detecting a line-of-sight direction of the user. In this case, the area detection unit may detect the boundary of the actual object on the basis of the line-of-sight direction of the user.

[0029] Thus, it becomes possible to improve separation accuracy between the actual object to be captured and the target area. As a result, it becomes possible to generate an appropriate virtual image.

[0030] The line-of-sight detection unit may detect a gaze position on the basis of the line-of-sight direction of the user. In this case, the area detection unit may detect the boundary of the actual object including the contact position and the gaze position as the target area.

[0031] Thus, it becomes possible to greatly improve the separation accuracy between the actual object to be captured and the target area, and to sufficiently improve the reliability of the apparatus.

[0032] The area detection unit may detect the boundary of the actual object on the basis of at least one of a shadow, a size, and a shape of the actual object.

[0033] This makes it possible to accurately detect, for example, the boundary of the actual object regardless of the state of the actual object or the like. As a result, it becomes possible to sufficiently improve the usability of the apparatus.

[0034] The motion detection unit may detect a fingertip position of a hand of the user. In this case, the area detection unit may detect the target area on the basis of a trajectory of the fingertip position accompanying a movement of the fingertip position.

[0035] This makes it possible to easily set the capture range, for example.

[0036] The display control unit may superimpose and display an area image representing the target area on the actual object.

[0037] Thus, for example, it becomes possible to confirm the target area as a range of capture, and to sufficiently avoid a state such as unnecessary virtual image is generated.

[0038] The area image may be displayed such that at least one of a shape, a size, and a position can be edited. In this case, the area detection unit may change the target area on the basis of the edited area image.

[0039] Thus, it becomes possible to accurately set the capture range, and, for example, to easily generate the virtual image or the like of a desired actual object.

[0040] The motion detection unit may detect a contact position between the actual object and the hand of the user. In this case, the display control unit may control the display of the virtual image according to the detected contact position.

[0041] Thus, for example, it becomes possible to display the virtual image without a sense of discomfort according to the contact position, and to connect seamlessly the actual space and the virtual space.

[0042] The motion detection unit may detect a gesture of a hand of the user contacting the actual object. In this case, the display control unit may control the display of the virtual image according to the detected gesture of the hand of the user.

[0043] Thus, for example, it becomes possible to switch a display method of the virtual image corresponding to the gesture of the hand, and to provide an easy-to-use interface.

[0044] The virtual image may be at least one of a two-dimensional image and a three-dimensional image of the actual object.

[0045] Thus, it becomes possible to generate virtual images of various actual objects existing in the actual space, and to seamlessly connect the actual space and the virtual space.

[0046] An information processing method according to an embodiment of the present technology is an information processing method including, executed by a computer system, acquiring one or more captured images obtained by capturing an actual space.

[0047] A contact motion, which is a series of motions when a user contacts an actual object in the actual space is detected.

[0048] A target area including the actual object according to the detected contact motion is detected.

[0049] A partial image corresponding to the target area is extracted from the one or more captured images to generate a virtual image of the actual object and to control display of the virtual image according to the contact motion.

[0050] A computer readable medium with program stored thereon according to an embodiment of the present technology, the program causes a computer system to execute the following steps:

[0051] a step of acquiring one or more captured images obtained by capturing an actual space;

[0052] a step of detecting a contact motion, which is a series of motions when a user contacts an actual object in the actual space;

[0053] a step of detecting a target area including the actual object according to the detected contact motion; and

[0054] a step of generating a virtual image of the actual object by extracting a partial image corresponding to the target area from the one or more captured images, and controlling display of the virtual image according to the contact motion.

Advantageous Effects of Invention

[0055] As described above, according to the present technology, it is possible to seamlessly connect the actual space and the virtual space. Note that the effect described here is not necessarily limitative, and any of the effects described in the present disclosure may be provided.

BRIEF DESCRIPTION OF DRAWINGS

[0056] FIG. 1 is a schematic diagram for explaining an outline of a motion of an HMD according to an embodiment of the present technology.

[0057] FIG. 2 is a perspective view schematically showing an appearance of the HMD according to an embodiment of the present technology.

[0058] FIG. 3 is a block diagram showing a configuration example of the HMD shown in FIG. 2.

[0059] FIG. 4 is a flowchart showing an example of the motion of the HMD 100.

[0060] FIG. 5 is a schematic diagram showing an example of a contact motion with respect to the actual object of the user.

[0061] FIG. 6 is a schematic diagram showing an example of detection processing of a capture area in an area automatic detection mode.

[0062] FIG. 7 is a schematic diagram showing another example of the detection processing of the capture area in the area automatic detection mode.

[0063] FIG. 8 is a schematic diagram showing an example of correction processing of the capture area.

[0064] FIG. 9 is a schematic diagram showing an example of a captured image used for generating a virtual image.

[0065] FIG. 10 is a schematic diagram showing an example of a display of the virtual image.

[0066] FIG. 11 is a schematic diagram showing an example of a display of the virtual image.

[0067] FIG. 12 is a schematic diagram showing an example of a display of the virtual image.

[0068] FIG. 13 is a schematic diagram showing an example of a display of the virtual image.

[0069] FIG. 14 is a schematic diagram showing another example of a display of the virtual image.

[0070] FIG. 15 is a schematic diagram showing an example of the detection processing of the capture area including a shielding object.

[0071] FIG. 16 is a schematic diagram showing an example of a virtual image generated by the detection processing shown in FIG. 15.

[0072] FIG. 17 is a flowchart showing another example of the motion of the HMD.

[0073] FIG. 18 is a schematic diagram showing an example of a capture area designated by the user.

[0074] FIG. 19 is a perspective view schematically showing an appearance of the HMD according to another embodiment.

[0075] FIG. 20 is a perspective view schematically showing the appearance of a mobile terminal according to another embodiment.

MODE(S)* FOR CARRYING OUT THE INVENTION*

[0076] Embodiments according to the present technology will now be described below with reference to the drawings.

[0077] [Configuration of HMD]

[0078] FIG. 1 is a schematic diagram for explaining an outline of a motion of an HMD according to an embodiment of the present technology. An HMD 100 (Head Mount Display) is a spectacle type apparatus having a transmission type display, and is used by being worn on a head of a user 1.

[0079] The user 1 wearing the HMD 100 will be able to visually recognize an actual scene and at the same time visually recognize an image displayed on the transmission type display. That is, by using the HMD 100, virtual images or the like can be superimposed and displayed on a real space (actual space) around the user 1. Thus, the user 1 will be able to experience an Augmented Reality (AR) or the like.

[0080] FIG. 1A is a schematic diagram showing an example virtual space (AR space) visually seen by the user 1. A user 1a wearing the HMD 100 sits on a left-side chair in FIG. 1A. An image of other user 1b sitting on the other side of a table, for example, is displayed on a display of the HMD 100. As a result, the user 1a wearing the HMD 100 can experience the augmented reality as if the user 1a were sitting face-to-face to the other user 1b.

[0081] Note that a portion indicated by solid lines in the diagram (such as chair on which user 1a sits, table, and document 2 on table) is actual objects 3 arranged in an actual space in which the user actually exists. Furthermore, a portion indicated by a dotted line in the drawing (such as other user 1b and his chair) is an image displayed on the transmission type display, and becomes a virtual image 4 in the AR space. In the present disclosure, the virtual image 4 is an image for displaying various objects (virtual objects) displayed, for example, in the virtual space.

[0082] By wearing the HMD 100 in this manner, even when the other user 1b is at a remote location, for example, conversations with gestures and the like can be naturally performed, and good communications become possible. Of course, even when the user 1a and the other user 1b are in the same space, the present technology can be applied.

[0083] The HMD 100 includes a capture function that generates the virtual image 4 of the actual object 3 in the actual space and displays it in the AR space. For example, suppose that the user 1a wearing the HMD 100 extends his hand to the document 2 on the table and contacts the document 2. In this case, in the HMD 100, the virtual image 4 of the document 2 to which the user 1a contacts is generated. In the present embodiment, the document 2 is an example of the actual object 3 in the actual space.

[0084] FIG. 1B schematically shows an example contact motion in which the user 1a contacts the document 2. For example, when the user 1a contacts the document 2, an area of the document 2 to be captured (boundary of document 2) is detected. On the basis of the detected result, the virtual image 4 (hatched area in the drawing) representing the document 2 contacted by the user 1a is generated and displayed on the HMD 100 display (AR space). A method of detecting the area to be captured, a method of generating the virtual image 4, and the like will be described in detail later.

[0085] For example, as shown in FIG. 1B, when the user 1a manually scrapes off the document 2 on the table, the captured document 2 (virtual image 4) is displayed as if it turned over the actual document 2. That is, the generated virtual image 4 is superimposed and displayed on the actual document 2 as if the actual document 2 were turned over. Note that the user 1a does not need to actually turn over the document 2, and can generate the virtual image 4 only by performing a gesture of turning over the document 2, for example.

[0086] Thus, in the HMD 100, the actual object 3 (document 2) to be captured is designated by the user 1a’s hand, and a target virtual image 4 is generated. The captured virtual image 4 is superimposed and displayed on a target actual object. The virtual image 4 of the document 2 displayed in the AR space can be freely displayed in the AR space according to various gestures of the user 1a such as grabbing, deforming, or moving the virtual image 4, for example.

[0087] Furthermore, the document 2 brought into the AR space as the virtual image 4 can be freely moved in the virtual AR space. For example, FIG. 1C shows that the user 1a grabs the virtual object document 2 (virtual image 4) and hands it to the other user 1b at the remote location displayed on the HMD 100 display. By using the virtual image 4, for example, such communication becomes possible.

[0088] As described above, in the HMD 100, the actual object 3 existing in the actual space (real world) is simply captured and presented in the virtual space (virtual world). That is, it can be said that the HMD 100 has a function of simply capturing the actual space. This makes it possible to easily bring the object in the actual space into the virtual space such as the AR space, and to seamlessly connect the actual space and the virtual space. Hereinafter, the configuration of the HMD 100 will be described in detail.

[0089] FIG. 2 is a perspective view schematically showing an appearance of the HMD 100 according to the embodiment of the present technology. FIG. 3 is a block diagram showing an example configuration of the HMD 100 shown in FIG. 2.

[0090] The HMD 100 includes a frame 10, a left-eye lens 11a and a right-eye lens 11b, a left-eye display 12a and a right-eye display 12b, a left-eye camera 13a and a right-eye camera 13b, and an outward camera 14.

[0091] The frame 10 has a shape of glasses, and includes a rim portion 15 and temple portions 16. The rim portion 15 is a portion disposed in front of the left and right eyes of the user 1, and supports each of the left eye lens 11a and the right eye lens 11b. The temple portions 16 extend rearward from both ends of the rim portion 15 toward both ears of the user 1, and tips are worn by both ears. The rim portion 15 and the temple portions 16 are formed of, for example, a material such as synthetic resin and metal.

[0092] The left-eye lens 11a and the right-eye lens 11b are respectively disposed in front of the left and right eyes of the user so as to cover at least a part of a field of view of the user. Typically, each lens is designed to correct the user’s vision. Needless to say, it is not limited to this, and a so-called no-degree lens may be used.

[0093] The left-eye display 12a and the right-eye display 12b are transmission type displays, and are disposed so as to cover partial areas of the left-eye and right-eye lens 11a and 11b, respectively. That is, the left-eye and right-eye lens 11a and 11b are respectively disposed in front of the left and right eyes of the user.

[0094] Images for the left eye and the right eye and the like are displayed on the left eye and the right eye displays 12a and 12b, respectively. A virtual display object (virtual object) such as the virtual image 4 is displayed on each of the displays 12a and 12b. Therefore, the user 1 wearing the HMD 100 visually sees the actual space scene, such as the actual object 3, on which the virtual images 4 displayed on the displays 12a and 12b are superimposed.

[0095] As the left-eye and right-eye displays 12a and 12b, for example, a transmission type organic electroluminescence display, an LCD (liquid crystal display) display, or the like is used. In addition, a specific configuration of the left-eye and right-eye displays 12a and 12b is not limited, and, for example, a transmission type display of an arbitrary method such as a method of projecting and displaying an image on a transparent screen or a method of displaying an image using a prism or the like may be used, as appropriate.

[0096] The left-eye camera 13a and the right-eye camera 13b are appropriately placed in the frame 10 so that the left eye and the right eye of the user 1 can be imaged. For example, it is possible to detect a line of sight of the user 1, a gaze point that the user 1 is gazing at, and the like, on the basis of the images of the left eye and the right eye captured by the left eye and right eye cameras 13a and 13b.

[0097] As the left-eye and right-eye cameras 13a and 13b, for example, digital cameras including image sensors such as a CMOS (Complementary Metal-Oxide Semiconductor) sensor and a CCD (Charge Coupled Device) sensor are used. Furthermore, for example, an infrared camera equipped with an infrared illumination such as an infrared LED may be used.

[0098] Hereinafter, the left-eye lens 11a and the right-eye lens 11b are both referred to as lenses 11, and the left-eye display 12a and the right-eye display 12b are both referred to as transmission type displays 12 in some cases. The left-eye camera 13a and the right-eye camera 13b are referred to as inward cameras 13 in some cases.

[0099] The outward camera 14 is disposed toward outside (side opposite to user 1) in a center of the frame 10 (rim portion 15). The outward camera 14 captures an actual space around the user 1 and outputs a captured image in which the actual space is captured. A capturing range of the outward camera 14 is set to be substantially the same as the field of view of the user 1 or to be a range wider than the field of view of the user 1, for example. That is, it can be said that the outward camera 14 captures the field of view of the user 1. In the present embodiment, the outward camera 14 corresponds to a capturing apparatus.

[0100] As the outward camera 14, for example, a digital camera including an image sensor such as a CMOS sensor or a CCD sensor is used. In addition, for example, a stereo camera capable of detecting depth information of the actual space or the like, a camera equipped with a TOF (Time of Flight) sensor, or the like may be used as the outward camera 14. The specific configuration of the outward camera 14 is not limited, and any camera capable of capturing the actual space with a desired accuracy, for example, may be used as the outward camera 14.

[0101] As shown in FIG. 3, the HMD 100 further includes a sensor unit 17, a communication unit 18, a storage unit 20, and a controller 30.

[0102] The sensor unit 17 includes various sensor elements for detecting a state of a surrounding environment, a state of the HMD 100, a state of the user 1, and the like. In the present embodiment, as the sensor element, a distance sensor (Depth sensor) for measuring a distance to a target is mounted. For example, the stereo camera or the like described above is an example of a distance sensor. In addition, a LiDAR sensor, various radar sensors, or the like may be used as the distance sensor.

[0103] In addition, as the sensor elements, for example, a 3-axis acceleration sensor, a 3-axis gyro sensor, a 9-axis sensor including a 3-axis compass sensor, a GPS sensor for acquiring information of a current position of the HMD 100 or the like may be used. Furthermore, a biometric sensor (heart rate) such as an electroencephalogram sensor, an electromyographic sensor, or a pulse sensor for detecting biometric information of the user 1 may be used.

[0104] The sensor unit 17 includes a microphone for detecting sound information of a user’s voice or a surrounding sound. For example, voice uttered by the user is detected, as appropriate. Thus, for example, the user can experience the AR while making a voice call and perform an operation input of the HMD 100 using a voice input. In addition, the sensor element or the like provided as the sensor unit 17 is not limited.

[0105] The communication unit 18 is a module for executing network communication, short-range wireless communication, and the like with other devices. For example, a wireless LAN module such as a Wi-Fi, and a communication module such as Bluetooth (registered trademark) are provided.

[0106] The storage unit 20 is a nonvolatile storage device, and, for example, a hard disk drive (HDD), a solid state drive (SSD), or the like is used.

[0107] The storage unit 20 stores a captured image database 21. The captured image database 21 is a database that stores, for example, an image of the actual space captured by the outward camera 14. The image or the like of the actual space captured by other camera or the like different from the outward camera 14 may be stored in the captured image database 21.

[0108] The captured image database 21 stores, for example, the captured image of the actual space and capture information relating to a capturing state of each captured image in association with each other. As the capture information, for example, when the image is captured, a capturing time, a position of the HMD 100 at the time of capturing, a capturing direction (HMD 100 attitude, etc.), a capturing resolution, a capturing magnification, an exposure time, etc. are stored. In addition, a specific configuration of the captured image database 21 is not limited. In the present embodiment, the captured image database corresponds to a database in which an output of the capturing apparatus is stored.

[0109] Furthermore, the storage unit 20 stores a control program 22 for controlling an overall motion of the HMD 100. The method of installing the captured image database 21 and the control programs 22 to the HMD 100 are not limited.

[0110] The controller 30 corresponds to the information processing apparatus according to the present embodiment, and controls motions of respective blocks of the HMD 100. The controller 30 includes a hardware configuration necessary for a computer such as a CPU and a memory (RAM, ROM). When the CPU loads and executes the control program 22 stored in the storage unit 20 to the RAM, various processes are executed.

[0111] As the controller 30, a device such as a PLD (Programmable Logic Device) such as an FPGA (Field Programmable Gate Array), other ASIC (Application Specific Integrated Circuit), or the like may be used, for example.

[0112] In the present embodiment, the CPU of the controller 30 executes the program according to the present embodiment, whereby an image acquisition unit 31, a contact detection unit 32, a line-of-sight detection unit 33, an area detection unit 34, and an AR display unit 35 are realized as functional blocks. The information processing method according to the present embodiment is executed by these functional blocks. Note that in order to realize each functional block, dedicated hardware such as an IC (integrated circuit) may be used, as appropriate.

[0113] The image acquisition unit 31 acquires one or more captured images in which the actual space is captured. For example, the image acquisition unit 31 reads the captured image captured by the outward camera 14 by appropriately controlling the outward camera 14. In this case, the image acquisition unit 31 can acquire the image captured in real time.

[0114] For example, when a notification that the user 1 and the actual object 3 are about to come into contact with each other is received from the contact detection unit 32, which will be described later, the image acquisition unit 31 controls the outward camera 14 to start capturing the actual object 3 to be captured. Also, in a case where the outward camera 14 is performing continuous capturing, a capturing parameter of the outward camera 14 is changed and switched to capturing a higher resolution image. That is, the image acquisition unit 31 controls the outward camera 14 so as to switch to a mode of capturing the actual object 3 to be captured. This point will be described in detail below with reference to FIG. 5 and the like.

[0115] Furthermore, for example, the image acquisition unit 31 accesses the storage unit 20 as appropriate to read a captured image 40 stored in the captured image database 21. That is, the image acquisition unit 31 can appropriately refer to the captured image database 21 and acquire the captured image captured in the past.

[0116] Thus, in the present embodiment, the image acquisition unit 31 acquires one or more captured images from at least one of the outward camera 14 for capturing the actual space and the captured image database 21 in which the output of the outward camera 14 is stored. The acquired captured image is supplied to, for example, other functional blocks, as appropriate. In addition, the captured image acquired from the outward camera 14 is appropriately stored in the captured image database 21. In this embodiment, the image acquisition unit 31 corresponds to the acquisition unit.

[0117] The contact detection unit 32 detects a series of contact motions when the user 1 contacts the actual object 3 in the actual space. As the detection of the contact motion, for example, the depth information detected by the distance sensor or the like mounted as the sensor unit 17, an image of the field of view of the user 1 captured by the outward camera 14 (captured image), or the like is used.

[0118] In the present disclosure, the contact motion is a series of motions (gestures) performed when the user 1 contacts the actual object 3, and is typically a motion performed by the user 1 so that the hand (fingers) of the user 1 contacts the actual object 3. For example, a hand gesture of the user’s fingers when the hand of the user 1 contacts the actual object 3 is the contact motion. For example, hand gestures such as pinching, turning over, grabbing, tapping, and shifting the document 2 (actual object 3) are included in the contact motion. Incidentally, the hand gesture is not limited to the gesture performed while contacting the actual object 3. For example, a hand gesture or the like performed in a state where the user 1 does not contact the actual object 3, such as spreading or narrowing fingers to pinch the actual object 3, is also the contact motion.

[0119] The contact motion includes a motion of bringing the hand of the user 1 closer to the actual object 3. That is, in order to contact the actual object 3, a motion of the user 1 extending the hand to the actual object 3 to be a target is also included in the contact motion. For example, the motion (approaching motion) in which the user 1 moves the hand to approach the document 2 (actual object 3) is the contact motion. Therefore, it can be said that the contact detection unit 32 detects a series of motions performed when the user contacts the actual object 3, such as an approach motion and a hand gesture at the time of contacting as the contact motion of the user 1.

[0120] The contact detection unit 32 determines the state of the contact motion. For example, the contact detection unit determines whether or not the state of the contact motion is a pre-contact state in which the contact of the hand of the user 1 with respect to the actual object 3 is predicted. That is, it is determined whether or not the hand of the user 1 is likely to contact the actual object 3. For example, when a distance between the fingers of the user 1 and the surrounding actual object 3 is smaller than a certain threshold, it is determined that the hand of the user 1 is likely to contact the actual object 3, and the contact motion of the user 1 is in the pre-contact state (see Step 102 of FIG. 4). In this case, the state in which the distance between the fingers and the actual object 3 is smaller than the threshold and the fingers are not in contact with the actual object 3 is the pre-contact state.

[0121] In addition, the contact detection unit 32 determines whether or not the state of the contact motion is the contact state in which the hand of the user 1 and the actual object 3 are in contact with each other. That is, the contact detection unit 32 detects the contact of the fingers of the user 1 with a surface (plane) of the actual object 3.

[0122] When the contact between the user 1 and the actual object 3 is detected, the contact detection unit 32 detects a contact position P between the hand of the user 1 and the actual object 3. As the contact position P, for example, a coordinate of a position where the hand of the user 1 and the actual object 3 contact each other in a predetermined coordinate system set in the HMD 100 is detected.

[0123] A method of detecting the contact motion or the like is not limited. For example, the contact detection unit 32 appropriately measures the position of the hand of the user 1 and the position of the surrounding actual object 3 using the distance sensor or the like attached to the HMD 100. On the basis of measurement results of the respective positions, for example, it is determined whether or not the state is the pre-contact state, and it is detected whether or not the hand of the user 1 is likely to contact the actual object 3. Furthermore, for example, it is determined whether or not it is a contact state and whether or not the hand contacts the actual object 3.

[0124] In order to detect whether or not it is likely to contact, for example, prediction processing by machine learning, prediction processing using a fact that the distance between the hand of the user 1 and the actual object 3 is shortened, or the like is used. Alternatively, on the basis of a movement direction, a movement speed, and the like of the hand of the user 1, processing of predicting the contact between the user 1 and the actual object 3 may be performed.

[0125] Furthermore, the contact detection unit 32 detects the hand gesture of the user 1 on the basis of the captured image or the like captured by the outward camera 14. For example, a method of detecting the gesture by detecting an area of the fingers in the captured image, a method of detecting a fingertip of each finger and detecting the gesture, or the like may be used, as appropriate. Processing of detecting the hand gesture using machine learning or the like may be performed. In addition, a method of detecting the hand gesture or the like is not limited.

[0126] The line-of-sight detection unit 33 detects a line-of-sight direction of the user 1. For example, the line-of-sight direction of the user 1 is detected on the basis of the images of the left eye and the right eye of the user 1 captured by the inward camera 13. The line-of-sight detection unit 33 detects a gaze position Q on the basis of the line-of-sight direction of the user 1. For example, in a case where the user 1 is seeing at the certain actual object 3 in the actual space, the position where the actual object 3 and the line-of-sight direction of the user 1 intersect is detected as the gaze position Q of the user 1.

[0127] The method of detecting the line-of-sight direction and the gaze position Q of the user 1 is not limited. For example, in a configuration in which the infrared camera (inward camera 13) and an infrared light source are mounted, an image of an eyeball on which reflection (bright spot) of infrared light emitted from the infrared light source is reflected is captured. In this case, the line-of-sight direction is estimated from the bright spot of the infrared light and a pupil position, and the gaze position Q is detected.

[0128] In addition, a method of estimating the line-of-sight direction and the gaze position Q on the basis of a feature point such as a corner of the eye or the like may be used on the basis of the image of the eyeball. Furthermore, the line-of-sight direction or the gaze position Q may be detected on the basis of a change in an eye potential or the like generated by charging of the eyeball. In addition, any algorithm or the like capable of detecting the line-of-sight direction, the gaze position Q, and the like of the user 1 may be used.

[0129] The area detection unit 34 detects the capture area including the actual object 3 according to the contact motion detected by the contact detection unit 32. The capture area is, for example, an area for generating the virtual image 4 in which the actual object 3 is captured. That is, an area including the actual object 3 to be captured as the virtual image 4 can be said to be the capture area. In the present embodiment, the capture area corresponds to a target area.

[0130] For example, the captured image (hereinafter, referred to as contact image) that captures a state in which the user 1 is in contact with the actual object 3 is acquired. The area detection unit 34 analyzes the contact image and detects a range in the contact image to be captured as the virtual image 4. Note that it is not limited to the case where the capture area is detected from the contact image. For example, the capture area may be detected from the captured image other than the contact image on the basis of the contact position of the user 1 or the like.

[0131] In the present embodiment, an area automatic detection mode for automatically detecting the capture area is executed. In the area automatic detection mode, for example, the actual object 3 contacted by the user 1 is automatically identified as a capture target. Then, an area representing an extension of the surface of the actual object 3 to be captured, that is, the boundary (periphery) of the actual object 3 contacted by the user 1 may be detected as the capture area. In addition, an area representing the boundary (periphery) of the actual object 3 related to the actual object 3 contacted by the user 1 may be detected as the capture area. For example, a boundary of a document on a top surface, a back surface, or the like of a document contacted by the user 1 may be detected as the capture area. Alternatively, when one document bound with a binder or the like is contacted, the capture area may be detected, such as containing the other document.

[0132] In this manner, in the area automatic detection mode, it is detected on which surface the user 1 is about to contact and to what extent the surface is extended. This makes it possible to identify the range of the surface contacted by the user 1 (range of document 2, white board, or the like). A method of automatically detecting the capture area is not limited, and, for example, arbitrary image analysis processing capable of detecting an object, recognizing a boundary, or the like, or detection processing by the machine learning or the like may be used, as appropriate.

[0133] Furthermore, in the present embodiment, the area manual designation mode for detecting the capture area designated by the user 1 is executed. In the area manual designation mode, for example, a motion in which the user 1 traces the actual object 3 is detected as appropriate, and the range designated by the user 1 is detected as the capture area. The area automatic detection mode and the area manual designation mode will be described later in detail.

[0134] The AR display unit 35 generates an AR image (virtual image 4) displayed on a transmission type display 12 of the HMD 100 and controls the display thereof. For example, according to the state of the HMD 100, the state of the user 1, and the like, the position, the shape, the attitude, and the like of displaying the AR image are calculated.

[0135] The AR display unit 35 extracts a partial image corresponding to the capture area from one or more captured images to generate the virtual image 4 of the actual object 3. The partial image is, for example, an image generated by cutting out a portion of the captured image corresponding to the capture area. On the basis of the cut-out partial image, the virtual image 4 for displaying in the AR space is generated. Therefore, it can be said that the virtual image 4 is a partial image processed corresponding to the AR space.

[0136] For example, if the actual object 3 having a two-dimensional spread such as the document 2 and a whiteboard is captured, the virtual image 4 having a two-dimensional spread for displaying content written on the surface of the actual object 3 is generated. In this case, the virtual image 4 is a two-dimensional image of the actual object 3.

[0137] In addition, in the HMD 100, the actual object 3 having a three-dimensional shape can be captured. For example, the virtual image 4 is generated so that a stereoscopic shape of the actual object 3 can be represented in the AR space. In this case, the virtual image 4 is a three-dimensional image of the actual object 3. In this manner, the AR display unit 35 generates the virtual image 4 according to the shape of the actual object 3.

[0138] Furthermore, the AR display unit 35 generates the virtual image 4 representing the actual object 3 which is not shielded by a shielding object. Here, the state of being shielded by the shielding object (other object) is a state in which a part of the actual object 3 is hidden by the shielding object. For example, in the contact image captured in a state in which the hand of the user 1 is in contact with the actual object 3, it is conceivable that a part of the actual object 3 is hidden by the hand of the user 1. In this case, the hand of the user 1 becomes the shielding object that shields the actual object 3.

[0139] In the present embodiment, the AR display unit 35 generates the virtual image 4 in which the entire actual object 3 is displayed without shielding the actual object 3. Therefore, the virtual image 4 is a clear image representing the entire actual object 3 to be captured (see FIG. 9, etc.). As to such a virtual image 4, a partial image can be generated from the captured image, for example, in which the actual object 3 is captured without shielding. Incidentally, the virtual image 4 in which a part of the actual object 3 is shielded may be generated (see FIG. 16A, etc.).

[0140] The AR display unit 35 displays the generated virtual image 4 on the transmission type display 12 so as to overlap with the actual object 3. That is, the image (virtual image 4) of the clear actual object 3 is superimposed and displayed on the actual object 3. In addition, the virtual image 4 is displayed corresponding to the action of the hand (hand gesture) of the hand of the user 1 in contact with the actual object 3 and the like. For example, a type of the display of the virtual image 4 is changed for each type of motion that contacts the actual object 3 (such as tapping or rubbing actual object 3). In this manner, the AR display unit 35 controls the display of the virtual image 4 according to the contact motion of the user 1.

[0141] A method of generating the virtual image 4 of the actual object 3, a method of displaying the virtual image 4, and the like will be described in detail later. In the present embodiment, the AR display unit 35 corresponds to the display control unit.

[0142] [Motion of HMD]

[0143] FIG. 4 is a flowchart showing an example of a motion of the HMD 100. Processing shown in FIG. 4 is processing executed in the area automatic detection mode, and is, for example, loop processing repeatedly executed during the motion of the HMD 100.

[0144] The contact detection unit 32 measures a finger position of the user 1 and a surface position of the actual object 3 existing around the fingers of the user 1 (Step 101). Here, for example, the position of the surface of the arbitrary actual object 3 existing around the fingers is measured. Incidentally, at this timing, the actual object 3 to be contacted by the user 1 needs not be identified.

[0145] For example, on the basis of the depth information detected by the distance sensor, the position of the fingers of the user 1 and the surface position of the actual object 3 in the coordinate system set to the HMD 100 (distance sensor) is measured. In this case, it can be said that a spatial arrangement relationship between the fingers of the user 1 and the actual object 3 around the fingers is measured. As the finger position, for example, each fingertip of the user 1 directed toward the actual object 3 is detected. In addition, as the surface position, for example, a shape or the like representing the surface of the actual object 3 near the fingers of the user 1 is detected.

[0146] Furthermore, in a case where the field of view of the user 1 is captured by the outward camera 14 or the like, the finger position and the surface position (arrangement of fingers and actual object) may be appropriately detected from the depth information and the captured image. By using the outward camera 14, it is possible to improve a detection accuracy of each position. In addition, a method of detecting the finger position and the surface position is not limited.

[0147] The contact detection unit 32 determines whether or not the fingers of the user 1 are likely to contact the surface of the actual object 3 (Step 102). That is, it is determined whether or not the state of the contact motion of the user 1 is the pre-contact state in which the contact is predicted.

[0148] As the determination of the pre-contact state, for example, a threshold determination of the distance between the finger position and the surface position is performed. That is, it is determined whether or not the distance between the finger position and the surface position is larger than a predetermined threshold. The predetermined threshold is appropriately set, for example, so that capture processing of the actual object 3 can be appropriately executed.

[0149] For example, if the distance between the finger position of the user 1 and the surface position of the actual object 3 is larger than the predetermined threshold, it is determined that the fingers of the user 1 are sufficiently away from the actual object 3 and is not in the pre-contact state (No in Step 102). In this case, it returns to Step 101, the finger position and the surface position are measured at a next timing, and it is determined whether or not the state is the pre-contact state.

[0150] If the distance between the finger position and the surface position is equal to or less than the predetermined threshold, it is determined that the fingers of the user 1 are in a state of approaching the actual object 3 and is in the pre-contact state in which the contact is predicted (Yes in Step 102). In this case, the image acquisition unit 31 controls the outward camera 14, and starts capturing of the actual space with a setting suitable for capture (Step 103). That is, when an occurrence of an interaction between the actual object 3 and the user 1 is predicted, a capturing mode is switched and a detailed capture is started.

[0151] Specifically, by the image acquisition unit 31, each capturing parameter such as the capturing resolution, the exposure time, and a capturing interval of the outward camera 14 is set to a value for capturing. The value for capturing is appropriately set so that a desired virtual image 4 can be generated, for example.

[0152] For example, in a configuration in which the outward camera 14 always captures the field of view of the user 1, the capturing resolution for monitoring is set so as to suppress an amount of image data. The capturing resolution for monitoring is changed to a capturing resolution for more detailed capturing. That is, the image acquisition unit 31 increases the capturing resolution of the outward camera 14 in a case where the state of the contact motion is determined to be the pre-contact state. This makes it possible to generate a detailed captured image (virtual image 4) with high resolution, for example.

[0153] Furthermore, for example, the exposure time of the outward camera 14 is appropriately set so that the image having desired brightness and contrast is captured. Alternatively, the capturing interval is appropriately set so that a sufficient number of captured images can be captured as will be described later.

[0154] When each capturing parameter of the outward camera 14 is set to the value for capturing and the capturing mode is switched, capturing of the actual space by the outward camera 14 (capturing of field of view of user 1) is started. The captured image captured by the outward camera 14 is appropriately read by the image acquisition unit 31. Capturing processing is repeatedly executed until a predetermined condition for generating the virtual image 4 is satisfied, for example.

[0155] FIG. 5 is a schematic diagram showing an example of the contact motion of the user 1 with respect to the actual object 3. FIG. 5A schematically shows fingers 5 of the user 1 and the actual object 3 (document 2) at a timing determined to be in the pre-contact state. Note that whether or not the document 2 shown in FIG. 5A is the target of the contact motion (target to be captured) is not identified in the state shown in FIG. 5A.

[0156] In the state shown in FIG. 5A, the capturing area of the outward camera 14 (dotted line in FIG. 5A) includes the fingers 5 of the user 1 and a part of the document 2. For example, the captured image with high resolution is captured in such a capturing range. In this case, the captured image is an image in which only a part of the document 2 is captured.

[0157] FIG. 5B shows the pre-contact state in which the fingers 5 of the user 1 approach the actual object 3 closer than the state shown in FIG. 5A. In the state shown in FIG. 5B, the entire document 2 is included in the capturing area of the outward camera 14. The fingers 5 of the user 1 are not in contact with the document 2, and the document 2 is captured without being shielded by the shielding object. That is, the captured image captured in the state shown in FIG. 5B becomes an image in which the document 2 (actual object 3) that is not shielded by the shielding object is captured.

[0158] FIG. 5C shows a contact state in which the fingers 5 of the user 1 and the actual object 3 are in contact with each other. The capturing processing by the outward camera 14 may be continued even in the contact state. In this case, the entire document 2 is included in the capturing range of the outward camera 14, but a part of the document 2 is shielded by the fingers of the user 1. In this case, the captured image is an image in which a part of the document 2 is shielded.

[0159] In the capturing processing by the outward camera 14, capturing is performed in the states as shown in, for example, FIG. 5A to FIG. 5C, and the captured images in the respective states are appropriately read. Thus, in a case where the state of the contact motion is determined to be the pre-contact state, the image acquisition unit 31 controls the outward camera 14 to acquire one or more captured images. That is, it can be said that the image acquisition unit 31 acquires the image captured by a capture setting (capture image).

……
……
……

本文链接：https://patent.nweon.com/21718

Sony Patent | Information processing apparatus, information processing method, and computer readable medium

您可能还喜欢...

分类

最新AR/VR行业分享

最新AR/VR论文

最新AR/VR行业招聘

Sony Patent | Information processing apparatus, information processing method, and computer readable medium

您可能还喜欢...

Sony Patent | Information processing device, information processing method, and information processing program

Sony Patent | Data processing system and method for image enhancement

Sony Patent | System, game console and method for adjusting a virtual environment

分类

最新AR/VR行业分享

最新AR/VR论文

最新AR/VR行业招聘