Sony Patent | Information Processing Device And Information Processing Method, And Recording Medium

编辑：映维 | 分类：Sony | 2020年10月9日

Patent: Information Processing Device And Information Processing Method, And Recording Medium

Publication Number: 20200322595

Publication Date: 20201008

Applicants: Sony

Sony Patent | Information Processing Device And Information Processing Method, And Recording Medium

Abstract

The present disclosure relates to an information processing device and an information processing method that enable achievement of an improvement regarding the localization of a visual line, for example, in pointing or an object operation with the visual line, and a recording medium. A display device is controlled such that a stereoscopic object is displayed, the stereoscopic object being disposed in a predetermined direction in a visual field of a user, the stereoscopic object indicating distance regarding the predetermined direction. The present disclosure can be applied to a wearable display device, such as a head-mounted display and the like.

TECHNICAL FIELD

[0001] The present disclosure relates to an information processing and an information processing method, and a recording medium, and particularly relates to an information processing device and an information processing method that enable, for example, a comfortably handsfree operation with achievement of an improvement regarding the localization of a visual line, in pointing or an object operation with the visual line, and a recording medium.

BACKGROUND ART

[0002] A number of devices and methods for an operation of an object in real-world three-dimensional space, including a dedicated device, such as a 3-dimension (3D) mouse, gesture with a fingertip, and the like have been proposed (refer to Patent Document 1).

CITATION LIST

Patent Document

[0003] Patent Document 1: Japanese Patent Publication No. 5807686

SUMMARY OF THE INVENTION

Problems to be Solved by the Invention

[0004] However, for the dedicated device, such as the 3D mouse, it is necessary that the dedicated device is operated by hand. For the gesture with a fingertip, the latency of pointing is large.

[0005] Furthermore, due to a human vison-adjustment mechanism, it is desirable that an improvement regarding the localization of a visual line is made in pointing or an object operation with the visual line.

[0006] The present disclosure has been made in consideration of the situations, and an object of the present disclosure is to enable an improvement regarding the localization of a visual line, to be achieved.

Solutions to Problems

[0007] An information processing device according to the present disclosure, includes a display control unit that controls a display device such that a stereoscopic object is displayed, the stereoscopic object being disposed in a predetermined direction in a visual field of a user, the stereoscopic object indicating distance regarding the predetermined direction. A recording medium according to the present disclosure records a program for causing a computer to function as the information processing device.

[0008] An information processing method according to the present disclosure, includes controlling a display device such that a stereoscopic object is displayed, the stereoscopic object being disposed in a predetermined direction in a visual field of a user, the stereoscopic object indicating distance regarding the predetermined direction.

[0009] According to the present disclosure, a stereoscopic object is displayed on a display device, the stereoscopic object being disposed in a predetermined direction in a visual field of a user, the stereoscopic object indicating distance regarding the predetermined direction.

Effects of the Invention

[0010] According to the present disclosure (the present technology), a displayed stereoscopic object assists localization of a visual field of a user in three-dimensional space. As a result, for example, an operation can be comfortably performed in a handsfree manner.

[0011] Note that, the effects described in the present specification are just exemplifications. The effects of the present technology are not limited to the effects described in the present specification, and thus additional effects may be provided.

BRIEF DESCRIPTION OF DRAWINGS

[0012] FIG. 1 is a diagram for describing an overview of the present technology.

[0013] FIG. 2 is a diagram for describing a virtual object operation (Example 1).

[0014] FIG. 3 is a diagram for describing a real object operation in a real world (Example 2).

[0015] FIG. 4 is a diagram for describing a virtual camera visual-point movement in a virtual world (Example 3).

[0016] FIG. 5 is a diagram of illustrating an exemplary different virtual measure.

[0017] FIG. 6 is a diagram for describing exemplary object fine-adjustment.

[0018] FIG. 7 is a diagram for describing exemplary object fine-adjustment in Example 1.

[0019] FIG. 8 is a diagram for describing the exemplary object fine-adjustment in Example 1.

[0020] FIG. 9 is a diagram for describing exemplary object fine-adjustment in Example 2.

[0021] FIG. 10 is a diagram for describing the exemplary object fine-adjustment in Example 2.

[0022] FIG. 11 is a diagram for describing exemplary object fine-adjustment in Example 3.

[0023] FIG. 12 is a diagram for describing the exemplary object fine-adjustment in Example 3.

[0024] FIG. 13 is a diagram illustrating an exemplary configuration of the external appearance of a wearable display device to which the present technology has been applied.

[0025] FIG. 14 is a block diagram illustrating an exemplary configuration of the wearable display device of FIG. 13.

[0026] FIG. 15 is a flowchart for describing virtual-object operation processing.

[0027] FIG. 16 is a flowchart for describing environment recognition processing at step S11 of FIG. 15.

[0028] FIG. 17 is a flowchart for describing visual-line estimation processing at step S12 of FIG. 15.

[0029] FIG. 18 is a flowchart for describing drawing processing at step S13 of FIG. 15.

[0030] FIG. 19 is a diagram illustrating an exemplary configuration of the external appearance of a wearable display device to which the present technology has been applied.

[0031] FIG. 20 is a block diagram illustrating an exemplary configuration of the wearable display device of FIG. 19.

[0032] FIG. 21 is a flowchart for describing real-object operation processing.

[0033] FIG. 22 is a flowchart for describing visual-line estimation processing at step S112 of FIG. 21.

[0034] FIG. 23 is a flowchart for describing drone control processing at step S114 of FIG. 21.

[0035] FIG. 24 is a diagram illustrating an exemplary configuration of the external appearance of a wearable display device to which the present technology has been applied.

[0036] FIG. 25 is a block diagram illustrating an exemplary configuration of the wearable display device of FIG. 24.

[0037] FIG. 26 is a flowchart for describing visual-line estimation processing at step S12 of FIG. 15.

[0038] FIG. 27 is a diagram illustrating the relationship between coordinate systems according to the present technology.

[0039] FIG. 28 is a diagram for describing a method of acquiring a 3D gaze point in virtual space, according to the present technology.

[0040] FIG. 29 is a diagram for describing a method of acquiring a 3D gaze point in virtual space, according to the present technology.

[0041] FIG. 30 is a block diagram illustrating an exemplary configuration of an image processing system to which the present technology has been applied.

[0042] FIG. 31 is a block diagram illustrating an exemplary configuration of the hardware of a personal computer.

MODE FOR CARRYING OUT THE INVENTION

[0043] Modes for carrying out the present disclosure (hereinafter, referred to as embodiments) will be described below. Note that the descriptions will be given in the following order.

[0044] 1. First Embodiment (Overview)

[0045] 2. Second Embodiment (Virtual Object Operation)

[0046] 3. Third Embodiment (Real Object Operation)

[0047] 4. Fourth Embodiment (Virtual Camera Visual-Point Movement)

[0048] 5.* Additional Descriptions*

[0049] 6. Fifth Embodiment (Image Processing System)

1.* First Embodiment*

Overview

[0050] First, an overview of the present technology will be described with reference to FIG. 1.

[0051] A number of devices and methods for an operation of an object in real-world three-dimensional space, including a dedicated device, such as a 3-dimension (3D) mouse, gesture with a fingertip, and the like have been proposed. However, for the dedicated device, such as the 3D mouse, it is necessary that the dedicated device is operated by hand. For the gesture with a fingertip, the latency of pointing is large.

[0052] Furthermore, for midair (empty-field), a visual line cannot be localized (referred to as empty-field-myopia) due to a human vison-adjustment mechanism, and thus pointing or an object operation with the visual line is difficult to perform.

[0053] In other words, as illustrated in A of FIG. 1, when a visually recognizable object A is present, a user 1 can focus thereon. In contrast to this, although the user 1 desires to focus on the position of the object A so as to gaze, when no object is present as indicated with a dotted-line star, the user 1 has difficulty in focusing.

[0054] Thus, according to the present technology, even when no real object is present, display of a virtual measure 4 in a wearable display device 3 assists the user 1 in focusing. In other words, according to the present technology, as illustrated in B of FIG. 1, display control of displaying the virtual measure 4 that is a virtual object that assists the visual line in localizing in midair, is performed in the wearable display device 3 (display device), so that the focusing of the user 1 is assisted. The virtual measure 4 that is one stereoscopic object including a virtual object to be viewed stereoscopically (visible stereoscopically), is disposed, for example, in a predetermined direction, such as a depth direction extending ahead of the user 1, a horizontal direction, an oblique direction, or a bent direction, in the visual field of the user 1, and indicates distance in the predetermined direction. The virtual measure 4 assists the visual line in localizing in the midair, and makes an improvement such that the localizing of the visual line into the midair is facilitated. Note that the wearable display device 3 includes, for example, a see-through display, a head-mounted display, or the like.

[0055] This arrangement can achieve three-dimensional pointing including midair space, with the visual line and the virtual measure.

Example 1:* Exemplary Virtual Object Operation*

[0056] FIG. 2 is a diagram for describing exemplary disposition simulation of virtual furniture as a virtual object operation. In the example of FIG. 2, a user 1 who is wearing a wearable display device 3 is located in real-world three-dimensional space (or virtual three-dimensional space) 11. A table 13 is disposed as one piece of furniture in the real-world three-dimensional space 11. The wearable display device 3 is provided with an environment recognition camera 12 that captures an image of the inside of the real-world three-dimensional space 11 in order to recognize the environment, and a display 20. Then, on the right side of FIG. 2, the image captured by the environment recognition camera 12 in the real-world three-dimensional space 11 (the image of the inside of the real-world three-dimensional space 11), is displayed on the display 20.

[0057] As illustrated in A of FIG. 2, although the user 1 attempts to dispose a virtual thing into an empty-field 14 above the table 13 in the real-world three-dimensional space 11, namely, in midair, the user 1 cannot focus on the empty-field 14 above the table 13 in the real-world three-dimensional space 11 due to the human vison-adjustment mechanism as described above.

[0058] Thus, the wearable display device 3 displays, as indicated with an arrow P1, a virtual ruler 21 having a scale that enables a gaze, onto the display 20 displaying the inside of the real-world three-dimensional space 11. This arrangement enables the user 1 to focus on, with the virtual ruler 21, a desired position 22 on the virtual ruler 21, as indicated with an arrow P2. Note that the desired position 22 is displayed on the virtual ruler 21 when the user 1 focuses on the position.

[0059] In other words, the wearable display device 3 displays the virtual ruler 21 that is one example of the virtual measure 4, onto the display 20 displaying the inside of the real-world three-dimensional space 11. The virtual ruler 21 includes a flat-shaped stereoscopic object imitating a ruler, and has a scale having substantially regular intervals as information indicating distance. For example, the virtual ruler 21 is disposed slightly obliquely in a region (space) including the midair in which no object visible stereoscopically in the real space is present, with a longitudinal direction (a direction in which the scale is added) along the depth direction and a lateral direction facing vertically, in the visual field of the user 1. Note that the disposition direction in which the virtual ruler 21 (longitudinal direction) is disposed, is not limited to the depth direction. Furthermore, the disposition timing of the virtual measure 21 may be determined on the basis of retention of the visual line or may be determined on the basis of an operation of the user 1 on a graphical user interface (GUI), such as a setting button 51 illustrated in FIG. 6 to be described later.

[0060] While the user 1 is continuously gazing at the desired position 22 at which the user 1 desires to dispose the virtual thing, as illustrated in B of FIG. 2, the wearable display device 3 measures whether the retention level of a 3D attention point is a threshold value or less. Here, a circle surrounding the desired position 22 indicates a retention-level threshold range 25 in which the retention level of the 3D attention point is the threshold value or less. Then, as indicated with an arrow P11, the wearable display device 3 displays, at a location at which the retention level of the 3D attention point on the display 20 is the threshold value or less, the desired position 22 indicating the location, and displays a progress mark 23 indicating that the same position is being viewed, in the neighborhood of the desired position 22. Thereafter, as indicated with an arrow P12, the virtual thing 24 can be set in the empty-field 14.

[0061] In other words, the wearable display device 3 determines the gaze of the user 1, on the basis of the intersection between the visual line of the user 1 and the virtual ruler 21. In other words, the intersection between the visual line of the user 1 and the virtual ruler 21, is detected in the wearable display device 3. The intersection is a point to which the user 1 attempts to give attention with performance of focusing (with fixation of the visual line) (a point on which the user 1 fixes the visual line in the real-world three-dimensional space or the virtual three-dimensional space), and hereinafter is also referred to as the 3D attention point. As illustrated in B of FIG. 2, the wearable display device 3 determines whether a retention level corresponding to a level of retention range in which the 3D attention point is being retained, is the threshold value or less over a predetermined period (retention-level threshold determination). For example, in a case where the retention level is the threshold value or less over the predetermined period, the wearable display device 3 determines that the user 1 has gazed at the position 22 in the retention range of the 3D attention point. Therefore, while the user 1 is continuously performing the focusing on the position 22 (continuously fixing the visual line), it is determined that the user 1 has gazed. During the performance of the retention-level threshold determination, the wearable display device 3 displays, at the position 22 in the retention range in which the retention level of the 3D attention point on the display 20 is the threshold value or less, a point as an object indicating the position 22, and displays the progress mark 23 indicating the progress of the state in which the same position 22 is being viewed, in the neighborhood of the position 22, as indicated with the arrow P11. The progress mark 23 indicates the time (elapse) during which the retention level is the threshold value or less. After it is determined that the user 1 has gazed at the position 22, for example, the position 22 is regarded as a 3D gaze point at which the user 1 is gazing at, and the wearable display device 3 sets the virtual thing 24 at the position 22, as indicated with the arrow P12.

[0062] Then, after the setting of the virtual thing 24, as illustrated in C of FIG. 2, when the user 1 strikes any pose, such as coming close to the table 13, or the like in the real-world three-dimensional space 11, as indicated with an arrow P21, the wearable display device 3 displays, in response to the pose of the user 1, the virtual thing 24 onto the display 20 with simultaneous localization and mapping (SLAM) to be described later with reference to FIG. 6. Therefore, the user can verify the virtual thing 24 in response to the pose of the user 1.

Example 2:* Exemplary Real Object Operation*

[0063] FIG. 3 is a diagram for describing an exemplary drone operation as a real object operation in a real world. In the example of FIG. 3, a user 1 who is wearing a wearable display device 3 is located in real-world three-dimensional space 32. A drone 31 is disposed in the real-world three-dimensional space 32. The wearable display device 3 is provided with an environment recognition camera 12 and a display 20, similarly to the example of FIG. 2. On the right side of FIG. 3, an image captured by the environment recognition camera 12 (an image of the sky having clouds floating) in the real-world three-dimensional space 32, is displayed on the display 20.

[0064] As illustrated in A of FIG. 3, even when the user 1 attempts to move the drone 31 into an empty-field 14 that is midair in the real-world three-dimensional space 32, the user 1 cannot focus on the empty-field 14 in the real-world three-dimensional space 32 due to the human vison-adjustment mechanism as described above.

[0065] Thus, as indicated with an arrow P31, the wearable display device 3 displays a virtual ruler 21 that enables a gaze, onto the display 20. This arrangement enables the user 1 to focus on, with the virtual ruler 21, a desired position 22 in the empty-field 14, as indicated with an arrow P32.

[0066] While the user 1 is continuously gazing at the desired position 22 to which the user 1 desires to move the drone 31, as illustrated in B of FIG. 3, the wearable display device 3 measures whether the retention level of a 3D attention point is a threshold value or less. Then, as indicated with an arrow P41, the wearable display device 3 displays, at a location at which the retention level of the 3D attention point on the display 20 is the threshold value or less, the desired position 22 indicating the location, and displays a progress mark 23 indicating that the same position is being viewed, in the neighborhood of the desired position 22. Thereafter, as indicated with an arrow P42, the drone 31 can be moved into the empty-field 14 (desired position 22 thereof). Note that, in practice, the wearable display device 3 transmits positional information to the drone 31, to move the drone 31.

[0067] In other words, while the user 1 is continuously fixing the visual line at the position 22 to which the user 1 desires to move the drone 31, an object indicating the position 22 and the progress mark 23 are displayed (the arrow P41 of B of FIG. 3), similarly to the case of FIG. 2. Thereafter, it is determined that the user 1 has gazed, as indicated with the arrow P42, the drone 31 is moved to the position 22 at which the user 1 is gazing.

[0068] Then, after the movement of the drone 31, as illustrated in C of FIG. 3, the user 1 can verify, for example, the drone 31 moved to the desired position 22 in the real-world three-dimensional space 32.

Example 3:* Exemplary Virtual Camera Visual-Point Movement*

[0069] FIG. 4 is a diagram for describing an exemplary visual-point warp as a virtual camera visual-point movement in a virtual world. In the example of FIG. 4, a user 1 who is wearing a wearable display device 3, is located in virtual three-dimensional space 35. The wearable display device 3 is provided with an environment recognition camera 12 and a display 20, similarly to the example of FIG. 2. On the right side of FIG. 4, an image captured by the environment recognition camera 12 (an image of a house viewed diagonally from the front) in the virtual three-dimensional space 35, is displayed on the display 20.

[0070] As illustrated in A of FIG. 4, the user 1 who is playing with a subjective visual point, attempts to view an empty-field 14 that is midair and is the position of a visual-point switching destination in the virtual three-dimensional space 35, in order to make a switch to a bird’s-eye visual point. However, even when the user 1 attempts to view the empty-field 14 that is the midair, as indicated with an arrow P51, the user 1 cannot focus on the empty-field 14 in the virtual three-dimensional space 35 due to the human vison-adjustment mechanism as described above.

[0071] Thus, as indicated with an arrow P52, the wearable display device 3 displays a virtual ruler 21 that enables a gaze, onto the display 20 displaying the midair (an image of the sky in which clouds are floating). The virtual ruler 21 superimposed on the image of the empty-field 14 (namely, the sky), is displayed on the display 20. This arrangement enables the user 1 to focus on, with the virtual ruler 21, a desired position 22 in the visual-point switching destination (empty-field 14), as indicated with an arrow P52.

[0072] While the user 1 is continuously gazing at the desired position 22 in the visual-point switching destination, as illustrated in B of FIG. 4, the wearable display device 3 measures whether the retention level of a 3D attention point is a threshold value or less. Then, as indicated with an arrow P61, the wearable display device 3 displays, at a location at which the retention level of the 3D attention point on the display 20 is the threshold value or less, the desired position 22 indicating the location, and displays a progress mark 23 indicating that the same position is being viewed, in the neighborhood of the desired position 22. Thereafter, as indicated with an arrow P62, the camera initial point can be switched to the desired position 22 in the empty-field 14. As a result, an image of the house viewed from above (desired position 22) (bird’s-eye image) is displayed on the display 20.

[0073] In other words, while the user 1 is continuously fixing the visual line at the desired position 22 in the visual-point switching destination, an object indicating the position 22 and the progress mark 23 are displayed (the arrow P61 of B of FIG. 4), similarly to the case of FIG. 2. Thereafter, it is determined that the user 1 has gazed, as indicated with the arrow P62, the camera visual point (a visual point from which an image to be displayed on the display 20 is viewed) is switched to the position 22 at which the user 1 is gazing. As a result, the image of the house viewed from above (desired position 22) (bird’s-eye image) is displayed on the display 20.

[0074] Then, for example, as illustrated in C of FIG. 4, the user 1 can have a bird’s eye view at the desired position 22 as the camera visual point in the virtual three-dimensional space 35, for example.

Modification 1:* Exemplary Virtual Measure*

[0075] FIG. 5 is a diagram illustrating an exemplary different virtual measure. In the example of FIG. 5, as a virtual measure instead of a virtual ruler 21, spheres 41 as a plurality of virtual objects disposed at substantially regular intervals, are displayed on a display 20. In other words, in FIG. 5, the virtual measure includes the spheres 41 as the plurality of virtual objects, and the plurality of spheres 41 is disposed at the substantially regular intervals correspondingly in a depth direction and a horizontal direction as predetermined directions. The disposition of the plurality of spheres 41 at the substantially regular intervals correspondingly in the depth direction and the horizontal direction, allows the plurality of spheres 41 to indicate distance (interval) correspondingly in the depth direction and the horizontal direction. Although a 2D visual-point pointer 42 of a user 1 is located at a position different from those of the plurality of spheres 41, as indicated with an arrow P71, a gaze can be allowed immediately. The 2D visual-point pointer 42 indicates the position the user 1 is viewing (is performing focusing on).

[0076] For example, the color of the sphere 41 at which the 2D visual-point pointer 42 of the user 1 is disposed (namely, the visual line of the user 1 is fixed), is changed, or the like so that feedback can be promptly performed to the user 1. In other words, for the plurality of spheres 41 as the virtual measure, the display of at least one of the plurality of spheres 41 can be changed in response to the visual line of the user 1. Specifically, for example, the color, the luminance, the shape, the size, or the like of the sphere 41 at which the visual line of the user 1 is fixed, can be changed.

[0077] Moreover, as indicated with an arrow P72, it is necessary that a display with addition of, for example, additional information indicating the position of the 2D visual-point pointer 42 having “an altitude of 15 m and a distance of 25 m”, to only the sphere 41 at which the 2D visual-point pointer 42 is disposed (namely, the visual line of the user 1 is fixed), facilitates viewing and additionally the visual field of the user 1 is prevented from being obstructed as much as possible. In other words, for the plurality of spheres 41 as the virtual measure, the additional information regarding at least one of the plurality of spheres 41 can be displayed in response to the visual line of the user 1. Specifically, for example, information indicating the position of the sphere 41 at which the visual line of the user 1 is fixed, or the like can be displayed.

[0078] Note that, in the example of FIG. 5, although the plurality of spheres is provided, any object may be provided as long as assistance is given. In other words, in the example of FIG. 5, although the virtual measure includes the plurality of spheres, any virtual object having a shape different from a sphere may be provided as long as the focusing of the user 1 is assisted.

Modification 2:* Exemplary Object Fine-Adjustment*

[0079] Next, object fine-adjustment from a plurality of visual points with SLAM will be described with reference to FIG. 6. Note that the SLAM (position and attitude estimation) is a technique of estimating, with an image of a camera, a map and a position from change information regarding the image, to acquire the position and the attitude of the camera itself in real time.

[0080] In the example of FIG. 6, a user 1 who is wearing a wearable display device 3, attempts to set an object on a table 13. The wearable display device 3 is provided with an environment recognition camera 12 and a visual-line recognition camera 50. Thus, a case is assumed where the wearable display device 3 performs first-time visual-line estimation and gaze determination and second-time visual-line estimation and gaze determination. The visual-line estimation includes processing of estimating the visual line of the user 1, and the gaze determination includes processing of determining whether the user 1 has gazed, with the visual line of the user 1. Note that, in FIG. 6, only the description of the “visual-line estimation” is given from the “visual-line estimation” and the “gaze determination”, and the description of the “gaze determination” is omitted.

[0081] In the example of FIG. 6, a display 20-1 represents a display 20 after the first-time gaze estimation, and a display 20-2 represents the display 20 after the gaze estimation due to a second-time gaze. In other words, the display 20-1 represents the display 20 after the first-time visual-line estimation and gaze determination, and the display 20-2 represents the display 20 after the second-time visual-line estimation and gaze determination. A setting button 51, a provisionally-setting button 52, and a cancel button 53 displayed on the displays 20-1 and 20-2, each can be selected by gazing. Note that, as indicated with hatching, the display 20-1 has the provisionally-setting button 52 selected, and the display 20-2 has the setting button 51 selected.

[0082] In other words, a first-time 3D gaze point 61 is calculated by the first-time visual-line estimation and gaze determination, and is provisionally set as indicated with the hatching of the provisionally-setting button 52. At that time, the display 20-1 displays, on the table 13, an object 55 provisionally set by the gaze in the first-time visual-line estimation. For example, the display is made with dotted lines because of the provisional setting.

[0083] Although the user 1 attempts to place the object at the center of the table 13, in practice, as interpreted from the position of the first-time 3D gaze point 61 calculated by the first-time visual-line estimation and gaze determination and the position of a second-time 3D gaze point 62 calculated by the second-time visual-line estimation and gaze determination, there is a possibility that the positions in the depth direction or the like are in disagreement even in agreement in the right-and-left direction.

[0084] At this time, the use of the technique of the SLAM in the wearable display device 3, enables the position of the provisionally-set first-time 3D gaze point 61 to be verified with the object 55 on the display 20-2 from a second-time visual point different from a first-time visual point, on the basis of a result of the position and attitude estimation due to the SLAM. Moreover, the first-time 3D gaze point 61 is readjusted from the second-time visual point with verification as an object 56 on the display 20-2, so that the object 56 can be set as indicated with the hatching of the setting button 51. Note that, the display 20-2 displays the object 56 more clearly than the object 55.

[0085] Note that specific examples of the object fine-adjustment of FIG. 6 in Examples 1 to 3 will be described individually.

Object Fine-Adjustment in Example 1

[0086] Next, object fine-adjustment with the virtual object operation described in FIG. 2, will be described with reference to FIGS. 7 and 8.

[0087] A of FIG. 7 and A of FIG. 8 each illustrate the visual field of the user viewed through the display 20 that is see-through, for example. B of FIG. 7 and B of FIG. 8 are bird’s-eye views in world coordinates, illustrating the cases of A of FIG. 7 and A of FIG. 8, respectively.

[0088] In the example of A of FIG. 7, the table 13 is disposed as one piece of furniture in the real-world three-dimensional space 11 viewed through the display 20, and the wearable display device 3 displays the virtual ruler 21 having the scale that enables a gaze, on the display 20.

[0089] In B of FIG. 7, the virtual ruler 21 is displayed at a constant angle with respect to the facing direction of the user 1. In other words, the virtual ruler 21 is disposed (substantially) in the depth direction in the visual field of the user. Furthermore, the virtual ruler 21 has the scale indicating distance regarding the depth direction, and the scale is disposed (displayed) such that the distance regarding the depth direction is indicated. Note that the intervals of the scale of the virtual ruler 21 and the display direction are not limited to the example of A of FIG. 7 (namely, the user 1 can set the intervals and the display direction). After the intervals and the display direction are determined, the virtual ruler 21 moves in conjunction with movement of the head of the user 1. As illustrated in A of FIG. 7 and B of FIG. 7, a 3D gaze point 61 is acquired at the intersection between the visual line of the user indicated with a dotted-line arrow and the virtual ruler 21, above the table 13.

[0090] In the example of A of FIG. 8, the technique of the SLAM causes, after the user 1 moves from the position of B of FIG. 7 to the position illustrated in B of FIG. 8, a result 55 on the gaze point 61 basis before the movement and a result 56 on the current gaze point 62 basis, to be superimposed on the display 20 with the virtual ruler 21 before the movement, remaining displayed. In other words, the object 55 disposed at the 3D gaze point 61 before the movement and the object 56 disposed at the current 3D gaze point 62, are displayed on the display 20. Then, because the virtual ruler 21 before the movement, remains displayed, after the movement of the user 1, the virtual ruler 21 is disposed (substantially) in the horizontal direction when viewed from the user, and the scale included in the virtual ruler 21 indicates distance regarding the horizontal direction.

[0091] The user 1 can update the set location that is the result 56 on the current 3D gaze point 62 basis and can perform fine adjustment, any number of times from any position.

Object Fine-Adjustment in Example 2

[0092] Next, object fine-adjustment with the real object operation described in FIG. 3, will be described with reference to FIGS. 9 and 10.

[0093] A of FIG. 9 and A of FIG. 10 each illustrate the visual field of the user viewed through the display 20. B of FIG. 9 and B of FIG. 10 are bird’s-eye views in world coordinates, illustrating the cases of A of FIG. 9 and A of FIG. 10, respectively.

[0094] In the example of A of FIG. 9, the sky having clouds floating is present in the real-world three-dimensional space 32 viewed through the display 20, and the wearable display device 3 displays the virtual ruler 21 having the scale that enables a gaze, on the display 20.

[0095] In B of FIG. 9, the virtual ruler 21 is displayed at a constant angle with respect to the facing direction of the user 1. Note that the intervals of the scale of the virtual ruler 21 and the display direction are not limited to the example of A of FIG. 9 (namely, the user 1 can set the intervals and the display direction). After the intervals and the display direction are determined, the virtual ruler 21 moves in conjunction with movement of the head of the user 1. As illustrated in A of FIG. 9 and B of FIG. 9, a 3D gaze point 61 is acquired at the intersection between the visual line of the user indicated with a dotted-line arrow and the virtual ruler 21.

[0096] In the example of A of FIG. 10, the technique of the SLAM causes, after the user 1 moves from the position illustrated in B of FIG. 9 to the position illustrated in B of FIG. 10, a drone 65 drawn at the position of a result on the 3D gaze point 61 basis before the movement and a moved position 66 that is a result on the current 3D gaze point 62 basis, to be superimposed on the display 20 with the virtual ruler 21 before the movement, remaining displayed.

[0097] The user 1 can update the moved position 66 that is a result on the current 3D gaze point 62 basis and can perform fine adjustment, any number of times from any position.

Object Fine-Adjustment in Example 3

[0098] Next, object fine-adjustment with the virtual camera visual-point movement described in FIG. 4, will be described with reference to FIGS. 11 and 12.

[0099] A of FIG. 11 and A of FIG. 12 each illustrate the visual field of the user viewed through the display 20. B of FIG. 11 and B of FIG. 12 are bird’s-eye views in world coordinates, illustrating the cases of A of FIG. 11 and A of FIG. 12, respectively.

[0100] In the example of A of FIG. 11, the sky having clouds floating is present in the virtual three-dimensional space 35 viewed through the display 20, and the wearable display device 3 displays the virtual ruler 21 having the scale that enables a gaze, on the display 20.

[0101] In B of FIG. 11, the virtual ruler 21 is displayed at a constant angle with respect to the facing direction of the user 1. Note that the intervals of the scale of the virtual ruler 21 and the display direction are not limited to the example of A of FIG. 11 (namely, the user 1 can set the intervals and the display direction). After the intervals and the display direction are determined, the virtual ruler 21 moves in conjunction with movement of the head of the user 1. As illustrated in A of FIG. 11 and B of FIG. 11, a 3D gaze point 61 is acquired at the intersection between the visual line of the user indicated with a dotted-line arrow and the virtual ruler 21.

[0102] In the example of A of FIG. 12, the technique of the SLAM causes, after the user 1 moves from the position illustrated in B of FIG. 11 to the position illustrated in B of FIG. 12, a user itself 67 drawn at the position of a result on the 3D gaze point 61 basis before the movement and a moved position 68 that is a result on the current 3D gaze point 62 basis, to be superimposed on the display 20 with the virtual ruler 21 before the movement, remaining displayed.

[0103] The user 1 can update the moved position 68 that is a result on the current 3D gaze point 62 basis and can perform fine adjustment, any number of times from any position.

[0104] As described above, according to the present technology, the use of the SLAM (or a position-estimation technique similar to the SLAM or the like) enables the object fine-adjustment to be performed from a plurality of visual points.

[0105] Note that the virtual objects to be displayed on the display 20 described above (e.g., a virtual thing, a virtual measure, a progress mark, and a sphere) are stereoscopic images to be viewed stereoscopically (visible stereoscopically), each including a right-eye image and a left-eye image each having a binocular disparity and a vergence angle. That is, each virtual object has a virtual-image position in the depth direction (each virtual object is displayed as if each virtual object is present at a predetermined position in the depth direction). In other words, for example, setting the binocular disparity or the vergence angle enables each virtual object to have a desired virtual-image position (each virtual object is displayed to the user as if each virtual object is present at the desired position in the depth direction).

2.* Second Embodiment*

External Appearance of Wearable Display Device

[0106] FIG. 13 is a diagram illustrating an exemplary configuration of the external appearance of a wearable display device as an image processing device that is one information processing device to which the present technology has been applied. Note that the wearable display device of FIG. 13 performs the virtual object operation described with reference FIG. 2.

[0107] In the example of FIG. 13, the wearable display device 3 has a spectacle type, and is worn on the face of a user 1. The casing of the wearable display device 3 is provided with, for example, a display 20 (display unit) including a right-eye display unit 20A and a left-eye display unit 20B, environment recognition cameras 12, visual-line recognition cameras 50, and LEDs 71.

[0108] The lens portions of the wearable display device 3 are included in, for example, the display 20 that is see-through, and the environment recognition cameras 12 are provided at portions above both eyes, on the outside of the display 20. At least one environment recognition camera 12 may be provided. The environment recognition cameras 12 each may include, but are not limited to, an RGB camera.

[0109] The LEDs 71 are provided individually up and down, and, right and left around both eyes, the LEDs 71 facing inward (to the face) from the display 20. Note that the LEDs 71 are used for visual-line recognition and preferably at least two LEDs 71 may be provided for one eye. In other words, at least two LEDs 71 may be provided for one eye.

[0110] Moreover, the visual-line recognition cameras 50 are provided at portions below both eyes, the visual-line recognition cameras 50 facing inward from the display 20. Note that at least one visual-line recognition camera 50 may be provided for one eye. For visual-line recognition for both eyes, at least two infrared cameras are provided. Furthermore, in visual-line recognition with a corneal reflex method, at least two LEDs 71 are provided for one eye and at least four LEDs 71 are provided for the visual-line recognition for both eyes.

[0111] In the wearable display device 3, the portions corresponding to the lenses of a pair of spectacles, are included in the display 20 (the right-eye display unit 20A and the left-eye display unit 20B). When the user 1 wears the wearable display device 3, the right-eye display unit 20A is located in the neighborhood ahead of the right eye of the user 1 and the left-eye display unit 20B is located in the neighborhood ahead of the left eye of the user.

[0112] The display 20 includes a transmissive display that transmits light therethrough. Therefore, the right eye of the user 1 can view, through the right-eye display unit 20A, a real-world sight (transmissive image) on the back side thereof, namely, ahead of the right-eye display unit 20A (in front when viewed from the user 1 (in the depth direction)). Similarly, the left eye of the user 1 can view, through the left-eye display unit 20B, a real-world sight (transmissive image) on the back side thereof, namely, ahead of the left-eye display unit 20B. Therefore, the user 1 views an image displayed on the display 20, the image being superimposed on the near side of a real-world sight ahead of the display 20.

[0113] The right-eye display unit 20A displays an image (right-eye image) to be viewed to the right eye of the user 1, and the left-eye display unit 20B displays an image (left-eye image) to be viewed to the left eye of the user 1. That is, the display 20 causes each of the right-eye display unit 20A and the left-eye display unit 20B to display an image having a disparity, so that a stereoscopic image to be viewed stereoscopically (stereoscopic object) is displayed.

[0114] The stereoscopic image includes the right-eye image and the left-eye image each having the disparity. Controlling the disparity (or vergence angle), namely, for example, controlling, to the position of a subject viewed in one of the right-eye image and the left-eye image, a shift amount in the horizontal direction of the position of the same subject viewed in the other image, enables the subject to be viewed at a position far away from the user 1 or to be viewed at a position near the user 1. That is, the stereoscopic image can be controlled in terms of the depth position (not the actual display position of the image, but the position at which the user 1 views the image as if the image is present thereat (virtual-image position)).

[0115] FIG. 14 is a block diagram illustrating an exemplary configuration of the wearable display device of FIG. 13.

[0116] In the example of FIG. 14, the wearable display device 3 includes an environment recognition camera 12, a display 20, a visual-line recognition camera 50, and an image processing unit 80. The image processing unit 80 includes a visual-line estimation unit 81, a 2D visual-line operation reception unit 82, a 2D visual-line information DB 83, a coordinate-system conversion unit 84, a 3D attention-point calculation unit 85, a gaze determination unit 86, a coordinate-system conversion unit 87, a gaze-point DB 88, a camera-display relative position and attitude DB 89, a coordinate-system conversion unit 90, a position and attitude estimation unit 91, an environment-camera position and attitude DB 92, a drawing control unit 93, and a 3D attention-point time-series DB 94. Note that the drawing control unit 93 may be exemplarily regarded as at least one of a display control unit or an object control unit according to the present disclosure.

[0117] The visual-line estimation unit 81 consecutively estimates the visual line of the user 1 from an image input from the visual-line recognition camera 50. The estimated visual line includes, for example, a “pupillary position” and a “visual-line vector” in a visual-line recognition camera coordinate system having the visual-line recognition camera 50 as the origin. The information is supplied to the 2D visual-line operation reception unit 82, the 2D visual-line information DB 83, and the coordinate-system conversion unit 84. The visual-line recognition adopts, for example, a pupillary and corneal reflex method, but may adopt another visual-line recognition method, such as a sclerotic reflex method, a Double Purkinje method, an image processing method, a search coil method, or an electro-oculography (EOG) method. Note that the visual line of the user 1 may be estimated, for example, as the orientation of the environment recognition camera 12 (the optical axis of the environment recognition camera 12). Specifically, the orientation of the camera to be estimated with an image captured by the camera 12, may be estimated as the visual line of the user. In other words, it should be noted that the adoption of a visual-line recognition method of capturing an eyeball of the user 1 is not essential to the estimation of the visual line of the user 1.

[0118] With the visual line from the visual-line estimation unit 81 and data in camera-display relative position and attitude relationship from the camera-display relative position and attitude DB 89, the 2D visual-line operation reception unit 82 acquires 2D visual-line coordinates on the display 20 (2D gaze-point coordinates), receives a menu operation, selects and sets a virtual measure. The 2D visual-line coordinates (2D gaze-point coordinates) on the display 20 means two-dimensional coordinates information regarding where the visual line of the user is located on the display 20.

[0119] The 2D visual-line information DB 83 records the menu operation received by the 2D visual-line operation reception unit 82 and information regarding the virtual measure (e.g., the desired position 22 of FIG. 2) as a state. The type of the virtual measure with the 2D visual line and the position and attitude of the virtual measure in a viewpoint coordinate system are recorded in the 2D visual-line information DB 83.

[0120] With the data in camera-display relative position and attitude relationship from the camera-display relative position and attitude DB 89, the coordinate-system conversion unit 84 converts the visual line in the visual-line recognition camera coordinate system from the visual-line estimation unit 81, into the visual line in the viewpoint coordinate system of the display 20.

[0121] The 3D attention-point calculation unit 85 acquires the intersection between the virtual measure recorded in the 2D visual-line information DB 83 and the visual line in the viewpoint coordinate system converted by the coordinate-system conversion unit 84, and calculates 3D attention-point coordinates. The calculated 3D attention-point coordinates are accumulated in the 3D attention-point time-series DB 94.

[0122] In other words, the 3D attention-point calculation unit 85 calculates a 3D attention point that is the intersection between the virtual measure recorded in the 2D visual-line information DB 83 and the visual line in the viewpoint coordinate system converted by the coordinate-system conversion unit 84.

[0123] The gaze determination unit 86 determines whether or not the user has gazed, with 3D attention-point time-series data from the 3D attention-point time-series DB 94. The average value, a mode value, or a median (intermediate value) in the time-series data is adopted as the final 3D gaze-point coordinates.

[0124] On the speed basis, the gaze determination unit 86 compares the speed of a coordinate variation in the 3D attention-point time-series data in a section, with a threshold value, and determines a gaze when the speed is the threshold value or less. On the dispersion basis, the gaze determination unit 86 compares the dispersion of a coordinate variation in the 3D attention-point time-series data in a section, with a threshold value, and determines a gaze when the dispersion is the threshold value or less. The coordinate variation, the speed, and the dispersion each correspond to the retention level described above. Note that both methods on the speed basis and the dispersion basis can make a determination from a one-eye visual line, but can also use a both-eyes visual line. In that case, the midpoint between the 3D attention points is handled as the 3D attention point of both eyes.

[0125] With camera-display relative position and attitude data from the camera-display relative position and attitude DB 89, environment-camera position and attitude in the latest world coordinate system that is a world standard, from the environment-camera position and attitude DB 92, and a 3D gaze point in the viewpoint coordinate system from the gaze determination unit 86, the coordinate-system conversion unit 87 converts the 3D gaze point in the viewpoint coordinate system into the 3D gaze point in the world coordinate system, and records the 3D gaze point into the gaze-point DB 88. The coordinate-system conversion unit 87 can function as a gaze-point calculation unit that calculates the 3D gaze point in the world coordinate system, on the basis of the environment-camera position and attitude (the position and attitude of the user) in the latest world coordinate system that is a world standard, from the environment-camera position and attitude DB 92, and the 3D gaze point (a point acquired from the 3D attention point that is the intersection between the visual line and the virtual measure) in the viewpoint coordinate system from the gaze determination unit 86.

[0126] The 3D gaze point in the world coordinate system, converted by the coordinate-system conversion unit 87, is accumulated in the gaze-point DB 88.

[0127] Data in position and attitude relationship between the visual-line recognition camera 50, the environment recognition camera 12, and the display 20 is recorded in the camera-display relative position and attitude DB 89. The position and attitude relationship therebetween is calculated in advance in factory calibration.

[0128] With the camera-display relative position and attitude data from the camera-display relative position and attitude DB 89, the environment-camera position and attitude in the latest world coordinate system from the environment-camera position and attitude DB 92, and the coordinates of the 3D gaze point in the world coordinate system from the gaze-point DB 88, the coordinate-system conversion unit 90 converts the 3D gaze point in the world coordinate system into the 3D gaze point in the viewpoint coordinate system at the point in time.

[0129] The environment-camera position and attitude estimation unit 91 consecutively estimates the position and attitude of the environment recognition camera 12 (the user 1 who is wearing the environment recognition camera 12), from the image of the environment recognition camera 12. Self-position estimation adopts the environment recognition camera 12 and the technique of the SLAM described above. Examples of other self-position estimation technologies include GPS, WIFI, IMU (a triaxial acceleration sensor+a triaxial gyroscope sensor), RFID, visible-light communication positioning, and object recognition (image authentication). Although the techniques have problems in terms of processing speed and precision, the techniques can be used instead of the SLAM. Even for the use of the environment recognition camera 12 and the SLAM, any of the techniques described above is available for standard determination of the world coordinate system (initialization). For example, the environment-camera position and attitude estimation unit 91 can be regarded as a position and attitude estimation unit that estimates the position and attitude of the user who is wearing the wearable display device 3 in the real-world or virtual three-dimensional space.

[0130] The environment-camera position and attitude DB 92 records the latest position and attitude at the point in time from the environment-camera position and attitude estimation unit 91.

[0131] The drawing control unit 93 controls drawing on the display 20 with the 2D visual line, drawing of the virtual measure, based on the information in the 2D visual-line information DB 83, and drawing of a virtual object at the 3D gaze point, based on the 3D gaze point in the viewpoint coordinate system converted by the coordinate-system conversion unit 90. In other words, the drawing control unit 93 can function as the display control unit that controls the display of the point on the display 20 the user is viewing and the display of the virtual measure, and the display of the virtual object at the 3D gaze point, based on the 3D gaze point in the viewpoint coordinate system converted by the coordinate-system conversion unit 90, or can be function as the object control unit that controls the object. The time-series data of the calculated 3D attention-point coordinates calculated by the 3D attention-point calculation unit 85, is recorded in the 3D attention-point time-series DB 94.

[0132] Note that the drawing control unit 93 performs processing of generating a stereoscopic object (stereoscopic image) including a left-eye image and a right-eye image, the stereoscopic object being to be displayed on the display 20 as a drawing. Then, the drawing control unit 93 causes the display 20 to display the generated stereoscopic object.

[0133] For example, the drawing control unit 93 sets the virtual-image position of each stereoscopic object. Then, the drawing control unit 93 controls the display 20 to display such that the stereoscopic object is viewed stereoscopically as if the stereoscopic object is present at the virtual-image position set to the stereoscopic object.

[0134] In order to display such that the stereoscopic object is viewed stereoscopically as if the stereoscopic object is present at the virtual position set to the stereoscopic object, the drawing control unit 93 sets a disparity or a vergence angle for the stereoscopic object, and generates the left-eye image and the right-eye image as the stereoscopic object having the disparity or the vergence angle occurring. Any method of generating the stereoscopic image, is provided. For example, Japanese Patent Application Laid-Open No. H08-322004 discloses a stereoscopic display device having a means that electrically shifts, in the horizontal direction, an image displayed on a display screen such that the vergence angle to a diopter scale is substantially constant in real time. Furthermore, Japanese Patent Application Laid-Open No. H08-211332 discloses a three-dimensional image reproduction device that acquires a stereoscopic image with a binocular disparity, the three-dimensional image reproduction device having: a vergence-angle selection means that sets a vergence angle for viewing of a reproduction image; and a control means that controls the relative reproduction position between right and left images on the basis of information regarding the selected vergence angle. For example, the drawing control unit 93 can generate the stereoscopic object with any of the described methods.

Operation of Wearable Display Device

[0135] Next, virtual-object operation processing will be described with reference to the flowchart of FIG. 15. Note that each step of FIG. 15 is performed in parallel. In other words, although each step is sequenced, for convenience, in the flowchart of FIG. 15, each step is appropriately performed in parallel. A similar manner is made in a different flowchart.

[0136] The image from the environment recognition camera 12 is input into the environment-camera position and attitude estimation unit 91. At step S11, the environment-camera position and attitude estimation unit 91 performs environment recognition processing. Although the details of the environment recognition processing are to be described later with reference to FIG. 16, the processing causes the position and attitude of the environment recognition camera 12, estimated from the image from the environment recognition camera 12, to be recorded into the environment-camera position and attitude DB 92.

[0137] Furthermore, the image input from the visual-line recognition camera 50, is input into the visual-line estimation unit 81. At step S12, the visual-line estimation unit 81, the 2D visual-line operation reception unit 82, the coordinate-system conversion unit 84, the 3D attention-point calculation unit 85, and the gaze determination unit 86 perform visual-line estimation processing. Although the details of the visual-line estimation processing are to be described later with reference to FIG. 17, the processing causes a 2D gaze point to be acquired, a 3D gaze point to be acquired from the 2D gaze point, and the 3D gaze point to be converted into the 3D gaze point in the latest viewpoint coordinate system.

[0138] At step S13, the drawing control unit 93 performs drawing processing with the information in the 2D visual-line information DB 83 and the 3D gaze point in the viewpoint coordinate system converted by the coordinate-system conversion unit 90. Although the drawing processing is to be described later with reference to FIG. 18, the processing causes the drawing on the display 20 with the 2D visual line (drawing of the 2D visual-line coordinates on the display 20), the drawing of the virtual measure, and the drawing of the virtual object at the 3D gaze point, to be controlled, so that the drawings are made on the display 20. In other words, the display 20 displays, for example, the virtual measure and the virtual object disposed at the 3D gaze point.

[0139] At step S14, the 2D visual-line operation reception unit 82 determines whether or not the virtual-object operation processing is to be finished. At step S14, in a case where it is determined that the virtual-object operation processing is to be finished, the virtual-object processing of FIG. 15 is finished. Meanwhile, at step S14, in a case where it is determined that the virtual-object processing is not to be finished yet, the processing goes back to step S11 and the processing at and after step S11 is repeated.

……
……
……

本文链接：https://patent.nweon.com/13280

Sony Patent | Information Processing Device And Information Processing Method, And Recording Medium

您可能还喜欢...

分类

最新AR/VR行业分享

最新AR/VR论文

最新AR/VR行业招聘

Sony Patent | Information Processing Device And Information Processing Method, And Recording Medium

您可能还喜欢...

Sony Patent | Signal Processing Device And Signal Processing Method

Sony Patent | Optimization of eye capture conditions for each user and use case

Sony Patent | Information processing apparatus, information processing method, program, and information processing system

分类

最新AR/VR行业分享

最新AR/VR论文

最新AR/VR行业招聘