Microsoft Patent | Translating combinations of user gaze direction and predetermined facial gestures into user input instructions for near-eye-display (ned) devices

编辑：映维 | 分类：Microsoft | 2021年2月17日

Patent: Translating combinations of user gaze direction and predetermined facial gestures into user input instructions for near-eye-display (ned) devices

Drawings: Click to check drawins

Publication Number: 20210041949

Publication Date: 20210211

Applicant: Microsoft

Microsoft Patent | Translating combinations of user gaze direction and predetermined facial gestures into user input instructions for near-eye-display (ned) devices

Abstract

A Near-Eye-Display (NED) devices that translates combinations of user gaze direction and predetermined facial gestures into user input instructions. The NED device includes an eye tracking system and a display that renders computer-generated images within a user’s field-of-view. The eye tracking system may continually track the user’s eye movements with a high degree of accuracy to identify specific computer-generated images that a user is focused on. The eye tracking system may also identify various facial gestures such as, for example, left-eye blinks and/or right-eye blinks that are performed while the specific computer-generated images are being focused on. In this way, NED devices are enabled to identify combinations of user gaze direction and predetermined facial gestures and to translate these identified combinations into user input instructions that correspond to specific computer-generated images.

Claims

A Near-Eye-Display (NED) device, comprising: at least one sensor that is configured to generate eye tracking data associated with a first eye of a user and a second eye of the user; one or more processors; and at least one computer storage medium having computer executable instructions stored thereon which are executable by the one or more processors to: receive visual axis offset data that defines: a first spatial relationship between a first visual axis of the first eye and a first optical axis of the first eye, and a second spatial relationship between a second visual axis of the second eye and a second optical axis of the second eye; receive the eye tracking data from the at least one sensor; determine, based on the eye tracking data, pupil orientation parameters that indicate directions of the first optical axis of the first eye and the second optical axis of the second eye; determine, based on the visual axis offset data and the pupil orientation parameters, a vergence between the first visual axis, of the first eye, and the second visual axis of the second eye; and identify an object of focus within a field-of-view of the user based on the vergence between the first visual axis and the second visual axis.
The NED device of claim 1, wherein the computer executable instructions are further executable by the one or more processors to cause at least one display component to render, within the field-of-view, a virtual scene that includes a plurality of virtual objects, wherein the object of focus that is identified based on the vergence is an individual virtual object of the plurality of virtual objects.
The NED device of claim 2, wherein the computer executable instructions are further executable by the one or more processors to adjust, responsive to the individual virtual object being identified as the object of focus, a graphical appearance of the individual virtual object to visually distinguish the individual virtual object from at least one other virtual object of the plurality of virtual objects.
The NED device of claim 1, wherein the computer executable instructions are further executable by the one or more processors to cause at least one display component to render, within a virtual scene, a graphical indicator that indicates a current location of the vergence between the first visual axis and the second visual axis.
The NED device of claim 1, wherein the computer executable instructions are further executable by the one or more processors to: perform a predetermined computing action in association with the object of focus in response to the user performing a predetermined facial gesture while the vergence is aligned with the object of focus.
The NED device of claim 5, wherein the predetermined computing action includes exposing a virtual menu in association with the object of focus.
The NED device of claim 1, wherein the visual axis offset data is generated based on a user-specific calibration process that is performed in association with the user.
The NED device of claim 1, wherein the computer executable instructions are further executable by the one or more processors to: determine orientation parameters that correspond to a first iris-pupil plane of the first eye and a second iris-pupil plane of the second eye, wherein determining the vergence is further based on the orientation parameters.
The NED device of claim 1, wherein the computer executable instructions are further executable by the one or more processors to: determine, based on a minimization function, an accommodation plane that corresponds to a minimum distance between the first visual axis and the second visual axis, wherein the object of focus that is further identified based on the accommodation plane.
A computer-implemented method, comprising: receiving visual axis offset data that is indicative of spatial relationships between visual axes of a user and optical axes of a user; receiving eye tracking data from at least one sensor; determining, based on the eye tracking data, pupil orientation parameters that are indicative of real-time directions of the optical axes of the user; determining, based on the visual axis offset data and the pupil orientation parameters, a vergence between a first visual axis, of the visual axes, and a second visual axis of the visual axes; and generating data that is indicative of a position of an object of focus based on the vergence between the first visual axis and the second visual axis.
The computer-implemented method of claim 10, further comprising: causing at least one display component to render a virtual scene that includes a plurality of virtual objects, wherein the object of focus that is identified based on the vergence is an individual virtual object of the plurality of virtual objects.
The computer-implemented method of claim 11, further comprising: responsive to the individual virtual object being identified as the object of focus, adjusting a graphical appearance of the individual virtual object to visually distinguish the individual virtual object from at least one other virtual object of the plurality of virtual objects.
The computer-implemented method of claim 10, further comprising: determining an accommodation plane that corresponds to a minimum distance between the first visual axis and the second visual axis, wherein generating the data that is indicative of the position of the object of focus is further based on the accommodation plane that corresponds to the minimum distance between the first visual axis and the second visual axis.
The computer-implemented method of claim 10, wherein the visual axis offset data is indicative of a user-specific calibration profile.
The computer-implemented method of claim 10, further comprising determining a focal point within a real-world environment based on the vergence between the first visual axis and the second visual axis.
The computer-implemented method of claim 15, wherein determining the focal point includes determining an accommodation plane that corresponds to the vergence between the first visual axis and the second visual axis.
A computing system, comprising: at least one sensor that is configured to generate eye tracking data; one or more processors; and at least one computer storage medium having computer executable instructions stored thereon which are executable by the one or more processors to: receive visual axis offset data that is generated for a user based on a user-specific calibration process; receive the eye tracking data from the at least one sensor; determine pupil orientation parameters based on the eye tracking data, wherein the pupil orientation parameters are indicative of real-time directions of optical axes of the user; determine, based on the visual axis offset data and the pupil orientation parameters, a vergence between a first visual axis, of the user, and a second visual axis of the user; and cause a display to render a computer-generated graphic within a field-of-view of the user based on the vergence between the first visual axis and the second visual axis.
The computing system of claim 17, wherein causing the display to render the computer-generated graphic includes causing the display to render a focal location indicated within a computer-generated scene.
The computing system of claim 17, wherein causing the display to render the computer-generated graphic includes adjusting a graphical appearance of the computer-generated graphic to visually distinguish the computer-generated graphic from at least one other computer-generated graphic that is being rendered by the display within a virtual scene.
The computing system of claim 17, wherein the computer executable instructions are further executable by the one or more processors to: determine a focal point within a real-world environment based on the vergence between the first visual axis and the second visual axis.

Description

PRIORITY APPLICATION

[0001] This U.S. non-provisional application is a continuation application that claims benefit of and priority to U.S. Non-Provisional application Ser. No. 16/383,474, filed on Apr. 12, 2019, entitled TRANSLATING COMBINATIONS OF USER GAZE DIRECTION AND PREDETERMINED FACIAL GESTURES INTO USER INPUT INSTRUCTIONS FOR NEAR-EYE-DISPLAY (NED) DEVICES, which claims the benefit of and priority to U.S. Non-Provisional application Ser. No. 16/168,319, filed on Oct. 23, 2018, entitled EYE TRACKING SYSTEMS AND METHODS FOR NEAR-EYE-DISPLAY (NED) DEVICES, the entire contents of which are incorporated herein by reference.

BACKGROUND

[0002] Near-Eye-Display (NED) systems are promising tools for increasing the productivity and efficiency with which people are able to perform a variety of complex tasks. This is largely due to the ability of NED systems to superimpose computer-generated images (“CG images”) over a person’s view of a real-world environment while the professional is performing a complex task. In this way, the professional is provided with information that is temporally pertinent to the task being performed (e.g., step-by-step instructions, real-time sensor readings, etc.) precisely when it is needed. As a specific example, a healthcare professional may wear a NED system while performing some task during which it is important to monitor a patient’s vitals. In this example, the NED system may superimpose a readout of the patient’s vitals over some portion of the healthcare professional’s field-of-view.

[0003] Some conventional NED systems are designed to track users’ hand movements in order to identify predetermined hand gestures that are assigned as being different user input instructions. For example, while wearing a conventional NED system, a user may make an “Air Tap” gesture to adjust the information that is currently being rendered. Unfortunately, performing some tasks require uninterrupted use of a person’s hands which inherently limits the person’s ability to perform hand gestures. Thus, conventional NED systems are ill-suited for maximally increasing the productivity and efficiency with which people are able to perform these hand intensive tasks.

[0004] It is with respect to these and other considerations that the disclosure made herein is presented.

SUMMARY

[0005] Technologies described herein provide for Near-Eye-Display (NED) devices that utilize eye tracking systems to translate combinations of user gaze direction and predetermined facial gestures into user input instructions. Generally described, an exemplary NED device includes an eye tracking system and a display component that renders computer-generated images within the user’s field-of-view. The eye tracking system may continually track the user’s eye movements with a high degree of accuracy to identify specific computer-generated images that a user is focused on. The eye tracking system may also identify various facial gestures that are performed while the specific computer-generated images are being focused on. In this way, NED devices are enabled to identify combinations of user gaze direction and predetermined facial gestures and to translate these identified combinations into user input instructions that are provided by the user in association with specific computer-generated images. Exemplary user input instructions include, but are not limited to, various computing instructions that are commonly associated with a standard left mouse button and right mouse button. As a specific example, a user wearing a NED device as described herein may use their gaze direction to controllably place a curser over a graphical control element within a virtual menu and then perform a double left blink facial gesture to activate the graphical control element.

[0006] Technologies described herein provide a marked improvement over conventional NED devices in that users are enabled to provide a wide array of “hands-free” user input instructions to, for example, adjust what type of information is currently being rendered, adjust the format with which information is currently being rendered, and so on. Real-life practical applications of these technologies include scenarios where users are performing hand intensive tasks that render conventional hand gestures impractical but that may benefit in terms of productivity and/or efficiency by providing the users with an ability to provide “hands-free” user input instructions.

[0007] For illustrative purposes, consider a scenario where a person is performing a complex task such as a surgical procedure that requires uninterrupted use of the person’s hands. Further suppose that it is important for the person performing the surgical procedure to retain an ability to toggle between viewing various types of information that are pertinent to the task being performed. Such a scenario is well-suited for the user to wear a NED device that displays the pertinent information to the user while the task is being performed. It can be appreciated, however, that it is impractical in such a scenario for the person to perform hand gestures to interact with graphical control elements to toggle between viewing the various types of pertinent information. Using the techniques described herein, the person may simply and intuitively focus her attention onto a specific graphical control element that is being rendered by the NED device and then perform a predefined facial gesture to enter user input instructions with respect to the specific graphical control element that is being focused on. For example, the NED device may be rendering a sequence of step-by-step instructions for the person to follow in performance of the surgical procedure. Upon completion of an individual instruction that is currently being rendered, the person may focus her attention onto a “next step” button being rendered within her field-of-view and then deliberately blink her left eye to select the “next step” button to change the instruction being rendered. Upon identifying this combination of user gaze (e.g., the user is focused on the “next step” button) and the deliberate facial gesture (e.g., the user blinks her left eye twice within a threshold amount of time), the NED device may respond by toggling to the next instruction.

[0008] In an exemplary embodiment, a Near-Eye-Display (NED) device includes an eye tracking system having one or more sensors that generate eye tracking data that indicates a gaze direction (e.g., as defined by a visual axis or an optical axis) of one or both of a user’s eyes. Based on the eye tracking data, the NED device may determine one or more computer-generated images that the NED device is generating via a display component and that the user is focused on. For example, the NED device may be rendering a graphical control element and may monitor the eye tracking data to determine when the user is focused on the graphical control element. It will be appreciated that the gaze direction of the user can in a way be analogized to a mouse curser element of a typical operating system. For example, the user focusing on a specific graphical control element may be treated similar to the user hovering a mouse curser over the graphical control element. The NED device may also be configured to monitor for one or more predetermined facial gestures such as, for example, a user blinking a left eye while a right eye remains open, the user blinking the right eye while the left eye remains open, the user blinking both eyes concurrently, or any other suitable facial gesture that a typical user may deliberately perform for purposes of providing a user input instruction to the NED device. In this way, the NED device identifies combinations of user gaze direction and predetermined facial gestures and, ultimately, translates these identified combinations into user input instructions.

[0009] In the exemplary embodiment, the NED device may utilize the display to render one or more computer-generated images within at least a portion of the user’s field-of-view (FOV). For example, the NED device may render an individual instruction page of an ordered sequence of instruction pages within the user’s FOV. The individual instruction page may include some form of data that is somehow useful to the user in performing a task. The individual instruction page may also include one or more individual computer-generated images that are designed for use as user interface elements (e.g., graphical control elements) associated with enabling the user to provide user input instructions. Exemplary user interface elements include, but are not limited to, input controls (e.g., checkboxes, radio buttons, dropdown lists, toggles buttons, etc.), navigational components (e.g., search fields, slider bars or track bars, etc.), and any other type of user interface element. A benefit of the techniques described herein is that a user whom is wearing the NED device is enabled to provide “hands-free” user input instructions that select a user interface element simply by gazing at the user interface element and then deliberately performing some predefined facial gesture.

[0010] With respect to a non-limiting but exemplary technique for monitoring gaze direction (e.g., to determine which particular user interface elements a user is focusing on), the eye tracking data may be associated with one or more substantially circular features of one or both of a user’s eyes. Exemplary such “substantially” circular features include pupils and irises which are generally very close to circular and, therefore, may be modeled as perfect circles for purposes of the calculations described herein. The individual sensors have corresponding sensor planes that are angularly skewed with respect to the planes on which the circular features reside (e.g., an Iris-Pupil Plane). Based on the eye tracking data, the eye tracking system determines ellipse parameters for ellipses that result from these sensor planes being angularly skewed from the Iris-Pupil Planes. In some embodiments, the eye tracking system may track only one of the user’s eyes. In other embodiments, the eye tracking system may track both of the user’s eyes. In embodiments that track both eyes, the eye tracking system may determine ellipse parameters that define: (i) first ellipses that correspond to projections of an iris and/or pupil of a right eye onto a first sensor plane; and (ii) second ellipses that correspond to projections of an iris and/or pupil of a left eye onto a second sensor plane. The projections of each of the iris(es) and/or pupil(s) onto the corresponding sensor plane(s) may in some embodiments pass through a predetermined point such as, for example, an entrance pupil of each corresponding sensor.

[0011] Based on the ellipse parameters, the eye tracking system may then generate propagation data that defines three-dimensional (3D) propagations of the ellipses. The 3D propagation data may define a series of lines (e.g., rays) that extend from individual ellipses that are detected on the sensor plane. For example, individual lines of the series of lines may begin on the sensor plane at individual points along a perimeter of a detected ellipse. The individual lines may all commonly propagate from the sensor plane through a predetermined point toward the user’s eyes. In some implementations, the predetermined point through which all lines of a particular 3D propagation pass is an entrance pupil of a corresponding sensor. Since all of the lines of these 3D propagations extend from the ellipse through the predetermined point, the 3D propagations may be graphically represented as an elliptic cone that extends from the predetermined point toward the eye.

[0012] The eye tracking system may utilize the propagation data to determine pupil orientation parameters that define various characteristics of the user’s eye(s). Exemplary pupil orientation parameters may define optical axes for one or both of the user’s eyes (e.g., an axis of an eye lens), visual axes for one or both of the user’s eyes (e.g. axes that extend from the fovea through the lens and into the real-world environment), rotational angles of the user’s eyes (e.g. an angle of rotation between a semi-axis of an ellipse and a horizontal axes of the sensor), Iris-Pupil Planes of the user’s eyes (e.g. a plane on which the pupil resides), center points for the user’s eyes (e.g., a point at which the optical axis (or alternatively the visual axis) intersects the Iris-Pupil plane). Additionally, or alternatively, the pupil orientation parameters may define various other characteristics of the user’s eyes.

[0013] As described in detail below, the eye tracking system may utilize the pupil orientation parameters to continually determine of a current (e.g., real time) IPD for a user, i.e. while the NED device is operating. For example, the eye tracking system may dynamically track the center points for each of the user’s two eyes and continually calculate and re-calculate the user’s interpupillary distance in near real time. Additionally, or alternatively, the eye tracking system may utilize the pupil orientation parameters to determine a vergence of two visual axes (which are different than the optical axis) of the user. For example, the eye tracking system may dynamically track the visual axis of each of the user’s two eyes and continually calculate a location in space at which the distance between these two visual axes is the smallest. In various implementations, the visual axes are determined based on visual axis offset data that indicates at least an angular relationship between the optical axis and the visual axis. As described in detail below, this visual axis offset data may be specifically custom to a particular user and may be determined through a user-specific calibration process. It can be appreciated that although vergence is generally understood as the “point” at which the user’s two visual axis intersect, in a practical sense these axes rarely mathematically intersect but rather simply become the closest at the user’s accommodation plane. Thus, as described herein the vergence of the visual axes may be determined by calculating a point in space at which the separation between the two visual axes is the least (i.e., wherever the two axes become closest together).

[0014] In some embodiments, the pupil orientation parameters may be determined by analyzing the propagation data with respect to an ocular rotation model to calculate an orientation of the Iris-pupil plane for an eye, a distance from a predetermined point of the sensor to a center of an entrance pupil of the eye, and/or a radius of the pupil of the eye. The ocular rotation model may be usable to model rotation of a circular feature of an eye around that eye’s center of rotation. For example, the ocular rotation model may be (or be based on) an equation that defines coordinates for a circle of a particular radius as that circle is rotated around the center of an eye. It can be appreciated that a circle of a specific radius will mathematically match the “elliptical” 3D propagations only at a single plane. Therefore, utilizing various error minimization algorithms to analyze the propagation data with respect to the ocular rotation model may yield the Iris-Pupil plane’s specific location in space and the circular pupil’s specific location and rotation thereon. Although some specific error minimization algorithms are described herein, such descriptions are provided for exemplary purposes only and other error minimization algorithms may also be used.

[0015] Although exemplary and non-limiting, the foregoing techniques provide for highly accurate monitoring of gaze direction and are suitable for determining which particular user interface elements a user is focusing on. Moreover, since the substantially circular feature that is being tracked in the foregoing techniques will become obstructed by the user’s eyelid while a blink is being performed, the foregoing techniques are also suitable for monitoring for certain types of predetermined facial gestures. Thus, such techniques may be used by the exemplary NED device to identify combinations of user gaze direction and predetermined facial gestures which are then translated into user input instructions for “hands-free” control of various aspects of the NED device and/or the content being displayed thereby.

[0016] It should be appreciated that any reference to “first,” “second,” etc. items and/or abstract concepts within the Summary and/or Detailed Description is not intended to and should not be construed to necessarily correspond to any reference of “first,” “second,” etc. elements of the claims. In particular, within the Summary and/or Detailed Description, items and/or abstract concepts such as, for example, three-dimensional (3D) propagations and/or circular features of eyes and/or sensor entrance pupils may be distinguished by numerical designations without such designations corresponding to the claims or even other paragraphs of the Summary and/or Detailed Description. For example, any designation of a “first 3D propagation” and “second 3D propagation” of the eye tracking system within any specific paragraph of this the Summary and/or Detailed Description is used solely to distinguish two different 3D propagations of the eye tracking system within that specific paragraph–not any other paragraph and particularly not the claims.

[0017] These and various other features will be apparent from a reading of the following Detailed Description and a review of the associated drawings. This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended that this Summary be used to limit the scope of the claimed subject matter. Furthermore, the claimed subject matter is not limited to implementations that solve any or all disadvantages noted in any part of this disclosure.

DRAWINGS

[0018] The Detailed Description is described with reference to the accompanying figures. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. The same reference numbers in different figures indicate similar or identical items. References made to individual items of a plurality of items can use a reference number with another number included within a parenthetical (and/or a letter without a parenthetical) to refer to each individual item. Generic references to the items may use the specific reference number without the sequence of letters.

[0019] FIG. 1 illustrates an exemplary hardware layout for a Near-Eye-Display (NED) device that is configured to implement the methods described herein.

[0020] FIG. 2 illustrates a pair of three-dimensional (3D) propagations that extend from ellipses that result from circular features of user’s eyes being projected into the sensors.

[0021] FIG. 3 illustrates in an exemplary ellipse that is being projected onto a sensor plane within a sensor that is angularly skewed with respect to the Iris-Pupil plane (not shown in FIG. 3) so that circular features on the Iris-Pupil plane appear elliptical on the sensor plane.

[0022] FIG. 4 illustrates a side view of a 3D propagation of the ellipse of FIG. 3 from the sensor plane through a predetermined point and toward the Iris-Pupil plane.

[0023] FIG. 5A illustrates exemplary eye tracking data in the form of pixel data that is generated by the sensors and that is usable to implement the techniques described herein.

[0024] FIG. 5B illustrates exemplary eye tracking data in the form of pixel data that has changed in relation to FIG. 5A due to the user’s focus shifting to the left.

[0025] FIG. 6 illustrates exemplary positions of a user’s fovea in relation to the optical axes of the user’s left and right eyes.

[0026] FIG. 7 illustrates exemplary positions of a user’s right fovea and left fovea in relation to the optical axes of the user’s right eye and left eye, respectively.

[0027] FIG. 8 illustrates a side view of a user’s eye showing how the offset position of the user’s fovea in relation to the optical axis results in the visual axis diverging from the optical axis.

[0028] FIG. 9 illustrates an exemplary environment in which a user may perform vergence movements of the eyes to shift a vergence of the two visual axes (e.g., a focal point) from a first accommodation plane to a second accommodation plane.

[0029] FIG. 10 illustrates an exemplary anatomical eye model that defines geometrical relationships between various portions of an eye.

[0030] FIG. 11 illustrates a pair of visual axes that are determinable based on visual axis offset data defining a spatial relationship between the individual visual axes and corresponding optical axes.

[0031] FIG. 12 illustrates an exemplary environment in which a plurality of virtual stimuli can be sequentially generated at a predetermined accommodation plane for performance of a user-specific calibration process.

[0032] FIG. 13 is a flow diagram of a process 1300 to generate propagation data that defines three-dimensional (3D) propagations from ellipses detected at a sensor plane to determine pupil orientation parameters.

[0033] FIG. 14 illustrates an exemplary computing environment in which a user is providing user input instructions to a NED device in the form of combinations of user gaze direction and predetermined facial gestures.

[0034] FIG. 15A illustrates a NED device that is rendering a virtual scene while tracking eye movements of the user to continually monitor a gaze direction.

[0035] FIG. 15B is similar to FIG. 15A with the exception that the user’s gaze direction has changed from being on a work space below a virtual scene to being focused on the virtual scene.

[0036] FIG. 15C is similar to FIG. 15B with the exception that the user’s gaze direction has changed from being on a first portion of the virtual scene to a second portion of the virtual scene.

[0037] FIG. 15D is similar to FIG. 15C with the exception that the virtual scene has been adjusted in response to the combination of user gaze direction and facial gesture shown in FIG. 15C.

[0038] FIG. 15E is similar to FIG. 15D with the exception that the user’s gaze direction has been focused onto a particular graphical element just prior to the user deliberately performing a particular blinking gesture.

[0039] FIG. 16 is a flow diagram of a process to translate combinations of user gaze direction and predetermined facial gestures into user input instructions for a Near-Eye-Display (NED) device.

DETAILED DESCRIPTION

[0040] The following Detailed Description describes technologies for utilizing eye tracking systems to identify user input instructions provided in the form of combinations of user gaze direction and predetermined facial gestures. The technologies described herein enable a user to provide “hands-free” user input instructions to a Near-Eye-Display (NED) device that includes a display to render computer-generated images within the user’s field-of-view (FOV) and an eye tracking system to monitor where within the FOV the user is currently focused. Specifically, the eye tracking system may continually track the user’s eye movements with a high degree of accuracy to identify specific user interface elements (e.g., graphical control elements) that a user is focused on. The eye tracking system may also identify various facial gestures that are performed deliberately (e.g., voluntarily as opposed to spontaneously or reflexively) while the specific user interface elements are being focused on. In this way, various computing systems such as NED devices are enabled to identify combinations of user gaze direction and predetermined facial gestures and, ultimately, to translate these identified combinations into user input instructions that are provided in association with specific user interface elements.

[0041] Technologies described herein provide a marked improvement over conventional NED devices in that users are enabled to provide a wide array of “hands-free” user input instructions to, for example, adjust what type of information is currently being rendered, adjust the format with which information is currently being rendered, and so on. Real-life practical applications of these technologies include providing a user with an ability to provide “hands-free” user input instructions in scenarios where the user is a performing hand intensive task that renders conventional hand gestures impractical. The disclosed techniques therefore represent a substantial advance toward providing users with access to and control over deeply immersive augmented-reality (AR) content in a variety of practical scenarios.

[0042] Aspects of the techniques described herein are primarily described in the context of a specific scenario where a person is performing a complex task such as a surgical procedure that requires uninterrupted use of the person’s hands. While the disclosed techniques are not necessarily limited to such a scenario where a user’s hand are temporarily unavailable to provide gesture input, an appreciation of various aspects of the invention is best gained through a discussion of an example in such a context. However, the techniques described herein applicable to a variety of other contexts such as simply providing a user that is in a public setting (e.g., a restaurant) with an ability to discretely control a NED device. Various techniques described herein are extendable to facilitate “hand-free” user input instructions in any other suitable context.

[0043] Turning now to FIG. 1, illustrated is an exemplary hardware layout for a Near-Eye-Display (NED) device 100 that is configured to implement the methods described herein. In the exemplary hardware layout the NED device 100 includes a pair of sensors 102 that are each directed toward a corresponding eye 104 of a user. More specifically, the illustrated NED device 100 includes a first sensor 102(1) that is angularly offset from and directed toward a right eye 104(R) and also a second sensor 102(1) that is angularly offset from and directed toward a left eye 104(L). The right eye 104(R) includes a corresponding pupil 106(R) and a corresponding iris 108(R). The left eye 104(L) includes a corresponding pupil 106(L) and a corresponding iris 108(L). The sensors 102 can be in any suitable form such as, for example, a non-contact sensor configured to use optical-based tracking (e.g. video camera based and/or some other specially designed optical-sensor-based eye tracking technique) to monitor the one or more physical characteristics of the user’s eyes. Exemplary physical characteristics include, but are not limited to, pupil size, a rate of change of pupil size, gaze direction, and/or a rate of change to a gaze direction.

[0044] FIG. 1 is illustrated from a perspective that is directly in front of the optical axes of the eyes 104 so that the pupils 106 and irises 108 appear perfectly circular. It will be appreciated by one skilled in the art that in humans (and many other vertebrates for that matter) the pupils 106 and irises 108 of the eyes 104 are almost perfect circles. Therefore, in various calculations described below, the pupils 106 and/or irises 108 are mathematically modeled as and/or presumed to be perfectly circular in shape. From the perspective of the individual sensors 102, however, the pupils 106 and irises 108 of the eyes 104 appear to be elliptical as described herein. This is because the sensors 102 are angularly offset from the eyes 104 in the sense that the optical axis of each individual sensor 102 is not parallel to the optical axis of the eye 104 it is tracking. The position of the sensors 102 shown in FIG. 1 is for illustrative purposes only. It will be appreciated that the techniques described herein can be performed with the sensors 102 being located in a variety of positions with respect to the eyes. As a specific but nonlimiting example, the sensors could be embedded within a lens or other substrate directly in front of the eyes.

[0045] In the illustrated embodiment, the NED device 100 further includes a controller 110 that is configured to implement the various operations of the methods described herein. The controller 110 may be communicatively coupled to the sensors 102 to receive eye tracking data that is generated by the sensors 102 in association with the circular features of the eyes. The controller 110 may further be communicatively coupled to other componentry of the NED display device 100. The controller 110 includes one or more logic devices and one or more computer memory devices storing instructions executable by the logic device(s) to deploy functionalities described herein with relation to the NED device 100. The controller 116 can comprise one or more processing units 112, one or more computer-readable media 114 for storing an operating system 116 and data such as, for example, eye tracking data, visual axis offset data, application data, etc. The computer-readable media 114 may further include an eye tracking engine 116 configured to receive the eye tracking data from the sensor 102 and, based thereon, determine one or more physical characteristics of the user’s eyes using the methods and techniques described herein. The components of the NED device 100 are operatively connected, for example, via a bus 120, which can include one or more of a system bus, a data bus, an address bus, a PCI bus, a Mini-PCI bus, and any variety of local, peripheral, and/or independent buses.

[0046] The processing unit(s) 112, can represent, for example, a CPU-type processing unit, a GPU-type processing unit, a field-programmable gate array (FPGA), another class of digital signal processor (DSP), or other hardware logic components that may, in some instances, be driven by a CPU. For example, and without limitation, illustrative types of hardware logic components that can be used include Application-Specific Integrated Circuits (ASICs), Application-Specific Standard Products (ASSPs), System-on-a-Chip Systems (SOCs), Complex Programmable Logic Devices (CPLDs), etc.

[0047] As used herein, computer-readable media, such as computer-readable media 114, can store instructions executable by the processing unit(s). Computer-readable media can also store instructions executable by external processing units such as by an external CPU, an external GPU, and/or executable by an external accelerator, such as an FPGA type accelerator, a DSP type accelerator, or any other internal or external accelerator. In various examples, at least one CPU, GPU, and/or accelerator is incorporated in a computing device, while in some examples one or more of a CPU, GPU, and/or accelerator is external to a computing device.

[0048] Computer-readable media can include computer storage media and/or communication media. Computer storage media can include one or more of volatile memory, nonvolatile memory, and/or other persistent and/or auxiliary computer storage media, removable and non-removable computer storage media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules, or other data. Thus, computer storage media includes tangible and/or physical forms of media included in a device and/or hardware component that is part of a device or external to a device, including but not limited to random access memory (RAM), static random-access memory (SRAM), dynamic random-access memory (DRAM), phase change memory (PCM), read-only memory (ROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), flash memory, rotating media, optical cards or other optical storage media, magnetic storage, magnetic cards or other magnetic storage devices or media, solid-state memory devices, storage arrays, network attached storage, storage area networks, hosted computer storage or any other storage memory, storage device, and/or storage medium that can be used to store and maintain information for access by a computing device.

[0049] In contrast to computer storage media, communication media can embody computer-readable instructions, data structures, program modules, or other data in a modulated data signal, such as a carrier wave, or other transmission mechanism. As defined herein, computer storage media does not include communication media. That is, computer storage media does not include communications media consisting solely of a modulated data signal, a carrier wave, or a propagated signal, per se.

[0050] The NED device 100 may further include various other components, for example speakers, microphones, accelerometers, gyroscopes, magnetometers, temperature sensors, touch sensors, biometric sensors, other image sensors, energy-storage components (e.g. battery), a communication facility, a GPS receiver, etc.

[0051] Turning now to FIG. 2, illustrated is a pair of three-dimensional (3D) propagations 202 that extend from ellipses 204 that result from circular features (e.g., pupils 106 and/or irises 108) of user’s eyes 104 being projected into the sensors 102. As illustrated, a first 3D propagation 202(1) extends from a first ellipse 204(1), which is detected at the first sensor 102(1), through a first point P1. As further illustrated, a second 3D propagation 202(2) extends from a second ellipse 204(2), which is detected at the second sensor 102(2), through a second point P2. Each of the 3D propagations 202 extend toward a corresponding Iris-Pupil plane 206 that is angularly offset with respect to the sensors 102. The angularly offset nature of the Iris-Pupil planes 206 results in the pupils 106 and irises 108 appearing elliptical from the perspectives of the sensors 102.

[0052] As illustrated, each of the individual 3D propagations 202 may include a series of lines that extend from a perimeter of a corresponding individual ellipse 204 through a corresponding predetermined point and, ultimately, to the perimeter of a circular feature (e.g., pupil 106 or iris 108) that resides within a corresponding Iris-Pupil plane 206. The predetermined points (e.g., P1 and P2) may correspond to specific points in space that are measurable in relation to corresponding sensors 102. For example, the first predetermined point P1 may correspond to a center of an entrance pupil of the first sensor 102(1) whereas the second predetermined point P2 may correspond to a center of an entrance pupil of the second sensor 102(2). Thus, it can be appreciated that P1 may correspond to a point in space at which light rays cross prior to forming an image within the first sensor 102(1) and that P2 may correspond to a point in space at which light rays cross prior to forming an image within the second sensor 102(2).

[0053] As described in more detail below, these 3D propagations 202 may be used to determine pupil orientation parameters that define various characteristics of the user’s pupil(s) 106. For example, it can be appreciated that the 3D propagations 202 can be mathematically modeled as elliptical cones. This is because individual ones of the 3D propagations 202 originate at a corresponding ellipse 204 and pass through a singular point. It can further be appreciated that a cross-section of an elliptical cone will be circular in shape if that cross-section is taken at a specific orientation. Thus, by using the mathematical assumption that the pupils 106 and irises 108 are circular in shape, the 3D propagations 202 may enable a determination of the specific orientation of the Iris-Pupil planes 206. Additionally, as described in more detail below, performing various error minimization techniques of the 3D propagations with respect to an ocular rotation model may further enable a determination of the center points of the pupils 106. It can be appreciated that once the location in space of the center point of a pupil 106 and an orientation of an Iris-Pupil plane 206 is known for a particular eye, the optical axis (illustrated as dashed lines for each eye) for that particular eye is also known.

[0054] Turning now to FIG. 3, illustrated in an exemplary ellipse 204 that is projected from a circular feature of an eye 104 (e.g., an Iris 108) onto a sensor plane 302 of a sensor 102. The sensor plane 302 may correspond to a substantially planar surface within the sensor 102 that is angularly skewed with respect to a corresponding Iris-Pupil plane 206 (not shown in FIG. 3) so that circular features on the Iris-Pupil plane appear elliptical on the sensor plane 302. In some embodiments, the sensors 102 may be image sensors such as, for example, complementary metal oxide semiconductor (CMOS) sensors and/or charge-coupled device (CCD) sensors. In such embodiments, the sensors 102 may generate eye tracking data in the form of pixel data that defines images of the eyes. These images may be formed based on ambient light surrounding the user. Thus, in contrast to conventional eye tracking systems that rely on illuminating the eye(s) with near infrared light to cause first Purkinje reflections (e.g., “glints”) that are distributed around the iris, the techniques disclosed herein do not require active emission of near infrared light toward the user’s eyes. The numerous benefits of the techniques disclosed herein include providing a system that can track the user’s eyes using ambient light rather than having to expend battery resources to generate near infrared light. Moreover, the disclosed techniques provide a system that is highly sensitive and accurate in the detection of eye movements (e.g., the systems are sensitive enough to even accurately track saccadic eye movements).

[0055] Semi-axes for the “elliptically shaped” iris 108 and/or pupil 106 are uniquely oriented within the sensor plane 302 for any particular subtended angle of the sensor 102 and rotation of the eye being tracked. The size of the semi axes of the elliptically shaped iris 108 and pupil 106 depend on the original size of each and any magnification caused by optical components (e.g., lenses, etc.) of the sensor 102. In FIG. 3, the semi-major axis of the elliptically shaped iris 108 is labelled p.sub.ip.sup.M and the semi-minor axis of the elliptically shaped iris 108 is labelled p.sub.ip.sup.m. The sensor plane 302 is illustrated with a sensor coordinate system centered thereon. The sensor coordinate system includes a vertical y-Axis and a horizontal x-Axis. Additionally, as illustrated, the elliptically shaped iris 108 is rotated an angle .alpha. with respect to the horizontal x-Axis. Therefore, within the sensor plane 302, an ellipse 204 that is centered at (x.sub.iop.sup.d, y.sub.ip.sup.d) with semi-major axis p.sub.ip.sup.M and semi-minor axis p.sub.ip.sup.m and that is also rotated an angle .alpha. with respect to the horizontal x-Axis is given by Equation 1 shown below:

E i p ( i , j ) .ident. { x _ i p d + p i p M cos [ .PHI. ( i , j ) ] cos ( .alpha. ) - p i p m sin [ .PHI. ( i , j ) ] sin ( .alpha. ) , y _ i p d + p i p M cos [ .PHI. ( i , j ) ] sin ( .alpha. ) - p i p m sin [ .PHI. ( i , j ) ] cos ( .alpha. ) } ( 1 ) ##EQU00001##

[0056] Turning now to FIG. 4, illustrated is a side view of a 3D propagation 202 of the ellipse 204 of FIG. 3 from the sensor plane 302 through a predetermined point. In the illustrated embodiment, the predetermined point is labeled {right arrow over (r)}.sub.o and is defined as the center of the entrance pupil for the sensor 102. To improve the clarity of the illustration, only two individual 3D rays of the 3D propagation 202 are shown. Each individual ray extends from a point on the sensor plane 302 that falls along the perimeter of the ellipse 204 through the point {right arrow over (r)}.sub.o and, ultimately, to a point on the Iris-Pupil plane 206 that falls along the perimeter of the pupil 106 or iris 108. In plain terms, the 3D propagation 202 represents the reverse of the projections of the pupil 106 or iris 108 through the point {right arrow over (r)}.sub.o and to the sensor plane 302. Thus, in three dimensional terms the rays that start from the sensor plane 302 and pass through point {right arrow over (r)}.sub.o (e.g., the center of the entrance pupil of the sensor 102) and then travel some additional distance to reach the circular perimeter of the pupil 106 or iris 108 at the Iris-Pupil plane 206 is given by Equation 2 shown below:

{right arrow over (r)}.sub.ip.sup.d(i,j)={right arrow over (r)}.sub.o+ {square root over ([p.sub.ip.sup.2+d.sub.ipo)}+ {square root over (|D.sub.cip(i,j)|.sup.2+f.sup.2)}]{circumflex over (T)}.sub.oip(i,j) (2)

where, {right arrow over (r)}.sub.o is a point at which all of the rays of a particular image cross prior to forming an image on the sensor plane 302, d.sub.ipo is the distance from the point {right arrow over (r)}.sub.o to the center of the iris/pupil {right arrow over (r)}.sub.ip.sup.O (as labeled in FIG. 4), D.sub.cip is the radial distance between the center of the sensor 102 and the ellipse points E.sub.ip, f is the focal length of the sensor 102, and {circumflex over (T)}.sub.oip(i, j) is the vector going from the points in the ellipse 204 to the point {right arrow over (r)}.sub.o.

[0057] In some embodiments, the systems described herein may determine one or more of an orientation Rot(.PHI., .theta.) of the Iris-Pupil plane 206, a radius p.sub.ip of the pupil 106 or iris 108 (e.g., whichever circular feature is being observed to perform eye tracking), and the distance d.sub.ipo from the point {right arrow over (r)}.sub.o to the center {right arrow over (r)}.sub.ip.sup.O of the iris/pupil by analyzing the 3D propagations 202 with respect to an ocular rotation model. The ocular rotation model may be usable to model rotation of a circular feature of an eye around that eye’s center of rotation {right arrow over (r)}.sub.c. For example, an ocular rotation model may define coordinates of a circle with a center {right arrow over (r)}.sub.ip.sup.O(i,j) and a radius p.sub.ip and that is rotated around the eye’s center of rotation {right arrow over (r)}.sub.c an elevation angle .theta. and azimuth angle .PHI. as given by Equation 3 shown below:

{right arrow over (r)}.sub.ip.sup.r=Rot(.PHI.,.theta.)({right arrow over (r)}.sub.ip.sup.O+{right arrow over (r)}.sub.ip.sup.c(i,j)-{right arrow over (r)}.sub.c) (3)

where the position of the center of the circle is given by {right arrow over (r)}.sub.ip.sup.O={{right arrow over (x)}.sub.ip.sup.O, {right arrow over (y)}.sub.ip.sup.O, {right arrow over (z)}.sub.ip.sup.O}, and the parametrized coordinates of the circle are defined as {right arrow over (r)}.sub.ip.sup.c (i,j)={p.sub.ip cos .phi., p.sub.ip sin .phi., 0}. In various embodiments, the center of the iris/pupil circle and the center of rotation of the eye {right arrow over (r)}.sub.c are defined from one or more anatomical eye models such as, for example, the Gullstrand model, the Arizona model, the Liou-Brennan model, and/or the Navarro model. Moreover, as described in more detail below, a user-specific calibration may be performed to complete global minimization of the various parameters used in Equation 3 to customize the ocular rotation model to a specific user.

[0058] As a specific but non-limiting example, the orientation Rot(.PHI., .theta.) of the Iris-Pupil plane 206, the radius p.sub.ip of the pupil 106 or iris 108, and the distance d.sub.ipo from the point {right arrow over (r)}.sub.o to the center {right arrow over (r)}.sub.ip.sup.O of the iris/pupil are determined by minimizing the error between the 3D propagations 202 of the points detected (e.g., in the sensor plane 302) {right arrow over (r)}.sub.ip.sup.d through the vector {circumflex over (T)}.sub.cip(i,j), and a circle of radius p.sub.ip rotated around the eye center {right arrow over (r)}.sub.c. An exemplary such error minimization technique is given by Equation 4 shown below:

Err ( p i p , d i p o , R o t ( .phi. , .THETA. ) ) = argmin i , j r .fwdarw. i p d ( i , j ) - r .fwdarw. i p ( i , j ) 2 ( 4 ) ##EQU00002##

It will be appreciated that upon determining the orientation Rot(.PHI., .theta.) of the Iris-Pupil plane 206 and the distance d.sub.ipo from the point {right arrow over (r)}.sub.o to the center {right arrow over (r)}.sub.ip.sup.O of the iris/pupil, the systems disclosed herein can then determine where an optical axis for a tracked eye begins and in which direction it propagates with respect to the sensor 102. Additionally, in embodiments that include two sensors 102 which are separated by a known distance, upon determining the location of the center {right arrow over (r)}.sub.ip.sup.O of the pupil for both eyes in relation to the sensors 102, the systems disclosed herein can dynamically determine an interpupillary distance (IPD) for the user (as shown in FIG. 2).

……
……
……

本文链接：https://patent.nweon.com/17794

Microsoft Patent | Translating combinations of user gaze direction and predetermined facial gestures into user input instructions for near-eye-display (ned) devices

您可能还喜欢...

分类

最新AR/VR行业分享

最新AR/VR论文

最新AR/VR行业招聘

Microsoft Patent | Translating combinations of user gaze direction and predetermined facial gestures into user input instructions for near-eye-display (ned) devices

您可能还喜欢...

Microsoft Patent | Downloading Of Three-Dimensional Scene Data For Asynchronous Navigation

Microsoft Patent | Broadened spectrum laser diode for display device

Microsoft Patent | Latency reduction in remote rendering with adaptive phase shifting

分类

最新AR/VR行业分享

最新AR/VR论文

最新AR/VR行业招聘