Microsoft Patent | User-Specific Eye Tracking Calibration For Near-Eye-Display (Ned) Devices

编辑：映维 | 分类：Microsoft | 2020年4月24日

Patent: User-Specific Eye Tracking Calibration For Near-Eye-Display (Ned) Devices

Publication Number: 20200125166

Publication Date: 20200423

Applicants: Microsoft

Abstract

Technologies for performing user-specific calibration of eye tracking systems for Near-Eye-Display (NED) devices. The NED device may sequentially present different virtual stimuli to a user while concurrently capturing instances of eye tracking data. The eye tracking data reveals calibration ellipse centers that uniquely correspond to individual virtual stimuli. The calibration ellipse centers may be used define a polygon grid in association with a sensor plane. The resulting polygon grid is used during operation to interpolate the real-time gaze direction of the user. For example, a real-time instance of eye tracking data may be analyzed to determine which particular polygon of the polygon grid a real-time ellipse center falls within. Then, distances between the real-time ellipse center and the vertices of the particular polygon may be determined. A proportionality factor is then determined based on these distances and is used to interpolate the real-time eye gaze of the user.

PRIORITY APPLICATION

[0001] This U.S. non-provisional application is a continuation in part application that claims benefit of and priority to U.S. Non-Provisional Application No. 16/168,319, filed Oct. 23, 2018, entitled EYE TRACKING SYSTEMS AND METHODS FOR NEAR-EYE-DISPLAY (NED) DEVICES, the entire contents of which are incorporated herein by reference.

BACKGROUND

[0002] Near-Eye-Display (NED) systems superimpose computer-generated images (“CG images”) over a user’s view of a real-world environment. For example, a NED system may generate composite views to enable a user to visually perceive a CG image superimposed over a physical object that exists within the real-world environment. In some instances, the NED system may track a depth at which the user is focusing within the real-world environment. One reason for tracking the user’s focal depth (e.g., accommodation plane) is because the user may experience motion sickness or vertigo if CG images are rendered at a depth that is different (i.e., closer to/farther from the user) than that which the user is focusing. For this reason, a user’s experience may be highly dependent on the NED system accurately tracking the user’s eye movements. This is but one of many aspects of how the accuracy of eye tracking impacts a user’s experience with a NED system.

[0003] Some conventional eye tracking systems may undergo a calibration process whereby geometric characteristics of a specific user’s eyes are accounted for. During such a calibration process, a particular user may be prompted to sequentially direct their focus onto various known points on a display while images of the particular user’s eyes are captured. An anatomical model of the particular user’s eyes may be refined based on these images to more accurately reflect the true geometric characteristics of the particular user’s eyes. Following the calibration process, the refined anatomical model may be used in real-time to calculate the user’s gaze during real-time operation of the eye tracking system. Unfortunately, these calculations are computationally intensive and negatively impact computing resource consumption on compact computing systems such as, for example, NED systems.

[0004] It is with respect to these and other considerations that the disclosure made herein is presented.

SUMMARY

[0005] Technologies described herein provide techniques for performing user-specific calibration of eye tracking systems for Near-Eye-Display (NED) devices. Generally described, a NED device may sequentially present a plurality of virtual stimuli to a user in a random or pseudo-random fashion. While presenting each individual stimulus, the NED device deploys an eye tracking system to capture instances of eye tracking data (e.g., images of the eyes) in association with each individual virtual stimulus. The individual instances of eye tracking data may be analyzed to identify calibration ellipse centers that uniquely correspond to individual ones of the plurality of virtual stimuli. These calibration ellipse centers may be represented in a sensor plane and may be used define a polygon grid in association with the sensor plane. As a specific example, a grid of triangles may be formed in the sensor plane by interconnecting individual calibration ellipse centers that are represented in the sensor plane. The resulting polygon grid may then be used during operation to interpolate the real-time gaze direction of the user. More particularly, the real-time gaze direction of the user may be interpolated based on a location of a real-time ellipse center within the polygon grid that is formed by interconnecting the calibration ellipse centers. Continuing with the example in which the polygon grid is a grid of triangles, a real-time instance of eye tracking data may be analyzed to determine which particular triangle a real-time ellipse center falls within. Then, distances between the real-time ellipse center and the three calibration ellipse centers that form the particular triangle may be determined. Finally, a proportionality factor is determined based on these distances and then used to interpolate the real-time eye gaze (e.g., optical axis) of the user.

[0006] In an exemplary implementation of performing user-specific eye tracking calibration, a Near-Eye-Display (NED) device includes a display that is positioned within a user’s field of view when the NED device is being properly worn by the user. For example, the NED device may include a transparent display that is positioned slightly forward of the user’s eyes. The NED device further includes an eye tracking system having one or more sensors that generate eye tracking data associated with one or both of the user’s eyes. In some embodiments, the individual sensors have corresponding sensor planes that are angularly skewed with respect to Iris-Pupil Plane(s) of the user’s eye(s). Thus, based on the eye tracking data, the eye tracking system may determine ellipse parameters for ellipses that result from these sensor planes being angularly skewed from the Iris-Pupil Planes. In particular, the angular skew of the sensor plane with respect to the Iris-Pupil Plane results in circular features of the eyes (e.g., the pupil and/or iris) appearing elliptical in shape. The ellipse parameters may indicate the center points for the ellipses as described in more detail below.

[0007] To perform a user-specific eye tracking calibration, the NED device may sequentially present a plurality of virtual stimuli to a user in a random or pseudo-random fashion. In this way, the user’s focus may be drawn from one virtual stimulus to another virtual stimulus, and so on. The plurality of virtual stimuli may be arranged according to a predetermined pattern such as, for example, an ordered grid that includes rows and columns of virtual stimuli. The NED device may present individual ones of the virtual stimuli while simultaneously capturing instances of eye tracking data (e.g., images of the eyes). The eye tracking data be indicative of one or more center points for elliptical images of the user’s pupils and/or irises–each center point uniquely corresponding to an individual virtual stimulus. As described in detail below, these center points may be used to define and/or form a polygon grid by interconnecting the center points as they are represented within the sensor planes of each sensor. Furthermore, the resulting polygon grids may be usable to interpolate a user’s eye gaze in near real-time when the NED device is being used in real-time operation (e.g., following the calibration process). In a specific implementation, the polygon grid is a grid of triangles from which a user’s gaze may be calculated using a Delaunay decomposition and by estimating barycentric coordinates.

[0008] In some embodiments, the polygon grid is formed by interconnecting averaged values of numerous calibration ellipse centers that uniquely correspond to individual ones of the plurality of virtual stimuli. The reasoning for such averaging is because as an individual one of the virtual stimuli is presented, the user’s focus will move rapidly around this individual virtual stimulus due to saccadic movements which naturally occur as the user collects information about the scene. Thus, multiple instances (e.g., frames) of eye tracking data may be collected in association with each individual virtual stimulus–each instance of eye tracking data having a different ellipse center. As a specific but non-limiting example, if the user’s eyes make three saccadic movements per second and the NED device monitors these movements for four seconds while the user focuses on an individual virtual stimulus, then the NED device may collect twelve unique instances of eye tracking data in association with this individual virtual stimulus. In this specific example, these twelve unique instances of eye tracking data may be averaged to determine an average or nominal calibration ellipse center in association with the individual virtual stimulus. Then, this average or nominal calibration ellipse center may be used to form a polygon grid that is specific to the user.

[0009] Following the user-specific eye tracking calibration, the NED device may utilize the eye tracking system to monitor movements of the user’s eyes during real-time operation. Similar to the eye tracking data that is captured during calibration, the eye tracking data that is captured during real-time operation may be indicative of one or more center points for elliptical images of the user’s pupils and/or irises. However, these so called “real-time” ellipse centers will in most cases be located somewhere in between groupings of the “calibration” ellipse centers. This is of course because the user is no longer being presented with virtual stimuli but is rather focusing at various objects of interest that exist within a real-world environment. Thus, the “real-time” ellipse centers that are identified when eye tracking is being performed during actual use of the NED device (e.g., following the calibration phase when virtual stimulus are presented) are located within the boundaries of individual polygons of the user-specific polygon grid. For example, a “real-time” ellipse center might fall within an individual triangle that is formed by interconnecting a group of three “calibration” ellipse centers.

[0010] To determine the user’s real time gaze direction (e.g., in terms of optical axis and/or visual axis), the NED device may determine which particular polygon of the polygon grid a current “real-time” ellipse center falls within. For example, if the polygon grid is a grid of triangles, then a real-time instance of eye tracking data (e.g., a particular frame or image of an eye) may be analyzed to determine which particular triangle a real-time ellipse center currently falls within. Once this triangle has been determined, the Euclidean distance between the “real-time” ellipse center to each of the calibration ellipse centers that form the triangle may be determined. Then, proportionality factors .alpha..sub.R may be calculated with respect to each of the calibration ellipse centers that form the polygon grid bounding the real-time ellipse center. For example, in the example where the polygon grid is a grid of triangles, then three proportionality factors (.alpha..sub.A, .alpha..sub.B, and .alpha..sub.C) may calculated based on the real-time ellipse center. These proportionality factors may then be used as weighted sums to calculate the user’s real time gaze direction.

[0011] It should be appreciated that any reference to “first,” “second,” etc. items and/or abstract concepts within the Summary and/or Detailed Description is not intended to and should not be construed to necessarily correspond to any reference of “first,” “second,” etc. elements of the claims. In particular, within the Summary and/or Detailed Description, items and/or abstract concepts such as, for example, three-dimensional (3D) propagations and/or circular features of eyes and/or sensor entrance pupils may be distinguished by numerical designations without such designations corresponding to the claims or even other paragraphs of the Summary and/or Detailed Description. For example, any designation of a “first 3D propagation” and “second 3D propagation” of the eye tracking system within any specific paragraph of this the Summary and/or Detailed Description is used solely to distinguish two different 3D propagations of the eye tracking system within that specific paragraph–not any other paragraph and particularly not the claims.

[0012] These and various other features will be apparent from a reading of the following Detailed Description and a review of the associated drawings. This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended that this Summary be used to limit the scope of the claimed subject matter. Furthermore, the claimed subject matter is not limited to implementations that solve any or all disadvantages noted in any part of this disclosure.

DRAWINGS

[0013] The Detailed Description is described with reference to the accompanying figures. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. The same reference numbers in different figures indicate similar or identical items. References made to individual items of a plurality of items can use a reference number with another number included within a parenthetical (and/or a letter without a parenthetical) to refer to each individual item. Generic references to the items may use the specific reference number without the sequence of letters.

[0014] FIG. 1 illustrates an exemplary hardware layout for a Near-Eye-Display (NED) device that is configured to implement the methods described herein.

[0015] FIG. 2 illustrates a pair of three-dimensional (3D) propagations that extend from ellipses that result from circular features of user’s eyes being projected into the sensors.

[0016] FIG. 3 illustrates in an exemplary ellipse that is being projected onto a sensor plane within a sensor that is angularly skewed with respect to the Iris-Pupil plane (not shown in FIG. 3) so that circular features on the Iris-Pupil plane appear elliptical on the sensor plane.

[0017] FIG. 4 illustrates a side view of a 3D propagation of the ellipse of FIG. 3 from the sensor plane through a predetermined point and toward the Iris-Pupil plane.

[0018] FIG. 5A illustrates exemplary eye tracking data in the form of pixel data that is generated by the sensors and that is usable to implement the techniques described herein.

[0019] FIG. 5B illustrates exemplary eye tracking data in the form of pixel data that has changed in relation to FIG. 5A due to the user’s focus shifting to the left.

[0020] FIG. 6 illustrates exemplary positions of a user’s fovea in relation to the optical axes of the user’s left and right eyes.

[0021] FIG. 7 illustrates exemplary positions of a user’s right fovea and left fovea in relation to the optical axes of the user’s right eye and left eye, respectively.

[0022] FIG. 8 illustrates a side view of a user’s eye showing how the offset position of the user’s fovea in relation to the optical axis results in the visual axis diverging from the optical axis.

[0023] FIG. 9 illustrates an exemplary environment in which a user may perform vergence movements of the eyes to shift a vergence of the two visual axes (e.g., a focal point) from a first accommodation plane to a second accommodation plane.

[0024] FIG. 10 illustrates an exemplary anatomical eye model that defines geometrical relationships between various portions of an eye.

[0025] FIG. 11 illustrates a pair of visual axes that are determinable based on visual axis offset data defining a spatial relationship between the individual visual axes and corresponding optical axes.

[0026] FIG. 12 is a flow diagram of a process to generate propagation data that defines three-dimensional (3D) propagations from ellipses detected at a sensor plane to determine pupil orientation parameters.

[0027] FIG. 13 illustrates an exemplary environment in which a plurality of virtual stimuli can be sequentially generated at a predetermined accommodation plane to facilitate a user-specific calibration of an eye tracking system.

[0028] FIG. 14 illustrates an exemplary sequence of individual virtual stimuli being generated at a predetermined accommodation plane at predetermined locations.

[0029] FIG. 15A illustrates an exemplary aggregation of focal points for which instances of eye tracking data are captured in association with individual ones of the virtual stimuli.

[0030] FIG. 15B illustrates an enlarged view of the aggregation of focal points that surrounds a particular virtual stimulus shown in FIG. 15A.

[0031] FIG. 16 illustrates exemplary aggregations of numerous instances of eye tracking data that are captured in association with targeted focal points.

[0032] FIG. 17 illustrates exemplary calibration profiles that define polygon grids that are formed by interconnecting instances of the eye tracking data shown in FIG. 17 within a corresponding sensor plane.

[0033] FIG. 18 illustrates a schematic diagram of how a “real-time” ellipse center (“P”) may fall within a polygon grid during real-time operation and may be used to interpolate the real-time gaze direction of the user.

[0034] FIG. 19 is a flow diagram of a process to generate a calibration profile for a user based on instances of eye tracking data that are received in association with virtual stimuli and to use the calibration profile to track a real-time gaze of the user.

DETAILED DESCRIPTION

[0035] The following Detailed Description describes technologies for performing user-specific calibration of eye tracking systems for compact electronic devices such as, for example, Near-Eye-Display (NED) devices, laptop computers, etc. In some implementations, a compact electronic device may sequentially present a plurality of virtual stimuli to a user in a random or pseudo-random fashion. While presenting each individual stimulus, an eye tracking system captures instances of eye tracking data (e.g., images of the eyes) in association with each individual virtual stimulus. The eye tracking data reveals one or more calibration ellipse centers that uniquely correspond to individual ones of the plurality of virtual stimuli. These calibration ellipse centers may be represented in a sensor plane and may be used define a polygon grid in association with the sensor plane. As a specific example, a grid of triangles may be formed in the sensor plane by interconnecting individual calibration ellipse centers that are represented in the sensor plane. The resulting polygon grid may then be used during operation to interpolate the real-time gaze direction of the user. Continuing with the example in which the polygon grid is a grid of triangles, a real-time instance of eye tracking data may be analyzed to determine which particular triangle a real-time ellipse center falls within. Then, distances between the real-time ellipse center and the three calibration ellipse centers that form the particular triangle may be determined. Finally, a proportionality factor is determined based on these distances and then used to interpolate the real-time eye gaze (e.g., optical axis) of the user.

[0036] Aspects of the techniques described herein are primarily described in the context of the sensors being cameras that contain one or more lenses that define an entrance pupil that is disposed in front of an image-sensor (e.g., a CMOS sensor). In such embodiments, the image sensor may generate eye tracking data in the form of pixel data that defines images of the user’s eyes. While the disclosed techniques are not necessarily limited to using cameras, an appreciation of various aspects of the invention is best gained through a discussion of example in such a context. However, any type of sensor that is suitable for observing a shape and/or orientation of the iris and/or pupil of the user’s eye shall be considered variations of the techniques described herein. For example, it will be appreciated that various forms of lenses sensors may also be suitable for implementing the techniques described herein.

[0037] Turning now to FIG. 1, illustrated is an exemplary hardware layout for a Near-Eye-Display (NED) device 100 that is configured to implement the methods described herein. In the exemplary hardware layout the NED device 100 includes a pair of sensors 102 that are each directed toward a corresponding eye 104 of a user. More specifically, the illustrated NED device 100 includes a first sensor 102(1) that is angularly offset from and directed toward a right eye 104(R) and also a second sensor 102(1) that is angularly offset from and directed toward a left eye 104(L). The right eye 104(R) includes a corresponding pupil 106(R) and a corresponding iris 108(R). The left eye 104(L) includes a corresponding pupil 106(L) and a corresponding iris 108(L). The sensors 102 can be in any suitable form such as, for example, a non-contact sensor configured to use optical-based tracking (e.g. video camera based and/or some other specially designed optical-sensor-based eye tracking technique) to monitor the one or more physical characteristics of the user’s eyes. Exemplary physical characteristics include, but are not limited to, pupil size, a rate of change of pupil size, gaze direction, and/or a rate of change to a gaze direction.

[0038] FIG. 1 is illustrated from a perspective that is directly in front of the optical axes of the eyes 104 so that the pupils 106 and irises 108 appear perfectly circular. It will be appreciated by one skilled in the art that in humans (and many other vertebrates for that matter) the pupils 106 and irises 108 of the eyes 104 are almost perfect circles. Therefore, in various calculations described below, the pupils 106 and/or irises 108 are mathematically modeled as and/or presumed to be perfectly circular in shape. From the perspective of the individual sensors 102, however, the pupils 106 and irises 108 of the eyes 104 appear to be elliptical as described herein. This is because the sensors 102 are angularly offset from the eyes 104 in the sense that the optical axis of each individual sensor 102 is not parallel to the optical axis of the eye 104 it is tracking. The position of the sensors 102 shown in FIG. 1 is for illustrative purposes only. It will be appreciated that the techniques described herein can be performed with the sensors 102 being located in a variety of positions with respect to the eyes. As a specific but nonlimiting example, the sensors could be embedded within a lens or other substrate directly in front of the eyes.

[0039] The NED device 100 may be configured to render computer generated images (CGIs) in front of a user’s eye(s). For example, the NED device 1404 can be used for augmented reality (AR) and/or virtual reality (VR) applications. In implementations where the NED device 100 is an AR-type Head Mounted Device (HMD) device, a display element 101 may protrude into the user’s field of view. An exemplary type of display component may be a transparent waveguide display that enables the user to see concurrently both the real-world environment surrounding him or her as well as AR content generated by the display element 101. In the illustrated embodiment, the NED device includes a right display element 102(R) that generates images in front of the user’s right eye and also a left display element 102(L) that generates images in front of the user’s left eye. The one or more display elements 101 may be deployed to present virtual stimuli to the user to perform the calibration techniques described herein.

[0040] In the illustrated embodiment, the NED device 100 further includes a controller 110 that is configured to implement the various operations of the methods described herein. The controller 110 may be communicatively coupled to the sensors 102 to receive eye tracking data that is generated by the sensors 102 in association with the circular features of the eyes. The controller 110 may further be communicatively coupled to other componentry of the NED device 100. The controller 110 includes one or more logic devices and one or more computer memory devices storing instructions executable by the logic device(s) to deploy functionalities described herein with relation to the NED device 100. The controller 116 can comprise one or more processing units 112, one or more computer-readable media 114 for storing an operating system and data such as, for example, eye tracking data, visual axis offset data, application data, etc. The computer-readable media 114 may further include an eye tracking engine (e.g., module) configured to receive the eye tracking data from the sensor 102 and, based thereon, determine one or more physical characteristics of the user’s eyes using the methods and techniques described herein. The components of the NED device 100 are operatively connected, for example, via a bus 120, which can include one or more of a system bus, a data bus, an address bus, a PCI bus, a Mini-PCI bus, and any variety of local, peripheral, and/or independent buses.

[0041] The processing unit(s) 112, can represent, for example, a CPU-type processing unit, a GPU-type processing unit, a field-programmable gate array (FPGA), another class of digital signal processor (DSP), or other hardware logic components that may, in some instances, be driven by a CPU. For example, and without limitation, illustrative types of hardware logic components that can be used include Application-Specific Integrated Circuits (ASICs), Application-Specific Standard Products (ASSPs), System-on-a-Chip Systems (SOCs), Complex Programmable Logic Devices (CPLDs), etc.

[0042] As used herein, computer-readable media, such as computer-readable media 114, can store instructions executable by the processing unit(s). Computer-readable media can also store instructions executable by external processing units such as by an external CPU, an external GPU, and/or executable by an external accelerator, such as an FPGA type accelerator, a DSP type accelerator, or any other internal or external accelerator. In various examples, at least one CPU, GPU, and/or accelerator is incorporated in a computing device, while in some examples one or more of a CPU, GPU, and/or accelerator is external to a computing device.

[0043] Computer-readable media can include computer storage media and/or communication media. Computer storage media can include one or more of volatile memory, nonvolatile memory, and/or other persistent and/or auxiliary computer storage media, removable and non-removable computer storage media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules, or other data. Thus, computer storage media includes tangible and/or physical forms of media included in a device and/or hardware component that is part of a device or external to a device, including but not limited to random access memory (RAM), static random-access memory (SRAM), dynamic random-access memory (DRAM), phase change memory (PCM), read-only memory (ROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), flash memory, rotating media, optical cards or other optical storage media, magnetic storage, magnetic cards or other magnetic storage devices or media, solid-state memory devices, storage arrays, network attached storage, storage area networks, hosted computer storage or any other storage memory, storage device, and/or storage medium that can be used to store and maintain information for access by a computing device.

[0044] In contrast to computer storage media, communication media can embody computer-readable instructions, data structures, program modules, or other data in a modulated data signal, such as a carrier wave, or other transmission mechanism. As defined herein, computer storage media does not include communication media. That is, computer storage media does not include communications media consisting solely of a modulated data signal, a carrier wave, or a propagated signal, per se.

[0045] The NED device 100 may further include various other components, for example speakers, microphones, accelerometers, gyroscopes, magnetometers, temperature sensors, touch sensors, biometric sensors, other image sensors, energy-storage components (e.g. battery), a communication facility, a GPS receiver, etc.

[0046] Turning now to FIG. 2, illustrated is a pair of three-dimensional (3D) propagations 202 that extend from ellipses 204 that result from circular features (e.g., pupils 106 and/or irises 108) of user’s eyes 104 being projected into the sensors 102. As illustrated, a first 3D propagation 202(1) extends from a first ellipse 204(1), which is detected at the first sensor 102(1), through a first point P1. As further illustrated, a second 3D propagation 202(2) extends from a second ellipse 204(2), which is detected at the second sensor 102(2), through a second point P2. Each of the 3D propagations 202 extend toward a corresponding Iris-Pupil plane 206 that is angularly offset with respect to the sensors 102. The angularly offset nature of the Iris-Pupil planes 206 results in the pupils 106 and irises 108 appearing elliptical from the perspectives of the sensors 102.

[0047] As illustrated, each of the individual 3D propagations 202 may include a series of lines that extend from a perimeter of a corresponding individual ellipse 204 through a corresponding predetermined point and, ultimately, to the perimeter of a circular feature (e.g., pupil 106 or iris 108) that resides within a corresponding Iris-Pupil plane 206. The predetermined points (e.g., P1 and P2) may correspond to specific points in space that are measurable in relation to corresponding sensors 102. For example, the first predetermined point P1 may correspond to a center of an entrance pupil of the first sensor 102(1) whereas the second predetermined point P2 may correspond to a center of an entrance pupil of the second sensor 102(2). Thus, it can be appreciated that P1 may correspond to a point in space at which light rays cross prior to forming an image within the first sensor 102(1) and that P2 may correspond to a point in space at which light rays cross prior to forming an image within the second sensor 102(2).

[0048] As described in more detail below, these 3D propagations 202 may be used to determine pupil orientation parameters that define various characteristics of the user’s pupil(s) 106. For example, it can be appreciated that the 3D propagations 202 can be mathematically modeled as elliptical cones. This is because individual ones of the 3D propagations 202 originate at a corresponding ellipse 204 and pass through a singular point. It can further be appreciated that a cross-section of an elliptical cone will be circular in shape if that cross-section is taken at a specific orientation. Thus, by using the mathematical assumption that the pupils 106 and irises 108 are circular in shape, the 3D propagations 202 may enable a determination of the specific orientation of the Iris-Pupil planes 206. Additionally, as described in more detail below, performing various error minimization techniques of the 3D propagations with respect to an ocular rotation model may further enable a determination of the center points of the pupils 106. It can be appreciated that once the location in space of the center point of a pupil 106 and an orientation of an Iris-Pupil plane 206 is known for a particular eye, the optical axis (illustrated as dashed lines for each eye) for that particular eye is also known.

[0049] Turning now to FIG. 3, illustrated in an exemplary ellipse 204 that is projected from a circular feature of an eye 104 (e.g., an Iris 108) onto a sensor plane 302 of a sensor 102. The sensor plane 302 may correspond to a substantially planar surface within the sensor 102 that is angularly skewed with respect to a corresponding Iris-Pupil plane 206 (not shown in FIG. 3) so that circular features on the Iris-Pupil plane appear elliptical on the sensor plane 302. In some embodiments, the sensors 102 may be image sensors such as, for example, complementary metal oxide semiconductor (CMOS) sensors and/or charge-coupled device (CCD) sensors. In such embodiments, the sensors 102 may generate eye tracking data in the form of pixel data that defines images of the eyes. These images may be formed based on ambient light surrounding the user. Thus, in contrast to conventional eye tracking systems that rely on illuminating the eye(s) with near infrared light to cause first Purkinje reflections (e.g., “glints”) that are distributed around the iris, the techniques disclosed herein do not require active emission of near infrared light toward the user’s eyes. The numerous benefits of the techniques disclosed herein include providing a system that can track the user’s eyes using ambient light rather than having to expend battery resources to generate near infrared light. Moreover, the disclosed techniques provide a system that is highly sensitive and accurate in the detection of eye movements (e.g., the systems are sensitive enough to even accurately track saccadic eye movements).

[0050] Semi-axes for the “elliptically shaped” iris 108 and/or pupil 106 are uniquely oriented within the sensor plane 302 for any particular subtended angle of the sensor 102 and rotation of the eye being tracked. The size of the semi axes of the elliptically shaped iris 108 and pupil 106 depend on the original size of each and any magnification caused by optical components (e.g., lenses, etc.) of the sensor 102. In FIG. 3, the semi-major axis of the elliptically shaped iris 108 is labelled p.sub.ip.sup.M and the semi-minor axis of the elliptically shaped iris 108 is labelled p.sub.ip.sup.m. The sensor plane 302 is illustrated with a sensor coordinate system centered thereon. The sensor coordinate system includes a vertical y-Axis and a horizontal x-Axis. Additionally, as illustrated, the elliptically shaped iris 108 is rotated an angle .alpha. with respect to the horizontal x-Axis. Therefore, within the sensor plane 302, an ellipse 204 that is centered at (x.sub.ip.sup.d, y.sub.ip.sup.d) with semi-major axis p.sub.ip.sup.M and semi-minor axis p.sub.ip.sup.m and that is also rotated an angle .alpha. with respect to the horizontal x-Axis is given by Equation 1 shown below:

E.sub.ip(i,j).ident.{x.sub.ip.sup.d+p.sub.ip.sup.M cos [.phi.(i,j)] cos (.alpha.)-p.sub.ip.sup.m sin (.alpha.), y.sub.ip.sup.d+p.sub.ip.sup.M cos [.phi.(i,j)] sin (.alpha.)-p.sub.ip.sup.m sin [.phi.(i,j)] cos (.alpha.)} (1)

[0051] Turning now to FIG. 4, illustrated is a side view of a 3D propagation 202 of the ellipse 204 of FIG. 3 from the sensor plane 302 through a predetermined point. In the illustrated embodiment, the predetermined point is labeled {right arrow over (r)}.sub.o and is defined as the center of the entrance pupil for the sensor 102. To improve the clarity of the illustration, only two individual 3D rays of the 3D propagation 202 are shown. Each individual ray extends from a point on the sensor plane 302 that falls along the perimeter of the ellipse 204 through the point {right arrow over (r)}.sub.o and, ultimately, to a point on the Iris-Pupil plane 206 that falls along the perimeter of the pupil 106 or iris 108. In plain terms, the 3D propagation 202 represents the reverse of the projections of the pupil 106 or iris 108 through the point {right arrow over (r)}.sub.o and to the sensor plane 302. Thus, in three dimensional terms the rays that start from the sensor plane 302 and pass through point {right arrow over (r)}.sub.o (e.g., the center of the entrance pupil of the sensor 102) and then travel some additional distance to reach the circular perimeter of the pupil 106 or iris 108 at the Iris-Pupil plane 206 is given by Equation 2 shown below:

r .fwdarw. ip d ( i , j ) = r .fwdarw. o + [ p ip 2 + d ipo + D cip ( i , j ) 2 + f 2 ] T ^ oip ( i , j ) ( 2 ) ##EQU00001##

where, {right arrow over (r)}.sub.o is a point at which all of the rays of a particular image cross prior to forming an image on the sensor plane 302, d.sub.ipo is the distance from the point {right arrow over (r)}.sub.o to the center of the iris/pupil {right arrow over (r)}.sub.ip.sup.o (as labeled in FIG. 4), D.sub.cip is the radial distance between the center of the sensor 102 and the ellipse points E.sub.ip, f is the focal length of the sensor 102, and {circumflex over (T)}.sub.oip (i, j) is the vector going from the points in the ellipse 204 to the point {right arrow over (r)}.sub.o.

[0052] In some embodiments, the systems described herein may determine one or more of an orientation Rot(.PHI., .THETA.) of the Iris-Pupil plane 206, a radius p.sub.ip of the pupil 106 or iris 108 (e.g., whichever circular feature is being observed to perform eye tracking), and the distance d.sub.ipo from the point {right arrow over (r)}.sub.o to the center {right arrow over (r)}.sub.ip.sup.o of the iris/pupil by analyzing the 3D propagations 202 with respect to an ocular rotation model. The ocular rotation model may be usable to model rotation of a circular feature of an eye around that eye’s center of rotation {right arrow over (r)}.sub.c. For example, an ocular rotation model may define coordinates of a circle with a center {right arrow over (r)}.sub.ip.sup.o(i, j) and a radius p.sub.ip and that is rotated around the eye’s center of rotation {right arrow over (r)}.sub.c an elevation angle .THETA. and azimuth angle .PHI. as given by Equation 3 shown below:

{right arrow over (r)}.sub.ip.sup.r=Rot(.PHI.,.THETA.)({right arrow over (r)}.sub.ip.sup.o+{right arrow over (r)}.sub.ip.sup.c(i,j)-{right arrow over (r)}.sub.c)+{right arrow over (r)}.sub.c (3)

where the position of the center of the circle is given by {right arrow over (r)}.sub.ip.sup.o={{right arrow over (x)}.sub.ip.sup.o,{right arrow over (y)}.sub.ip.sup.o,{right arrow over (z)}.sub.ip.sup.o}, and the parametrized coordinates of the circle are defined as {right arrow over (r)}.sub.ip.sup.c(i, j)={p.sub.ip cos .phi.,p.sub.ip sin .phi., 0}. In various embodiments, the center of the iris/pupil circle and the center of rotation of the eye {right arrow over (r)}.sub.c are defined from one or more anatomical eye models such as, for example, the Gullstrand model, the Arizona model, the Liou-Brennan model, and/or the Navarro model. Moreover, as described in more detail below, a user-specific calibration may be performed to complete global minimization of the various parameters used in Equation 3 to customize the ocular rotation model to a specific user.

[0053] As a specific but non-limiting example, the orientation Rot(.PHI., .THETA.) of the Iris-Pupil plane 206, the radius p.sub.ip of the pupil 106 or iris 108, and the distance d.sub.ipo from the point {right arrow over (r)}.sub.o to the center {right arrow over (r)}.sub.ip.sup.o of the iris/pupil are determined by minimizing the error between the 3D propagations 202 of the points detected (e.g., in the sensor plane 302) {right arrow over (r)}.sub.ip.sup.d through the vector {circumflex over (T)}.sub.cip(i, j), and a circle of radius p.sub.ip rotated around the eye center {right arrow over (r)}.sub.c. An exemplary such error minimization technique is given by Equation 4 shown below:

Err ( p ip , d ipo , Rot ( .phi. , .THETA. ) ) = arg min i , j r .fwdarw. ip d ( i , j ) - r .fwdarw. ip ( i , j ) 2 ( 4 ) ##EQU00002##

It will be appreciated that upon determining the orientation Rot(.PHI., .THETA.) of the Iris-Pupil plane 206 and the distance d.sub.ipo from the point {right arrow over (r)}.sub.o to the center {right arrow over (r)}.sub.ip.sup.o of the iris/pupil, the systems disclosed herein can then determine where an optical axis for a tracked eye begins and in which direction it propagates with respect to the sensor 102. Additionally, in embodiments that include two sensors 102 which are separated by a known distance, upon determining the location of the center {right arrow over (r)}.sub.ip.sup.o of the pupil for both eyes in relation to the sensors 102, the systems disclosed herein can dynamically determine an interpupillary distance (IPD) for the user (as shown in FIG. 2).

[0054] Turning now to FIG. 5A, exemplary eye tracking data is shown in the form of pixel data 502 that is generated by the sensors 102 and that is usable to implement the techniques described herein. As illustrated in FIG. 5A, a NED device 100 includes a first sensor 102(1) that is angularly offset from and directed toward a user’s right eye 104(R) and a second sensor 102(2) that is angularly offset from and directed toward a user’s left eye 104(L). As the user’s eyes move around to look at and/or track various objects within the user’s field-of-view (FOV), the sensors 102 continually capture images of the pupils 106 and/or irises 108 of the user’s eyes.

本文链接：https://patent.nweon.com/10204

Microsoft Patent | User-Specific Eye Tracking Calibration For Near-Eye-Display (Ned) Devices

您可能还喜欢...

分类

最新AR/VR行业分享

最新AR/VR论文

最新AR/VR行业招聘

Microsoft Patent | User-Specific Eye Tracking Calibration For Near-Eye-Display (Ned) Devices

您可能还喜欢...

Microsoft Patent | Waveguide For Generating Overlapping Images In A Display Module

Microsoft Patent | Hinged head-mounted display device

Microsoft Patent | Wearable Image Display System

分类

最新AR/VR行业分享

最新AR/VR论文

最新AR/VR行业招聘