雨果巴拉:行业北极星Vision Pro过度设计不适合市场

Google Patent | Multi-Perspective Eye-Tracking For Vr/Ar Systems

Patent: Multi-Perspective Eye-Tracking For Vr/Ar Systems

Publication Number: 20190129174

Publication Date: 20190502

Applicants: Google

Abstract

A display system, such as a head mounted display, tracks a pose of a user’s eye based on multiple perspectives of the eye captured via a segmented optics array, such as a lenslet array or Fresnel lens. The display system reflects light (e.g., infra-red light) off each segment of the segmented optics array, and captures an image based on the reflected light. Because of the segmented optics, the captured image represents multiple concurrent perspectives of the user’s eye. The display system analyzes the different perspectives and selects a perspective, or combination of perspectives, and based on the selected perspective or combination, identifies a pose of the user’s eye.

BACKGROUND

Field of the Disclosure

[0001] The present disclosure relates generally to virtual reality/augmented reality (VR/AR) systems and, more particularly, to eye-tracking for VR/AR systems.

Description of the Related Art

[0002] Some computer display systems, such as some head-mounted display (HMD) systems, employ eye-tracking techniques to enhance the user experience. For example, some display systems employ an eye-tracking system that applies machine vision techniques to a captured image of a user’s eye to identify a pose of the eye. Based on the identified pose, the display system can modify content displayed to the user, can identify that the user has selected a displayed icon or other displayed item, and the like. However, the user’s eye is often occluded by surrounding facial features (e.g., an eyelash) or other objects in the user’s environment, or by glints of light on the eye itself. These occlusions can interfere with the eye-tracking system’s ability to identify the pose of the eye, and therefore negatively impact the user’s experience. Moreover, the occlusions are often dynamic in nature, and are therefore difficult to account for by conventional eye-tracking systems.

BRIEF DESCRIPTION OF THE DRAWINGS

[0003] The present disclosure may be better understood, and its numerous features and advantages made apparent to those skilled in the art by referencing the accompanying drawings. The use of the same reference symbols in different drawings indicates similar or identical items.

[0004] FIG. 1 is a block diagram of a display system that identifies a pose of a user’s eye based on multiple perspectives of the eye captured via a segmented optics array in accordance with some embodiments.

[0005] FIG. 2 is a diagram illustrating the display system of FIG. 1 employing a lenslet array as the segmented optics array in accordance with some embodiments.

[0006] FIG. 3 is a diagram illustrating the display system of FIG. 1 employing a Fresnel lens as the segmented optics array in accordance with some embodiments.

[0007] FIG. 4 is a block diagram of an eye-tracking module of the display system of FIG. 1 employing a neural net to select an eye perspective in accordance with some embodiments.

[0008] FIG. 5 is a flow diagram of a method of identifying a pose of a user’s eye based on multiple perspectives of the eye captured via a segmented optics array in accordance with some embodiments.

DETAILED DESCRIPTION

[0009] FIGS. 1-5 illustrate techniques for tracking a pose of a user’s eye based on multiple perspectives of the eye captured via a segmented optics array, such as a lenslet array or Fresnel lens. A display system reflects light (e.g., infra-red light) off each segment of the segmented optics array, and captures an image based on the reflected light. Because of the segmented optics, the captured image represents multiple concurrent perspectives of the user’s eye. The display system analyzes the different perspectives and selects a perspective, or combination of perspectives, and based on the selected perspective or combination, identifies a pose of the user’s eye. By employing multiple perspectives of the eye, the display system reduces the impact of occlusions on eye tracking, thereby improving the user experience.

[0010] To illustrate, conventional eye-tracking systems identify the position of the eye at a given instant in time based on a single image of the eye, taken from a single perspective, at or near the given instant in time. However, the eye, or a particular portion of the eye such as the pupil, may be occluded in the image by an eyelash, by a particularly bright reflection of light, or by other features. These occlusions can make it difficult or impossible to identify the pose of the eye. Using the techniques and systems described herein, an eye-tracking system employs the segmented optics array to capture an image having multiple concurrent, and different, perspectives of the eye, so that at least one of the perspectives is likely to depict the eye without occlusions. For example, an eyelash may occlude the eye in one of the perspectives, but a different perspective, depicting the eye from a different angle, is un-occluded by the eyelash. The eye-tracking system can analyze each of the different perspectives to identify the perspective, or combination of perspectives, that is predicted to result in the most accurate identification of the eye’s pose, and employs the selected perspective (or combination of perspectives) to identify the pose of the eye.

[0011] FIG. 1 illustrates a head mounted display (HMD) system 100 incorporating multi-perspective eye-tracking in accordance with at least one embodiment. In the depicted example, the HMD system 100 includes a computational display sub-system 102, a rendering component 104, and one or more eye-tracking modules, such as one or both of an eye-tracking module 106 for tracking a user’s left eye and an eye-tracking module 108 for tracking the user’s right eye. The computational display sub-system 102 includes a left-eye display 110 and a right-eye display 112 mounted in an apparatus 114 (e.g., goggles, glasses, etc.) that places the displays 110, 112 in front of the left and right eyes, respectively, of the user.

[0012] As shown by view 116, each of the displays 110, 112 includes at least one display panel 118 to display a sequence or succession of integral lightfield frames (hereinafter, “lightfield frame” for ease of reference), each of which comprises an array 120 of elemental images 122. For ease of reference, an array 120 of elemental images 122 may also be referred to herein as a lightfield frame 120. Each of the displays 110, 112 further includes a segmented optics array 124 of optical segments 126 overlying the display panel 118. As used herein, a segmented optics array refers to an optical array that includes a plurality of optical segments, wherein each optical segment modifies (e.g. reflects) received light, and wherein the light received by each optical segment is different than the light received by the other optical segments of the array. The segmented optics array can thereby provide light to generate an image (e.g., image 120) including a plurality of segments, referred to herein as sub-images (e.g., sub-image 122), wherein each sub-image reflects a different perspective of an environment or object (e.g., one of the user’s eyes). In particular, each sub-image corresponds to an image generated from the light modified by a different one of the optical segments. Further, the sub-images display concurrent, substantially simultaneous perspectives of the environment or object. Examples of segmented optics arrays, as described further herein with respect to FIGS. 2 and 3, include an array of lenslets and a Fresnel lens having a plurality of sections.

[0013] Typically, the number of optical segments 126 in the segmented optics array 124 is equal to the number of elemental images 122 in the array 120, but in other implementations the number of optical segments 126 may be fewer or greater than the number of elemental images 122. Note that while the example of FIG. 1 illustrates a 5.times.4 array of elemental images 122 and a corresponding 5.times.4 array 120 of optical segments 126 for ease of illustration, in a typical implementation the number of elemental images 122 in a lightfield frame 120 and the number of optical segments 126 in the segmented optics array 124 typically is much higher. Further, in some embodiments, a separate display panel 118 is implemented for each of the displays 110, 112, whereas in other embodiments the left-eye display 110 and the right-eye display 112 share a single display panel 118, with the left half of the display panel 118 used for the left-eye display 110 and the right half of the display panel 118 used for the right-eye display 112.

[0014] In at least one embodiment, the segmented optics array 124 overlies the display surface 130 of the display panel 118 so as to be disposed between the display surface 130 and the corresponding eye of the user. In this configuration, each optical segment 126 focuses a corresponding region of the display surface 130 onto the pupil of the eye, with each such region at least partially overlapping with one or more adjacent regions. Thus, in such computational display configurations, when an array of elemental images is displayed at the display surface 130 of the display panel 118 and then viewed by the eye through the segmented optics array 124, the user perceives the array of elemental images as a single image of a scene. Thus, when this process is performed in parallel for both the left eye and right eye of the user with the proper parallax implemented therebetween, the result is the presentation of autostereoscopic three-dimensional (3D) imagery to the user.

[0015] As also shown in FIG. 1, the rendering component 104 includes a set of one or more processors, such as the illustrated central processing unit (CPU) 136 and graphics processing units (GPUs) 138, 140 and one or more storage components, such as system memory 142, to store software programs or other executable instructions that are accessed and executed by the processors 136, 138, 140 so as to manipulate the one or more of the processors 136, 138, 140 to perform various tasks as described herein. Such software programs include, for example, rendering program 144 comprising executable instructions for an accommodation range adjustment process, as described below, as well as an eye-tracking program 146 comprising executable instructions for an eye-tracking process, as also described below.

[0016] In operation, the rendering component 104 receives rendering information 148 from a local or remote content source 150, where the rendering information 148 represents graphics data, video data, or other data representative of an object or scene that is the subject of imagery to be rendered and displayed at the display sub-system 102. Executing the rendering program 144, the CPU 136 uses the rendering information 148 to send drawing instructions to the GPUs 138, 140, which in turn utilize the drawing instructions to render, in parallel, a series of lightfield frames 151 for display at the left-eye display 110 and a series of lightfield frames 153 for display at the right-eye display 112 using any of a variety of well-known VR/AR computational/lightfield rendering processes. As part of this rendering process, the CPU 136 may receive pose information 150 from an inertial management unit (IMU) 154, whereby the pose information 150 is representative of a current pose of the display sub-system 102 and control the rendering of one or more pairs of lightfield frames 151, 153 to reflect the viewpoint of the object or scene from the current pose.

[0017] The rendering component 104 further may use eye pose information from one or both of the eye-tracking modules 106, 108 to change the elemental images 122 to be displayed. To this end, the eye-tracking modules 106, 108 each may include one or more infrared (IR) light sources (referred to herein as “IR illuminators”) to illuminate the corresponding eye with IR light, one or more imaging cameras (e.g., camera 109) to capture the IR light reflected off the segmented optics array 124 as a corresponding eye image 120.

[0018] As explained above, the eye image 120 includes a plurality of sub-images (e.g., sub-image 122), wherein each sub-image displays a different perspective of the eye. Thus, for example, one sub-image may correspond to a perspective of the eye from below the eye, another sub-image may correspond to a perspective of the eye from below and to the left of the eye, still another perspective may correspond to a perspective of the eye from above and to the right of the eye, and the like. The camera 109 provides the different perspectives of the eye to the eye-tracking module 108 as a set of eye perspectives 117. In one embodiment, the eye perspectives 117 are provided as the single image 120. In other embodiments, the camera 109 may divide the different sub-images of the image 120 into one or more subsets, and provide the subsets of sub-images as the eye perspectives 117.

[0019] Based on the eye perspectives 117, the eye-tracking module 108 determines a pose (one or both of a position and orientation) of the corresponding eye. In at least one embodiment, the eye-tracking module 108 selects one of the perspectives of the eye perspectives 117 likely to produce the most reliable or accurate pose, and identifies, according to conventional eye tracking techniques, the pose of the eye based on the sub-image corresponding to the selected perspective. For example, the eye-tracking module 108 can employ machine vision techniques to identify one or more edges of the eye, or portions of the eye such as the pupil, and identify a pose of the eye based on the eye edges or eye portions.

[0020] To select the eye perspective for pose identification, the eye-tracking module 108 can apply machine vision techniques to each sub-image (or a selected subset of the sub-images) and generate a confidence score for each sub-image based, for example, on the number of “eye-like” features identified in the corresponding sub-image. The eye-tracking module 108 then selects the sub-image having the highest confidence score to identify the pose of the corresponding eye.

[0021] It will be appreciated that at different times different perspectives of the eye may be the best perspective from which to identify the pose of the eye. Thus, for example, at a given point in time a perspective from above and to the left of the eye may be the best perspective from which to identify the eye’s pose, while at a later point in time the eye may be occluded from that perspective (e.g., an eyelash may be partially occluding the pupil of the eye). Accordingly, at the later point in time a different perspective, such as a perspective from above and to the right of the eye may be the best perspective from which to identify the eye’s pose. Thus, at different times the perspective having the best confidence score will change, and therefore the eye-tracking module 108 will select different perspectives of the eye.

[0022] In other embodiments, the eye-tracking module 108 employs a neural network (also referred to herein as a “neural net”) that has been trained to identify an eye pose from the sub-images supplied by the eye perspectives 117. The neural net is initially trained based on training data having a plurality of images of an eye, wherein different images correspond to different poses of the eye. During training, the neural net is given feedback according to whether it correctly identifies an eye pose from an image or combination of images, and the neural net adjusts its pathways according to the feedback, thus training the neural net to more accurately identify eye poses from images. After training, the eye-tracking module 108 employs the neural network to analyze the eye perspectives 117 to identify the eye pose. In some embodiments, the neural net uses a plurality of eye perspectives to identify the eye pose. For example, the neural net may identify a composite of different perspectives of the eye from the eye perspectives 117, and identify the eye pose based on the composite of perspectives.

[0023] In at least one embodiment, the eye-tracking module 106 identifies an eye pose of the corresponding eye in similar fashion to the eye-tracking module 108, based on an image including multiple sub-images corresponding to different eye perspectives of the eye. The eye-tracking modules 106 and 108 indicate the respective eye poses to the rendering component 104 via information 156. Based on the eye poses, the CPU 136 can modify the images displayed to user via the display sub-system 102. For example, based on the eye pose, the CPU 136 can determine that the user is intending to look at a particular portion of a virtual reality environment, and, in conjunction with the GPUs 138 and 140, generate images corresponding to the portion of the virtual reality environment. The rendering component 104 provides the generated images to the display sub-system for display as explained above.

[0024] It will be appreciated that, as explained above, the segmented optics array 124 is employed both to display images to the user, and to reflect light to identify multiple concurrent perspectives of the eye for eye tracking. The HMD SYSTEM 100 is thus able to more accurately and reliably identify eye poses without employing substantial additional hardware to determine the multiple concurrent perspectives. Accordingly, the eye-tracking of the HMD SYSTEM 100 is improved while maintaining relatively low power and other resource consumption and maintaining a relatively comfortable form factor.

[0025] FIG. 2 is a diagram illustrating the display system of FIG. 1 employing a lenslet array as the segmented optics array 124 in accordance with some embodiments. In the depicted example, the segmented optics array includes a plurality of lenslets (e.g., lenslets 126-1, 126-2, and 126-3), also sometimes referred to as microlenses. Each of the lenslets 126 is coated on its eye-facing side with an infra-red (IR) reflective coating 237. Thus, each of the lenslets 126 passes visible light (that is, light in the visible spectrum) to the display panel 118 and reflects IR light (that is, light in the infra-red spectrum).

[0026] In the illustrated example, each of the lenslets 126 of the lenslet array 124 serves as a separate “projector” of visible light onto an eye 132, with each “projector” overlapping with one or more adjacent projectors in forming a composite virtual image from the array of elemental images displayed at the display panel 118. To illustrate, the lenslet 126-1 projects a corresponding elemental image (represented by region 204) from one region of the virtual image, the lenslet 126-2 projects a corresponding elemental image (represented by region 206) from a different region of the virtual image, and the lenslet 126-3 projects a corresponding elemental image (represented by region 208) from still another region 214 of the virtual image.

[0027] Each of the lenslets 126 also reflects rays (e.g. ray 233) of IR light that have, in turn, been reflected off the eye 232. In some embodiments, the IR light reflected off the eye 232, and in turn reflected off the lenslets 126, is ambient IR light in the environment of the HMD SYSTEM 100. In other embodiments, the HMD SYSTEM 100 includes an IR illuminator (not shown) that continuously or periodically illuminates the eye 232 with IR light, and the light reflected off the eye 232 and in turn reflected off the lenslets 126, is at least in part light generated by the IR illuminator.

[0028] The camera 109 is an IR camera that captures images based on the IR light reflected from the lenslets 126. It will be appreciated that the light reflected from each lenslet provides an image of the eye 232 from a different perspective. Thus, for example, the light reflected from the lenslet 126-1 provides an image of the eye from a given perspective, and the light reflected from the lenslet 126-2 provides a concurrent image of the eye from a different perspective than the lenslet 126-1. Accordingly, an image captured by the camera 109 will be composed of a set of sub-images, with each sub-image corresponding to a different one of the lenslets 126, and each sub-image displaying the perspective of the eye provided by the IR light reflected from the corresponding lenslet. The HMD SYSTEM 100 employs the sub-images for eye-tracking as described above with respect to FIG. 1.

[0029] FIG. 3 is a diagram illustrating the display system of FIG. 1 employing a Fresnel lens as the segmented optics array 124 in accordance with some embodiments. As will be appreciated by one skilled in the art, the Fresnel lens 124 is a lens formed into a plurality of sections (e.g., sections 126-1, 126-2). Each of the sections is curved on an eye-facing side (that is, a side of the section facing an eye 332), and the curves of the sections are aligned such that the refractive properties of the lens (for visible light passing through the lens) are substantially similar to or the same as a larger plano-convex lens with similar focusing power. The Fresnel lens 124 therefore can provide focusing power for an image displayed at the display panel 118 as a larger convex lens, but at substantially less weight and size.

[0030] The Fresnel lens 124 is coated on its eye-facing side with an infra-red (IR) reflective coating 337. Thus, each of the sections 126 passes visible light (that is, light in the visible spectrum) from the display panel 118 and reflects IR light (that is, light in the infra-red spectrum). In particular, each of the sections 126 also reflects rays (e.g. ray 333) of IR light that have, in turn, been reflected off the eye 332. As with the example of FIG. 2, the IR light may be ambient light, or may at least in part be ambient light generated by an IR illuminator (not shown).

[0031] The camera 109 is an IR camera that captures images based on the IR light reflected from the sections 126. As with the lenslets of FIG. 2, the light reflected from each section of the Fresnel lens provides an image of the eye 332 from a different perspective. Thus, for example, the light reflected from the section 126-1 provides an image of the eye from a given perspective, and the light reflected from the section 126-2 provides a concurrent image of the eye from a different perspective than the section 126-1. Accordingly, an image captured by the camera 109 will be composed of a set of sub-images, with each sub-image corresponding to a different one of the sections 126, and each sub-image displaying the perspective of the eye provided by the IR light reflected from the corresponding section. The HMD SYSTEM 100 employs the sub-images for eye-tracking as described above with respect to FIG. 1.

[0032] FIG. 4 is a block diagram of the eye-tracking module 108 of FIG. 1 employing a neural net to select an eye perspective in accordance with some embodiments. In the illustrated example, the eye-tracking module 108 includes an artificial neural network 440. As will be appreciated by one skilled in the art, the neural network 440 includes a plurality of nodes connected via an interconnect, such that each node is connected to at least one, and many cases more than one, other node. Each of the connections between nodes can be assigned a strength value. Further, data can be transmitted between nodes based on one or more propagation rules associated with the neural network 440. The states of each node, the strength values of the connections between nodes, and other parameters of the neural network 440 are governed by one or more learning rules. For the embodiment of FIG. 4, the learning rules for the neural network 440 are specified to configure, or “train”, the neural network 440 to identify an eye pose based on images representing multiple concurrent perspectives of the eye.

[0033] To illustrate, the neural network 440 is initially configured with a baseline set of connections, connection strengths, propagation rules, and learning rules. Baseline eye-image data 450 is supplied to the neural network 440, wherein the baseline eye-image data is a set of known images (e.g., images captured in a test environment), with a corresponding eye pose defined for each image. Based on the baseline eye-image data 450, the neural network 440 generates a set of corresponding test eye poses. The test eye poses are compared to the eye poses defined for each image and, based on the differences, feedback is supplied to the neural network 440 to adjust one or more of the network parameters, such as the set of connections, connection strengths, propagation rules, learning rules, and the like. The baseline eye-image data 450 is again provided as inputs to the neural network 440 to generate another set of test eye poses, the test eye poses are compared to the defined eye poses, and feedback is again supplied to the neural network 440 to again adjust the network parameters. This process is repeated iteratively until the test eye poses match the defined eye poses within a threshold correlation.

[0034] After the neural network 440 has been trained based on the baseline eye-image data 450, the eye-tracking module 108 employs the neural network 440 to identify an eye pose of a user’s eye. In particular, the eye-tracking module 108 provides as input to the neural network 440 the eye perspectives 117, representing multiple concurrent perspectives of the user’s eye as described above with respect to FIG. 1. Based on the eye perspectives 117, and based on the network parameters established during training, the neural network 440 generates an eye pose 419. By employing the neural network 440, the eye-tracking module 108 can efficiently identify an eye pose from a wide variety of eye perspectives and image types. For example, in some embodiments the neural network 440 can identify the eye pose based on a combination of perspectives of the eye, rather than selecting a single perspective. Further, in some embodiments the neural network 440 can identify the eye pose from images that appear distorted or are otherwise difficult to process with conventional machine vision techniques.

[0035] FIG. 5 is a flow diagram of a method 500 of identifying a pose of a user’s eye based on multiple perspectives of the eye captured via a segmented optics array in accordance with some embodiments. The method 500 is described with respect to an example embodiment at the HMD system 100. At block 502, visible light is passed through the segmented optics array 124, thereby displaying VR/AR content to the user. At block 504, IR light is reflected off each segment of the segmented optics array 124. In some embodiments, the IR light is generated by an IR illuminator of the HMD system 100.

[0036] At block 506 the camera 109 captures the image 120 based on the reflected IR light. The captured image 120 includes a plurality of sub-images, each sub-image depicting a different perspective of the user’s eye. At block 508 the camera 109 provides the eye perspectives 117 based on the image 120 to the neural network 440 (FIG. 4). At block 510 the neural network 440 identifies the eye pose 419 based on the eye perspectives 117. At block 512 the eye-tracking module 108 provides the eye pose 419 to the rendering component 104, which modifies the VR/AR content based on the eye pose. The rendering component 104 then displays the modified VR/AR content to the user via the segmented optics array.

[0037] In some embodiments, certain aspects of the techniques described above may implemented by one or more processors of a processing system executing software. The software comprises one or more sets of executable instructions stored or otherwise tangibly embodied on a non-transitory computer readable storage medium. The software can include the instructions and certain data that, when executed by the one or more processors, manipulate the one or more processors to perform one or more aspects of the techniques described above. The non-transitory computer readable storage medium can include, for example, a magnetic or optical disk storage device, solid state storage devices such as Flash memory, a cache, random access memory (RAM) or other non-volatile memory device or devices, and the like. The executable instructions stored on the non-transitory computer readable storage medium may be in source code, assembly language code, object code, or other instruction format that is interpreted or otherwise executable by one or more processors.

[0038] A computer readable storage medium may include any storage medium, or combination of storage media, accessible by a computer system during use to provide instructions and/or data to the computer system. Such storage media can include, but is not limited to, optical media (e.g., compact disc (CD), digital versatile disc (DVD), Blu-Ray disc), magnetic media (e.g., floppy disc , magnetic tape, or magnetic hard drive), volatile memory (e.g., random access memory (RAM) or cache), non-volatile memory (e.g., read-only memory (ROM) or Flash memory), or microelectromechanical systems (MEMS)-based storage media. The computer readable storage medium may be embedded in the computing system (e.g., system RAM or ROM), fixedly attached to the computing system (e.g., a magnetic hard drive), removably attached to the computing system (e.g., an optical disc or Universal Serial Bus (USB)-based Flash memory), or coupled to the computer system via a wired or wireless network (e.g., network accessible storage (NAS)).

[0039] Note that not all of the activities or elements described above in the general description are required, that a portion of a specific activity or device may not be required, and that one or more further activities may be performed, or elements included, in addition to those described. Still further, the order in which activities are listed are not necessarily the order in which they are performed. Also, the concepts have been described with reference to specific embodiments. However, one of ordinary skill in the art appreciates that various modifications and changes can be made without departing from the scope of the present disclosure as set forth in the claims below. Accordingly, the specification and figures are to be regarded in an illustrative rather than a restrictive sense, and all such modifications are intended to be included within the scope of the present disclosure.

[0040] Benefits, other advantages, and solutions to problems have been described above with regard to specific embodiments. However, the benefits, advantages, solutions to problems, and any feature(s) that may cause any benefit, advantage, or solution to occur or become more pronounced are not to be construed as a critical, required, or essential feature of any or all the claims. Moreover, the particular embodiments disclosed above are illustrative only, as the disclosed subject matter may be modified and practiced in different but equivalent manners apparent to those skilled in the art having the benefit of the teachings herein. No limitations are intended to the details of construction or design herein shown, other than as described in the claims below. It is therefore evident that the particular embodiments disclosed above may be altered or modified and all such variations are considered within the scope of the disclosed subject matter. Accordingly, the protection sought herein is as set forth in the claims below.

您可能还喜欢...