Microsoft – Nweon Patent https://patent.nweon.com 映维网,影响力虚拟现实(VR)、增强现实(AR)产业信息数据平台 Fri, 09 Dec 2022 13:21:58 +0000 en-US hourly 1 https://wordpress.org/?v=4.8.17 https://patent.nweon.com/wp-content/uploads/2021/04/nweon-icon.png Microsoft – Nweon Patent https://patent.nweon.com 32 32 Microsoft Patent | Waveguide display assembly https://patent.nweon.com/26135 Thu, 08 Dec 2022 13:44:17 +0000 https://patent.nweon.com/?p=26135 ...

文章《Microsoft Patent | Waveguide display assembly》首发于Nweon Patent

]]>
Patent: Waveguide display assembly

Patent PDF: 加入映维网会员获取

Publication Number: 20220390666

Publication Date: 2022-12-08

Assignee: Microsoft Technology Licensing

Abstract

A waveguide display assembly comprises a waveguide, including an in-coupling grating configured to in-couple light of a first wavelength band emitted by a light source into the waveguide, and cause propagation of the light of the first wavelength band through the waveguide via total internal reflection. An out-coupling grating is configured to out-couple the light of the first wavelength band from the waveguide and toward a user eye. One or more diffractive gratings are disposed along an optical path between the in-coupling grating and the out-coupling grating, the one or more diffractive gratings configured to diffract light outside the first wavelength band out of the waveguide and away from the user eye.

Claims

1.A waveguide display assembly, comprising: a waveguide, including: an in-coupling grating configured to in-couple light of a first wavelength band emitted by a light source into the waveguide, and cause propagation of the light of the first wavelength band through the waveguide via total internal reflection; an out-coupling grating configured to out-couple the light of the first wavelength band from the waveguide and toward a user eye; and one or more diffractive gratings disposed along an optical path between the in-coupling grating and the out-coupling grating, the one or more diffractive gratings configured to diffract light outside the first wavelength band out of the waveguide and away from the user eye.

2.The waveguide display assembly of claim 1, wherein the light outside the first wavelength band is diffracted out of the waveguide and toward a waste-light area.

3.The waveguide display assembly of claim 2, wherein the waste-light area includes an optically absorbent material to absorb the light outside the first wavelength band.

4.The waveguide display assembly of claim 2, wherein the light outside the first wavelength band is diffracted out of the waveguide and into a waste-light collection waveguide, the waste-light collection waveguide configured to direct the light outside the first wavelength band toward the waste-light area.

5.The waveguide display assembly of claim 1, wherein the in-coupling grating is configured to selectively in-couple light of the first wavelength band while rejecting at least some light outside the first wavelength band.

6.The waveguide display assembly of claim 5, wherein the waveguide is a first waveguide, and the waveguide display assembly further comprises a second waveguide including a second in-coupling grating configured to selectively in-couple light of a second wavelength band, and a third waveguide including a third in-coupling grating configured to selectively in-couple light of a third wavelength band.

7.The waveguide display assembly of claim 6, wherein the first wavelength band corresponds to blue light, the second wavelength band corresponds to green light, and the third wavelength band corresponds to red light.

8.The waveguide display assembly of claim 6, wherein the first waveguide, the second waveguide, and the third waveguide are arranged in a stack such that the first, second, and third in-coupling gratings are aligned with each other and with the light source.

9.The waveguide display assembly of claim 6, further comprising a first color-specific filter disposed between the first waveguide and the second waveguide, and a second color-specific filter disposed between the second waveguide and the third waveguide.

10.The waveguide display assembly of claim 9, wherein the first color-specific filter is configured to filter light of the first wavelength band while passing light of the second wavelength band and the third wavelength band, and wherein the second color-specific filter is configured to filter light of the second wavelength band while passing light of the third wavelength band.

11.The waveguide display assembly of claim 9, wherein the first in-coupling grating is configured to in-couple light having a first polarization direction while rejecting at least some light having a second polarization direction, orthogonal to the first polarization direction.

12.The waveguide display assembly of claim 11, wherein the first color-specific filter is configured to rotate a current polarization direction of light passed by the first color-specific filter, and the second color-specific filter is configured to rotate a current polarization direction of light passed by the second color-specific filter.

13.The waveguide display assembly of claim 1, further comprising an image controller configured to control activity of the light source, such that light emitted by the light source toward the waveguide forms a virtual image for viewing by the user eye.

14.The waveguide display assembly of claim 13, wherein at least some light outside the first wavelength band is out-coupled by the out-coupling grating toward the user eye, and the image controller is further configured to modify the virtual image formed by the light emitted by the light source to account for the out-coupling of the light outside the first wavelength band.

15.A display device, comprising: a light source configured to emit light; an image controller configured to control activity of the light source, such that the light emitted by the light source forms a virtual image for viewing by a user eye; and a waveguide display assembly, comprising: a waveguide, including: an in-coupling grating configured to in-couple light of a first wavelength band emitted by the light source into the waveguide, and cause propagation of the light of the first wavelength band through the waveguide via total internal reflection; an out-coupling grating configured to out-couple the light of the first wavelength band from the waveguide and toward the user eye; and one or more diffractive gratings disposed along an optical path between the in-coupling grating and the out-coupling grating, the one or more diffractive gratings configured to diffract light outside the first wavelength band out of the waveguide and away from the user eye.

16.The display device of claim 15, wherein the light outside the first wavelength band is diffracted out of the waveguide and toward a waste-light area.

17.The display device of claim 15, wherein the waveguide is a first waveguide, and the waveguide display assembly further comprises a second waveguide including a second in-coupling grating configured to selectively in-couple light of a second wavelength band, and a third waveguide including a third in-coupling grating configured to selectively in-couple light of a third wavelength band.

18.The display device of claim 17, wherein the first waveguide, the second waveguide, and the third waveguide are arranged in a stack such that the first, second, and third in-coupling gratings are aligned with each other and with the light source, and wherein the waveguide display assembly further comprises a first color-specific filter disposed between the first waveguide and the second waveguide, and a second color-specific filter disposed between the second waveguide and the third waveguide.

19.The display device of claim 17, wherein the second in-coupling grating is configured to in-couple light having a first polarization direction while rejecting at least some light having a second polarization direction, orthogonal to the first polarization direction.

20.A waveguide display assembly, comprising: a first waveguide, including: a first in-coupling grating configured to in-couple light of a first wavelength band emitted by a light source into the first waveguide, and cause propagation of the light of the first wavelength band through the first waveguide via total internal reflection; a first out-coupling grating configured to out-couple the light of the first wavelength band from the first waveguide and toward a user eye; and one or more first diffractive gratings disposed along an optical path between the first in-coupling grating and the first out-coupling grating, the one or more first diffractive gratings configured to diffract light outside the first wavelength band out of the first waveguide and away from the user eye; and a second waveguide, including: a second in-coupling grating configured to in-couple light of a second wavelength band emitted by the light source into the second waveguide, and cause propagation of the light of the second wavelength band through the second waveguide via total internal reflection; a second out-coupling grating configured to out-couple the light of the second wavelength band from the second waveguide and toward the user eye; and one or more second diffractive gratings disposed along an optical path between the second in-coupling grating and the second out-coupling grating, the one or more second diffractive gratings configured to diffract light outside the second wavelength band out of the second waveguide and away from the user eye.

Description

BACKGROUND

Some display devices use waveguides to steer light from a light source toward a user eye. In some cases, different waveguides may be used for different wavelengths of light.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 schematically illustrates use of a head-mounted display device (HMD).

FIG. 2 schematically illustrates use of an example waveguide display assembly to direct light toward a user eye.

FIG. 3 schematically depicts the example waveguide display assembly of FIG. 2 in more detail.

FIG. 4 schematically depicts the example waveguide display assembly of FIG. 2 in more detail.

FIG. 5 schematically shows an example computing system.

DETAILED DESCRIPTION

Waveguides may be used in a variety of different display contexts for directing light toward a user eye. For instance, FIG. 1 schematically shows a user 100 using a head-mounted display device (HMD) 102 in a real-world environment 104. As will be described in more detail below, HMD may include one or more waveguide display assemblies configured to direct light toward one or more user eyes—e.g., the light may form a virtual image viewable by a user eye.

HMD 102 includes a waveguide display assembly 106 integrated into a near-eye display device of the HMD. Via the near-eye display, user 100 has a field-of-view 108 of a virtual environment 110. The virtual environment is presented as a series of virtual image frames visible to the eyes of the user via the near-eye display, such that the user’s view of their surrounding real-world environment is partially, or entirely, augmented or replaced by virtual content (e.g., the near-eye display may include two waveguide display assemblies, each configured to direct virtual image light toward a different user eye). Virtual environment 110 includes a virtual tree 112, which is a computer-generated representation of a tree that does not exist in the real-world, as well as a virtual landscape that differs from the user’s real-world surroundings.

The present disclosure primarily focuses on waveguide display assemblies in the context of a near-eye display device of a HMD—e.g., a device that provides virtual imagery that augments or replaces the user’s view of their own real-world environment. This may include fully virtual scenarios, in which the user’s view of their real-world environment is substantially replaced by computer-generated imagery. Waveguide display assemblies may also be used in augmented reality settings, in which the real-world remains at least partially visible to the user—e.g., via a partially transparent display, live video feed, or other approach. It will be understood, however, that waveguide display assemblies as described herein may be used in any scenario in which a waveguide is used to direct light from a light source toward a user eye. As another example, waveguide display assemblies may be used to provide head-up displays (HUDs) in vehicle windscreens, eyeglasses, and/or other suitable contexts.

In any case, a waveguide display assembly as described herein may be integrated into, or otherwise used with, any suitable computing device having any suitable capabilities, form factor, and hardware configuration. As one example, a waveguide display assembly may be used with, or otherwise controlled by, computing system 500 described below with respect to FIG. 5.

In designing display devices that incorporate waveguides, it is sometimes desirable to use multiple waveguides corresponding to different wavelength bands of light. For example, a display device may direct image light toward a user eye using a combination of three or more different waveguides—e.g., one waveguide for red wavelengths of light, one for blue wavelengths of light, and one for green wavelengths of light. Each different waveguide may have a suitable optical element (e.g., an in-coupling grating) configured to accept (or “in-couple) light of an intended wavelength band while rejecting at least some light outside the intended wavelength band.

However, though different wavelength bands of light may be intended for different waveguides, some amount of cross-coupling often occurs, in which different wavelength bands of light enter and propagate through the same waveguide. This may occur, for example, due to overlap between the wavelengths of light accepted by different in-coupling gratings of different waveguides. For example, while one waveguide may have an in-coupling grating that selectively accepts a first wavelength band of light (e.g., blue light), and a second waveguide may have an in-coupling grating that selectively accepts a second wavelength band of light (e.g., green light), there may be some amount of overlap between these two wavelength bands. This may cause some amount of green light to be in-coupled into the blue waveguide, and some amount of blue light to be in-coupled into the green waveguide, as examples. Similarly, blue and/or green light may be in-coupled into a red waveguide, and red light may be in-coupled into either or both of the blue and green waveguides. This can negatively affect any images displayed to the user eye—e.g., causing a loss of clarity, color uniformity problems, and/or other undesirable artifacts.

Accordingly, the present disclosure is directed to a waveguide display assembly that includes additional diffractive grating elements disposed along a waveguide. The diffractive grating elements are configured to mitigate the negative effects of cross-coupling of different wavelength bands of light into the waveguide. Specifically, the diffractive grating elements may selectively diffract one or more wavelength bands of light out of the waveguide, while allowing one or more other wavelength bands of light to continue propagating within the waveguide. In this manner, light of an intended wavelength band (e.g., blue light) may be in-coupled into the waveguide and propagate toward an out-coupling element, which directs the light toward a user eye. Any unintended wavelength bands (e.g., red and/or green light) may still be in-coupled to some extent, although much of this unintended light may be diffracted out by the diffractive grating elements, instead of being out-coupled toward the user eye. In this manner, a waveguide display assembly may provide clearer and more accurate images to a user eye, without significantly increasing the weight, size, or complexity of the overall device.

FIG. 2 schematically shows waveguide display assembly 106 in more detail. Waveguide display assembly 106 is in the process of directing light toward a user eye 200 for viewing. In some cases, waveguide display assembly 106 may be one of two or more different waveguide display assemblies in the same display device. For example, HMD 102 may include a second waveguide display assembly that is similar to waveguide display assembly 106 and configured to direct light toward a second user eye. In other examples, a device need only include one waveguide display assembly—e.g., for directing light toward a single user eye, or two user eyes at once.

It will be understood that FIG. 2 is highly simplified for the sake of illustration. The example waveguide display assembly is shown schematically and is not drawn to scale. FIGS. 3, 4, and 5 are similarly schematic in nature.

In FIG. 2, use eye 200 is shown relative to an eyebox 202. An “eyebox” refers to a region of space defining a field-of-view in which a user eye can receive virtual images generated by a display device with acceptable quality. As will be described in more detail below, optical components of the waveguide display assembly may serve to perform pupil expansion, in which a relatively small exit pupil of light emitted by a light source is expanded to a larger size. This allows for a larger eyebox in which images formed by the light are viewable. A larger eyebox can enable a larger field-of-view for the user, and can also allow the display device to be used by a wider population of users (e.g., having different eye shapes, sizes, and spacings) without requiring extensive adjustment or calibration.

Waveguide display assembly 106 includes an image controller 204 communicatively coupled with a light source 206. Image controller 204 is configured to control activity of the light source. For example, based on instructions from image controller 204, light source 206 may emit light toward a waveguide that forms a virtual image for viewing by user eye 200. The light emitted by the light source may be updated at any suitable rate—e.g., a refresh rate of 60 frames-per-second (FPS), 90FPS, or 120FPS.

The image controller may take any suitable form. For example, the image controller may be implemented as a suitable computer processor, application-specific integrated circuit (ASIC), or other computer logic hardware. In some examples, image controller 204 may be implemented as logic subsystem 502 described below with respect to FIG. 5.

Light source 206 may include one or more light-emitting diodes (LEDs) or lasers, as non-limiting examples. In some cases, light source 206 may include suitable components for spatially modulating light emitted by the light source such that the light forms a virtual image. Light source 206 may be configured to output light at one fixed angle, or at any of a range of angles. For example, the light source may in some cases include a steerable micromirror, and/or other suitable elements for controlling the angle of the mitted light. The light source may include any suitable elements for focusing, or otherwise modifying the optical power of, light emitted by the light source—e.g., the light source may include one or more suitable lenses or dynamic elements (e.g., dynamic digital holograms). Additionally, or alternatively, such optical focusing elements may be disposed elsewhere in the waveguide display assembly along the optical path between the light source and user eye.

It will be understood that, in general, the light source may include any suitable optics and other components for emitting light, and such light may be emitted at any suitable time and for any suitable reason. The term “light source” as used herein refers to any structures and components suitable for emitting light toward a waveguide.

Waveguide display assembly 106 also includes a waveguide 208. Light source 206 is emitting light 210 toward waveguide 208, which serves to steer the light and facilitate its viewing by user eye 200. Specifically, waveguide 208 includes an in-coupling grating configured to in-couple light emitted by light source 206 into the waveguide. Once in-coupled, the light may propagate through the waveguide via total internal reflection, indicated in FIG. 2 by the repeated reflections of light within waveguide 208. It will be understood that light may propagate within the waveguide at any suitable reflection angle.

In FIG. 2, and in other FIGS. described herein, different reference numerals are used to refer to light at different points along the optical path between the light source and the user eye. Specifically, after being in-coupled to waveguide 208, light 210 is labeled as 210R. After being out-coupled from the waveguide and directed toward user eye 200, light 210 is labeled as 210T.

In some examples, waveguide 208 may have a geometry other than the flat, rectangular shape shown in FIG. 2. For example, the waveguide may have the shape of a wedge or a curved wedge. In general, waveguide 208 and other waveguides described herein may each have any suitable shapes, sizes, and geometries. Furthermore, the waveguides described herein may be constructed from any suitable materials—e.g., suitable dielectric glasses or plastics may be used.

In-coupling grating 212 may take any suitable form. In general, the in-coupling grating takes the form of a physical pattern that is printed or etched onto the surface of the waveguide in a manner that alters the path of light passing through the in-coupling grating (e.g., via diffraction or reflection). The optical properties of the in-coupling grating may depend on the spatial frequency of the gratings, an incident angle of the gratings, and/or the geometry of the individual gratings (e.g., thickness, size, curved gratings vs a sawtooth pattern), as non-limiting examples. For the purposes of this disclosure, an “in-coupling grating” refers to an optical element on the surface of a waveguide that accepts at least some light incident on the in-coupling grating into a waveguide, in which the light begins propagating via total internal reflection.

After being in-coupled to waveguide 208, light 210R propagates via a series of internal reflections until reaching an out-coupling grating 214 configured to out-couple the light from the waveguide and toward user eye 200. As with the in-coupling grating, the out-coupling grating may take any suitable form, but may often be implemented as a physical pattern that is printed or etched onto the surface of the waveguide in a manner that alters the path of light passing through the out-coupling grating (e.g., via diffraction or reflection). Again, optical properties of the out-coupling grating may depend on the spatial frequency of the gratings, an incident angle of the gratings, and/or the geometry of the individual gratings, as non-limiting examples. For the purposes of this disclosure, an “out-coupling grating” refers to an optical element on the surface of a waveguide that accepts at least some light incident on the out-coupling grating and directs it out of the waveguide—e.g., toward a user eye.

In some cases, the in-coupling grating may be at least partially wavelength-selective. For example, light 210 may specifically include light of a first wavelength band, while excluding other wavelengths of visible light. The in-coupling grating may be configured to selectively in-couple light of the first wavelength band (e.g., blue light) while rejecting at least some light outside the first wavelength band (e.g., red and/or green light). As will be described in more detail below, the waveguide display assembly may in some cases include additional waveguides having additional in-coupling gratings, configured to in-couple light of other wavelength bands. Thus, for example, the waveguide display assembly may include three different waveguides, one each for blue light, red light, and green light.

However, an in-coupling grating may not in-couple all of the light of its intended wavelength band, and similarly may not reject all of the light outside its intended wavelength band. For example, in the case where in-coupling grating 212 is configured to in-couple blue wavelengths of light, some amount of blue light may pass through the in-coupling grating without being in-coupled into the waveguide. Instead, some amount of blue light may continue toward other waveguides of the waveguide display assembly, as will be described in more detail below. Similarly, some amount of red and/or green light may be in-coupled into waveguide 208, and ultimately out-coupled toward the user eye, which can negatively affect a virtual image formed by the emitted light.

Accordingly, as discussed above, a waveguide display assembly may in some cases include additional diffractive gratings configured to mitigate the cross-coupling of light of different wavelength bands into the same waveguide. This is illustrated with respect to FIG. 3, again schematically showing waveguide display assembly 106. However, FIG. 3 provides a more detailed view of the waveguide display assembly, in which additional components are depicted that were not shown in FIG. 2.

Specifically, in FIG. 3, light source 206 is emitting two different wavelength bands of light, including light 210 of a first wavelength band, and light 300 of a second wavelength band. The first wavelength band may, for example, correspond to blue light, while the second wavelength band corresponds to green light, and/or the two different wavelength bands may correspond to any other suitable wavelengths of visible light (or other non-visible electromagnetic radiation). While FIG. 3 depicts these two wavelength bands of light as different, spatially-separate emissions of light, it will be understood that this is done only for the sake of illustration. In practical applications, the two different wavelength bands of light may be emitted from the light source as a single beam (e.g., a beam of white light emitted by an LED). Alternatively, the light source may include multiple wavelength-specific light emitters—e.g., a laser that emits blue light, and a laser that emits green light.

Regardless, as shown, both light 210 and light 300 are incident on in-coupling grating 212 of waveguide 208. In-coupling grating 212 is configured to selectively in-couple light of the first wavelength band (e.g., light 210), while rejecting at least some light outside of the first wavelength band (e.g., light 300). In FIG. 3, some amount of light 210 is in-coupled into the waveguide and begins propagating via total internal reflection as light 210R, as discussed above. However, some amount of light 210 also passes through the in-coupling grating without being in-coupled into the waveguide. Similarly, some amount of light 300 ends up in-coupled into waveguide 208 and propagating via total internal reflection as light 300R, even as much of light 300 is rejected by the in-coupling grating and is not in-coupled into the waveguide. In the event that light 300 is out-coupled toward user eye 200, the clarity and/or accuracy of an image formed by the light may suffer, as discussed above.

Accordingly, a waveguide display assembly may include one or more diffractive gratings disposed along an optical path between the in-coupling grating and the out-coupling grating. In FIG. 3, these include diffractive gratings 302A-302F, disposed along an optical path 303 between the in-coupling and out-coupling gratings. The one or more diffractive gratings are configured to diffract light outside the first wavelength band out of the waveguide and away from the user eye. In FIG. 3, each of the diffractive gratings 302 direct some amount of light 300R out of the waveguide as waste light 300W, while substantially all of light 210R continues propagating through waveguide 208 toward the out-coupling grating.

In practical examples, it will be understood that some amount of light 210R may also be diffracted out of the waveguide by the diffractive gratings 302, and that not all of the light 300R may be diffracted out of the waveguide by the diffractive gratings 302. In general, however, use of one or more diffractive gratings as described herein may improve the ratio of intended wavelengths vs unintended wavelengths that are out-coupled toward the user eye, by diffracting much of the unintended light out of the waveguide before the out-coupling grating.

It will be understood that the specific quantity, spacing, and configuration of diffractive gratings 302A-302F shown in FIG. 3 is non-limiting. In other examples, a waveguide display assembly may include a different number of diffractive gratings (including only one), and may include diffractive gratings on other surfaces of the waveguide than the upper surface shown in FIG. 3. In general, a waveguide display assembly may include any suitable number of diffractive gratings, each separated by any suitable distance, and each disposed on any suitable surfaces of the waveguide.

Each diffractive grating may be implemented in any suitable way. As with the in-coupling and out-coupling gratings, the diffractive gratings will typically be implemented as a physical pattern printed or etched onto the waveguide surface in a way that controls light incident on the diffractive grating in an intended manner. For example, a diffractive grating may diffract particular wavelengths of light out of a waveguide, while allowing other wavelengths of light to continue propagating through the waveguide. Such functionality may be achieved by tuning any suitable optical properties of each diffractive grating. As non-limiting examples, such optical properties can include the spatial frequency of the gratings, an incident angle of the gratings, the geometry of the individual gratings and/or other properties.

In the example of FIG. 3, the light diffracted out of the waveguide by diffractive gratings 302A-302F (otherwise referred to as “waste light”) goes in an opposite direction from the light 210 that is out-coupled toward the user eye. It will be understood, however, that the diffractive gratings may diffract light out of the waveguide in any suitable direction, although it is generally desirable that the waste light be directed such that it is not visible to the user eye. Thus, waste light 300W need not be diffracted out of the waveguide in the specific upwards direction shown in FIG. 3, but rather may be diffracted out of any suitable surface of the waveguide at any suitable angle.

In some examples, light outside the first wavelength band (e.g., the waste light) may be diffracted out of the waveguide and toward a waste-light area. The waste-light area may, for example, include an optically absorbent material to absorb the light outside the first wavelength band, thereby mitigating any potential visibility of the waste light to the user eye. The waste light may be diffracted directly from the waveguide into the waste-light area. Alternatively, the waste light may be steered toward the waste light area by one or more suitable optical elements of the waveguide display assembly. Any suitable optically absorbent material may be used—e.g., suitable plastics, rubbers, or coated/painted metals.

For example, in FIG. 3, the light outside of the first wavelength band (e.g., waste light 300W) is diffracted out of waveguide 208 and into a waste-light collection waveguide 304. From there, the waste-light collection waveguide directs light 300W to a waste-light area 306, including an optically absorbent material 308. In this manner, at least some of the light 300 that is in-coupled into waveguide 208 by in-coupling grating 212 may ultimately be absorbed by optically absorbent material 308, rather than being outcoupled toward user eye 200 along with light 210.

It will be understood that the specific arrangement of waveguide 208, waste-light collection waveguide 304, waste-light area 306, and optically absorbent material 308 shown in FIG. 3 is a non-limiting example. In practical scenarios, each of these components may have any suitable shapes, sizes, and spatial positions with respect to one another and the rest of waveguide display assembly 106. Furthermore, though no optical elements (e.g., in-coupling gratings or out-coupling gratings) are specifically shown on waste-light collection waveguide 304, it will be understood that such a waveguide may have any suitable combination of optical elements for accepting waste light and directing the waste light toward a waste-light area.

Turning now to FIG. 4, another schematic view of waveguide display assembly 106 is shown, in which some components depicted in FIG. 3 are omitted while other components are added. As shown in FIG. 4, the light source emits light 210 of a first wavelength band and light 300 of a second wavelength band toward waveguide 208, along with light of a third wavelength band 400. For example, light 210 may correspond to blue light, while light 300 corresponds to green light and light 400 corresponds to red light. Similar to FIG. 3, although FIG. 4 shows light emissions 210, 300, and 400 as being distinct and spatially-separate, the light source may in some cases emit a single beam of light toward the waveguide—e.g., the light source may emit full spectrum white light that includes light of the first, second, and third wavelength bands 210, 300, and 400.

Each of light 210, 300, and 400 are incident on in-coupling grating 212 of waveguide 208, and some amount of each wavelength band is in-coupled into the waveguide, resulting in propagation of light 210R, 300R, and 400R through the waveguide. As described above, waveguide 208 includes one or more diffractive gratings, including diffractive grating 302A, configured to diffract light outside the first wavelength band from the waveguide. Thus, as shown in FIG. 4, some amount of light 300 is diffracted out of the waveguide as waste light 300W, and some amount of light 400 is diffracted out of the waveguide as waste light 400W. Much of light 210 continues propagating through waveguide 208 until ultimately being out-coupled toward the user eye as light 210T.

The present disclosure has thus far primarily focused only on propagation of light through waveguide 208. As discussed above, however, a waveguide display assembly may in some cases include multiple different waveguides for directing different wavelength bands of light toward the user eye. For example, because in-coupling grating 212 selectively in-couples light of the first wavelength band, much of the light 300 of the second wavelength band and light 400 of the third wavelength band will pass through in-coupling grating 212. The waveguide display assembly may include additional waveguides configured to in-couple and steer these other wavelength bands of light.

For example, in FIG. 4, waveguide 208 is a first waveguide, and the waveguide display assembly further comprises a second waveguide 402 including a second in-coupling grating 404 configured to selectively in-couple light 300 of the second wavelength band. Specifically, in-coupling grating 404 in-couples at least some of the incident light 300 of the second wavelength band, while rejecting at least some of any incident light outside the second wavelength band. This results in propagation of light 300 within waveguide 402 as light 300R and, to a lesser extent, propagation of light 400 in waveguide 402 as light 400R. Some amount of light 210 of the first wavelength band may in some cases also be in-coupled by waveguide 402, although as will be discussed in more detail below, such light may in some cases be filtered out before reaching in-coupling grating 404.

Similar to waveguide 208, waveguide 402 includes an out-coupling grating 406 configured to outcouple light 300R of the second wavelength band toward user eye 200 as light 300R. To mitigate the amount of light 400R that is outcoupled by out-coupling grating 406, waveguide 402 also includes one or more diffractive gratings, including a diffractive grating 408A. These diffractive gratings may be configured to selectively diffract light outside the second wavelength band out of the waveguide (e.g., light 400R), while allowing light 300R of the second wavelength band to continue propagating through the waveguide via total internal reflection. Thus, in FIG. 4, some amount of light 400 of the third wavelength band is diffracted out of waveguide 402 as waste light 400W by diffractive grating 408A. Similar to waste light 300W described above with respect to FIG. 3, waste light 400W may be diffracted out of the waveguide and toward a waste-light area, and/or generally directed away from the user eye.

In FIG. 4, waveguide display assembly 106 also includes a third waveguide 410 including a third in-coupling grating 412 configured to selectively in-couple light 400 of the third wavelength band. Specifically, in-coupling grating 412 in-couples at least some of the incident light 400 of the third wavelength band, while rejecting at least some of any incident light outside the third wavelength band. This results in propagation of light 400 within waveguide 410 as light 400R. Waveguide 410 also includes an out-coupling grating 414 configured to out-couple light 400R from the waveguide toward the user eye as light 400T.

In some cases, some amount of light 210 and/or light 300 may also be in-coupled into waveguide 410. However, as will be described in more detail below, such light may in some cases be filtered before reaching in-coupling grating 412. In cases where substantial amounts of light 210 and/or light 300 is in-coupled into waveguide 410, the waveguide may include diffractive gratings similar to those of waveguides 208 and 402, although this is not shown in FIG. 4.

In the example of FIG. 4, the first waveguide 208, second waveguide 402, and third waveguide 410 are arranged in a stack, such that the first, second, and third in-coupling gratings 212, 404, and 412 are aligned with each other and with light source 206. It will be understood, however, that the various waveguides of the waveguide display assembly may have any suitable spatial arrangements with respect to one another, the light source, and any other components of the waveguide display assembly.

As discussed above, though in-coupling grating 212 selectively in-couples light 210 of the first wavelength band, some amount of light 210 may pass through the in-coupling grating without being in-coupled into the waveguide. Rather, such light may continue toward the second and third waveguides, where it may potentially be in-coupled and ultimately directed toward the user eye along with light of the second and/or third wavelength bands. This can introduce visual artifacts and/or negatively impact the clarity of the virtual images presented to the user eye, as discussed above.

Accordingly, in some examples, the waveguide display assembly may include a first color-specific filter disposed between the first waveguide and the second waveguide, and/or a second color-specific filter disposed between the second waveguide and the third waveguide. This is shown in FIG. 4, in which waveguide display assembly 106 includes a first color-specific filter 416 between waveguide 208 and waveguide 402. A second color-specific filter 418 is between waveguide 402 and waveguide 410.

Each color-specific filter may be configured to selectively filter one or more wavelengths of light, while passing other wavelengths of light. For example, first color-specific filter 416 may filter light 210 of the first wavelength band while passing light 300 of the second wavelength band and light 400 of the third wavelength band. This is schematically shown in FIG. 4, in which light 210 that passes through in-coupling grating 212 without being in-coupled into waveguide 208 does not reach in-coupling grating 404 of waveguide 402, as it is instead filtered by color-specific filter 416. Similarly, second color-specific filter 418 may be configured to filter light of the second wavelength band while passing light of the third wavelength band. In FIG. 4, light 300 that passes through in-coupling grating 404 without being in-coupled into waveguide 402 does not reach in-coupling grating 412 of waveguide 410, as it is instead filtered by color-specific filter 418.

The first and second color-specific filters may be implemented in any suitable way. In general, the first and second color-specific filters may take the form of any suitable optical elements that are useable to filter light of one or more wavelengths while passing light of other wavelengths. Furthermore, it will be understood that the specific shapes, sizes, and positions of the color-specific filters shown in FIG. 4 are non-limiting examples.

In some examples, light polarization may be used to mitigate cross coupling of light of different wavelength bands into the same waveguide. For example, the first in-coupling grating 212 may be configured to accept light having a first polarization direction (e.g., vertical) while rejecting at least some light having a second polarization direction, orthogonal to the first polarization direction (e.g., horizontal). In this example, the light source may emit light of the first wavelength band (e.g., blue light) and the third wavelength band (e.g., red light) with a vertical polarization direction, while light of the second wavelength band is emitted with a horizontal polarization direction. In this manner, while light of the first and third wavelength bands may have the correct polarization direction for in-coupling into the first waveguide, light of the second wavelength band may be substantially rejected. This may reduce the amount of the light of the second wavelength that is cross-coupled into the first waveguide.

Polarization-dependent in-coupling gratings may in some cases be used in tandem with optical elements configured to change the polarization direction of incident light. In some examples, these polarization-changing optical elements may be implemented as the first and/or second color-specific filters described above, and/or other suitable optical elements may instead be used. To continue with the above example, the first color-specific filter may be configured to rotate a current polarization direction of light passed by the first color-specific filter. In this manner, the polarity of the light of the second and third wavelength bands will be rotated as the light passes through the first color-specific filter. As a result, the light of the second wavelength band may have the correct polarization direction for in-coupling into the second waveguide when the light is incident on the second in-coupling grating, while light of the third wavelength band may be substantially rejected. The polarization direction of the light of the third wavelength band may again be rotated when the light passes through the second color-specific filter, causing the light to have the correct polarization direction for in-coupling into the third waveguide 410.

The present disclosure has described various approaches to mitigating the amount of light outside an intended wavelength band that is in-coupled into a waveguide and out-coupled toward a user eye. However, even when the above-described techniques are implemented, some amount of cross-coupling of light may still occur. For example, referring to the first waveguide 208, at least some light 300 or 400 outside the first wavelength band may be out-coupled by the out-coupling grating toward the user eye.

Any resulting negative effects on the virtual images displayed to the user eye may in some cases be corrected for in software by manipulating the image source. For example, the image controller may be configured to modify the virtual image formed by the light emitted by the light source to account for the out-coupling of the light outside the first wavelength band. Notably, when cross-coupling of light occurs, the resulting virtual image may include localized color uniformity issues—e.g., when red light is cross-coupled into a waveguide intended for blue light, it may cause a particular portion of the image (such as a corner) to have an overly-red appearance. The image controller may correct for this by, for example, instructing the light source to reduce the amount of red light used to form that part of the image, to correct for the excess red light out-coupled by the waveguide. It will be understood, however, that this is one non-limiting example scenario, and visual artifacts in a displayed image may be corrected for in software in any suitable way.

The methods and processes described herein may be tied to a computing system of one or more computing devices. In particular, such methods and processes may be implemented as an executable computer-application program, a network-accessible computing service, an application-programming interface (API), a library, or a combination of the above and/or other compute resources.

FIG. 5 schematically shows a simplified representation of a computing system 500 configured to provide any to all of the compute functionality described herein. Computing system 500 may take the form of one or more personal computers, network-accessible server computers, tablet computers, home-entertainment computers, gaming devices, mobile computing devices, mobile communication devices (e.g., smart phone), virtual/augmented/mixed reality computing devices, wearable computing devices, Internet of Things (IoT) devices, embedded computing devices, and/or other computing devices. In some cases, HMD 102 described above may be implemented as computing system 500.

Computing system 500 includes a logic subsystem 502 and a storage subsystem 504. Computing system 500 may optionally include a display subsystem 506, input subsystem 508, communication subsystem 510, and/or other subsystems not shown in FIG. 5.

Logic subsystem 502 includes one or more physical devices configured to execute instructions. For example, the logic subsystem may be configured to execute instructions that are part of one or more applications, services, or other logical constructs. Image controller 204 may in some cases be implemented as logic subsystem 502. The logic subsystem may include one or more hardware processors configured to execute software instructions. Additionally, or alternatively, the logic subsystem may include one or more hardware or firmware devices configured to execute hardware or firmware instructions. Processors of the logic subsystem may be single-core or multi-core, and the instructions executed thereon may be configured for sequential, parallel, and/or distributed processing. Individual components of the logic subsystem optionally may be distributed among two or more separate devices, which may be remotely located and/or configured for coordinated processing. Aspects of the logic subsystem may be virtualized and executed by remotely-accessible, networked computing devices configured in a cloud-computing configuration.

Storage subsystem 504 includes one or more physical devices configured to temporarily and/or permanently hold computer information such as data and instructions executable by the logic subsystem. When the storage subsystem includes two or more devices, the devices may be collocated and/or remotely located. Storage subsystem 504 may include volatile, nonvolatile, dynamic, static, read/write, read-only, random-access, sequential-access, location-addressable, file-addressable, and/or content-addressable devices. Storage subsystem 504 may include removable and/or built-in devices. When the logic subsystem executes instructions, the state of storage subsystem 504 may be transformed—e.g., to hold different data.

Aspects of logic subsystem 502 and storage subsystem 504 may be integrated together into one or more hardware-logic components. Such hardware-logic components may include program- and application-specific integrated circuits (PASIC/ASICs), program- and application-specific standard products (PSSP/ASSPs), system-on-a-chip (SOC), and complex programmable logic devices (CPLDs), for example.

The logic subsystem and the storage subsystem may cooperate to instantiate one or more logic machines. As used herein, the term “machine” is used to collectively refer to the combination of hardware, firmware, software, instructions, and/or any other components cooperating to provide computer functionality. In other words, “machines” are never abstract ideas and always have a tangible form. A machine may be instantiated by a single computing device, or a machine may include two or more sub-components instantiated by two or more different computing devices. In some implementations a machine includes a local component (e.g., software application executed by a computer processor) cooperating with a remote component (e.g., cloud computing service provided by a network of server computers). The software and/or other instructions that give a particular machine its functionality may optionally be saved as one or more unexecuted modules on one or more suitable storage devices.

When included, display subsystem 506 may be used to present a visual representation of data held by storage subsystem 504. In some embodiments, waveguide display assemblies 106L and 106R may be implemented as display subsystem 506. This visual representation may take the form of a graphical user interface (GUI). Display subsystem 506 may include one or more display devices utilizing virtually any type of technology. In some implementations, display subsystem may include one or more virtual-, augmented-, or mixed reality displays.

When included, input subsystem 508 may comprise or interface with one or more input devices. An input device may include a sensor device or a user input device. Examples of user input devices include a keyboard, mouse, touch screen, or game controller. In some embodiments, the input subsystem may comprise or interface with selected natural user input (NUI) componentry. Such componentry may be integrated or peripheral, and the transduction and/or processing of input actions may be handled on- or off-board. Example NUI componentry may include a microphone for speech and/or voice recognition; an infrared, color, stereoscopic, and/or depth camera for machine vision and/or gesture recognition; a head tracker, eye tracker, accelerometer, and/or gyroscope for motion detection and/or intent recognition.

When included, communication subsystem 510 may be configured to communicatively couple computing system 500 with one or more other computing devices. Communication subsystem 510 may include wired and/or wireless communication devices compatible with one or more different communication protocols. The communication subsystem may be configured for communication via personal-, local- and/or wide-area networks.

This disclosure is presented by way of example and with reference to the associated drawing figures. Components, process steps, and other elements that may be substantially the same in one or more of the figures are identified coordinately and are described with minimal repetition. It will be noted, however, that elements identified coordinately may also differ to some degree. It will be further noted that some figures may be schematic and not drawn to scale. The various drawing scales, aspect ratios, and numbers of components shown in the figures may be purposely distorted to make certain features or relationships easier to see.

In an example, a waveguide display assembly comprises: a waveguide including: an in-coupling grating configured to in-couple light of a first wavelength band emitted by a light source into the waveguide, and cause propagation of the light of the first wavelength band through the waveguide via total internal reflection; an out-coupling grating configured to out-couple the light of the first wavelength band from the waveguide and toward a user eye; and one or more diffractive gratings disposed along an optical path between the in-coupling grating and the out-coupling grating, the one or more diffractive gratings configured to diffract light outside the first wavelength band out of the waveguide and away from the user eye. In this example or any other example, the light outside the first wavelength band is diffracted out of the waveguide and toward a waste-light area. In this example or any other example, the waste-light area includes an optically absorbent material to absorb the light outside the first wavelength band. In this example or any other example, the light outside the first wavelength band is diffracted out of the waveguide and into a waste-light collection waveguide, the waste-light collection waveguide configured to direct the light outside the first wavelength band toward the waste-light area. In this example or any other example, the in-coupling grating is configured to selectively in-couple light of the first wavelength band while rejecting at least some light outside the first wavelength band. In this example or any other example, the waveguide is a first waveguide, and the waveguide display assembly further comprises a second waveguide including a second in-coupling grating configured to selectively in-couple light of a second wavelength band, and a third waveguide including a third in-coupling grating configured to selectively in-couple light of a third wavelength band. In this example or any other example, the first wavelength band corresponds to blue light, the second wavelength band corresponds to green light, and the third wavelength band corresponds to red light. In this example or any other example, the first waveguide, the second waveguide, and the third waveguide are arranged in a stack such that the first, second, and third in-coupling gratings are aligned with each other and with the light source. In this example or any other example, the waveguide display assembly further comprises a first color-specific filter disposed between the first waveguide and the second waveguide, and a second color-specific filter disposed between the second waveguide and the third waveguide. In this example or any other example, the first color-specific filter is configured to filter light of the first wavelength band while passing light of the second wavelength band and the third wavelength band, and wherein the second color-specific filter is configured to filter light of the second wavelength band while passing light of the third wavelength band. In this example or any other example, the first in-coupling grating is configured to in-couple light having a first polarization direction while rejecting at least some light having a second polarization direction, orthogonal to the first polarization direction. In this example or any other example, the first color-specific filter is configured to rotate a current polarization direction of light passed by the first color-specific filter, and the second color-specific filter is configured to rotate a current polarization direction of light passed by the second color-specific filter. In this example or any other example, the waveguide display assembly further comprises an image controller configured to control activity of the light source, such that light emitted by the light source toward the waveguide forms a virtual image for viewing by the user eye. In this example or any other example, at least some light outside the first wavelength band is out-coupled by the out-coupling grating toward the user eye, and the image controller is further configured to modify the virtual image formed by the light emitted by the light source to account for the out-coupling of the light outside the first wavelength band.

In an example, a display device comprises: a light source configured to emit light; an image controller configured to control activity of the light source, such that the light emitted by the light source forms a virtual image for viewing by a user eye; and a waveguide display assembly, comprising: a waveguide, including: an in-coupling grating configured to in-couple light of a first wavelength band emitted by the light source into the waveguide, and cause propagation of the light of the first wavelength band through the waveguide via total internal reflection; an out-coupling grating configured to out-couple the light of the first wavelength band from the waveguide and toward the user eye; and one or more diffractive gratings disposed along an optical path between the in-coupling grating and the out-coupling grating, the one or more diffractive gratings configured to diffract light outside the first wavelength band out of the waveguide and away from the user eye. In this example or any other example, the light outside the first wavelength band is diffracted out of the waveguide and toward a waste-light area. In this example or any other example, the waveguide is a first waveguide, and the waveguide display assembly further comprises a second waveguide including a second in-coupling grating configured to selectively in-couple light of a second wavelength band, and a third waveguide including a third in-coupling grating configured to selectively in-couple light of a third wavelength band. In this example or any other example, the first waveguide, the second waveguide, and the third waveguide are arranged in a stack such that the first, second, and third in-coupling gratings are aligned with each other and with the light source, and wherein the waveguide display assembly further comprises a first color-specific filter disposed between the first waveguide and the second waveguide, and a second color-specific filter disposed between the second waveguide and the third waveguide. In this example or any other example, the second in-coupling grating is configured to in-couple light having a first polarization direction while rejecting at least some light having a second polarization direction, orthogonal to the first polarization direction.

In an example, a waveguide display assembly comprises: a first waveguide, including: a first in-coupling grating configured to in-couple light of a first wavelength band emitted by a light source into the first waveguide, and cause propagation of the light of the first wavelength band through the first waveguide via total internal reflection; a first out-coupling grating configured to out-couple the light of the first wavelength band from the first waveguide and toward a user eye; and one or more first diffractive gratings disposed along an optical path between the first in-coupling grating and the first out-coupling grating, the one or more first diffractive gratings configured to diffract light outside the first wavelength band out of the first waveguide and away from the user eye; and a second waveguide, including: a second in-coupling grating configured to in-couple light of a second wavelength band emitted by the light source into the second waveguide, and cause propagation of the light of the second wavelength band through the second waveguide via total internal reflection; a second out-coupling grating configured to out-couple the light of the second wavelength band from the second waveguide and toward the user eye; and one or more second diffractive gratings disposed along an optical path between the second in-coupling grating and the second out-coupling grating, the one or more second diffractive gratings configured to diffract light outside the second wavelength band out of the second waveguide and away from the user eye.

It will be understood that the configurations and/or approaches described herein are exemplary in nature, and that these specific embodiments or examples are not to be considered in a limiting sense, because numerous variations are possible. The specific routines or methods described herein may represent one or more of any number of processing strategies. As such, various acts illustrated and/or described may be performed in the sequence illustrated and/or described, in other sequences, in parallel, or omitted. Likewise, the order of the above-described processes may be changed.

The subject matter of the present disclosure includes all novel and non-obvious combinations and sub-combinations of the various processes, systems and configurations, and other features, functions, acts, and/or properties disclosed herein, as well as any and all equivalents thereof.

文章《Microsoft Patent | Waveguide display assembly》首发于Nweon Patent

]]>
Microsoft Patent | Near-eye display system having multiple pass in-coupling for waveguide display https://patent.nweon.com/26153 Thu, 08 Dec 2022 13:40:56 +0000 https://patent.nweon.com/?p=26153 ...

文章《Microsoft Patent | Near-eye display system having multiple pass in-coupling for waveguide display》首发于Nweon Patent

]]>
Patent: Near-eye display system having multiple pass in-coupling for waveguide display

Patent PDF: 加入映维网会员获取

Publication Number: 20220390744

Publication Date: 2022-12-08

Assignee: Microsoft Technology Licensing

Abstract

A waveguide display for use in a near-eye display system includes a waveguide stack having at least one waveguide substrate, an input coupler coupling light into the waveguide substrate and an optical arrangement that includes a birefringent reflective polarizer, a mirror and a polarization state converting element configured to convert light in a linear polarization state to a circular polarization state and to convert light in a circular polarization state to a linear polarization state. The mirror is arranged to receive light from the polarization state converting element and reflect the light back to the polarization state converting element. The optical arrangement causes a transmission path of light that traverses the waveguide stack a first time to be folded back through the waveguide stack such that at least a portion of light not coupled into the waveguide substrate is caused to traverse the waveguide stack a plurality of additional times.

Claims

1.A waveguide display, comprising: a waveguide stack that includes at least one waveguide substrate; an input coupler for coupling light into the waveguide substrate, the input coupler being configured to in-couple a first range of wavelengths into the waveguide substrate; and an optical arrangement that includes a birefringent reflective polarizer, a mirror and a polarization state converting element configured to convert light in a linear polarization state to a circular polarization state and to convert light in a circular polarization state to a linear polarization state, the mirror being arranged to receive light from the polarization state converting element and reflect the light back to the polarization state converting element, the optical arrangement causing a transmission path of light that traverses the waveguide stack a first time to be folded back through the waveguide stack such that at least a portion of light not coupled into the at one waveguide substrate is caused to traverse the waveguide stack a plurality of additional times, wherein the polarization state converting element is an achromatic wide-angle quarter-wave plate at a 45° orientation.

2.The waveguide display of claim 1, wherein the birefringent reflective polarizer is located to direct the light into the waveguide stack so that the light that traverses the waveguide stack the first time is linearly polarized or circularly polarized.

3.(canceled)

4.The waveguide display of claim 1, wherein the polarization state converting element is a Faraday rotator located to receive light after traversing the waveguide stack the first time.

5.The waveguide display of claim 1, wherein the optical arrangement is configured so that the polarization state converting element receives linearly polarized light and causes linearly polarized light to be in-coupled to the at least one waveguide substrate.

6.The waveguide display of claim 5, wherein the polarization state converting element is located to receive light after the light traverses the waveguide stack the first time.

7.The waveguide display of claim 1, wherein the optical arrangement is configured so that the polarization state converting element receives linearly polarized light and causes circularly polarized light to be in-coupled to the at least one waveguide substrate.

8.The waveguide display of claim 7, wherein the polarization state converting element is located to receive the light before the light traverses the waveguide stack the first time.

9.The waveguide display of claim 1, wherein the at least one waveguide substrate includes at least first and second waveguide substrates and the input coupler includes first and second input couplers for coupling light into the first and second waveguide substrates, respectively, the first input coupler being configured to in-couple a first range of wavelengths into the first waveguide substrate and transmit other wavelengths and the second input coupler being configured to in-couple a second range of wavelengths into the second waveguide substrate and transmit other wavelengths, and further comprising at least first and second output couplers for coupling light out of the first and second waveguide substrates, respectively, the first output coupler being configured to out-couple the first range of wavelengths from the first waveguide substrate and the second output coupler being configured to out-couple the second range of wavelengths from the second waveguide substrate.

10.The waveguide display of claim 9, wherein the first input coupler further comprises a plurality of first input couplers for coupling light in the first waveguide substrate and the second input coupler further comprises a plurality of second input couplers for coupling light in the second waveguide substrate, and wherein the optical arrangement further comprises a plurality of optical arrangements, each of the optical arrangements being associated with one of the plurality of first input couplers or one of the plurality of second input couplers and being tailored for operation at wavelengths to be in-coupled by the input coupler with which it is respectively associated.

11.The waveguide display of claim 9, wherein the optical arrangement includes a dielectric filter disposed along the transmission path of the light for reflecting light of selected wavelengths back to one or more of the at least first and second waveguide plates that in-couple light of the selected wavelengths and to transmit therethrough wavelengths other than the selected wavelengths.

12.A see-through, near eye display system, comprising: an imager for providing an output image; an exit pupil expander (EPE); a display engine for coupling the output image in a first polarization state from the imager into the EPE, the EPE including: a waveguide stack that includes at least first and second waveguide plates, each of the waveguide plates including a substrate having an input coupling diffractive optical element (DOE) for in-coupling image light of a range of wavelengths to the substrate and transmitting other wavelengths of image light and at least one output coupling DOE for out-coupling image light of the range of wavelengths from the substrate, the range of wavelengths of the image light for each of the waveguide plates differing at least in part from each of the other waveguide plates; a birefringent reflective polarizer configured to reflect light in a second polarization state orthogonal to the first polarization state and transmit therethrough to the waveguide stack light in the first polarization state; a polarization state converting element configured to receive light in the first polarization state after traversing the waveguide stack and convert the light in the first polarization state to circularly polarized light; and a reflector for receiving the circularly polarized light and reflecting the circularly polarized light back to the polarization state converting element to thereby convert the reflected circularly polarized light to reflected light in the second linear polarization state, the reflected light in the second linear polarization state being directed from the polarization state converting element back to the waveguide stack such that the reflected light in the second linear polarization state traversing the waveguide stack is reflected back to the waveguide stack by the birefringent reflective polarizer.

13.The see-through, near eye display system of claim 12, wherein the polarization state converting element is an achromatic wide-angle quarter-wave plate at a 45° orientation.

14.The see-through, near eye display system of claim 12, wherein the first linearly polarized state is a TE or TM polarization state.

15.The see-through, near eye display system of claim 12, further comprising a dielectric filter disposed along an optical path traversed by the light to reflect light of selected wavelengths back to one or more of the at least first and second waveguide plates that in-couple light of the selected wavelengths and to transmit therethrough wavelengths other than the selected wavelengths.

16.A head mounted display comprising: a head mounted retention system for wearing on a head of a user; a visor assembly secured to the head mounted retention system, the visor assembly including; a chassis; a near-eye optical display system secured to the chassis that includes a waveguide display, the waveguide display including: a waveguide stack that includes at least one waveguide substrate; an input coupler for coupling light into the waveguide substrate, the input coupler being configured to in-couple a first range of wavelengths into the waveguide substrate; and an optical arrangement that includes a birefringent reflective polarizer, a mirror and a polarization state converting element configured to convert light in a linear polarization state to a circular polarization state and to convert light in a circular polarization state to a linear polarization state, the mirror being arranged to receive light from the polarization state converting element and reflect the light back to the polarization state converting element, the optical arrangement causing a transmission path of light that traverses the waveguide stack a first time to be folded back through the waveguide stack such that at least a portion of light not coupled into the at one waveguide substrate is caused to traverse the waveguide stack a plurality of additional times, wherein the input coupler is configured to alter a polarization state of light being in-coupled, the polarization state converting element being a wave plate configured to provide a phase difference that enhances in-coupling of light into the waveguide plate.

17.The head mounted display of claim 16, wherein the birefringent reflective polarizer is located to direct the light into the waveguide stack so that the light that traverses the waveguide stack the first time is linearly polarized or circularly polarized.

18.(canceled)

19.The head mounted display of claim 16, wherein the waveguide stack is configured to function as the polarization state converting element in the optical arrangement.

20.The head mounted display of claim 16, wherein an incoming polarization state of light received by the waveguide display is a linear polarization state that is oriented at a selected angle relative to an angle of gratings of the input coupler to enhance in-coupling of light into the waveguide plate.

Description

BACKGROUND

Mixed-reality computing devices, such as wearable head mounted display (HMD) systems and mobile devices (e.g. smart phones, tablet computers, etc.), may be configured to display information to a user about virtual and/or real objects in a field of view of the user and/or a field of view of a camera of the device. For example, an HMD device may be configured to display, using a see-through display system, virtual environments with real-world objects mixed in, or real-world environments with virtual objects mixed in.

SUMMARY

In embodiments, a near eye display system includes a waveguide display that presents to the eyes of a viewer mixed-reality or virtual-reality images. The waveguide display includes two or more waveguide plates that are stacked over one another with an air gap between them. In certain embodiments each of the waveguide plates in the stack is used to transfer different wavelengths or colors of light to the viewer. The waveguide plates each include a transparent substrate and input and output couplers such as diffractive optical elements (DOEs) for coupling light into and out of the waveguide substrates, respectively. Typically, the image only passes through the input couplers of the waveguide stack a single time, thereby limiting the efficiency of the in-coupling. Sometimes a mirror is provided behind the input couplers to increase the amount of light coupled into the waveguide plates. In some illustrative embodiments the efficiency of the in-coupling of the light to the waveguide plates is increased by folding the transmission path of the light through the waveguide stack so that some of the light in the main beam is able to traverse the waveguide stack up to four times.

In certain embodiments the transmission path is folded using the polarization of light. For instance, if the image light is in a first linear polarization state, a birefringent reflective polarizer is used to direct the image light to the waveguide stack. The birefringent reflective polarizer transmits the light in the first linear polarization state and reflects light in a second linear polarization state orthogonal to the first linear polarization state. After traversing the waveguide stack, any light not in-coupled to the waveguide plates exits the waveguide stack and is converted to circularly polarized light by a polarization state converting element such as an achromatic wide-angle quarter-wave plate at a 45° orientation. The circularly polarized light is then reflected by a mirror back to the quarter-wave plate with its phase reversed. The quarter-wave plate converts the light to the second linear polarization state and directs it back through the waveguide stack. In this way the transmission path can be folded back through the waveguide stack so that at least a portion of the light that has not yet been in-coupled to the waveguide plates traverses the waveguide stack two or more additional times.

This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter. Furthermore, the claimed subject matter is not limited to implementations that solve any or all disadvantages noted in any part of this disclosure. These and various other features will be apparent from a reading of the following Detailed Description and a review of the associated drawings.

DESCRIPTION OF THE DRAWINGS

FIG. 1 shows a block diagram of an illustrative near-eye optical display system.

FIG. 2 shows a view of an illustrative exit pupil expander.

FIG. 3 shows a view of an illustrative exit pupil expander (EPE) in which the exit pupil is expanded along two directions.

FIG. 4 shows an illustrative input to an exit pupil expander in which the FOV is described by angles in horizontal, vertical, or diagonal orientations.

FIG. 5 shows a pictorial front view of a sealed visor that may be used as a component of a head mounted display (HMD) device.

FIG. 6 shows a partially disassembled view of the sealed visor.

FIG. 7 shows an alternative example of an EPE in which a stack of two or more waveguide plates are employed.

FIG. 8 illustrates the operation of the EPE shown in FIG. 7.

FIG. 9 shows one example of an EPE in which the waveguide display includes a stack of two or more waveguide plates as described above in which some of the light in the main beam from a display engine is able to traverse the waveguide stack up to four times.

FIG. 10 shows another example of an EPE in which the waveguide display includes a stack of two or more waveguide plates as described above in which some of the light in the main beam from a display engine is able to traverse the waveguide stack up to four times.

FIG. 11 shows an illustrative example of a mixed-reality or virtual-reality HMD device.

FIG. 12 shows a functional block diagram of the mixed-reality or virtual-reality HMD device shown in FIG. 11.

Like reference numerals indicate like elements in the drawings. Elements are not drawn to scale unless otherwise indicated.

DETAILED DESCRIPTION

FIG. 1 shows a block diagram of an illustrative near-eye optical display system 100 which may incorporate a combination of optical couplers such as diffractive optical elements (DOEs) that provide in-coupling of incident light into a waveguide plate, exit pupil expansion in two directions, and out-coupling of light out of the waveguide plate. Near-eye optical display systems are often used, for example, in head mounted display (HMD) devices in industrial, commercial, and consumer applications. Other devices and systems may also use near-eye display systems, as described below. The near-eye optical display system 100 is an example that is used to provide context and illustrate various features and aspects of the present compact display engine with MEMS scanners.

System 100 may include one or more imagers (representatively indicated by reference numeral 105) that work with an optical system 110 to deliver images as a virtual display to a user’s eye 115. The imager 105 may include, for example, RGB (red, green, blue) light emitting diodes (LEDs), LCOS (liquid crystal on silicon) devices, OLED (organic light emitting diode) arrays, lasers, laser diodes, or any other suitable displays or micro-displays operating in transmission, reflection, or emission. The optical system 110 can typically include a display engine 120, pupil forming optics 125, and one or more waveguide plates 130. The imager 105 may include or incorporate an illumination unit and/or light engine (not shown) that may be configured to provide illumination in a range of wavelengths and intensities in some implementations.

In a near-eye optical display system the imager 105 does not actually shine the images on a surface such as a glass lens to create the visual display for the user. This is not feasible because the human eye cannot focus on something that is that close. Rather than create a visible image on a surface, the near-eye optical display system 100 uses the pupil forming optics 125 to form a pupil and the eye 115 acts as the last element in the optical chain and converts the light from the pupil into an image on the eye’s retina as a virtual display.

The waveguide plate 130 facilitates light transmission between the imager and the eye. One or more waveguide plates can be utilized in the near-eye optical display system because they are transparent and because they are generally small and lightweight (which is desirable in applications such as HMD devices where size and weight is generally sought to be minimized for reasons of performance and user comfort). For example, the waveguide plate 130 can enable the imager 105 to be located out of the way, for example, on the side of the user’s head or near the forehead, leaving only a relatively small, light, and transparent waveguide optical element in front of the eyes. The waveguide plate 130 operates using a principle of total internal reflection (TIR).

FIG. 2 shows a view of an illustrative exit pupil expander (EPE) 305 that may be used in the pupil forming optics 125 shown in FIG. 1. EPE 305 receives an input optical beam from the imager 105 and the display engine 120 as an entrance pupil to produce one or more output optical beams with expanded exit pupil in one or two directions relative to the input (in general, the input may include more than one optical beam which may be produced by separate sources). The display engine replaces magnifying and/or collimating optics that are typically used in conventional display systems. The expanded exit pupil typically facilitates a virtual display to be sufficiently sized to meet the various design requirements such as image resolution, field of view, and the like of a given optical system while enabling the imager and associated components to be relatively light and compact.

The EPE 305 is configured, in this illustrative example, to provide binocular operation for both the left and right eyes which may support stereoscopic viewing. Components that may be utilized for stereoscopic operation such as scanning mirrors, lenses, filters, beam splitters, MEMS devices, or the like are not shown in FIG. 3 for sake of clarity in exposition. The EPE 305 utilizes a waveguide display with a waveguide plate 130 that includes a transparent substrate 126, two out-coupling gratings, 310L and 310R and a central in-coupling grating 340 that are supported on or in the substrate 126. The substrate 126 may be made, for instance, from glass or plastic. The in-coupling and out-coupling gratings may be configured using multiple DOEs. Each DOE is an optical element comprising a periodic structure that can modulate various properties of light in a periodic pattern such as the direction of optical axis, optical path length, and the like. The structure can be periodic in one dimension such as one-dimensional (1D) grating and/or be periodic in two dimensions such as two-dimensional (2D) grating, While the waveguide plate 130 is depicted as having a planar configuration, other shapes may also be utilized including, for example, curved or partially spherical shapes, in which case the gratings disposed thereon are non-co-planar.

While the illustrative EPE 305 shown in FIG. 3 employs a single waveguide plate for binocular operation, in other examples a separate waveguide plate may be used for each eye. In this case each waveguide plate may have its own coupling gratings, imager and display engine.

As shown in FIG. 3, the EPE 305 may be configured to provide an expanded exit pupil in two directions (i.e., along each of a first and second coordinate axis). As shown, the exit pupil is expanded in both the vertical and horizontal directions. It may be understood that the terms “left,” “right,” “up,” “down,” “direction,” “horizontal,” and “vertical” are used primarily to establish relative orientations in the illustrative examples shown and described herein for ease of description. These terms may be intuitive for a usage scenario in which the user of the near-eye optical display device is upright and forward facing, but less intuitive for other usage scenarios. The listed terms are not to be construed to limit the scope of the configurations (and usage scenarios therein) of near-eye optical display features utilized in the present arrangement. The entrance pupil to the EPE 305 at the in-coupling grating 340 is generally described in terms of field of view (FOV), for example, using horizontal FOV, vertical FOV, or diagonal FOV as shown in FIG. 4.

FIG. 5 shows an illustrative example of a visor 600 that incorporates an internal near-eye optical display system that is used in a head mounted display (HMD) device 605 application worn by a user 615. The visor 600, in this example, is sealed to protect the internal near-eye optical display system. The visor 600 typically interfaces with other components of the HMD device 605 such as head mounting/retention systems and other subsystems including sensors, power management, controllers, etc., as illustratively described in conjunction with FIGS. 14 and 15. Suitable interface elements (not shown) including snaps, bosses, screws and other fasteners, etc. may also be incorporated into the visor 600.

The visor 600 includes see-through front and rear shields, 604 and 606 respectively, that can be molded using transparent materials to facilitate unobstructed vision to the optical displays and the surrounding real world environment. Treatments may be applied to the front and rear shields such as tinting, mirroring, anti-reflective, anti-fog, and other coatings, and various colors and finishes may also be utilized. The front and rear shields are affixed to a chassis 705 shown in the disassembled view in FIG. 6.

The sealed visor 600 can physically protect sensitive internal components, including a near-eye optical display system 702 (shown in FIG. 6), when the HMD device is used in operation and during normal handling for cleaning and the like. The near-eye optical display system 702 includes left and right waveguide displays 710 and 715 that respectively provide virtual world images to the user’s left and right eyes for mixed- and/or virtual-reality applications. The visor 600 can also protect the near-eye optical display system 702 from environmental elements and damage should the HMD device be dropped or bumped, impacted, etc.

As shown in FIG. 6, the rear shield 606 is configured in an ergonomically suitable form to interface with the user’s nose, and nose pads and/or other comfort features can be included (e.g., molded-in and/or added-on as discrete components). The sealed visor 600 can also incorporate some level of optical diopter curvature (i.e., eye prescription) within the molded shields in some cases.

FIG. 7 shows an alternative example of an EPE 307 in which a waveguide display includes a stack of two or more waveguide plates are employed instead of the single waveguide plate shown in the EPE 305 of FIG. 3. In this example each waveguide plate, which each may be of the type described above in connection with FIG. 3, can be used to transfer different optical wavelengths or colors of an image. For instance, in the particular example of FIG. 7, waveguide plate 230 may be used to transmit wavelengths corresponding to the red portion of an image and waveguide plate 330 may be used to transmit wavelengths corresponding to the blue and green portions of the image. The use of a waveguide stack instead of a single waveguide plate addresses the problem that may arise because the optical path lengths within the waveguide plates differ for different wavelengths of light, which can adversely impact the uniform distribution of light. In accordance with an embodiment, the red wavelength range is from 600 nm to 650 nm, the green wavelength range is from 500 nm to 550 nm, and the blue wavelength range is from 430 nm to 480 nm. Other wavelength ranges are also possible.

More specifically, an input coupler 212 of the waveguide 230 can be configured to couple light (corresponding to the image) within the red wavelength range into the waveguide 230, and the output couplers 210 and 216 of the waveguide 230 can be configured to couple light (corresponding to the image) within the red wavelength range (which has travelled from the input coupler 212 to the output couplers 210 and 216 by way of TIR) out of the waveguide 230. Similarly, an input coupler 312 of the waveguide 330 can be configured to couple light (corresponding to the image) within the blue and green wavelength ranges into the waveguide 330, and the output couplers 310 and 316 of the waveguide 330 can be configured to couple light (corresponding to the image) within the blue and green wavelength ranges (which has travelled from the input coupler 312 to the output couplers 310 and 316 by way of TIR) out of the waveguide 330.

FIG. 7 also shows left and right eyes 115L and 115R. The left eye 115L is viewing the image (as a virtual image) that is proximate to the output couplers 210 and 310 and the right eye 155R is viewing the image (as a virtual image) that is proximate to the output couplers 230 and 330. Explained another way, the eyes 115L and 115R are viewing the image from an exit pupil associated with the waveguides 230 and 330.

The distance between adjacent waveguides 230 and 330 can be, e.g., between approximately 50 micrometers and 300 micrometers, but is not limited thereto. While not specifically shown, spacers can be located between adjacent waveguides to maintain a desired spacing therebetween.

In other examples of the EPE, the number of waveguide plates in the stack of waveguide plates may vary, with each waveguide plate transmitting a different range of wavelengths or colors. For instance, if three waveguide plates are employed, one may be configured to transmit wavelengths corresponding to red light, another may be configured to transmit wavelengths corresponding to green light and the third waveguide plate may be configured to transmit wavelengths corresponding to blue light. Of course, other combinations of waveguide plates and wavelengths or colors of light may also be employed. Additionally, the wavelength ranges transmitted by each waveguide plate may be different and nonoverlapping from every other plate (as in the examples mentioned above), or, alternatively, the waveguide ranges may overlap for two or more of the waveguide plates. Moreover, the order in which the waveguide plates are stacked may differ in different examples.

As previously mentioned, the input and output couplers can each be implemented as a diffraction grating, or more generally, as a diffractive optical element (DOE). A diffraction grating is an optical component that may contain a periodic structure that causes incident light to split and change direction due to an optical phenomenon known as diffraction. The splitting (known as optical orders) and angle change depend on the characteristics of the diffraction grating. When the periodic structure is on the surface of an optical component, it is referred to a surface grating. When the periodic structure is due to varying of the surface itself, it is referred to as a surface relief grating (SRG). For example, an SRG can include uniform straight grooves in a surface of an optical component that are separated by uniform straight groove spacing regions. Groove spacing regions can be referred to as “lines”, “grating lines” or “filling regions”. A DOE having uniform straight grooves is an example of a one-dimensional (1D) ruled grating. The DOE is not limited to 1D ruled gratings. The DOE could include a two-dimensional (2D) grating. For example, the DOE could include a 2D crossed grating. A crossed grating may also be referred to as a doubly periodic grating. Examples of doubly periodic DOEs include, but are not limited to, a 2D arrays of holes, and a 2D array of pillars. The two periods of a doubly periodic DOE do not have to be perpendicular to each other. The nature of the diffraction by an SRG depends on the wavelength, polarization and angle of light incident on the SRG and various optical characteristics of the SRG, such as refractive index, line spacing, groove depth, groove profile, groove fill ratio and groove slant angle. An SRG can be fabricated by way of a suitable microfabrication process, which may involve etching of and/or deposition on a substrate to fabricate a desired periodic microstructure on the substrate to form an optical component, which may then be used as a production master such as a mold or mask for manufacturing further optical components. An SRG is an example of a Diffractive Optical Element (DOE). When a DOE is present on a surface (e.g., when the DOE is an SRG), the portion of that surface spanned by that DOE can be referred to as a DOE area.

A diffraction grating, instead of being a surface grating, can alternatively be a volume grating, such as a Bragg diffraction grating. It is also possible that one or more of the couplers are manufactured as SRGs and then covered within another material, e.g., using an atomic layer deposition process or an aluminum deposition process, thereby essentially burying the SRGs such that the major planar waveguide surface(s) including the SRG(s) is/are substantially smooth. Such a coupler is one example of a hybrid of a surface and volume diffraction grating. Any one of the input and output coupler can be, e.g., a surface diffraction grating, or a volume diffraction grating, or a hybrid of a surface and volume diffraction grating. In some embodiments any one of the input and output couplers can be a polarization grating. In accordance with some embodiments described herein, each diffraction grating can have a preferential linear polarization orientation specified by a direction of the grating lines of the diffraction grating, wherein the coupling efficiency for light having the preferential linear polarization orientation will be higher than for light having a non-preferential linear polarization orientation.

FIG. 8 illustrates the operation of the EPE 307 shown in FIG. 7. For clarity of illustration, only the rightmost portion of the waveguides 230 and 330 are shown, which direct light to the right eye 115R. The leftmost portion of the EPE operates in a similar fashion. In FIG. 8, the solid arrowed line 322 is representative of red and green light of the image that is output by the display engine 120 and the dashed arrowed line 325 is representative of blue and green light of the image that is output by the light engine 120.

When implemented as an input diffraction grating, the input coupler 212 is designed to diffract e.g., red, light within an input angular range (e.g., +/−15 degrees relative to the normal) into the waveguide plate 230, such that an angle of the diffractively in-coupled light exceeds the critical angle for the waveguide 230 and can thereby travel by way of TIR from the input coupler 212 to the output coupler 216. Further, the input coupler 212 is designed to transmit light outside the wavelength range that is diffracted so that light outside this wavelength range will pass through the waveguide plate 230. However, note that for the waveguide plates in the waveguide stack of FIG. 8 there may be some of amount of cross-coupling between the waveguides. Likewise, output coupler 216 outputs e.g., red light for viewing by the eye 115R.

Similarly, when implemented as an input diffraction grating, the input coupler 312 is designed to diffract e.g., blue and green light within an input angular range (e.g., +/−15 degrees relative to the normal) into the waveguide plate 330, such that an angle of the diffractively in-coupled blue and green light exceeds the critical angle for the waveguide plate 330 and can thereby travel by way of TIR from the input coupler 312 to the output coupler 316. Further, the input coupler 312 is designed to transmit light outside the e.g., blue and green wavelength ranges, so that light outside the blue and green wavelength ranges will pass through the waveguide plate 330. Likewise, output coupler 316 outputs blue and green light for viewing by the eye 214.

More generally, each of the waveguide plates can include an input coupler that is configured to couple-in light within an input angular range (e.g., +/−15 degrees relative to the normal) and within a specific wavelength range into the waveguide plate, such that an angle of the in-coupled light exceeds the critical angle for the waveguide plate and can thereby travel by way of TIR from the input coupler to the output coupler of the waveguide, and such that light outside the specific wavelength range is transmitted and passes through the waveguide plate.

Because near eye display systems are generally designed to be compact with small, energy efficient imagers and display engines, it is generally important to ensure that the light coupled into the waveguide plates is coupled with high efficiency. To accomplish this, a reflector is sometimes behind the input couplers of the waveguide stack so that light from the display engine passes through the stack twice, thereby increasing the amount of light that is coupled into the waveguide plates.

To further increase the amount of light that is coupled into the waveguide plates, an optical arrangement may be provided, which in one embodiment, causes the main incoming light beam from the display engine to pass through the waveguide stack up to four times, doubling the number of times the light can traverse the waveguide stack in comparison to an arrangement that simply uses a mirror located behind the input couplers of the waveguide stack.

FIG. 9 shows one example of an EPE 407 in which the waveguide display includes a stack 409 of two or more waveguide plates as described above in which the main incoming light beam from a display engine 403 is able to traverse the waveguide stack multiple times (e.g., up to four times in some embodiments). The light from the display engine 403 is assumed to be in a first polarization state, which for purposes of illustration is assumed to be a linearly polarized state (i.e., a TE polarization state) in this example. While in some embodiments the light from the display engine may be unpolarized, such embodiments will not be energy efficient since a portion of the light will be lost and not be coupled into the waveguide plates, potentially causing more optical loss than would be gained by having the light traverse the waveguide stack four times.

As shown, light from the display engine 403 is directed to a birefringent reflective polarizer 405 that transmits TE polarized light (long dashed lines) and reflects the orthogonal polarization state (i.e., TM polarized light (short dashed lines)) to the waveguide stack 409. Accordingly, the TE polarized light passes through the birefringent reflective polarizer 405 and is directed to the input couplers of the waveguide stack. For clarity, the input couplers (as well as the output couplers) are not shown in FIG. 9. The TE-polarized light passes through the waveguide stack 409 and the image is coupled into the waveguides for the first time. After exiting the waveguide stack 409 the light is received by an achromatic wide-angle quarter-wave plate 411 at a 45° orientation (referred to hereinafter as quarter-wave plate 411) that is located behind the waveguide stack 409 (i.e., on the side of waveguide stack 409 opposite to the display engine 403). As those of ordinary skill in the art will recognize, the quarter-wave plate 411 converts linearly polarized light (TE polarized light in this particular case) to left-handedly or right-handedly circularly polarized light. The circularly polarized light (dotted-dashed lines) exits the 45° quarter-wave plate 411 and is directed to a mirror 413 such as a metallic mirror employing, for instance, an Al/Ag coating.

Next, the circularly polarized light is reflected from the mirror 413 and in the process reverses its phase. In this way the mirror changes the handedness of the circularly polarized light (i.e., from left-handedly circularly polarized light to right-handedly circularly polarized light or vice versa). The light with the reversed handedness (dotted lines) then passes through the quarter-wave plate 411 a second time and is converted to TM-polarized light. The TM polarized light from the quarter-wave plate 411 is coupled back into the waveguide stack 409 from the backside direction and the image is coupled into the waveguide for a second time. The remaining TM polarized light passes through the waveguide stack 409 for the second time, reflected from the birefringent reflective polarizer 405 and passes through the waveguide stack 409 for a third time.

After traversing the waveguide stack 409 for the third time, the TM-polarized light is converted to circularly polarized light by the quarter-wave plate 411, reflected by the mirror 413, which again reverses the handedness of the circularly polarized light, and is converted to TE-polarized light by the quarter-wave plate 411. The TE polarized light from the quarter-wave plate 411 is coupled back into the waveguide stack 409 from the backside direction and traverses the waveguide stack 409 for the fourth time. Any remaining TE polarized light that is not coupled into the waveguides of the waveguide stack 409 passes through the birefringent reflective polarizer 405 where it goes back toward the display engine 403. In some cases a filter, absorber or like may be provided to prevent the remaining light from reaching the display engine 403 or from being scattered from other components of the system.

While in the example shown in FIG. 9 the light from display engine 403 is linearly polarized in the TE-polarized state, in an alternative embodiment the light may be linearly polarized in the TM-polarized state. In this case the polarized mirror 405 is configured to transmit TM polarized light and reflect TE polarized light. Likewise, in yet other embodiments instead of being in a linearly polarized state, the light from the display engine may be in a circularly polarized state with suitable adjustments to the configuration of the birefringent reflective polarizer 405 and the quarter-wave plate 411. In this case, for instance, it may be advantageous to locate the quarter-wave plate 411 before the birefringent reflective polarizer 405 to convert the light to linear polarization.

In some embodiments instead of employing a single input coupler for each waveguide plate, multiple input couplers may be employed. In this case each individual input coupler can be optimized for the particular wavelengths that are to be in-coupled to the waveguide plates by that input coupler with which they are associated. In this case the sets of mirrors and quarter-wave plates associated with each input coupler can be optimized for those wavelengths. That is, by way of example, an input coupler that is optimized to in-couple red light may be associated with red-optimized quarter-wave plates and mirrors, an input coupler that is optimized to in-couple green light may be associated with green-optimized quarter-wave plates and mirrors, and so on.

In the example shown in FIG. 9 the quarter-wave plate 411 is located at the backend of the waveguide stack 409 so that it receives the light after traversing the waveguide stack. In this way the light in-coupled to the waveguide plates is linearly polarized. In an alternative embodiment, illustrated in FIG. 10 and described below, the quarter-wave plate may be located before the light is received by the waveguide stack 409 so that the light in-coupled to the waveguide plates is circularly polarized. In yet another embodiment, the quarter-wave plate may be located between the waveguide plates.

In one embodiment, the mirror 413 behind the waveguide plates 409 could be replaced or combined with one or more reflective dielectric filters. Alternatively, or additionally, one more reflective dielectric filters may be situated at different positions between the waveguides plates in the waveguide stack. For example, a reflective dielectric filter may be provided behind the waveguide plate that supports blue wavelengths, which reflects blue wavelengths but transmits green and red wavelengths to the green and red supporting waveguide plates behind the blue plate. Similarly, in another embodiment a reflective dielectric filter may be provided behind the waveguide plates that respectively support blue and green wavelengths, which reflects blue and green wavelengths but transmits red wavelengths to the red supporting waveguide plate behind the green and blue supporting waveguide plates. In this latter embodiment a red reflecting dielectric filter or a broadband metallic mirror (e.g., and Ad/Ag mirror) may be located behind the red supporting waveguide plate. In these cases it may be advantageous to locate the quarter-wave plate so that it receives the light before the light enters the waveguide stack.

In some embodiments a Faraday rotator could be used instead of the quarter-wave plate to achieve the polarization conversion. In this case, the linear polarization angle of the light is changed without conversion to circular polarization.

In another embodiment, the waveguide input couplers are configured to alter the polarization state of the light traversing through the waveguide stack and the quarter-wave plate may be replaced with a wave plate component providing not exactly quarter-wave phase difference, but some other optimal phase difference so that the overall in-coupling of light into the waveguide is maximized, taking account the polarization effect of the waveguides.

In yet another embodiment, the waveguide stack could be arranged to replace the function of the quarter-wave plate and provide the same required quarter-wave phase shift at the important wavelengths to reverse the polarization of the light traveling through it back and forth.

In yet another embodiment, the incoming polarization state is selected to be a linear polarization state that is oriented at an optimized angle relative to the gratings of the input couplers and the reflective polarizer angle is selected accordingly so that the selected polarization angle is fully transmitted and the linear polarization state at 90 degrees angle to that polarization angle is fully reflected. In this case, the angle of wave plate can be optimized according to the polarization state of the in-coupled light.

FIG. 10 shows an embodiment of the EPE 407 in which the quarter-wave plate may be located before the light is received by the waveguide stack so that the light in-coupled to the waveguide plates is circularly polarized. Similar to the embodiment of FIG. 9, the light from the display engine 403 is assumed to be in a first polarization state, which for purposes of illustration is assumed to be a linearly polarized state (i.e., a TE polarization state) in this example. As shown, light from the display engine 403 is directed to the birefringent reflective polarizer 405, which transmits TE polarized light (long dashed lines) and reflects the orthogonal polarization state (i.e., TM polarized light (short dashed lines) to the quarter-wave plate 411. The quarter-wave plate 411 converts the TE-polarized light to left-handedly or right-handedly circularly polarized light. The light passes through the waveguide stack 409 and the image is coupled in for the first time. The circularly polarized light is reflected from the mirror 413 behind the waveguide stack 409 and its handedness is reversed. The light is then coupled into the waveguide stack 409 from the backside direction and passes through the waveguide stack 409 a second time, with some of it being in-coupled to the waveguide plates. Any remaining light, which is circularly polarized with reversed handedness, passes through the quarter-wave plate 411, which converts it to TM polarized light and directs it to the birefringent reflective polarizer 405. The TM-polarized light is reflected from the birefringent reflective polarizer 405 and passes through the quarter-wave plate 411 again, which converts it to circularly polarized light with the reversed handedness and directs it into the waveguide stack 409 for a third time. Any light that traverses the waveguide stack 409 for the third time is reflected back into the waveguide stack 409 by the mirror 413 with its handedness reversed so that the light has its original handedness. The light then passes into the waveguide stack 409 for a fourth time and any light that traverses the waveguide stack 409 without being in-coupled to one of the waveguide plates is converted back to TE polarized light by the quarter-wave plate 411. Any remaining light passes through the birefringent reflective polarizer 405 and is directed toward the display engine 403. As discussed in connection with the embodiment of FIG. 9, a filter, absorber or the like may be provided to prevent the light from reaching the display engine 403 or from being scattered from other components of the system.

It should be noted that while the embodiments shown in FIGS. 9 and 10 show a waveguide stack with two or more waveguide plates, in some alternative embodiments only a single waveguide plate may be employed.

Embodiments of the waveguide display described above may be utilized in mixed-reality or virtual-reality applications. FIG. 11 shows one particular illustrative example of a mixed-reality or virtual-reality HMD device 3100, and FIG. 12 shows a functional block diagram of the device 3100. HMD device 3100 comprises one or more waveguide displays 3102 that form a part of a see-through display subsystem 3104, so that images may be displayed. HMD device 3100 further comprises one or more outward-facing image sensors 3106 configured to acquire images of a background scene and/or physical environment being viewed by a user, and may include one or more microphones 3108 configured to detect sounds, such as voice commands from a user. Outward-facing image sensors 3106 may include one or more depth sensors and/or one or more two-dimensional image sensors. In alternative arrangements, as noted above, a mixed reality or virtual reality display system, instead of incorporating a see-through display subsystem, may display mixed reality or virtual reality images through a viewfinder mode for an outward-facing image sensor.

The HMD device 3100 may further include a gaze detection subsystem 3110 configured for detecting a direction of gaze of each eye of a user or a direction or location of focus, as described above. Gaze detection subsystem 3110 may be configured to determine gaze directions of each of a user’s eyes in any suitable manner. For example, in the illustrative example shown, a gaze detection subsystem 3110 includes one or more glint sources 3112, such as infrared light sources, that are configured to cause a glint of light to reflect from each eye of a user, and one or more image sensors 3114, such as inward-facing sensors, that are configured to capture an image of each eyeball of the user. Changes in the glints from the user’s eye and/or a location of a user’s pupil, as determined from image data gathered using the image sensor(s) 3114, may be used to determine a direction of gaze.

In addition, a location at which gaze lines projected from the user’s eyes intersect the external display may be used to determine an object at which the user is gazing (e.g. a displayed virtual object and/or real background object). Gaze detection subsystem 3110 may have any suitable number and arrangement of light sources and image sensors. In some implementations, the gaze detection subsystem 3110 may be omitted.

The HMD device 3100 may also include additional sensors. For example, HMD device 3100 may comprise a global positioning system (GPS) subsystem 3116 to allow a location of the HMD device 3100 to be determined. This may help to identify real-world objects, such as buildings, etc. that may be located in the user’s adjoining physical environment.

The HMD device 3100 may further include one or more motion sensors 3118 (e.g., inertial, multi-axis gyroscopic, or acceleration sensors) to detect movement and position/orientation/pose of a user’s head when the user is wearing the system as part of a mixed reality or virtual reality HMD device. Motion data may be used, potentially along with eye-tracking glint data and outward-facing image data, for gaze detection, as well as for image stabilization to help correct for blur in images from the outward-facing image sensor(s) 3106. The use of motion data may allow changes in gaze direction to be tracked even if image data from outward-facing image sensor(s) 3106 cannot be resolved.

In addition, motion sensors 3118, as well as microphone(s) 3108 and gaze detection subsystem 3110, also may be employed as user input devices, such that a user may interact with the HMD device 3100 via gestures of the eye, neck and/or head, as well as via verbal commands in some cases. It may be understood that sensors illustrated in FIGS. 31 and 32 and described in the accompanying text are included for the purpose of example and are not intended to be limiting in any manner, as any other suitable sensors and/or combination of sensors may be utilized to meet the needs of a particular implementation. For example, biometric sensors (e.g., for detecting heart and respiration rates, blood pressure, brain activity, body temperature, etc.) or environmental sensors (e.g., for detecting temperature, humidity, elevation, UV (ultraviolet) light levels, etc.) may be utilized in some implementations.

The HMD device 3100 can further include a controller 3120 such as one or more processors having a logic subsystem 3122 and a data storage subsystem 3124 in communication with the sensors, gaze detection subsystem 3110, display subsystem 3104, and/or other components through a communications subsystem 3126. The communications subsystem 3126 can also facilitate the display system being operated in conjunction with remotely located resources, such as processing, storage, power, data, and services. That is, in some implementations, an HMD device can be operated as part of a system that can distribute resources and capabilities among different components and subsystems.

The storage subsystem 3124 may include instructions stored thereon that are executable by logic subsystem 3122, for example, to receive and interpret inputs from the sensors, to identify location and movements of a user, to identify real objects using surface reconstruction and other techniques, and dim/fade the display based on distance to objects so as to enable the objects to be seen by the user, among other tasks.

The HMD device 3100 is configured with one or more audio transducers 3128 (e.g., speakers, earphones, etc.) so that audio can be utilized as part of a mixed reality or virtual reality experience. A power management subsystem 3130 may include one or more batteries 3132 and/or protection circuit modules (PCMs) and an associated charger interface 3134 and/or remote power interface for supplying power to components in the HMD device 3100.

It may be appreciated that the HMD device 3100 is described for the purpose of example, and thus is not meant to be limiting. It may be further understood that the display device may include additional and/or alternative sensors, cameras, microphones, input devices, output devices, etc. than those shown without departing from the scope of the present arrangement. Additionally, the physical configuration of an HMD device and its various sensors and subcomponents may take a variety of different forms without departing from the scope of the present arrangement.

Various exemplary embodiments of the present display system are now presented by way of illustration and not as an exhaustive list of all embodiments. An example includes a waveguide display, comprising: a waveguide stack that includes at least one waveguide substrate; an input coupler for coupling light into the waveguide substrate, the input coupler being configured to in-couple a first range of wavelengths into the waveguide substrate; and an optical arrangement that includes a birefringent reflective polarizer, a mirror and a polarization state converting element configured to convert light in a linear polarization state to a circular polarization state and to convert light in a circular polarization state to a linear polarization state, the mirror being arranged to receive light from the polarization state converting element and reflect the light back to the polarization state converting element, the optical arrangement causing a transmission path of light that traverses the waveguide stack a first time to be folded back through the waveguide stack such that at least a portion of light not coupled into the at one waveguide substrate is caused to traverse the waveguide stack a plurality of additional times.

In another example the birefringent reflective polarizer is located to direct the light into the waveguide stack so that the light that traverses the waveguide stack the first time is linearly polarized or circularly polarized. In another example the polarization state converting element is an achromatic wide-angle quarter-wave plate at a 45° orientation. In another example the polarization state converting element is a Faraday rotator located to receive light after traversing the waveguide stack the first time. In another example the optical arrangement is configured so that the polarization state converting element receives linearly polarized light and causes linearly polarized light to be in-coupled to the at least one waveguide substrate. In another example the polarization state converting element is located to receive light after the light traverses the waveguide stack the first time. In another example the optical arrangement is configured so that the polarization state converting element receives linearly polarized light and causes circularly polarized light to be in-coupled to the at least one waveguide substrate. In another example the polarization state converting element is located to receive the light before the light traverses the waveguide stack the first time. In another example the at least one waveguide substrate includes at least first and second waveguide substrates and the input coupler includes first and second input couplers for coupling light into the first and second waveguide substrates, respectively, the first input coupler being configured to in-couple a first range of wavelengths into the first waveguide substrate and transmit other wavelengths and the second input coupler being configured to in-couple a second range of wavelengths into the second waveguide substrate and transmit other wavelengths, and further comprising at least first and second output couplers for coupling light out of the first and second waveguide substrates, respectively, the first output coupler being configured to out-couple the first range of wavelengths from the first waveguide substrate and the second output coupler being configured to out-couple the second range of wavelengths from the second waveguide substrate. In another example the first input coupler further comprises a plurality of first input couplers for coupling light in the first waveguide substrate and the second input coupler further comprises a plurality of second input couplers for coupling light in the second waveguide substrate, and wherein the optical arrangement further comprises a plurality of optical arrangements, each of the optical arrangements being associated with one of the plurality of first input couplers or one of the plurality of second input couplers and being tailored for operation at wavelengths to be in-coupled by the input coupler with which it is respectively associated. In another example the optical arrangement includes a dielectric filter disposed along the transmission path of the light for reflecting light of selected wavelengths back to one or more of the at least first and second waveguide plates that in-couple light of the selected wavelengths and to transmit therethrough wavelengths other than the selected wavelengths.

A further example includes a see-through, near eye display system, comprising: an imager for providing an output image; an exit pupil expander (EPE); a display engine for coupling the output image in a first polarization state from the imager into the EPE, the EPE including: a waveguide stack that includes at least first and second waveguide plates, each of the waveguide plates including a substrate having an input coupling diffractive optical element (DOE) for in-coupling image light of a range of wavelengths to the substrate and transmitting other wavelengths of image light and at least one output coupling DOE for out-coupling image light of the range of wavelengths from the substrate, the range of wavelengths of the image light for each of the waveguide plates differing at least in part from each of the other waveguide plates; a birefringent reflective polarizer configured to reflect light in a second polarization state orthogonal to the first polarization state and transmit therethrough to the waveguide stack light in the first polarization state; a polarization state converting element configured to receive light in the first polarization state after traversing the waveguide stack and convert the light in the first polarization state to circularly polarized light; and a reflector for receiving the circularly polarized light and reflecting the circularly polarized light back to the polarization state converting element to thereby convert the reflected circularly polarized light to reflected light in the second linear polarization state, the reflected light in the second linear polarization state being directed from the polarization state converting element back to the waveguide stack such that the reflected light in the second linear polarization state traversing the waveguide stack is reflected back to the waveguide stack by the birefringent reflective polarizer.

In another example the polarization state converting element is a 45° quarter-wave plate. In another example the first linearly polarized state is a TE or TM polarization state. In another example a dielectric filter is disposed along an optical path traversed by the light to reflect light of selected wavelengths back to one or more of the at least first and second waveguide plates that in-couple light of the selected wavelengths and to transmit therethrough wavelengths other than the selected wavelengths. A further example includes a head mounted display comprising: a head mounted retention system for wearing on a head of a user; a visor assembly secured to the head mounted retention system, the visor assembly including; a chassis; a near-eye optical display system secured to the chassis that includes a waveguide display, the waveguide display including: a waveguide stack that includes at least one waveguide substrate; an input coupler for coupling light into the waveguide substrate, the input coupler being configured to in-couple a first range of wavelengths into the waveguide substrate; and an optical arrangement that includes a birefringent reflective polarizer, a mirror and a polarization state converting element configured to convert light in a linear polarization state to a circular polarization state and to convert light in a circular polarization state to a linear polarization state, the mirror being arranged to receive light from the polarization state converting element and reflect the light back to the polarization state converting element, the optical arrangement causing a transmission path of light that traverses the waveguide stack a first time to be folded back through the waveguide stack such that at least a portion of light not coupled into the at one waveguide substrate is caused to traverse the waveguide stack a plurality of additional times.

In another example the birefringent reflective polarizer is located to direct the light into the waveguide stack so that the light that traverses the waveguide stack the first time is linearly polarized or circularly polarized. In another example the polarization state converting element is an achromatic wide-angle quarter-wave plate at a 45° orientation. In another example the polarization state converting element is a Faraday rotator located to receive light after traversing the waveguide stack the first time. In another example the optical arrangement is configured so that the polarization state converting element receives linearly polarized light and causes linearly polarized light to be in-coupled to the at least one waveguide substrate. In another example the input coupler is configured to alter a polarization state of light being in-coupled, the polarization state converting element being a wave plate configured to provide a phase difference that enhances in-coupling of light into the waveguide plate. In another example the waveguide stack is configured to function as the polarization state converting element in the optical arrangement. In another example an incoming polarization state of light received by the waveguide display is a linear polarization state that is oriented at a selected angle relative to an angle of gratings of the input coupler to enhance in-coupling of light into the waveguide plate,

Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.

文章《Microsoft Patent | Near-eye display system having multiple pass in-coupling for waveguide display》首发于Nweon Patent

]]>
Microsoft Patent | Depth sensing via device case https://patent.nweon.com/26117 Thu, 08 Dec 2022 12:42:13 +0000 https://patent.nweon.com/?p=26117 ...

文章《Microsoft Patent | Depth sensing via device case》首发于Nweon Patent

]]>
Patent: Depth sensing via device case

Patent PDF: 加入映维网会员获取

Publication Number: 20220392093

Publication Date: 2022-12-08

Assignee: Microsoft Technology Licensing

Abstract

Examples are disclosed that relate to displaying a hologram via an HMD. One disclosed example provides a method comprising obtaining depth data from a direct-measurement depth sensor included in the case for the HMD, the depth data comprising a depth map of a real-world environment. The method further comprises determining a distance from the HMD to an object in the real-world environment using the depth map, obtaining holographic imagery for display based at least upon the distance, and outputting the holographic imagery for display on the HMD.

Claims

1.On a computing system comprising a head-mounted display device (HMD) and a case for the head-mounted display device, a method for displaying a hologram via the HMD, the method comprising: obtaining depth data from a direct-measurement depth sensor included in the case for the HMD, the depth data comprising a depth map of a real-world environment; determining a distance from the HMD to an object in the real-world environment using the depth map; obtaining holographic imagery for display based at least upon the distance; and outputting the holographic imagery for display on the HMD.

2.The method of claim 1, wherein obtaining the depth data from the direct-measurement depth sensor comprises obtaining the depth data from a time-of-flight sensor on the case.

3.The method of claim 1, wherein determining the distance comprises sending the depth map from the case to the HMD, and determining the distance on the HMD.

4.The method of claim 1, further comprising obtaining a depth image of a user of the HMD via the direct-measurement depth sensor during a holographic communication session, sending the depth image of the user to the HMD during the holographic communication session, and displaying on the HMD a representation of the user based upon the depth image of the user.

5.The method of claim 4, further comprising obtaining texture data representing an appearance of the user of the HMD, and displaying the texture data in the representation of the user.

6.The method of claim 5, further comprising sending the depth image of the user to another HMD participating in the holographic communication session.

7.The method of claim 1, wherein the depth data further comprises a depth image of a plurality of people including a user of the HMD, the method further comprising segmenting each person of the plurality of people.

8.The method of claim 1, wherein the depth data comprises first depth data from a first direct-measurement depth sensor, wherein the depth map comprises a first depth map, and the method further comprising: obtaining second depth data from a second direct-measurement depth sensor, the second depth data comprising a second depth map of the real-world environment; detecting an overlapping region in the first depth map and the second depth map; and using the overlapping region to combine the first depth map and the second depth map.

9.The method of claim 1, wherein the HMD includes a depth imaging system comprising a stereo camera arrangement configured to obtain indirect-measurement depth data, wherein the distance is a first determined distance, and wherein the method further comprises calibrating the depth imaging system by obtaining indirect-measurement depth data for the real-world environment via the depth imaging system of the HMD; determining a second determined distance from the HMD to the object in the real-world environment using the indirect-measurement depth data; comparing the first determined distance and the second determined distance to determine a correction for the indirect-measurement depth data; and applying the correction to subsequently measured indirect-measurement depth data.

10.The method of claim 1, further comprising obtaining a three-dimensional model of a target object by scanning the target object with the direct-measurement depth sensor included in the case from a plurality of angles.

11.A system comprising: a head-mounted display device (HMD) and a case for the head-mounted display device, the HMD comprising a see-through display system, a first communications system, an indirect-measurement depth sensing system, and a first computing system configured to control the display of images via the see-through display system and to control communication with the case via the first communications system; and the case comprising a direct-measurement depth sensor, a second communications system, and a second computing system configured to control the direct-measurement depth sensor to acquire depth data of a real-world environment and to control the second communications system to send depth data to the HMD.

12.The system of claim 11, wherein the direct-measurement depth sensor comprises a time-of-flight sensor.

13.The system of claim 11, wherein the case further comprises a red/green/blue (RGB) intensity image sensor.

14.The system of claim 13, wherein one or more of the first computing system and the second computing system comprises instructions executable to use RGB image data acquired via the RGB intensity image sensor as texture data for corresponding depth data.

15.(canceled)

16.The system of claim 11, wherein one or more of the first computing system and the second computing system comprises instructions executable to calibrate the indirect-measurement depth sensing system using depth data from the direct-measurement depth sensor.

17.The system of claim 11, wherein one or more of the first computing system and the second computing system comprises instructions executable to determine a distance from the HMD to an object in the real-world environment using the depth data of the real-world environment, and to compute a holographic image for display based upon the distance.

18.The system of claim 11, wherein the depth data further comprises a three-dimensional model of an object in the real-world environment.

19.A system comprising: a head-mounted display device (HMD) and a case for the head-mounted display device, the HMD comprising a see-through display system, a first communications system, and a first computing system, the case comprising a direct-measurement depth sensor, a second communications system, and a second computing system, and the system also comprising a microphone located on the case or the HMD, wherein the system is configured to conduct a holographic communication session by acquiring a depth image of a user of the HMD via the direct-measurement depth sensor on the case, acquiring first acoustic data capturing a voice of the user of the HMD via the microphone, sending the depth image of the user of the HMD and the first acoustic data to another computing device, receiving second acoustic data and image data from the other computing device, and presenting the second acoustic data and image data received from the other computing device.

20.The system of claim 19, wherein the case further comprises a red/green/blue (RGB) image sensor, and wherein the system is further configured to obtain texture data representing an appearance of the user of the HMD via the RGB image sensor, and to send the texture data to the other computing device.

21.The system of claim 11, wherein the case is operatively configured to obtain a depth image of a user of the HMD via the direct-measurement depth sensor during a holographic communication session.

Description

BACKGROUND

An augmented reality head-mounted display device (HMD) may use depth information to display holograms with respect to a real-world environment. For example, an augmented reality HMD may use an on-board depth imaging system to sense distances to objects in the real-world environment. The determined distances then may be used to compute holographic imagery for display via a see-through display device of the HMD.

SUMMARY

Examples are disclosed that relate to displaying a hologram via an HMD. One disclosed example provides a method comprising obtaining depth data from a direct-measurement depth sensor included in the case for the HMD, the depth data comprising a depth map of a real-world environment. The method further comprises determining a distance from the HMD to an object in the real-world environment using the depth map, obtaining holographic imagery for display based at least upon the distance, and outputting the holographic imagery for display on the HMD.

This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter. Furthermore, the claimed subject matter is not limited to implementations that solve any or all disadvantages noted in any part of this disclosure.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1A shows a scenario in which an example head-mounted display device (HMD) presents holographic imagery based upon incorrect distance information.

FIG. 1B shows a scenario in which the holographic imagery of FIG. 1A is displayed based upon direct-measurement depth data from an example HMD case comprising a direct-measurement depth sensor.

FIG. 2 shows a schematic diagram of an example system comprising an HMD and a case for the HMD, wherein the case includes a direct-measurement depth sensor.

FIG. 3 shows an example HMD suitable for use as the HMD of FIGS. 1A-1B and 2.

FIG. 4 shows an example HMD case comprising a direct measurement depth sensor.

FIG. 5 shows a flow diagram depicting an example method for displaying a hologram via an HMD using direct-measurement depth data acquired via a case for the HMD.

FIG. 6 shows an example scenario depicting a holographic communication session.

FIG. 7 shows a flow diagram depicting an example method for conducting a holographic communication session using direct-measurement depth data acquired via a case for a HMD.

FIG. 8 shows another example scenario depicting a holographic communications session.

FIG. 9 shows yet another example scenario depicting a holographic communications session.

FIG. 10 shows a scenario in which a case for an HMD is used to scan a physical object to form a digital 3D model of the physical object.

FIG. 11 shows a flow diagram depicting an example method for calibrating a depth imaging system of an HMD using direct-measurement depth data acquired via a case for the HMD.

FIG. 12 shows a schematic diagram of an example computing system.

DETAILED DESCRIPTION

As introduced above, an augmented reality head-mounted display device (HMD) uses depth information to display holograms with respect to a real-world environment. FIG. 1A shows one example of a real-world environment in the form of a room 100 in which a user 102 is wearing an HMD 104. The HMD 104 is used to display a hologram 106.

The HMD 104 can use depth imaging to determine a distance between the device 104 and one or more objects in the environment (e.g., an edge of a table 108), and to generate a depth map of the real-world environment 100. The depth map is used to determine a placement of the hologram, including a position, an orientation, and/or a scale factor for the hologram 106.

Some HMDs may utilize a direct measurement depth sensor, such as a time of flight (ToF) camera or a structured light depth camera, to measure distances to objects in a use environment. However, ToF cameras may be larger than cameras that sense two-dimensional intensity images. Thus, other HMDs having smaller form factors may use a stereo camera arrangement comprising a pair of two-dimensional cameras positioned at spaced-apart locations on the HMD to determine depth by performing triangulation. However, stereoscopic depth measurement is sensitive to changes in the positions and orientations of the stereoscopic cameras. A deviation in the position and/or orientation of one camera with respect to the other camera in the stereo pair may result in erroneous depth measurements. For example, bending or dropping the HMD can change the position and/or orientation of one or both cameras in the stereo pair away from their calibrated positions and/or orientations, which can introduce error in the depth measurement.

A hologram may be improperly displayed based on the erroneous depth measurement. For example, the hologram 106 is computed to be displayed adjacent to a table 108, at location 106′. However, using stereoscopic depth alone, the HMD 104 may not have an accurate sense of where an edge of the table 108 is located, and may erroneously place the hologram at the position of hologram 106, making the hologram appear to be on or extending through the table 108.

Thus, examples are disclosed that relate to the use of a direct measurement depth camera integrated with a case for a wearable device to help address issues as discussed above. Referring to FIG. 1B, the use of direct measurement depth data acquired via a direct measurement depth sensor on a case 110 may allow the hologram to be placed in an intended location, where the HMD and the table are both visible to the direct measurement depth sensor.

FIG. 2 shows a block diagram of an example system 200 comprising an HMD 202 and a case 204 for the HMD. As introduced above, the HMD 202 may include an indirect-measurement depth sensor, such as a stereo camera system 206. As described in more detail below with reference to FIG. 3, the HMD 202 further comprises a see-through display system 208, a first communications system 210, and a first computing system 212. The first computing system 212 is configured to control the display of images via the see-through display system 208 and to control communication with the case 204 via the first communications system 210. In some examples, the case 204 and the HMD 202 communicate directly with one another. In other examples, the case 204 and the HMD 202 are communicatively coupled via a network 214.

The case 204 comprises a direct measurement depth sensor 216, a second communications system 218, and a second computing system 220. The second computing system 220 is configured to control the direct-measurement depth sensor 216 to acquire depth data 222 of a real-world environment and to control the second communications system 218 to send the depth data 222 to the HMD 202. In this manner, the HMD 202 can display a hologram based at least upon the depth data 222 output by the direct-measurement depth sensor 216, without incorporating a direct measurement depth sensor into the HMD itself. In some examples, the case further may comprise a red/green/blue (RGB) image sensor 224. As described in more detail below, the RGB image sensor may be used to acquire image data for use in texture mapping, as illustrated by texture data 226. Such texture mapping data can be used in holographic communications scenarios, and/or other possible scenarios. The HMD 202 and/or the case 204 each also may optionally include a microphone 228, 230. As described in more detail below, the microphone 228 and/or microphone 230 may be configured to capture a user’s voice for use in a communication session.

The system 200 may communicate with a remote computing system 240. For example, the remote computing system 240 may comprise a server (e.g., a cloud-based computing service) configured to generate holographic imagery based upon depth data 222 and/or texture data 226 received from the case 204, among other possible functions. The remote computing system further may facilitate the conduction of holographic communication sessions between users, and/or various other functions.

FIG. 3 illustrates an example HMD device 300. HMD device 300 is an example implementation of HMD device 104 of FIGS. 1A and 1B, and of HMD device 204 of FIG. 2. The HMD device 300 comprises a frame 302, a first camera 304, a second camera 306, a display, and temple pieces 308A, 308B. In this example, the display comprises a first display 310 and a second display 311 supported by the frame 302, wherein each of the first display 310 and the second display 311 takes the form of a waveguide configured to deliver a projected image to a respective eye of a user. The first camera 304 and the second camera 306 are located respectively at left and right sides of the frame 302, wherein each of the first camera and the second camera are located on the frame adjacent to an outer edge of the frame. The first camera 304 and the second camera 306 can be operated as a stereoscopic camera pair to make indirect depth measurements.

Wearable display device 300 further comprises a first display module 312 positioned adjacent to the first camera 304 for displaying a first image of the stereo image and a second display module 328 positioned adjacent to the second camera 306 for displaying a second image of the stereo image. Each display module may comprise any suitable display technology, such as a scanned beam projector, a microLED (light emitting diode) panel, a microOLED (organic light emitting diode) panel, or a LCoS (liquid crystal on silicon) panel, as examples. Further, various optics, such as the above-mentioned waveguides, one or more lenses, prisms, and/or other optical elements may be used to deliver displayed images to a user’s eyes.

In addition to cameras, a wearable display device further may include other types of sensors. For example, wearable display device 300 comprises an inertial measurement unit system (IMU) comprising a first IMU 314 positioned adjacent to the first display module 312 and a second IMU 330 positioned adjacent to the second display module 328. IMU data can be used to adjust a displayed image based upon head motion.

FIG. 4 shows an example case 400 for an HMD. The case 400 is configured to house the HMD (e.g., HMD 300) when it is not in use. As introduced above, the case 400 is also configured to augment the functionality of an HMD. The case 400 includes a direct-measurement depth sensor in the form of a time-of-flight (ToF) camera comprising a ToF image sensor 402 and ToF illuminator 404. While the case of FIG. 4 includes a single depth sensor, in some examples a case may include two or more depth sensors. Further, in some examples, the case may include a direct-depth sensor other than a ToF camera, such as a structured light depth camera. The ToF camera is configured to resolve distance between sensor pixels of the ToF image sensor 402 and a surface by measuring, for each sensor pixel, the round-trip travel time of a light signal (e.g., amplitude-modulated infrared (IR) light) emitted by the ToF emitter 404. In this manner, the case 400 may be used to obtain a direct measurement of depth that is independent of the HMD 104.

Further, case 400 includes an optional RGB intensity image sensor 406, which may be used, for example, to acquire image data for use in texture mapping. In some examples, the ToF camera may include aspects of the RGB image sensor 406. In such an example, the ToF camera can comprise a camera configured to image in both IR and visible light modes, wherein the camera includes executable instructions to operate in an intensity mode for RGB imaging as well as in a ToF mode for depth imaging.

FIG. 5 illustrates a flow diagram depicting an example method 500 for displaying a hologram via an HMD. Method 500 can be performed using any suitable display device, including but not limited to those described herein. In other examples, various steps of method 500 may be omitted or performed in a different order than described, and/or method 500 may include additional and/or alternative steps relative to those illustrated in FIG. 5.

The method 500 includes, at 502, obtaining depth data via a direct-measurement depth sensor included in a case for the HMD, the depth data comprising a depth map of a real-world environment. In some examples, as indicated at 504, the depth data may be obtained via a time-of-flight sensor, while in other examples another suitable depth imaging system can be used, such as a structured light depth sensor. As shown at 506, the depth data includes a depth image capturing the HMD and one or more objects in the real-world environment. For example, referring to FIG. 1B, the case 110 is placed on the table 108 such that the depth sensor 112 has a field of view that encompasses the HMD 104 and a portion of the table 108. As illustrated by example in FIGS. 1B and 4, the case 110 may be shaped such that it can be set on the table 108 or any other suitable surface and provide a suitable field of view of the real-world environment without the use of a stand.

At 510, the method 500 includes using the depth data to determine a distance from the HMD to another object in the real-world environment. In the example of FIG. 1B, depth data from the case 110 may be used to determine a distance from the HMD 104 to a feature of the table 108 (e.g., an edge of the table). In some examples, the determination may be performed on the HMD. As such, at 512, method 500 may comprise sending depth data from the case to the HMD.

Continuing, the method 500 comprises, obtaining holographic imagery for display by the HMD based at least upon the distance determined, as indicated at 514. The method 500 further comprises, at 516, outputting the holographic imagery for display by the HMD. Holographic imagery may be generated on the HMD, or obtained from a device remote from the HMD (e.g., the case, a remote server, a remote peer, or other suitable device).

For example, and with reference again to FIG. 1B, a depth map generated by the direct-measurement depth sensor of the case 110 includes depth values for locations on the surfaces of objects within the room, including the HMD 104. As such, a distance from a selected location on the HMD to an object in the real-world environment can be directly computed from the depth map, thus avoiding any uncertainty with indirect depth measurements acquired using a stereo camera arrangement on the HMD.

As mentioned above, depth images acquired by an HMD case can be used in holographic communications. FIG. 6 shows a scenario in which a first user 600 and a second user 602 are participating in a holographic communication session. The first user 600 is located in a first real-world environment 604 and is wearing a first HMD 606. The second user 602 is located in a second real-world environment 608 and is wearing a second HMD 610.

As described in more detail below with reference to FIG. 7, a first case 612 for the first HMD 606 is positioned to capture depth data comprising a depth image of the first user 600 and, optionally, texture data comprising a visible image of the first user 600, e.g., using an RGB image sensor optionally included on the case. The texture data maps to the depth data, such that the texture data can be applied to the depth data to generate a hologram of the first user 614 for display to the second user 602 via the second HMD 610. Similarly, a second case 616 for the second HMD 610 is configured to capture a depth image and a visible image of the second user 602 to generate a hologram of the second user 618 for display by the first HMD 606.

In other examples, the cases may be configured to capture depth data and not texture data. In such examples, the depth data can be used to control an avatar that is displayed to the other user. Depth and texture data acquired by the first case can also be sent to the first HMD to display visual feedback 620 to the user of the first case during a call. Likewise, depth and texture data acquired by the second case can also be sent to the second HMD to display visual feedback 622 to the user of the second case during the call.

FIG. 7 illustrates a flow diagram depicting an example method 700 for conducting a holographic communication session. Method 700 can be performed using any suitable display device, including but not limited to those described herein. In other examples, various steps of method 700 may be omitted or performed in a different order than described, and/or method 700 may include additional and/or alternative steps relative to those illustrated in FIG. 7.

At 702, the method 700 includes acquiring a depth image of a user of the HMD via a direct-measurement depth sensor on a case for the HMD and acquiring acoustic data capturing a voice of the user of the HMD via a microphone located on the case or the HMD (e.g., the microphone 228 or the microphone 230 of FIG. 2, respectively). To provide visual feedback to a caller during a call, at 703, the method 700 may include sending depth data from the case to the HMD for display by the HMD.

In some examples, the method 700 may include, at 704, obtaining texture data representing an appearance of the user of the HMD via an RGB image sensor included in the case. In some such examples, the method 700 further may include sending the texture data 705 from the case to the HMD to provide visual feedback during a call. In various examples, depth data alone, RGB image data alone, or depth data plus texture mapped RGB data may be sent to the HMD to provide visual feedback. Where depth data alone is sent, the depth data may be presented as an avatar, or used to control an avatar representing the user.

In some use contexts, the depth image and/or visible image may capture a plurality of users. FIG. 8 shows an example of such use context, in which a first user 800 and a second user 802 are wearing a first HMD 804 and a second HMD 806, respectively, to participate in a joint holographic communication session with a third user, represented by hologram 808. The first user 800 and the second user 802 are located in a real-world environment 812, while the third user represented by hologram 808 is at a remote location.

As illustrated by example in FIG. 8, the environment 812 includes one case 820 for both the first user 800 and the second user 802. The case 820 is positioned such that the field of view 822 of a direct-measurement depth sensor on the case 820 includes the first user 800 and the second user 802. As such, referring briefly back to FIG. 7, where the depth image includes one or more people, the method 700 may include, at 706, segmenting each person of the one or more people. In the example of FIG. 8, the first user 800 and the second user 802 are segmented from depth image data captured by the depth sensor of the case 820. In some examples, the first user 800 and the second user 802 may be segmented by fitting a skeletal model to the depth image. In other examples, the depth image may be segmented using a thresholding method. It will also be appreciated that any other suitable image segmentation technique may be used. Segmenting may be used, for example, to allow the two users to be presented separately in holographic imagery, such as shown via visual feedback 830 of users 800 and 802. Such visual feedback may be displayed to first and second users 800 and 802 during a call.

In some scenarios, more than one HMD case comprising a direct measurement depth sensor may be in a use environment. In such an example, as indicated at 708, first and second depth maps received from first and second HMD cases can be combined into a larger field depth map using an overlapping region in the depth images to align the first and second depth maps. FIG. 9 shows a scenario in which a case 900 comprising a direct measurement depth sensor is positioned to capture a depth image of a first user 902 wearing a first HMD 903, and in which a second case 904 is positioned to capture a depth image of the second user 906 wearing a second HMD 907. In the example of FIG. 9, both users are participating in the same holographic communication session with a remote user represented by hologram 908.

A computing device that is facilitating the call (e.g., a cloud-based communication service, as an example) may receive a first depth map (and optionally visual texture data) from the first case 900 and a second depth map (and optionally visual texture data) from the second case 904. Where depth maps obtained from each overlap, the first and second depth maps can be combined using an overlapping region to create a combined depth map. Various types of data received from the HMD devices 903 and/or 907 may be used to determine that both the first user 902 and the second user 906 are in the same environment 920 for the purpose of combining depth images. For example, the HMD devices 903 and/or 907 may have GPS capabilities that can be used to localize the devices. In other examples, the HMD devices 903 and/or 907 may be localized using Wi-Fi or cellular data signals. It will also be appreciated that the locations of the HMD devices may be determined in any other suitable manner.

The combined depth maps can be used to recreate the real-world environment as a 3D model, and/or capture interactions between a plurality of people in the environment by determining their relative positions with respect to each other. In another potential advantage of the present disclosure, a combined depth map may be transmitted to each device participating in a shared augmented reality (AR)/virtual reality (VR) experience, instead of two or more depth maps that are configured to be displayed.

Returning to FIG. 7, at 714, the method 700 includes sending the depth image of the user of the HMD and the acoustic data to another computing device. Texture data also may be optionally sent, as indicated at 716. Referring to FIG. 6, the other device may comprise, for example, a cloud-based computing system (not shown) facilitating the holographic communication session of FIG. 6. In such an example, the cloud-based computing system can provide the HMD 610 of the second user 602 with depth image data and/or texture data of the first user 600 captured by the case 612 of FIG. 6 (e.g., for rendering by case 616 or HMD 610), or can provide holographic video image frames (e.g., stereo image frames) rendered by the cloud-based system based upon the depth data and potentially texture data for presentation by HMD 610.

The method 700 further includes, at 718, receiving acoustic data and image data from the other computing device. In some examples, the image data can include depth image data as indicated at 720 and/or texture data as indicated at 722 capturing a second user at a remote location. The method 700 also comprises presenting the received acoustic data and image data at 724. For example, the depth image data received from the case 616 can be used to render the hologram 618 of the second user 602 that is displayed to the first user 600. As indicated at 726, the received texture data can be presented, for example, by applying the texture data (e.g., a visible appearance of the second user 602) to the depth data to generate the hologram of the second user 618. In other examples, the received image data may comprise video frames rendered by a remote service.

As described above, and as indicated at 728, in some instances texture data may not be available. In such examples such, an avatar that depicts the second user can be generated and displayed to the first user based upon depth data received from the other user. Similarly, an avatar that depicts the first user can be generated and displayed to the second user.

An HMD case comprising a direct-measurement depth sensor also may be used as a convenient scanning device to form three-dimensional scans of a target object to obtain a three-dimensional model of the object. Scanning the object with a depth sensor on a case may provide a more convenient experience than scanning the object using a depth sensor on an HMD. FIG. 10 shows a scenario in which a case 1000 comprising a direct measurement depth sensor is used to generate a 3D model of an object. In this example, the user 1002 is scanning the case 1000 around a potted plant 1004 to obtain depth and/or visual image data that captures the potted plant 1004 from a plurality of angles. In this manner, the user 1002 can build a three-dimensional model of the object without having to manipulate the object itself or undertaking uncomfortable head gyrations to capture the object using the HMD 1006.

In some examples, and as described in more detail below with reference to FIG. 11, depth data obtained using the case can be compared to depth data obtained using the HMD to compensate for any errors in the depth data obtained by the HMD. For example, the HMD 300 of FIG. 3 may undergo a factory calibration process such that the first camera 304 and the second camera 306 may be used to obtain depth information via stereoscopic imaging. However, events such as temperature changes, humidity changes, and shocks can cause one or both of the cameras to go out of calibration. For example, if the HMD 300 is bent, one or both of the cameras may face slightly different directions, and depth information provided by the stereo pair will be different than information from the HMD in its original state. Accordingly, the depth data obtained using a direct-measurement depth sensor on a case for the HMD can be used to perform field calibration or data correction on one or more indirect-measurement depth sensors (e.g., the first camera 304 and the second camera 306).

FIG. 11 illustrates a flow diagram depicting an example method 1100 for calibrating a depth imaging system of an HMD. Method 1100 can be performed using any suitable display device, including but not limited to those described herein. In other examples, various steps of method 1100 may be omitted or performed in a different order than described, and/or method 1100 may include additional and/or alternative steps relative to those illustrated in FIG. 11.

The method 1100 includes, at 1102, obtaining indirect-measurement depth data for a real-world environment. As introduced above, the indirect-measurement depth data is acquired by the depth imaging system of the HMD. For example, the indirect-measurement depth data may comprise depth information obtained from the first camera 304 and the second camera 306 of the HMD 300 of FIG. 3.

The method 1100 further includes, at 1104, obtaining direct-measurement depth data for the real-world environment. As described above, the direct-measurement depth data is acquired by a direct-measurement depth sensor included in a case for the HMD. For example, the direct-measurement depth data may be obtained using an IR time-of-flight depth sensor (e.g., the ToF sensor 112 of FIG. 1). The direct-measurement depth data includes a depth image of both the HMD and another object in the environment. In this manner, the depth data can be used to make a direct measurement of distance from the HMD to the other object.

At 1106, the method 1100 includes determining a first determined distance from the HMD to an object in the real-world environment using the indirect-measurement depth data. At 1108, the method 1100 includes determining a second determined distance from the HMD to the object in the environment using the direct-measurement depth data.

A correction for the indirect measurement depth data is determined based upon comparing the first determined distance and the second determined distance, as indicated at 1110. For example, one or more error measurements (e.g., in the position and orientation of the first camera 304 and the second camera 306) may be determined if there is a discrepancy between a depth map assembled using the direct-measurement depth data and the indirect-measurement depth data while fusing the coordinate systems of the HMD and the case. Accordingly, direct-measurement depth data obtained from the case may be used to determine one or more correction operations to apply the indirect-measurement depth data. For example, the direct-measurement depth data can be used to determine one or more mathematical terms describing how the first camera 304 and the second camera 306 could be repositioned or rotated for the stereoscopic depth information to match the direct-measurement depth data.

The determined correction is applied to real-time indirect-measurement depth data at 1112. In this manner, an independent depth measurement can be used to correct for any errors in the HMD’s depth perception while the HMD is running. In some examples, the correction may be stored as calibration data to apply to subsequent indirect measurements. In this manner, the indirect-measurement depth data may be corrected even if the case is offline.

In some embodiments, the methods and processes described herein may be tied to a computing system of one or more computing devices. In particular, such methods and processes may be implemented as a computer-application program or service, an application-programming interface (API), a library, and/or other computer-program product.

FIG. 12 schematically shows an example of a computing system 1200 that can enact one or more of the devices and methods described above. Computing system 1200 is shown in simplified form. Computing system 1200 may take the form of one or more personal computers, server computers, tablet computers, home-entertainment computers, network computing devices, gaming devices, mobile computing devices, mobile communication devices (e.g., smart phone), and/or other computing devices, and wearable computing devices such as smart wristwatches and head mounted augmented reality devices. In some examples, the computing system 1200 may embody the HMD 104, the case 110, the HMD 202, the case 204, the remote computing system 240, the HMD 300, the HMD 606, the HMD 610, the case 612, the case 616, the HMD 804, the HMD 806, the case 820, the HMD 903, the HMD 907, the case 900, the case 904, the case 1000, and/or the HMD 1006.

The computing system 1200 includes a logic processor 1202 volatile memory 1204, and a non-volatile storage device 1206. The computing system 1200 may optionally include a display subsystem 1208, input subsystem 1210, communication subsystem 1212, and/or other components not shown in FIG. 12.

Logic processor 1202 includes one or more physical devices configured to execute instructions. For example, the logic processor may be configured to execute instructions that are part of one or more applications, programs, routines, libraries, objects, components, data structures, or other logical constructs. Such instructions may be implemented to perform a task, implement a data type, transform the state of one or more components, achieve a technical effect, or otherwise arrive at a desired result.

The logic processor may include one or more physical processors (hardware) configured to execute software instructions. Additionally or alternatively, the logic processor may include one or more hardware logic circuits or firmware devices configured to execute hardware-implemented logic or firmware instructions. Processors of the logic processor 1202 may be single-core or multi-core, and the instructions executed thereon may be configured for sequential, parallel, and/or distributed processing. Individual components of the logic processor optionally may be distributed among two or more separate devices, which may be remotely located and/or configured for coordinated processing. Aspects of the logic processor may be virtualized and executed by remotely accessible, networked computing devices configured in a cloud-computing configuration. In such a case, these virtualized aspects are run on different physical logic processors of various different machines, it will be understood.

Non-volatile storage device 1206 includes one or more physical devices configured to hold instructions executable by the logic processors to implement the methods and processes described herein. When such methods and processes are implemented, the state of non-volatile storage device 1206 may be transformed—e.g., to hold different data.

Non-volatile storage device 1206 may include physical devices that are removable and/or built-in. Non-volatile storage device 1206 may include optical memory (e.g., CD, DVD, HD-DVD, Blu-Ray Disc, etc.), semiconductor memory (e.g., ROM, EPROM, EEPROM, FLASH memory, etc.), and/or magnetic memory (e.g., hard-disk drive, floppy-disk drive, tape drive, MRAM, etc.), or other mass storage device technology. Non-volatile storage device 1206 may include nonvolatile, dynamic, static, read/write, read-only, sequential-access, location-addressable, file-addressable, and/or content-addressable devices. It will be appreciated that non-volatile storage device 1206 is configured to hold instructions even when power is cut to the non-volatile storage device 1206.

Volatile memory 1204 may include physical devices that include random access memory. Volatile memory 1204 is typically utilized by logic processor 1202 to temporarily store information during processing of software instructions. It will be appreciated that volatile memory 1204 typically does not continue to store instructions when power is cut to the volatile memory 1204.

Aspects of logic processor 1202, volatile memory 1204, and non-volatile storage device 1206 may be integrated together into one or more hardware-logic components. Such hardware-logic components may include field-programmable gate arrays (FPGAs), program- and application-specific integrated circuits (PASIC/ASICs), program- and application-specific standard products (PSSP/ASSPs), system-on-a-chip (SOC), and complex programmable logic devices (CPLDs), for example.

The terms “module,” “program,” and “engine” may be used to describe an aspect of computing system 1200 typically implemented in software by a processor to perform a particular function using portions of volatile memory, which function involves transformative processing that specially configures the processor to perform the function. Thus, a module, program, or engine may be instantiated via logic processor 1202 executing instructions held by non-volatile storage device 1206, using portions of volatile memory 1204. It will be understood that different modules, programs, and/or engines may be instantiated from the same application, service, code block, object, library, routine, API, function, etc. Likewise, the same module, program, and/or engine may be instantiated by different applications, services, code blocks, objects, routines, APIs, functions, etc. The terms “module,” “program,” and “engine” may encompass individual or groups of executable files, data files, libraries, drivers, scripts, database records, etc.

When included, display subsystem 1208 may be used to present a visual representation of data held by non-volatile storage device 1206. The visual representation may take the form of a graphical user interface (GUI). As the herein described methods and processes change the data held by the non-volatile storage device, and thus transform the state of the non-volatile storage device, the state of display subsystem 1208 may likewise be transformed to visually represent changes in the underlying data. Display subsystem 1208 may include one or more display devices utilizing virtually any type of technology. Such display devices may be combined with logic processor 1202, volatile memory 1204, and/or non-volatile storage device 1206 in a shared enclosure, or such display devices may be peripheral display devices.

When included, input subsystem 1210 may comprise or interface with one or more user-input devices such as a keyboard, mouse, touch screen, or game controller. In some examples, the input subsystem may comprise or interface with selected natural user input (NUI) componentry. Such componentry may be integrated or peripheral, and the transduction and/or processing of input actions may be handled on- or off-board. Example NUI componentry may include a microphone for speech and/or voice recognition; an infrared, color, stereoscopic, and/or depth camera for machine vision and/or gesture recognition; a head tracker, eye tracker, accelerometer, and/or gyroscope for motion detection and/or intent recognition; as well as electric-field sensing componentry for assessing brain activity; and/or any other suitable sensor.

When included, communication subsystem 1212 may be configured to communicatively couple various computing devices described herein with each other, and with other devices. Communication subsystem 1212 may include wired and/or wireless communication devices compatible with one or more different communication protocols. For example, the communication subsystem may be configured for communication via a wireless telephone network, or a wired or wireless local- or wide-area network. In some examples, the communication subsystem may allow computing system 1200 to send and/or receive messages to and/or from other devices via a network such as the Internet.

Another example provides, on a computing system comprising a head-mounted display device (HMD) and a case for the head-mounted display device, a method for displaying a hologram via the HMD, the method comprising: obtaining depth data from a direct-measurement depth sensor included in the case for the HMD, the depth data comprising a depth map of a real-world environment; determining a distance from the HMD to an object in the real-world environment using the depth map; obtaining holographic imagery for display based at least upon the distance; and outputting the holographic imagery for display on the HMD. Obtaining the depth data from the direct-measurement depth sensor may additionally or alternatively include obtaining the depth data from a time-of-flight sensor on the case. Determining the distance may additionally or alternatively include sending the depth map from the case to the HMD, and determining the distance on the HMD. The method may additionally or alternatively include obtaining a depth image of a user of the HMD via the direct-measurement depth sensor during a holographic communication session, sending the depth image of the user to the HMD during the holographic communication session, and displaying on the HMD a representation of the user based upon the depth image of the user. The method may additionally or alternatively include obtaining texture data representing an appearance of the user of the HMD, and displaying the texture data in the representation of the user. The method may additionally or alternatively include sending the depth image of the user to another HMD participating in the holographic communication session. The depth data may additionally or alternatively include a depth image of a plurality of people including a user of the HMD, and the method may additionally or alternatively include segmenting each person of the plurality of people. The depth data may additionally or alternatively include first depth data from a first direct-measurement depth sensor, the depth map may additionally or alternatively include a first depth map, and the method may additionally or alternatively include: obtaining second depth data from a second direct-measurement depth sensor, the second depth data comprising a second depth map of the real-world environment; detecting an overlapping region in the first depth map and the second depth map; and using the overlapping region to combine the first depth map and the second depth map. The HMD may additionally or alternatively include a depth imaging system comprising a stereo camera arrangement configured to obtain indirect-measurement depth data, the distance may additionally or alternatively include a first distance, and the method may additionally or alternatively include calibrating the depth imaging system by obtaining indirect-measurement depth data for the real-world environment via the depth imaging system of the HMD; determining a second determined distance from the HMD to the object in the real-world environment using the indirect-measurement depth data; comparing the first determined distance and the second determined distance to determine a correction for the indirect-measurement depth data; and applying the correction to subsequently measured indirect-measurement depth data. The method may additionally or alternatively include obtaining a three-dimensional model of a target object by scanning the target object with the direct-measurement depth sensor included in the case from a plurality of angles.

Another example provides a system comprising: a head-mounted display device (HMD) and a case for the head-mounted display device, the HMD comprising a see-through display system, a first communications system, and a first computing system configured to control the display of images via the see-through display system and to control communication with the case via the first communications system; and the case comprising a direct-measurement depth sensor, a second communications system, and a second computing system configured to control the direct-measurement depth sensor to acquire depth data of a real-world environment and to control the second communications system to send depth data to the HMD. The direct-measurement depth sensor may additionally or alternatively include a time-of-flight sensor. The case may additionally or alternatively include a red/green/blue (RGB) intensity image sensor. One or more of the first computing system and the second computing system may additionally or alternatively include instructions executable to use RGB image data acquired via the RGB intensity image sensor as texture data for corresponding depth data. The HMD may additionally or alternatively include an indirect-measurement depth sensing system. One or more of the first computing system and the second computing system may additionally or alternatively include instructions executable to calibrate the indirect-measurement depth sensing system using depth data from the direct-measurement depth sensor. One or more of the first computing system and the second computing system may additionally or alternatively include instructions executable to determine a distance from the HMD to an object in the real-world environment using the depth data of the real-world environment, and to compute a holographic image for display based upon the distance. The depth data may additionally or alternatively include a three-dimensional model of an object in the real-world environment.

Another example provides a system comprising: a head-mounted display device (HMD) and a case for the head-mounted display device, the HMD comprising a see-through display system, a first communications system, and a first computing system, the case comprising a direct-measurement depth sensor, a second communications system, and a second computing system, and the system also comprising a microphone located on the case or the HMD, wherein the system is configured to conduct a holographic communication session by acquiring a depth image of a user of the HMD via the direct-measurement depth sensor on the case, acquiring first acoustic data capturing a voice of the user of the HMD via the microphone, sending the depth image of the user of the HMD and the first acoustic data to another computing device, receiving second acoustic data and image data from the other computing device, and presenting the second acoustic data and image data received from the other computing device. The case may additionally or alternatively include a red/green/blue (RGB) image sensor, and the system may be additionally or alternatively configured to obtain texture data representing an appearance of the user of the HMD via the RGB image sensor, and to send the texture data to the other computing device.

It will be understood that the configurations and/or approaches described herein are exemplary in nature, and that these specific embodiments or examples are not to be considered in a limiting sense, because numerous variations are possible. The specific routines or methods described herein may represent one or more of any number of processing strategies. As such, various acts illustrated and/or described may be performed in the sequence illustrated and/or described, in other sequences, in parallel, or omitted. Likewise, the order of the above-described processes may be changed.

The subject matter of the present disclosure includes all novel and non-obvious combinations and sub-combinations of the various processes, systems and configurations, and other features, functions, acts, and/or properties disclosed herein, as well as any and all equivalents thereof.

文章《Microsoft Patent | Depth sensing via device case》首发于Nweon Patent

]]>
Microsoft Patent | Reinforced differentiable attribute for 3d face reconstruction https://patent.nweon.com/26131 Thu, 08 Dec 2022 12:39:03 +0000 https://patent.nweon.com/?p=26131 ...

文章《Microsoft Patent | Reinforced differentiable attribute for 3d face reconstruction》首发于Nweon Patent

]]>
Patent: Reinforced differentiable attribute for 3d face reconstruction

Patent PDF: 加入映维网会员获取

Publication Number: 20220392166

Publication Date: 2022-12-08

Assignee: Microsoft Technology Licensing

Abstract

Techniques performed by a data processing system for reconstructing a three-dimensional (3D) model of the face of a human subject herein include obtaining source data comprising a two-dimensional (2D) image, three-dimensional (3D) image, or depth information representing a face of a human subject. Reconstructing the 3D model of the face also includes generating a 3D model of the face of the human subject based on the source data by analyzing the source data to produce a coarse 3D model of the face of the human subject, and refining the coarse 3D model through free form deformation to produce a fitted 3D model. The coarse 3D model may be a 3D Morphable Model (3DMM), and the coarse 3D model may be refined through free-form deformation in which the deformation of the mesh is limited by applying an as-rigid-as-possible (ARAP) deformation constraint.

Claims

What is claimed is:

1.A data processing system comprising: a processor; and a computer-readable medium storing executable instructions for causing the processor to perform operations of: analyzing source data comprising a two-dimensional (2D) image, three-dimensional (3D) image, or depth information representing a face of a human subject to produce a coarse 3D model of the face of the human subject; providing the source data to a neural network trained to analyze the source data and output a corrective shape residual, the corrective shape residual modeling a deformation of a mesh of the coarse 3D model for generating a fitted 3D model; obtaining the corrective shape residual from neural network; and applying free-form deformation to the mesh of the coarse 3D model to refine a shape of the mesh according to the corrective shape residual.

2.The data processing system of claim 1, wherein to analyze the source data to produce the coarse 3D model the computer-readable medium includes instructions configured to cause the processor to perform the operation of: producing the coarse 3D model of the face using a 3D Morphable Model (3DMM).

3.The data processing system of claim 1, wherein to deform the mesh according to the corrective shape residual the computer-readable medium includes instructions configured to cause the processor to perform the operation of: limiting the deformation of the mesh by applying an as-rigid-as-possible (ARAP) deformation constraint.

4.The data processing system of claim 1, wherein the computer-readable medium includes executable instructions for causing the processor to perform operations of: rendering the 2D image from the coarse 3D model using a rendering pipeline that utilizes one or more differentiable attributes that can be used to further refine the coarse 3D model.

5.The data processing system of claim 4, wherein the computer-readable medium includes executable instructions for causing the processor to perform operations of: comparing the 2D image to a reference ground-truth image to determine a photometric loss function for further refining the coarse 3D model.

6.The data processing system of claim 5, wherein the one or more differentiable attributes include depth, color, and mask attributes.

7.The data processing system of claim 5, wherein the computer-readable medium includes executable instructions for causing the processor to perform operations of: rendering the 2D image using a soft rasterization process that applies a convolutional kernel to blur the rendered 2D image to propagate attributes across vertices of the mesh.

8.A method performed by a data processing system for generating a model, the method comprising: analyzing source data comprising a two-dimensional (2D) image, three-dimensional (3D) image, or depth information representing a face of a human subject to produce a coarse 3D model of the face of the human subject; providing the source data to a neural network trained to analyze the source data and output a corrective shape residual, the corrective shape residual modeling a deformation of a mesh of the coarse 3D model for generating a fitted 3D model; obtaining the corrective shape residual from the neural network; and applying free-form deformation to the mesh of the coarse 3D model to refine a shape of the mesh according to the corrective shape residual.

9.The method of claim 8, wherein analyzing the 2D image of the face to produce the coarse 3D model of the face of the human subject includes producing the coarse 3D model of the face using a 3D Morphable Model (3DMM).

10.The method of claim 8, wherein deforming the mesh according to the corrective shape residual includes limiting the deformation of the mesh by applying an as-rigid-as-possible (ARAP) deformation constraint.

11.The method of claim 8, further comprising: rendering the 2D image from the coarse 3D model using a rendering pipeline that utilizes one or more differentiable attributes that can be used to further refine the coarse 3D model

12.The method of claim 11, further comprising: comparing the 2D image to a reference ground-truth image to determine a photometric loss function for further refining the coarse 3D model.

13.The method of claim 12, wherein the one or more differentiable attributes include depth, color, and mask attributes.

14.The method of claim 12, further comprising: rendering the 2D image using a soft rasterization process that applies a convolutional kernel to blur the rendered 2D image to propagate attributes across vertices of the mesh.

15.A machine-readable medium storing instructions that, when executed on a processor of a data processing system, cause the data processing system to generate a model, by: obtaining source data comprising a two-dimensional (2D) image, three-dimensional (3D) image, or depth information representing a face of a human subject; analyzing the source data of the face to produce a coarse 3D model of the face of the human subject; providing the source data to a neural network trained to analyze the source data and output a corrective shape residual, the corrective shape residual modeling a deformation of a mesh of the coarse 3D model for generating a fitted 3D model; obtaining the corrective shape residual from the neural network; and applying free-form deformation to the mesh of the coarse 3D model to refine a shape of the mesh according to the corrective shape residual.

16.The machine-readable medium of claim 15, wherein to analyze the 2D image of the face to produce the coarse 3D model, the machine-readable medium includes instructions configured to cause the processor to perform an operation of producing the coarse 3D model of the face using a 3D Morphable Model (3DMM).

17.The machine-readable medium of claim 15, wherein to deform the mesh according to the corrective shape residual the machine-readable medium includes instructions configured to cause the processor to perform an operation of limiting the deformation of the mesh by applying an as-rigid-as-possible (ARAP) deformation constraint.

18.The machine-readable medium of claim 15, wherein the machine-readable medium includes executable instructions for causing the processor to perform operations of: rendering the 2D image from the coarse 3D model using a rendering pipeline that utilizes one or more differentiable attributes that can be used to further refine the coarse 3D model; and

19.The machine-readable medium of claim 18, wherein the machine-readable medium includes executable instructions for causing the processor to perform operations of: comparing the 2D image to a reference ground-truth image to determine a photometric loss function for further refining the coarse 3D model.

20.The machine-readable medium of claim 19, wherein the machine-readable medium includes executable instructions for causing the processor to perform an operation of rendering the 2D image using a soft rasterization process that applies a convolutional kernel to blur the rendered 2D image to propagate attributes across vertices of the mesh.

Description

CROSS-REFERENCE TO RELATED APPLICATION

This application claims the benefit of priority to U.S. Provisional Patent Application No. 63/025,774, filed May 15, 2020 and entitled “Reinforced Differentiable Attribute for 3D Face Reconstruction,” and to U.S. patent application Ser. No. 16/930,161, filed Jul. 15, 2020 and entitled “Reinforced Differentiable Attribute for 3D Face Reconstruction,” the entire disclosures of which are incorporated herein by reference.

BACKGROUND

Three-dimensional (“3D”) face shape reconstruction has become an important research topic in both computer vision and graphics literature. Significant progress has been made in the past decade in areas such as face recognition, face reenactment and visual dubbing, and avatar creation and animation. Despite this progress, face reconstruction is still an ill-posed problem for monocular images due to the depth ambiguity and albedo illumination ambiguity. Various techniques have been developed for reconstructing a 3D reconstruction of the shape of a human face from image data. A key challenge is 3D face shape reconstruction is building a correct dense face correspondence between a deformable mesh and a single input image. Conventional approaches to this problem, such as 3D Morphable Models (“3DMM”), provide solutions for recovering 3D facial shape and texture from a single image of a face. 3DMM attempts to infer 3D face shape and texture as well as scene properties such as pose and illumination through a fitting process. However, given the ill-posed nature of the problem of 3D face reconstruction, 3DMM and other such conventional solutions rely on prior knowledge to reduce depth ambiguity when analyzing the input image. Other techniques such as Differentiable Rendering (“DR”) have also been used to try to solve the problem of 3D face reconstruction. DR attempts to infer 3D geometry, lighting, materials, and other elements of the scene such that a render may realistically reproduce the observed scene using the information inferred from the image of the scene. However, DR typically requires an extensive amount of training data, which renders this approach impractical in many situations. Thus, there is still significant room for improving the correspondence so that projected face shape better aligns with the regions of a face represented in an image.

SUMMARY

An example data processing system according to the disclosure may include a processor and a computer-readable medium storing executable instructions. The executable instructions include instructions configured to cause the processor to perform operations including obtaining source data comprising a two-dimensional (2D) image, three-dimensional (3D) image, or depth information representing a face of a human subject, and generating a 3D model of the face of the human subject based on the source data by analyzing the source data to produce a coarse 3D model of the face of the human subject and refining the coarse 3D model through free form deformation to produce a fitted 3D model.

An example method performed by a data processing system for generating a model includes obtaining source data comprising a two-dimensional (2D) image, three-dimensional (3D) image, or depth information representing a face of a human subject; and generating a 3D model of the face of the human subject based on the source data by: analyzing the source data to produce a coarse 3D model of the face of the human subject; and refining the coarse 3D model through free form deformation to produce a fitted 3D model.

An example memory device according to the disclosure stores instructions that, when executed on a processor of a data processing system, cause the data processing system to generate a model, by: obtaining source data comprising a two-dimensional (2D) image, three-dimensional (3D) image, or depth information representing a face of a human subject; and generating a 3D model of the face of the human subject based on the source data by analyzing the source data of the face to produce a coarse 3D model of the face of the human subject, and refining the coarse 3D model through free form deformation to produce a fitted 3D model.

This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter. Furthermore, the claimed subject matter is not limited to implementations that solve any or all disadvantages noted in any part of this disclosure.

BRIEF DESCRIPTION OF THE DRAWINGS

The drawing figures depict one or more implementations in accord with the present teachings, by way of example only, not by way of limitation. In the figures, like reference numerals refer to the same or similar elements. Furthermore, it should be understood that the drawings are not necessarily to scale.

FIGS. 1A and 1B are block diagrams illustrating an example computing environments in which the techniques disclosed herein may be implemented.

FIG. 2 is a diagram showing a comparison of the ReDA rasterizer and the SoftRas soft rasterizer and a comparison of outputs from both rasterizers.

FIG. 3 is a diagram providing a comparison of 3D face reconstruction results with and with ReDA and a mask, without the mask, and without ReDA.

FIG. 4 is a diagram showing a comparison of 3D face reconstruction results with and without the use of free-form deformation.

FIG. 5 is a diagram showing an example 3D face fitting pipeline.

FIG. 6 is a diagram providing a comparison of 3D face reconstruction using ReDA versus RingNet.

FIG. 7 s a diagram providing a comparison of 3D face reconstruction using ReDA versus Face Model Learning (FML).

FIG. 8 is a flow chart of an example process for 3D face reconstruction.

FIG. 9 is a block diagram illustrating an example software architecture, various portions of which may be used in conjunction with various hardware architectures herein described, which may implement any of the features herein described.

FIG. 10 is a block diagram illustrating components of an example machine configured to read instructions from a machine-readable medium and perform any of the features described herein.

FIG. 11 is a table depicting results of a comparison of the ReDA rasterization to Z-buffer rasterization on a first data set.

FIG. 12 is a table depicting results of a comparison of the ReDA rasterization to Z-buffer rasterization on a second data set.

FIG. 13 is a table depicting results of a comparison of the ReDA rasterization utilizing different numbers of pyramid layers.

DETAILED DESCRIPTION

In the following detailed description, numerous specific details are set forth by way of examples in order to provide a thorough understanding of the relevant teachings. However, it should be apparent that the present teachings may be practiced without such details. In other instances, well known methods, procedures, components, and/or circuitry have been described at a relatively high-level, without detail, in order to avoid unnecessarily obscuring aspects of the present teachings.

Techniques for 3D face shape reconstruction are provided. These techniques provide a novel framework for 3D face reconstruction from a monocular source image referred to herein as “Reinforced Differentiable Attributes” or “ReDA.” ReDA provides a technical solution to the technical problem of depth ambiguity during 3D face reconstruction from a monocular 2D source image, a 3D image, and/or depth data associated with the face of a human subject. ReDA provides a technical solution to reducing such ambiguities by utilizing attributes beyond just the color attributes used by conventional DR approaches, such as depth attributes and the use of a face parsing mask. A technical benefit of ReDA is a projected face shape that better aligns with the silhouette of each face region, such as the eyes, nose, mouth, cheek, of the face of the human subject in the source image and/or depth data.

The technical solution provided by ReDA also includes improvements to the renderer that permit the renderer to be more differentiable through a set of convolution operations with multiscale kernel sizes. The technical solution provided by ReDA also includes a new free-form deformation layer that sits on top of 3DMM to provide both the prior knowledge and out-of-space modeling. Both improvements may be easily integrated into existing 3D face reconstruction pipelines to provide improved 3D face reconstruction from a monocular image.

Another technical benefit provided by ReDA is that ReDA may significantly reduce the processing resources, network resources, and/or memory resources of the computing device(s) used to perform 3D face construction compared to conventional approaches to 3D face reconstruction. Many of these approaches require an extensive amount of training data for the machine learning models used by these approaches which may consume significant amounts of memory and processor resources to train the models. ReDA eliminates the need to obtain, store, and process such extensive amounts of training data to train the machine learning models used therein. Furthermore, ReDA may also significantly reduce processing resources, network resources, and/or memory resources for additional reasons discussed with respect to the example implementations that follow.

FIG. 1A is a diagram of an example computing environment 100, in which aspects of this disclosure may be implemented. The computing environment 100 includes a face data source 105, a 3D face reconstruction module 110, and a 3D face model 115. The face data source 105 may be a monocular camera, a depth camera, or other image capture device associated with the computing device. The face data source 105 may capture a 2D (RGB) image, a 3D (RGB-D) image, and/or depth (D) information associated a face of a human subject for whom a 3D face model representing the geometry of the face of the user represented by the face data obtained from the face data source 105. The depth information may be a point cloud representing a set of data points that represent the geometry of the face of the human subject. The depth information may be a depth map that represents a distance of the surface(s) of the face of the human subject from the camera or other device used to capture the depth map. The depth information may be captured using a camera device that is capable of capturing depth information of a scene. The depth information may be captured using various means, such as a stereoscopic camera, a time-of-flight (ToF)-enabled camera sensor, or other device capable of capturing depth information. In some implementations, the depth information may be captured by an imaging device that is capable of capturing both image and depth data.

The face data source 105 and/or the 3D face reconstruction module 110 may be implemented in one or more computing device(s). The computing device may be a laptop computing device, a personal computing device, a game console, a tablet computing device, a kiosk or point of sale device, a mobile phone or smartphone, a wearable computing device, or other computing device that may implement the 3D face reconstruction techniques disclosed herein. In some implementations, the face data source 105 may be separate from the computing device that implements the 3D face reconstruction module 110, and the face data source 105 may provide a 2D image, 3D image, and/or depth map to the computing device via a wired or wireless connection with the computing device. For example, the face data source 105 may be configured to communicate with the computing device via a Bluetooth connection or via a Universal Serial Bus (USB) connection. Other types of wired and/or wireless connections may be used in addition to or instead of these examples.

The 3D face reconstruction module 110 may be configured to receive a 2D image of a face of a human subject and to generate a 3D model of the face of the human subject based on the 2D image input. The 3D face reconstruction module 110 may be configured to implement ReDA for generating a 3D model of a face from a 2D image of the face disclosed herein. The 3D face reconstruction module 110 may implement at least a part of the 3D face fitting pipeline 500 shown in FIG. 5. The 3D face reconstruction module 110 may output a 3D face model 115 of the face included in the 2D input image.

The 3D face model 115 includes geometric data for a 3D representation of the face of the human subject depicted in the 2D image. The geometric data may represent the shape of the face of the human subject using a polygon mesh. The polygon mesh may define a polyhedral object representing the shape of the face of the human subject. The polygon mesh includes a collection of vertices and edges that connect the vertices. Multiple edges of the mesh are connected to form polygonal faces. The faces may define triangles, quadrilaterals, or other simple convex polygons. The example implementations described herein use triangular faces, but other convex polygonal shapes may also be used.

The 3D face model 115 may be utilized by application on the computing device. The application may be but is not limited to a video game, a 3D modeling application, rendering software for rendering images and/or video of a scene that includes a representation of the human subject whose 2D image was captured, an augmented reality or mixed reality application, a communications platform offering video chat and/or other types of messaging, volumetric capture or holographic capture software, and/or another application in which the 3D model obtained from the 2D image may be utilized. In other implementations, the 3D face model 115 may also be provided to a remote computing device or cloud-based service for use therein as will be described with respect to FIG. 1B.

FIG. 1B is a diagram of another example computing environment 195, in which aspects of this disclosure may be implemented. The computing environment 195 includes a client device 120, a network 125, cloud-based application services 160, and 3D face reconstruction services 170.

The client device 120 may include an image and/or depth information capture device 145, an image and depth information datastore 140, a model datastore 155, a 3D face reconstruction module 150, a native application 130, and a browser application 135. The client device 120 may be a laptop computing device, a personal computing device, a game console, a tablet computing device, a kiosk or point of sale device, a mobile phone or smartphone, a wearable computing device, or other computing device that may implement at least a portion of or use the 3D face reconstruction techniques disclosed herein.

The client device 120 may include an image and/or depth information capture device 145 configured to capture a 2D (RGB) image, a 3D (RGB-D) image, and/or depth (D) information associated a face of a human subject for whom a 3D model of their face is to be generated. The image and/or depth information capture device 145 may be a camera and/or cameras built into the client device 120 or may be a camera or cameras connected with the client device 120 via a wired or wireless connection. The image and/or depth information capture device 145 may be configured to capture the 2D (RGB) image, the 3D (RGB-D) image, and/or depth (D) information using an image sensor or sensors and to output the 2D (RGB) image, a 3D (RGB-D) image, and/or a depth (D) information. The image and/or depth information capture device 145 may be configured to capture video content using the image sensor and to output the video content, and a 2D or 3D image of a human subject for whom a 3D face model is to be generated may be extracted from one or more frames of the video content.

The image and/or depth information capture device 145 may output images, depth information, and/or video captured by the image and/or depth information capture device 145 to the image and depth information datastore 140. The image and depth information datastore 140 may be a persist memory of the client device 120 configured to maintain the contents of the memory even if the client device 120 is powered down and/or rebooted. In some implementations, the contents of the image and depth information datastore 140 may be organized as a set of files a by other components of the client device 120. In some implementations, the image and depth information datastore 140 may be implemented as a relational database or other such data store in which the image and/or video data stored there is organized and may be searched by one or more components of the client device 120. Furthermore, the client device 120 may include an image capture application (not shown) that may be used to control the image and/or depth information capture device 145 to capture image and/or video content using the image and/or depth information capture device 145.

The 3D face reconstruction module 150 may be implemented on some client devices 120. 3D face reconstruction module 150 may operate similarly to the 3D face reconstruction module 110 described above with respect to FIG. 1A to implement ReDA. The 3D face reconstruction module 150 may receive a 2D image, a 3D image, and/or depth information associated with a human subject and generate a 3D face model of a subject included in the 2D image using ReDA. The 3D face reconstruction module 150 may store the 3D model in the model datastore 155. The 3D face reconstruction module 150 may include a user interface configured to display a representation of the 3D face model on a display of the client device 120.

The model datastore 155 may be a persist memory of the client device 120 configured to maintain the contents of the memory even if the client device 120 is powered down and/or rebooted. In some implementations, the contents of the model datastore 155 may be organized as a set of files accessible by other components of the client device 120. For example, the 3D models may be organized by the name and/or other identifier associated with the person is the subject of the 3D model. In some implementations, the model datastore 155 may be implemented as a relational database or other such data store in which the model data stored there is organized and may be searched by one or more components of the client device 120.

The 3D face reconstruction services 170 is a cloud-based service accessible via a network 125. The network 125 may comprise one or more public or private networks and may be implemented by the Internet. The 3D face reconstruction services 170 may implement ReDA for generating a 3D face model of a human subject in a 2D image, a 3D image, and/or depth information. A client device, such as the client device 120, may send a request for a 3D face model to the 3D face reconstruction services 170. The request may include a 2D image, a 3D image, and/or depth information of a subject for whom the 3D face model is being requested. The 3D face reconstruction services 170 may generate the requested 3D face model and send the 3D face model to the client device 120 responsive to the request. One or more applications or components of the client device may generate and send the requests for a 3D face model to the 3D face reconstruction services 170, including but not limited to the native application 130, the 3D face reconstruction module 150, and/or the browser application 135. In some implementations, the 3D face reconstruction module 150 of the client device 120 may rely on the 3D face reconstruction services 170 to perform the processing on the 2D image to generate the 3D face model. In such implementations, 3D face reconstruction module 150 may provide an interface for receiving the 2D image, the 3D image, and/or the depth information, for sending the 2D image, the 3D image, and/or the depth information to the 3D face reconstruction services 170, and for receiving the 3D face model from the 3D face reconstruction services 170.

The client device 120 may include a native application 130 developed for use on the client device 120. The native application 130 may be configured for use with an operating system and/or the specific hardware of the client device 120. The native application 130 may be a video game, a 3D modeling application, rendering software for rendering images and/or video of a scene that includes a representation of the human subject whose representation was captured in a 2D image, a 3D image, and/or depth information, an augmented reality or mixed reality application, a communications platform offering video chat and/or other types of messaging, volumetric capture or holographic capture software, and/or another application in which the 3D model may be utilized. In some implementations, the native application 130 may include the functionality of the 3D face reconstruction module 150. In other implementations, the functionality of the 3D reconstruction module 150 may be implemented as a separate application on the client device 120 and/or may be implemented by an operating system of the client device 120. The native application 130 may provide a 2D image, a 3D image, and/or depth information associated with a subject for which a 3D model is desired and the 3D face reconstruction module 150 may output the 3D face model of the subject. The native application 130 may utilize the 3D face model of the user with one or more other models of the subject included in the 2D image to create a full-body model of the subject. The native application 130 may combine the 3D face model with models of other people and/or objects of a scene to be rendered by the native application 130 or into a larger model. The larger model may be rendered by the native application 130 or by another application on the client device or on another computing device.

The cloud-based application services 160 may implement a cloud-based application, such as a video game, a 3D modeling application, rendering software for rendering images and/or video of a scene that includes a representation of the human subject whose representation was captured in a 2D image, a 3D image, and/or depth information, an augmented reality or mixed reality application, an augmented reality or mixed reality application, a communications platform offering video chat and/or other types of messaging, volumetric capture or holographic capture software, and/or another application in which the 3D face model may be utilized. The cloud-based application services 160 may provide software-as-a-service (SaaS) that is accessible over the network 125 from the client device 120. The cloud-based application services 160 may be accessed from a web browser, such as the browser application 135, in some implementations. In other implementations, the cloud-based application services 160 may be accessible via a native application, such as the native application 130, which may be configured to implement a web browser and/or to utilize content provided by the cloud-based application services 160. In some implementations, the cloud-based application services 160 may receive a 2D image, a 3D image, and/or depth information representing a human subject from the client device 120, send the 2D image, the 3D image, and/or the depth information to the 3D face reconstruction services 170, and receive the 3D face model from the 3D face reconstruction services 170 in response to the request. The cloud-based application services 160 may use the 2D image, the 3D image, and/or the depth information and/or the 3D face model of the subject when providing services to a user of the client device 120 and/or other client devices (not shown). Furthermore, the cloud-based application services 160 may utilize the 3D face model of the user with one or more other models of the subject to create a model of the subject and/or models of other people and/or objects of a scene to be rendered by the cloud-based application services 160.

The examples that follow describe various aspects of ReDA. A comparison of the technical benefits of ReDA over conventional 3D face model reconstruction approaches is discussed first. Example implementations ReDA follow the discussion of the benefits of ReDA over conventional approaches to 3D face model reconstruction.

Research into 3D face reconstruction may be divided into separate groups based on the input modality (e.g., RGB inputs which include 2D color information or RGB-D inputs which include depth information in addition to the color information), single view or multi-view, optimization-based or learning-based, the face models used, and different constraints being used. Deep learning-based 3D reconstruction approaches have also been developed that either target only for geometry or for both geometry and texture for monocular input. Most of these conventional approaches attempt to boost the reconstruction accuracy by through the addition of prior knowledge, such as by using a parametric face model, or by adding more constraints, such as sparse landmark loss, perception loss, or photometric loss. ReDA follows the latter approach by adding more constraints by adding more discriminating constraints to reduce ambiguities. ReDA utilizes discriminating constraints that go beyond the color constraint used by conventional approaches to 3D face reconstruction such as 3DMM to provide significant improvements in 3D face reconstruction. Implementations of ReDA may utilize depth constraints and a face parsing mask to provide significant improvements in the resulting 3D face model. Other constraints may be used in addition to and/or instead of one or more of these additional constraints to further improve the resulting 3D face model.

Differential rendering or “DR” as it is referred to herein is an example of one conventional approach that attempts to boost reconstruction accuracy through prior knowledge. DR is a type of reverse rendering of the 3D model from a 2D image and has become widely used in deep learning systems used for face reconstruction. One conventional approach to applying DR to 3D face reconstruction trains a progressive generative adversarial network (GAN) to learn the highly nonlinear texture representation of the face as opposed to using the traditional linear principal component analysis (“PCA”) model. This approach may provide high quality results but is impractical in many situations. The GAN requires an extensive amount of training data to properly train the machine learning models used by GAN. For example, a typical implementation may require tens of thousands of high-quality face texture scans to be used as training data for the GAN. Acquiring such an extensive amount of training data is difficult and impractical. In contrast, ReDA relies on additional constraints rather than prior knowledge to avoid the need to obtain such an extensive set of training data.

Many conventional DR implementations also have another significant limitation, which is that these implementations use Z-buffer rasterization which is not truly differentiable. This shortcoming arises because each pixel will be only influenced by the three discrete vertices of its enclosing triangle. An attempt to address this shortcoming of DR with a Soft Rasterizer (“SoftRas”) which is fully differentiable. However, while SoftRas has shown impressive results for some 3D objects, SoftRas is not designed for 3D face reconstruction. SoftRas also exhibits several shortcomings that impact the resulting 3D model including: (1) SoftRas uses a single constraint of color, (2) SoftRas uses triangles to perform aggregation across mesh slices, and (3) SoftRas operates on vertex color. ReDA overcomes each of these shortcomings of SoftRas by: (1) operating on additional constraints such as depth and a face parsing mask, (2) using multi-scale convolution operations to perform aggregation across mesh slices, and (3) operating on UV coordinates rather than vertex color. The implementation details associated with each of these improvements is discussed in the examples that follow.

Semantic Face Segmentation is another conventional approach to 3D face reconstruction. One approach to Semantic Face Segmentation proposes a real-time facial segmentation model which masks out occluded facial regions before sending the masked data to a Displaced Dynamic Expression (DDE) tracking model for processing. Another conventional approach to Semantic Face Segmentation leverages a face segmentation model to exclude areas of the face occluded by glasses, hair, and/or the person’s hand or hands so that these elements do not contribute to the optimization process. Yet another conventional approach uses segmentation information to assign heuristically defined weights to different facial regions in the reconstruction loss function used in that approach. However, none of the conventional approaches have directly leveraged a face parsing mask to build the dense correspondence and to improve the reconstruction as in ReDA. Details of the usage of the face parsing mask will be described in greater detail in the examples that follow.

Dense Face Correspondence (“DFC”) is another conventional technique for obtaining explicit dense correspondence by directly regressing the per-pixel UV position (or equivalent flow). However, the per-pixel ground truth UV in DFC was obtained through 3DMM fitting, which limits the expressiveness space due to the limits of 3DMM capacity. Hence, any dense correspondence regression model trained through such supervised learning would also be limited. ReDA overcomes this capacity limit by adding a free-form deformation layer that can support out-of-space modeling.

A goal of 3D face reconstruction is to build dense correspondence between the 3D face model and the geometry of the face of the human subject included in a 2D image. Many face reconstruction techniques use a deformable mesh to represent the facial features. A significant challenge in 3D face reconstruction is building a dense correspondence between the 2D input image that includes the face and the 3D deformable mesh representing the face in the input image. Conventional approaches to 3D face reconciliation include both implicit and explicit approaches for building dense correspondence. One commonly used implicit approach is the “Analysis-by-Synthesis” approach. The “Analysis-by-Synthesis” approach attempts to minimize the visual differences between an input 2D image and 2D synthesis of an estimated 3D face through a simplified image formulation model. A commonly used explicit approach is to learn the dense correspondence first by directly regressing the per-pixel UV position (or equivalent flow) and fitting the 3D face model afterwards. This explicit approach to 3D face reconciliation uses 3DMM fitting to obtain the ground-truth. The regression model must then be trained through supervised learning. While this approach can provide more accurate 3D reconstruction, training the model through supervised learning may not be practical.

ReDA addresses several fundamental technical problems that have not been addressed by the conventional approaches to 3D face reconstruction discussed above. A first fundamental technical problem overcome by ReDA is that the capacity of the 3DMM significantly limits the representation power to support diverse geometry variations. Some approaches to 3D face reconstruction propose directly learning dense correspondence through UV mapping and claim to be model-free. However, the ground truth space of these approaches is still limited by the capacity of 3DMM. Recently, attempts have been made to represent the geometry in a free-form manner, but ReDA provides better correspondence between the projected face shape and the regions of the face represented in the 2D image by using additional discriminating constraints as discussed in the examples that follow. A second fundamental technical problem solved by ReDA is that the differentiable render used in “Analysis-by-Synthesis” paradigm is not truly “differentiable.” Most of the conventional techniques simply use Z-buffer rendering, which is not necessarily differentiable where the nearest vertex indices are changing for each pixel during the optimization. A third fundamental technical problem solved by ReDA is that the expressiveness of the pretrained texture models used by some conventional approaches to 3D face reconstruction were a limiting factor on the correspondence between the projected face shape and the regions of the face represented in the 2D image. If the texture used is overly smooth, the texture will not be useful as a discriminating constraint to drive optimization and correct the correspondence between the projected face shape and the regions of the face represented in the source 2D image of the human subject. For at least these reasons ReDA may significantly improve correspondence between the projected face shape and the regions of the face represented in the source 2D image of the human subject.

ReDA may implement a face fitting pipeline based on the standard “Analysis-by-Synthesis” pipeline, such that for a given input image, the pipeline outputs the parameters of a 3D face model such that a 2D projection of that model matches the input image. The pipeline may be optimized by: (1) replacing differential rendering with Reinforced Differentiable Attribute (ReDA) rendering, and (2) introducing a free-form deformation layer that expands the modeling capacity for better geometry representation. FIG. 5 illustrates an example face fitting pipeline 500 according to these techniques. The elements of the face fitting pipeline 500 will be discussed in greater detail in the examples that follow.

ReDA may also determine photometric loss and 2D landmark loss on a rendered color image generated by the ReDA rasterizer used by the face fitting pipeline 500. The photometric loss and the 2D landmark loss may be used to refine a machine learning model used by the face fitting pipeline 500 to analyze the 2D input images of human subjects. The photometric loss may be determined by measuring the differences between the 2D input image and the 2D projection of the 3D face model, and the 2D landmark loss may be determined by measuring differences between facial landmarks in the input image and the 2D projection. ReDA focuses on obtaining a better face shape using these constraints. A parametric model may be used to represent the base mesh of the face, which may provide a coarse 3D representation of the facial features. A freeform deformation layer may then optimize the per-vertex displacement of the 3D face model after optimizing the parameters of the pretrained face model. To avoid nonsensible displacements, as-rigid-as-possible constraints are added to regularize the deformation between base mesh and the final mesh after adding the displacement during the training. The ReDA module itself includes: (1) a convolution-based soft rasterizer that supports error propagation from one pixel to every vertex (see FIG. 5), and a pipeline that aggregates multiple attributes as constraints to drive the optimization.

ReDA: Reinforced Differentiable Attribute

The examples that follow illustrate various implementation details of ReDA. ReDA provides an optimization framework that steers the mesh deformation toward the correct shape until the final correspondence between the sources image and the 2D projection of the 3D model is achieved. The optimization framework is based on the “Analysis-by-Synthesis” pipeline. Furthermore, ReDA extends the differentiable attributes beyond the color attribute relied upon by conventional differential rendering techniques to include depth and/or face parsing mask attributes. Unless otherwise specified, in the examples that follow, the term is used to represent the differentiable attributes, including color (), mask (), and depth (), respectively. The color, mask, and depth attributes may be used together or in subcombinations of these differentiable attributes. may be augmented with additional attributes instead of or in addition to one or more of these differentiable attributes.

ReDA may extend the differentiable attributes to include a face parsing mask in the differentiable procedure and using the face parsing mask to drive the correspondence learning. The following examples illustrates how the ReDA can be applied to an input image. For an input image I, the term (I) represents the face parsing output of ReDA and the term (I) represents the face parsing mask ground truth. The ground truth may be obtained by either human annotation or a well-trained face parsing model. The term UV represents the mask UV map for the mesh template and which defines the semantic label (i.e., eyebrow, upper lip, or other region of the face) of each vertex of the face parsing mask. When color is used as the differentiable attribute, represented by the term , a corresponding texture UV map UV also be provided. In the follow example, a cylindrical unwarp function is used to map a triangle vertex p into the corresponding position in the UV map, where UV(p)=(p). For any surface point Vs on the surface of the shape S, the UV coordinates can be determined using the equation:

UV(VS)=(u,v)=Σp∈tλp(p) (1)

where t={pa, pb, pc} which represents the three vertices of the triangle encloses the point VS and λp represents the barycentric coordinates of the vertex p. Where is used, the mask attribute value (p) for vertex VS is computed via bi-linear sampling as:

𝒜S(VS)=u{u,u}v{v,v}(1“\[LeftBracketingBar]”uu“\[RightBracketingBar]”)(1“\[LeftBracketingBar]”vv“\[RightBracketingBar]”)*MUV(u,v)(2)

A rendering pipeline may then be used to convert the per-vertex attribute values on 3D shapes to per-pixel attribute values on 2D images. For example, the ReDA rasterizer pipeline 555 shown in FIG. 5 may be used to render the 2D images based on the 3D shapes and per-pixel attribute values. The term Pcam represents the camera projection matrix, and the term Ppos Represents the Pose of the Mesh in the Camera Coordinate System. Assuming that the closest surface point Vj based on the depth value on the shape S maps to the pixel P on the 2D image I after rendering, then the corresponding mask value (p) can be computed through the rendering function :

(Ii)=(Ppos,Pcam,Vj,(Vj)) (3)

A process similar to that illustrated in equations 1, 2, and 3 may be applied for other attributes, such as if the term UV is replaced with the term UV in the UV space. This approach to DR is quite different from conventional approaches in which is simply defined as the Z-buffer rendering function, where each pixel is only influenced by the nearest triangle that encloses Vj, which is not truly differentiable.

Soft Rasterization via Convolution Kernel

ReDA may utilize a soft rasterization via a convolution kernel to remedy the Z-buffer limitation of Differentiable Rendering. To remedy the Z-buffer limitation of DR, the discrete sampling (through the enclosed triangle) is differentiated into a continuous probabilistic procedure in which each pixel is influenced by all the vertices of the mesh with a corresponding weighted probability. After projection, the closer the pixel is to the projected vertex, the higher the probability that the vertex is influenced. Before projection, the further the distance along the Z (depth) direction, the less the weight should be imposed on the corresponding probability.

One way to achieve this is to project each triangle t onto the image plane and to rasterize all the enclosed pixels to get a rendered image. In this way, the triangle t is only influenced by those enclosed pixels and their corresponding attribute (color, mask, or depth) values if the triangle is visible to the camera. To make this “soft” rasterization, a convolutional kernel may be applied to “blur” the rendered image so that the attribute may be propagated outside of the triangle. The term tj and the term Ztj represent the attribute and the Z value, respectively, for each enclosed pixel j with triangle t, and (t) represents the enclosed pixel set of t, so j ∈(t), and where S represents the whole triangle set. The soft rendering results may then be aggregated across all the triangles:

𝒜I(Ii)=tSj𝒩(t)𝓌ji𝒜tj(4)

where

𝓌ji=Djiexp(𝒵tj/γ)kDkiexp(Ztk/γ) and Dki=sigmoid (ik2)(kt1𝒩(t)),

and both σ and γ are set 1×10−4. Each enclosed pixel attribute value tj of triangle t is first obtained via per triangle traditional rasterization. The soft rasterization is then implemented as spatial Gaussian filtering operations with varying kernel sizes to help propagate the attribute values outside of the triangle. The softening and aggregation may be performed on a per triangle basis. However, this approach may be too computationally intensive and memory inefficient. Thus, an alternate approach is illustrated in FIG. 2 (described below) in which approximation is performed on mesh slices where all the triangles belonging to the same depth zone are rendered in the same image representing a mesh slice. Aggregation across the mesh slices is then performed to generate a rendered image. In some implementations, the slices are taken along the Z-axis.

Equation 4, shown above, may be implemented as a multi-channel 2D convolution operation, where the kernel size can be varied for different scales of softening. The bigger the kernel size, the broader impact each pixel will have on the other vertices. In some implementations, the same convolution kernel may be stacked a several times with stride 2 to generate a pyramid of rendered attribute images. A photometric like loss may then be applied at each scale of the pyramid between the rendered attribute image and the corresponding ground-truth image (color, mask, or depth).

LReDA=kPyd(AI(I),k)Pyd(Agt(I), k)1(5)

where Pyd is a function returning the k-th scale of the softening version.

FIG. 2 provides a comparison 200 of the ReDA rasterizer described above (shown in the upper portion of the diagram) and the SoftRas soft rasterizer (shown in the lower portion of the diagram). FIG. 2 shows implementation differences between the two rasterization processes and provides a comparison of results provided by each process. The rasterization is performed on mesh slices to provide a less computationally intensive and more memory efficient approach to rasterization than performing the softening and aggregation on all triangles.

In the example shown in FIG. 2, the soft rasterizer receives two inputs: (1) a 3D mesh 260 representing the geometry of the face of a subject, and (2) an RGB texture 255 to be applied to the 3D mesh 260. The soft rasterizer applies an aggregation function to a plurality of per-triangle color based on probability maps 265 and the triangles’ relative depths to obtain final rendering results 270. In contrast, the ReDA rasterizer receives three inputs: (1) a 3D mesh 215 representing the geometry of the face of the subject, (2) an RGB texture 205 to be applied to the 3D mesh 215, and (3) a semantic mask 210. The 3D mesh 215 and the 3D mesh 260 represent facial structure the same subject in this example, and the RGB texture 205 and the RGB texture 255 represent the texturing of the face of the same subject as well. The semantic mask 210 (also referred to herein as a “face parsing mask”) represents a map for the 3D mesh 215 which defines a semantic label for each vertex. The semantic label may indicate a part of the face with which the vertex is associated, such as but not limited to an eyebrow, nose, upper lip, or other region of the face.

In the ReDA rasterizer, all triangles belonging to the same depth zone may be rendered into the same image, and then be aggregated across different slices 220. For example, the mesh may be sliced along the Z axis into multiple pieces as illustrated in FIG. 2. Rendering results 225 provide an example of the rendering results obtained using the ReDA rasterizer. A comparison of the rendering results 225 and the rendering results 270 shows that the rasterization provided by ReDA rasterizer provides significantly improved results over the soft rasterizer. The magnified portion of the ReDA rasterizer results 230 and the magnified portion of the soft rasterizer results 275 illustrate the improved results that may be provided by the ReDA rasterizer.

FIG. 3 further illustrates the improved results that may be produced by ReDA. FIG. 3 is a diagram that provides a side-by-side comparison 300 of results produced through Z-buffer rasterization with the continuous probabilistic procedure provided by ReDA. Column 305 of FIG. 3 includes 2D images of two subjects that serve as input images. Column 310 provides example of 3D face shape reconstruction results from rendering each of the subjects using color and a face parsing mask as differentiable attributes when applying ReDA to the input images. Column 315 provides example 3D face shape reconstruction results from rendering each of the subjects using color but not a face parsing mask when applying ReDA to the input images. Column 320 provides an example of 3D face shape reconstruction results from rendering each of the subjects in which ReDA was not applied to the input images. The resulting geometrics of results shown in FIG. 3 demonstrate that applying color and mask as differentiable attributes with ReDA can reduce fitting errors and provide geometries that more closely resemble the subjects in the input images.

Free Form Deformation

ReDA introduces a free-form deformation layer that sits on top of 3DMM to provide additional technical benefits that improve 3D face reconstruction. The free-form deformation layer uses both prior knowledge and out-of-space modeling to significantly improve the 3D face reconstruction results over the use of 3DMM alone. The examples that follow describe a parametric base model that may be used by ReDA and shape correction of the parametric base model through free-form deformation.

Parametric Base Model for Free-Form Deformation

Even though parametric base model, like that provided by 3DMM, has limited modeling capacity, the model still provides decent coarse-scale geometry that represents the shape of the face of the subject in the 2D image. The parametric base model may be further refined through shape-correction as described in the next section. The parametric base model may significantly reduce the burden of learning for the machine learning model. The techniques disclosed herein may use the following parametric face model to represent the basic face shape S0(α, β):

S0(α,β)=S¯+ks=1msαksBksske=1meβkeBkee(6)

where S∈R3N is the average facial geometry. Matrix [B1S, . . . , BmS s] and [B1e, . . . , Bme e] respectively represent the shape and expression PCA basis learned from high quality face scans. The number of shape and expression basis are represented by mS and me respectively. For a given a face image I, the coefficients [α1, . . . , αms] and [β1, . . . , βme] describe the shape of the face. The reflectance model may be similarly defined.

Shape Correction via Free-form Deformation

Free-form deformation may be used to provide improved fitting results that capture finer details in a fitted 3D model than results obtained without the use of free-form deformation. FIG. 4 is a diagram providing a comparison 400 of examples of 3D face fitting results both with and without free-form deformation. The input images for two subjects are in the left-most column 405 of the diagram. The middle column 410 illustrates results using fee-form deformation on the two input images, and the right-most column 415 illustrate results that were generated without free-form deformation being performed on the input images. As can be seen from this example, the use of free-form deformation can significantly improve the geometry details on important face regions to better convey input identity. The example illustrated in FIG. 4 demonstrates that free-form deformation may provide a fitted 3D model with significant improvements in the details around the cheek and mouth regions. Free-form deformation may provide fitted 3D models with improvements in the details of other regions of the face in addition to or instead of the regions discussed in this example.

In contrast with some conventional techniques for 3D face reconstruction that model the correction in parameter space, the techniques disclosed herein directly model the displacement in vertex space. As shown in FIG. 5, the network 515 outputs a corrective shape residual ΔS in parallel with the 3DMM parameters. The term S′ represents the final deformed mesh, hence S′=S0S. As discussed above, S0 models the coarse geometry of the face, and ΔS models the deformation needed to fill the gap between S0 and the final correct shape S′. As S0 and S′ have a natural per-vertex correspondence, the transformation from S0 to S′ is referred to herein as free-form deformation.

The techniques disclosed herein use an as-rigid-as-possible (ARAP) deformation constraint with respect to the free-form deformation. Such regularization may be necessary to prevent the mesh from deforming into a nonsensible shape. The ARAP constraint regularizes the deformation. The term Cl represents all the triangles centered at vertex p1, and the term C′l represents the deformed version. If the deformation is rigid, then there exists a rotation matrix Rl such that:

p′l−p′m=Rl(pl−pm),∀m∈N(l) (7)

For each edge emanating from vertex pl(p′l) to its neighbor pm(p′m) in the cell, where N(l) denotes the set of vertex indices connected to the vertex pl. In the context of the ARAP constraint, the following loss function is minimized:

L(Cl,Cl)=m𝒩(l)wlm(plpm)Rl(plpm)(8)

with respect to the whole mesh, the total rigidity may be enhanced by summarizing over the above loss for each cell based on the following:

LARAP=lnwlm𝒩(l)wlm(plpm)Rl(plpm)(9)

where both wl and wlm are set according to the techniques disclosed in “As-rigid-as-possible surface modeling” by Olga Sorkine-Hornung and Marc Alexa, In Symposium on Geometry Processing, 2007, which is incorporated herein by reference. In addition to the above loss, another smooth term is also added to penalize the rotation difference between the two adjacent cells. The final free-form deformation layer minimizes the following losses (referred to as “FFD ARAP” 545 in FIG. 5):

L(R,Δs)=LARAP=λlnm𝒩(l)(R1Rm)2(10)

where R is the set of all Rl, l ∈[1, . . . , n]. λ is set empirically to 0.001 in this example implementation. Each Rl is initialized as the identity matrix, and the process continues with alternating between optimizing ΔS while fixing R and optimizing R while fixing ΔS. At the end, the entire system can be trained end-to-end by combining LDA and L(R, ΔS) together with the 2D landmark loss.

FIG. 5 is a diagram of an example 3D face fitting pipeline 500 that may implement the various ReDA techniques disclosed herein. The face fitting pipeline 500 may receive a source data 505 representing the face of a human subject for whom a 3D face model is to be constructed. The source data 505 may be a 2D (RGB) image, a 3D (RGB-D) image, and/or depth (D) information representing the face of a human subject. The network 515 may analyze the source data 505 and output various parameters for various modules of the face fitting pipeline 500. The network 515 may be implemented by various types of machine learning architectures, such as deep neural networks (DNNs), recurrent neural networks (RNNs), convolutional neural networks (CNNs), and/or other types of neural networks.

The network 515 may output the corrective shape with the residual ΔS 520 in parallel with 3DMM parameters. The 3DMM parameters the coefficients 525 and the parameters 510. The coefficients 525 describe the shape of the face (α) and the skin reflectance (β) of the subject in the source data 505. The parameters 510 include the Pcam representing the camera projection matrix, and the Ppos, representing the pose of the mesh in the camera coordinate system. The 3DMM module 530 provides the parametric base model) (S0, which models the coarse geometry of the face from the source data 505. S′ represents the final deformed mesh in which the mesh is deformed according to the residual ΔS 520. The residual ΔS 520 models the deformation needed to fill the gap between S0 and the final correct shape S′. Therefore, S′=S0S.

The ReDA rasterization pipeline 555 generates the image(s) 560 which are a 2D representation of the 3D model of the face from the source data 505. The image 560 can be compared with the ground truth masked image 565 to determine the loss function 570. The loss function 570 represents a difference between the ground truth masked image 565 and the output from the ReDA rasterization pipeline 555. FIG. 2 illustrates an example implementation of the ReDA rasterization pipeline 555.

The free-form deformation (FFD) layer includes three elements in this example implementation: the FFD loss module 545, the FFD module 535, FFD module 540, and the FFD module 550. The FFD layer minimizes the free-form deformation loss using the techniques discussed in the preceding examples. Other losses are omitted in FIG. 5 for the sake of clarity but may be included in other implementations for training the elements of the pipeline according to these additional attributes. The FFD loss module 545 may be configured to determine the free-form loss according to equation (10) discussed above. The FFD module 535 sits on top of the 3DMM module 530 and provides out-of-space modeling that ensures that the mesh geometry has enough space to fit any 2D image, 3D image, and/or depth information included in the source data 505. This approach overcomes the capacity limitations of 3DMM by deforming the mesh geometry outside of 3DMM and providing the deformed mesh shape S′ to the 3DMM module 530. The FFD module 550 provides the deformed mesh shape S′ ReDA rasterization pipeline 555 which generates the image(s) 560 from the fitted model.

FIG. 6 is a diagram illustrating a comparison 600 of the results of the techniques disclosed herein with another 3D face shape reconstruction techniques referred to as “RingNet.” RingNet learns to compute 3D face shape from a single image. However, as can be seen in FIG. 6, the ReDA techniques disclosed herein may provide fits that are much closer to the input identities than the results produced by RingNet. The diagram in FIG. 6 includes a row of input images 605. The row of images 605 is analyzed by the techniques disclosed herein and by RingNet. The row of images 610 that illustrate the output of ReDA, and the row of images 615 shows the results obtained from RingNet. The row of images 620 show the results obtained from the techniques disclosed herein from row 610 rendered with 0.7 alpha blending to show the high alignment quality obtained from ReDA.

FIG. 7 is a diagram illustrating a comparison 700 of the results of the techniques disclosed herein with another 3D face shape reconstruction techniques referred to as Face Model Learning (“FML”). As can be seen in FIG. 7, the ReDA techniques disclosed herein can provide fits that are much closer to the input identities than the results produced by FML. The diagram in FIG. 7 includes a row of input images 705, which are the same input images 605 from FIG. 6. These images are analyzed by the techniques disclosed herein and by FML. The row of image 710 illustrates the output of ReDA, and the row of images 715 illustrates the results obtained from FML. The row of images 720 illustrates the results obtained from ReDA rendered with 0.7 alpha blending to show the high alignment quality obtained from ReDA.

FIG. 8 is a flow diagram of a process 800 for generating a 3D model of a face from a 2D image. The process 800 may be implemented on a data processing system, such as the machine 1000 illustrated in FIG. 10. The process 800 may be implemented on a client device, such as the client device 120. The process 800 may also be implemented by 3D face reconstruction services, such as 3D face reconstruction services 170.

The process 800 may include an operation 810 of obtaining a 2D image of a face of a human subject. The 2D image may be obtained from a camera or other image sensor of the device, as discussed with respect to FIGS. 1A and 1B. The 2D image may also be obtained from an external source. For example, the 2D image may be obtained from an image archive, a social media platform, or other source of 2D images. The 2D image may be stored one of many digital image file formats, including but not limited to Joint Photographic Experts Group (JPEG), Graphic interchange Format (GIF), Tagged Image File Format (TIFF), Device-Independent Bitmap (DIB), a Bitmap Image File (BMP), Portable Network Graphics (PNG), and/or other digital image file formats. The 2D image may be received over a network, such as the network 125. For example, the client device 120 or the cloud-based application services 160 may send a request to the 3D face reconstruction services 170 for a 3D face model of a subject in the 2D image.

The process 800 may also include an operation 820 of generating a three-dimensional (3D) model of the face of the human subject based on the 2D image by analyzing the 2D image of the face to produce a coarse 3D model of the face of the human subject, and refining the coarse 3D model through free form deformation to produce a fitted 3D model. The operation 820 may be implemented by the face fitting pipeline 500 illustrated in FIG. 5. Various techniques may be used to produce the coarse 3D model of the face of the human subject included in the 2D image. Some implementations may utilize 3DMM to produce a parametric base model (also referred to herein as a “coarse 3D model”) that provides coarse-scale geometry of the face of the subject. The coarse 3D model may be refined through free-form deformation to generate the fitted 3D model, and an as-rigid-as-possible (ARAP) deformation constraint to regularize the deformation and to prevent the coarse 3D model from deforming into nonsensible shapes.

ReDA was tested on two datasets: (1) the Media Integration and Communication Center (MICC) dataset, and (2) the 3D Facial Expression Database provided by Binghamton University (BU-3DFE).

The MICC includes scans of 53 subjects. Texture images from frontal pose scans were used for fitting experiments. The texture images in the dataset include both left-side and right-side view. The left-side views were selected for testing, and the scans were cropped at a 95 mm radius around the tip of the nose of the subject included in the selected scans to better evaluate reconstruction of the inner face.

BU-3DFE dataset includes scans of 100 subjects from diverse racial, age and gender groups. Each subject has 25 scans with different expressions. For testing ReDA, scans and images from neutral faces were selected. Furthermore, left-side view texture images were selected for use in testing.

To directly test the effectiveness of ReDA, experiments with the fitting-based method shown in FIG. 5 were used. The pipeline disclosed herein may also be utilized with learning-based methods. The fitting method utilized by ReDA implements stochastic gradient descent (SGD) optimization using the ADAM optimizer. 2D landmark loss is used by default. First, landmark detection is performed which includes the invisible line and face parsing on the input image to extract face landmarks and facial masks. Second, landmark loss is applied to optimize rigid pose Ppose in Equation 4 so that the pose of the template mesh is roughly aligned with the input image. The attribute loss (Equation 5) and landmark loss are applied to jointly optimize rigid pose and other model parameters. Free-form deformation is then performed after optimization of the model parameters.

To measure the error between ground-truth and predictions produced using these techniques, the iterative closest point (ICP) algorithm is applied to automatically find the correspondence between meshes. Point-to-plane errors are then calculated which are measured in millimeters. The results for MICC are listed in Table 1100 of FIG. 11 and the results for BU-3DFE are listed in Table 1200 of FIG. 12. Table 1100 provides results of ablation studies on the MICC dataset in which Z-buffer rasterization was used if ReDA rasterization is not specified. Table 1200 provides result of ablation studies on the BU-3DFE dataset. ReDA rasterization is used by default, and depth ground is assumed to be given where the depth attribute is used.

The effectiveness of Differentiable Attributes was tested by applying photometric loss by enforcing the color consistency between images and the projected color from 3D shapes. 3D shape color was approximated by utilizing a PCA texture model trained from 112 scans with lighting approximated by Spherical Harmonics Illumination. For mask attribute image, a face parsing model was first applied to images to obtain the ground-truth face parsing masks. To enable facial parsing from 3D shapes, UV maps (e.g., the semantic mask 210 of FIG. 2) are painted in which each facial region (e.g., eyes, nose, ears and etc.) is painted with discrete color that corresponds to the ground-truth facial mask labels. Since both color and mask attributes have exact correspondence in UV space, those attributes can be directly rendered as images. For images with depth information, the depth attribute is included in the experiments by default. To add depth attribute in the pipeline, the depth image is rendered for both ground-truth mesh and predicted mesh. The rendered depth image can be consumed in the same way as other attribute images by the pipeline in which the loss between our predicted depth image with the ground-truth depth image is computed. Consistent improvements have been observed as more attributes are combined in the optimization pipeline. As the results in Table 1100 and Table 1200 show, by jointly optimize color and mask attributes, 5:1% and 16:1% relative improvement can be achieved on MICC dataset comparing to optimize color attribute and mask attribute alone and 13:9% and 18:4% on BU-3DFE dataset with the same setting. With additional depth attribute, the fitting error can be further improved by 52:6%, 47:4% and 52:5% comparing to color attribute alone, mask attribute alone color+mask attributes settings, respectively. FIG. 5 shows the effectiveness of our proposed differentiable attributes in ReDA.

The effectiveness of ReDA rasterization was also tested. The ReDA rasterization disclosed herein turns discrete sampling into a continuous probabilistic procedure that a change of one pixel can influence every vertex in a mesh. The ablation study on MICC dataset Table 1200 compares our ReDA rasterization to traditional Z-buffer Rasterization. The results show that such a procedure can effectively reduce the numerical reconstruction error. Consistent improvement on reconstruction error on various of attributes constraints compared to Z-buffer rasterization have also been observed. ReDA rasterization reduces the fitting error on MICC by 14:3%, 26:6% and 23:3% with color, mask, and color+mask settings respectively relative to the Z-buffer rasterization baseline. FIG. 3 also shows the effectiveness by a side-by-side comparison between the ReDA in column 310 and the default Z-buffer rasterization in column 320. One factor that may affect the effectiveness of ReDA rasterization is the number of levels of pyramid layers. The ablation study Table 1300 shows that more levels of pyramid layers can lead to improved performance. Six pyramid layers were used in the ReDA rasterization experiments described herein for testing the effectiveness of ReDA. However, in actual implementations, a greater or fewer number of layers may be used.

The effectiveness of free-form deformation was also tested. To better leverage our image attributes, ARAP free-form deformation is used to ensure that the fitting results are not limited by the capacity of the 3D face model. Free-form deformation is added in the last stage of fitting. Color, face mask, and depth attributes have already been added at this point. The free-form deformation provided a relative improvement of 11.7% on the BU-3DFE dataset. FIG. 4 shows two examples of fitting results between with and without free-form deformation. As shown in FIG. 4, adding the free-form help add more geometry details on the important face regions to better convey the input identity, such as the details around the cheek and mouth.

Quantitatively, due to slight differences in the experimental setup, it may be difficult to compare these tests with the conventional 3D face reconstruction techniques. Nevertheless, the fitting errors may still be compared as a reference. On the MICC dataset, Generative Adversarial Network Fitting for High Fidelity 3D Face Reconstruction or (GANFit) reports historically low fitting error (with mean: 0.94 mm, SD: 0.106 mm) by using a high quality texture (GAN) model trained on a large scale 3D scans. Although the input images are different, ReDA achieves comparable mean point-to-plane error of 0.962 mm with a standard deviation (SD) of 0.146 mm. On BU-3DFE dataset, a comparison is made with FML which is a learning-based method taking multiple RGB images as input. A better result was achieved by ReDA of 1.331 mm mean point-to-plane error with standard deviation of 0:346 mm comparing to their error of 1:78 mm with SD of 0:45 mm. Qualitatively, FIGS. 6 and 7 show that ReDA provided fits much closer to the input identities.

The detailed examples of systems, devices, and techniques described in connection with FIGS. 18 and 1113 are presented herein for illustration of the disclosure and its benefits. Such examples of use should not be construed to be limitations on the logical process embodiments of the disclosure, nor should variations of user interface methods from those described herein be considered outside the scope of the present disclosure. It is understood that references to displaying or presenting an item (such as, but not limited to, presenting an image on a display device, presenting audio via one or more loudspeakers, and/or vibrating a device) include issuing instructions, commands, and/or signals causing, or reasonably expected to cause, a device or system to display or present the item. In some embodiments, various features described in FIGS. 18 and 1113 are implemented in respective modules, which may also be referred to as, and/or include, logic, components, units, and/or mechanisms. Modules may constitute either software modules (for example, code embodied on a machine-readable medium) or hardware modules.

In some examples, a hardware module may be implemented mechanically, electronically, or with any suitable combination thereof. For example, a hardware module may include dedicated circuitry or logic configured to perform certain operations. For example, a hardware module may include a special-purpose processor, such as a field-programmable gate array (FPGA) or an Application Specific Integrated Circuit (ASIC). A hardware module may also include programmable logic or circuitry that is temporarily configured by software to perform certain operations and may include a portion of machine-readable medium data and/or instructions for such configuration. For example, a hardware module may include software encompassed within a programmable processor configured to execute a set of software instructions. It will be appreciated that the decision to implement a hardware module mechanically, in dedicated and permanently configured circuitry, or in temporarily configured circuitry (for example, configured by software) may be driven by cost, time, support, and engineering considerations.

Accordingly, the phrase “hardware module” should be understood to encompass a tangible entity capable of performing certain operations and may be configured or arranged in a certain physical manner, be that an entity that is physically constructed, permanently configured (for example, hardwired), and/or temporarily configured (for example, programmed) to operate in a certain manner or to perform certain operations described herein. As used herein, “hardware-implemented module” refers to a hardware module. Considering examples in which hardware modules are temporarily configured (for example, programmed), each of the hardware modules need not be configured or instantiated at any one instance in time. For example, where a hardware module includes a programmable processor configured by software to become a special-purpose processor, the programmable processor may be configured as respectively different special-purpose processors (for example, including different hardware modules) at different times. Software may accordingly configure a processor or processors, for example, to constitute a particular hardware module at one instance of time and to constitute a different hardware module at a different instance of time. A hardware module implemented using one or more processors may be referred to as being “processor implemented” or “computer implemented.”

Hardware modules can provide information to, and receive information from, other hardware modules. Accordingly, the described hardware modules may be regarded as being communicatively coupled. Where multiple hardware modules exist contemporaneously, communications may be achieved through signal transmission (for example, over appropriate circuits and buses) between or among two or more of the hardware modules. In embodiments in which multiple hardware modules are configured or instantiated at different times, communications between such hardware modules may be achieved, for example, through the storage and retrieval of information in memory devices to which the multiple hardware modules have access. For example, one hardware module may perform an operation and store the output in a memory device, and another hardware module may then access the memory device to retrieve and process the stored output.

In some examples, at least some of the operations of a method may be performed by one or more processors or processor-implemented modules. Moreover, the one or more processors may also operate to support performance of the relevant operations in a “cloud computing” environment or as a “software as a service” (SaaS). For example, at least some of the operations may be performed by, and/or among, multiple computers (as examples of machines including processors), with these operations being accessible via a network (for example, the Internet) and/or via one or more software interfaces (for example, an application program interface (API)). The performance of certain of the operations may be distributed among the processors, not only residing within a single machine, but deployed across several machines. Processors or processor-implemented modules may be in a single geographic location (for example, within a home or office environment, or a server farm), or may be distributed across multiple geographic locations.

FIG. 9 is a block diagram 900 illustrating an example software architecture 902, various portions of which may be used in conjunction with various hardware architectures herein described, which may implement any of the above-described features. FIG. 9 is a non-limiting example of a software architecture and it will be appreciated that many other architectures may be implemented to facilitate the functionality described herein. The software architecture 902 may execute on hardware such as a machine 1000 of FIG. 10 that includes, among other things, processors 1010, memory 1030, and input/output (I/O) components 1050. A representative hardware layer 904 is illustrated and can represent, for example, the machine 1000 of FIG. 10. The representative hardware layer 904 includes a processing unit 906 and associated executable instructions 908. The executable instructions 908 represent executable instructions of the software architecture 902, including implementation of the methods, modules and so forth described herein. The hardware layer 904 also includes a memory/storage 910, which also includes the executable instructions 908 and accompanying data. The hardware layer 904 may also include other hardware modules 912. Instructions 908 held by processing unit 908 may be portions of instructions 908 held by the memory/storage 910.

The example software architecture 902 may be conceptualized as layers, each providing various functionality. For example, the software architecture 902 may include layers and components such as an operating system (OS) 914, libraries 916, frameworks 918, applications 920, and a presentation layer 944. Operationally, the applications 920 and/or other components within the layers may invoke API calls 924 to other layers and receive corresponding results 926. The layers illustrated are representative in nature and other software architectures may include additional or different layers. For example, some mobile or special purpose operating systems may not provide the frameworks/middleware 918.

The OS 914 may manage hardware resources and provide common services. The OS 914 may include, for example, a kernel 928, services 930, and drivers 932. The kernel 928 may act as an abstraction layer between the hardware layer 904 and other software layers. For example, the kernel 928 may be responsible for memory management, processor management (for example, scheduling), component management, networking, security settings, and so on. The services 930 may provide other common services for the other software layers. The drivers 932 may be responsible for controlling or interfacing with the underlying hardware layer 904. For instance, the drivers 932 may include display drivers, camera drivers, memory/storage drivers, peripheral device drivers (for example, via Universal Serial Bus (USB)), network and/or wireless communication drivers, audio drivers, and so forth depending on the hardware and/or software configuration.

The libraries 916 may provide a common infrastructure that may be used by the applications 920 and/or other components and/or layers. The libraries 916 typically provide functionality for use by other software modules to perform tasks, rather than rather than interacting directly with the OS 914. The libraries 916 may include system libraries 934 (for example, C standard library) that may provide functions such as memory allocation, string manipulation, file operations. In addition, the libraries 916 may include API libraries 936 such as media libraries (for example, supporting presentation and manipulation of image, sound, and/or video data formats), graphics libraries (for example, an OpenGL library for rendering 2D and 3D graphics on a display), database libraries (for example, SQLite or other relational database functions), and web libraries (for example, WebKit that may provide web browsing functionality). The libraries 916 may also include a wide variety of other libraries 938 to provide many functions for applications 920 and other software modules.

The frameworks 918 (also sometimes referred to as middleware) provide a higher-level common infrastructure that may be used by the applications 920 and/or other software modules. For example, the frameworks 918 may provide various graphic user interface (GUI) functions, high-level resource management, or high-level location services. The frameworks 918 may provide a broad spectrum of other APIs for applications 920 and/or other software modules.

The applications 920 include built-in applications 940 and/or third-party applications 942. Examples of built-in applications 940 may include, but are not limited to, a contacts application, a browser application, a location application, a media application, a messaging application, and/or a game application. Third-party applications 942 may include any applications developed by an entity other than the vendor of the particular platform. The applications 920 may use functions available via OS 914, libraries 916, frameworks 918, and presentation layer 944 to create user interfaces to interact with users.

Some software architectures use virtual machines, as illustrated by a virtual machine 948. The virtual machine 948 provides an execution environment where applications/modules can execute as if they were executing on a hardware machine (such as the machine 1000 of FIG. 10, for example). The virtual machine 948 may be hosted by a host OS (for example, OS 914) or hypervisor, and may have a virtual machine monitor 946 which manages operation of the virtual machine 948 and interoperation with the host operating system. A software architecture, which may be different from software architecture 902 outside of the virtual machine, executes within the virtual machine 948 such as an operating system 950, libraries 952, frameworks 954, applications 956, and/or a presentation layer 958.

FIG. 10 is a block diagram illustrating components of an example machine 1000 configured to read instructions from a machine-readable medium (for example, a machine-readable storage medium) and perform any of the features described herein. The example machine 1000 is in a form of a computer system, within which instructions 1016 (for example, in the form of software components) for causing the machine 1000 to perform any of the features described herein may be executed. As such, the instructions 1016 may be used to implement modules or components described herein. The instructions 1016 cause unprogrammed and/or unconfigured machine 1000 to operate as a particular machine configured to carry out the described features. The machine 1000 may be configured to operate as a standalone device or may be coupled (for example, networked) to other machines. In a networked deployment, the machine 1000 may operate in the capacity of a server machine or a client machine in a server-client network environment, or as a node in a peer-to-peer or distributed network environment. Machine 1000 may be embodied as, for example, a server computer, a client computer, a personal computer (PC), a tablet computer, a laptop computer, a netbook, a set-top box (STB), a gaming and/or entertainment system, a smart phone, a mobile device, a wearable device (for example, a smart watch), and an Internet of Things (IoT) device. Further, although only a single machine 1000 is illustrated, the term “machine” includes a collection of machines that individually or jointly execute the instructions 1016.

The machine 1000 may include processors 1010, memory 1030, and I/O components 1050, which may be communicatively coupled via, for example, a bus 1002. The bus 1002 may include multiple buses coupling various elements of machine 1000 via various bus technologies and protocols. In an example, the processors 1010 (including, for example, a central processing unit (CPU), a graphics processing unit (GPU), a digital signal processor (DSP), an ASIC, or a suitable combination thereof) may include one or more processors 1012a to 1012n that may execute the instructions 1016 and process data. In some examples, one or more processors 1010 may execute instructions provided or identified by one or more other processors 1010. The term “processor” includes a multi-core processor including cores that may execute instructions contemporaneously. Although FIG. 10 shows multiple processors, the machine 1000 may include a single processor with a single core, a single processor with multiple cores (for example, a multi-core processor), multiple processors each with a single core, multiple processors each with multiple cores, or any combination thereof. In some examples, the machine 1000 may include multiple processors distributed among multiple machines.

The memory/storage 1030 may include a main memory 1032, a static memory 1034, or other memory, and a storage unit 1036, both accessible to the processors 1010 such as via the bus 1002. The storage unit 1036 and memory 1032, 1034 store instructions 1016 embodying any one or more of the functions described herein. The memory/storage 1030 may also store temporary, intermediate, and/or long-term data for processors 1010. The instructions 1016 may also reside, completely or partially, within the memory 1032, 1034, within the storage unit 1036, within at least one of the processors 1010 (for example, within a command buffer or cache memory), within memory at least one of I/O components 1050, or any suitable combination thereof, during execution thereof. Accordingly, the memory 1032, 1034, the storage unit 1036, memory in processors 1010, and memory in I/O components 1050 are examples of machine-readable media.

As used herein, “machine-readable medium” refers to a device able to temporarily or permanently store instructions and data that cause machine 1000 to operate in a specific fashion, and may include, but is not limited to, random-access memory (RAM), read-only memory (ROM), buffer memory, flash memory, optical storage media, magnetic storage media and devices, cache memory, network-accessible or cloud storage, other types of storage and/or any suitable combination thereof. The term “machine-readable medium” applies to a single medium, or combination of multiple media, used to store instructions (for example, instructions 1016) for execution by a machine 1000 such that the instructions, when executed by one or more processors 1010 of the machine 1000, cause the machine 1000 to perform and one or more of the features described herein. Accordingly, a “machine-readable medium” may refer to a single storage device, as well as “cloud-based” storage systems or storage networks that include multiple storage apparatus or devices. The term “machine-readable medium” excludes signals per se.

The I/O components 1050 may include a wide variety of hardware components adapted to receive input, provide output, produce output, transmit information, exchange information, capture measurements, and so on. The specific I/O components 1050 included in a particular machine will depend on the type and/or function of the machine. For example, mobile devices such as mobile phones may include a touch input device, whereas a headless server or IoT device may not include such a touch input device. The particular examples of I/O components illustrated in FIG. 10 are in no way limiting, and other types of components may be included in machine 1000. The grouping of I/O components 1050 are merely for simplifying this discussion, and the grouping is in no way limiting. In various examples, the I/O components 1050 may include user output components 1052 and user input components 1054. User output components 1052 may include, for example, display components for displaying information (for example, a liquid crystal display (LCD) or a projector), acoustic components (for example, speakers), haptic components (for example, a vibratory motor or force-feedback device), and/or other signal generators. User input components 1054 may include, for example, alphanumeric input components (for example, a keyboard or a touch screen), pointing components (for example, a mouse device, a touchpad, or another pointing instrument), and/or tactile input components (for example, a physical button or a touch screen that provides location and/or force of touches or touch gestures) configured for receiving various user inputs, such as user commands and/or selections.

In some examples, the I/O components 1050 may include biometric components 1056, motion components 1058, environmental components 1060, and/or position components 1062, among a wide array of other physical sensor components. The biometric components 1056 may include, for example, components to detect body expressions (for example, facial expressions, vocal expressions, hand or body gestures, or eye tracking), measure biosignals (for example, heart rate or brain waves), and identify a person (for example, via voice-, retina-, fingerprint-, and/or facial-based identification). The motion components 1058 may include, for example, acceleration sensors (for example, an accelerometer) and rotation sensors (for example, a gyroscope). The environmental components 1060 may include, for example, illumination sensors, temperature sensors, humidity sensors, pressure sensors (for example, a barometer), acoustic sensors (for example, a microphone used to detect ambient noise), proximity sensors (for example, infrared sensing of nearby objects), and/or other components that may provide indications, measurements, or signals corresponding to a surrounding physical environment. The position components 1062 may include, for example, location sensors (for example, a Global Position System (GPS) receiver), altitude sensors (for example, an air pressure sensor from which altitude may be derived), and/or orientation sensors (for example, magnetometers).

The I/O components 1050 may include communication components 1064, implementing a wide variety of technologies operable to couple the machine 1000 to network(s) 1070 and/or device(s) 1080 via respective communicative couplings 1072 and 1082. The communication components 1064 may include one or more network interface components or other suitable devices to interface with the network(s) 1070. The communication components 1064 may include, for example, components adapted to provide wired communication, wireless communication, cellular communication, Near Field Communication (NFC), Bluetooth communication, Wi-Fi, and/or communication via other modalities. The device(s) 1080 may include other machines or various peripheral devices (for example, coupled via USB).

In some examples, the communication components 1064 may detect identifiers or include components adapted to detect identifiers. For example, the communication components 1064 may include Radio Frequency Identification (RFID) tag readers, NFC detectors, optical sensors (for example, one- or multi-dimensional bar codes, or other optical codes), and/or acoustic detectors (for example, microphones to identify tagged audio signals). In some examples, location information may be determined based on information from the communication components 1062, such as, but not limited to, geo-location via Internet Protocol (IP) address, location via Wi-Fi, cellular, NFC, Bluetooth, or other wireless station identification and/or signal triangulation.

While various embodiments have been described, the description is intended to be exemplary, rather than limiting, and it is understood that many more embodiments and implementations are possible that are within the scope of the embodiments. Although many possible combinations of features are shown in the accompanying figures and discussed in this detailed description, many other combinations of the disclosed features are possible. Any feature of any embodiment may be used in combination with or substituted for any other feature or element in any other embodiment unless specifically restricted. Therefore, it will be understood that any of the features shown and/or discussed in the present disclosure may be implemented together in any suitable combination. Accordingly, the embodiments are not to be restricted except in light of the attached claims and their equivalents. Also, various modifications and changes may be made within the scope of the attached claims.

While the foregoing has described what are considered to be the best mode and/or other examples, it is understood that various modifications may be made therein and that the subject matter disclosed herein may be implemented in various forms and examples, and that the teachings may be applied in numerous applications, only some of which have been described herein. It is intended by the following claims to claim any and all applications, modifications and variations that fall within the true scope of the present teachings.

Unless otherwise stated, all measurements, values, ratings, positions, magnitudes, sizes, and other specifications that are set forth in this specification, including in the claims that follow, are approximate, not exact. They are intended to have a reasonable range that is consistent with the functions to which they relate and with what is customary in the art to which they pertain.

The scope of protection is limited solely by the claims that now follow. That scope is intended and should be interpreted to be as broad as is consistent with the ordinary meaning of the language that is used in the claims when interpreted in light of this specification and the prosecution history that follows and to encompass all structural and functional equivalents. Notwithstanding, none of the claims are intended to embrace subject matter that fails to satisfy the requirement of Sections 101, 102, or 103 of the Patent Act, nor should they be interpreted in such a way. Any unintended embracement of such subject matter is hereby disclaimed.

Except as stated immediately above, nothing that has been stated or illustrated is intended or should be interpreted to cause a dedication of any component, step, feature, object, benefit, advantage, or equivalent to the public, regardless of whether it is or is not recited in the claims.

It will be understood that the terms and expressions used herein have the ordinary meaning as is accorded to such terms and expressions with respect to their corresponding respective areas of inquiry and study except where specific meanings have otherwise been set forth herein. Relational terms such as first and second and the like may be used solely to distinguish one entity or action from another without necessarily requiring or implying any actual such relationship or order between such entities or actions. The terms “comprises,” “comprising,” or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. An element proceeded by “a” or “an” does not, without further constraints, preclude the existence of additional identical elements in the process, method, article, or apparatus that comprises the element.

The Abstract of the Disclosure is provided to allow the reader to quickly ascertain the nature of the technical disclosure. It is submitted with the understanding that it will not be used to interpret or limit the scope or meaning of the claims. In addition, in the foregoing Detailed Description, it can be seen that various features are grouped together in various examples for the purpose of streamlining the disclosure. This method of disclosure is not to be interpreted as reflecting an intention that the claims require more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive subject matter lies in less than all features of a single disclosed example. Thus, the following claims are hereby incorporated into the Detailed Description, with each claim standing on its own as a separately claimed subject matter.

文章《Microsoft Patent | Reinforced differentiable attribute for 3d face reconstruction》首发于Nweon Patent

]]>
Microsoft Patent | Detecting heart rates using eye-tracking cameras https://patent.nweon.com/25257 Thu, 01 Dec 2022 07:15:12 +0000 https://patent.nweon.com/?p=25257 ...

文章《Microsoft Patent | Detecting heart rates using eye-tracking cameras》首发于Nweon Patent

]]>
Patent: Detecting heart rates using eye-tracking cameras

Patent PDF: 加入映维网会员获取

Publication Number: 20220378310

Publication Date: 20221201

Assignee: Microsoft

Abstract

A head-mounted device includes one or more eye-tracking cameras and one or more computer-readable hardware storage devices having stored thereon computer-executable instructions, including a machine-learned artificial intelligence (AI) model. The head-mounted device is configured to cause the one or more eye-tracking cameras to take a series of images of one or more areas of skin around one or more eyes of a wearer, and use the machine-learned AI model to analyze the series of images to extract a photoplethysmography waveform. A heart rate is then detected based on the photoplethysmography waveform.

Claims

What is claimed is:

Description

BACKGROUND

A heart rate (HR) monitor is a personal monitoring device that allows one to measure and/or display heart rate in real time or record the heart rate for later study. It is sometimes used to gather heart rate data while performing various types of physical exercise. Medical heart rate monitoring devices used in hospitals usually include wired multiple sensors. One type of commonly used HR monitor that uses electrical sensors to measure heart rate is referred to as electrocardiography (also referred to as ECG or EKG). The subject matter claimed herein is not limited to embodiments that solve any disadvantages or that operate only in environments such as those described above. Rather, this background is only provided to illustrate one exemplary technology area where some embodiments described herein may be practiced.

BRIEF SUMMARY

This Summary is provided to introduce a selection of concepts in a simplified form that is further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.

The embodiments described herein are related to detecting heart rates using eye-tracking cameras of head-mounted devices. A head-mounted device includes one or more eye-tracking cameras, one or more processors, and one or more computer-readable hardware storage devices. The computer-readable hardware storage devices store computer-executable instructions, including a machine-learned AI model. The computer-executable instructions are structured such that, when executed by the one or more processors, the head-mounted device is configured to cause the one or more eye-tracking cameras to take a series of images of an area of skin around one or more eyes of a wearer. The head-mounted device is further configured to use the machine-learned AI model to analyze the series of images to extract a photoplethysmography (PPG) waveform and detect a heart rate based on the PPG waveform.

The embodiments described herein are also related to training a machine-learned AI model for detecting heart rates based on a series of images taken by one or more eye-tracking cameras of a head-mounted device. Training the machine-learned AI model includes providing a machine learning network configured to train an AI model based on images taken by eye-tracking cameras of head-mounted devices. A plurality of series of images of one or more areas of skin around one or more eyes of the wearer is taken by the one or more eye-tracking cameras of the head-mounted device as training data. The plurality of series of images is then sent to a machine learning network to train a machine-learned AI model in a particular manner, such that the machine-learned AI model is trained to extract a PPG waveform and detect a heart rate based on the PPG waveform.

Additional features and advantages will be set forth in the description which follows, and in part will be obvious from the description or may be learned by the practice of the teachings herein. Features and advantages of the invention may be realized and obtained by means of the instruments and combinations particularly pointed out in the appended claims. Features of the present invention will become more fully apparent from the following description and appended claims or may be learned by the practice of the invention as set forth hereinafter.

BRIEF DESCRIPTION OF THE DRAWINGS

In order to describe the manner in which the above-recited and other advantages and features can be obtained, a more particular description of the subject matter briefly described above will be rendered by reference to specific embodiments which are illustrated in the appended drawings. Understanding that these drawings depict only typical embodiments and are not, therefore, to be considered to be limiting in scope, embodiments will be described and explained with additional specificity and details through the use of the accompanying drawings in which:

FIG. 1 illustrates an example of an architecture of a head-mounted device, in which principles described herein are implemented;

FIGS. 2A and 2B illustrate an example of a simplified structure of a head-mounted device including one or more eye-tracking cameras;

FIG. 3A illustrates an example of an embodiment of training an artificial intelligence (AI) model configured to generate photoplethysmography (PPG) waveform based on images taken by one or more eye-tracking cameras;

FIG. 3B illustrates another example of an embodiment of training an artificial intelligence (AI) model configured to generate photoplethysmography (PPG) waveform based on images taken by one or more eye-tracking cameras;

FIG. 4A illustrates an example of an embodiment of using a machine-trained AI model shown in FIG. 3A to detect a heart rate based on images generated by one or more eye-tracking cameras of a head-mounted device;

FIG. 4B illustrates an example of an embodiment of using a machine-trained AI model shown in FIG. 3B to detect a heart rate based on images generated by one or more eye-tracking cameras of a head-mounted device;

FIG. 5 illustrates a flowchart of an example of a method for training a machine-learned AI model configured to detect heart rates based on images taken by one or more eye-tracking cameras;

FIG. 6 illustrates a flowchart of an example of a method for using a machine-learned AI model to detect heart rates based on images taken by one or more eye-tracking cameras;

FIG. 7 illustrates a flowchart of an example of a method for segmenting out data generated during noisy time windows; and

FIG. 8 illustrates an example computing system in which the principles described herein may be employed.

DETAILED DESCRIPTION

The principles described herein are related to detecting heart rates using eye-tracking cameras of head-mounted devices. A head-mounted device includes one or more eye-tracking cameras, one or more processors, and one or more computer-readable hardware storage devices. The computer-readable hardware storage devices store computer-executable instructions, including a machine-learned AI model. The computer-executable instructions are structured such that, when executed by the one or more processors, the head-mounted device is configured to cause the one or more eye-tracking cameras to take a series of images of an area of skin around one or more eyes of a wearer. The head-mounted device is further configured to use the machine-learned AI model to analyze the series of images to extract a photoplethysmography (PPG) waveform and detect a heart rate based on the PPG waveform.

In some embodiments, at least one of the one or more eye-tracking cameras is an infrared camera. In some embodiments, the head-mounted device further includes one or more infrared light sources configured to emit infrared light at the one or more areas of skin around the one or more eyes of the wearer.

In some embodiments, the head-mounted device further includes an inertial measurement unit configured to detect head motion of the wearer, and the head-mounted device is further configured to remove at least a portion of noise artifacts generated by the head motion from the PPG waveform. In some embodiments, the inertial measurement unit includes at least one of (1) an accelerometer, (2) a gyroscope, and/or (3) a magnetometer.

In some embodiments, data related to the head motion of the wearer is also input into the machine-learned AI model, causing the machine-learned AI model to cancel out at least a portion of noise artifacts generated by the head motion from the PPG waveform. In some embodiments, the head-mounted device is further configured to process data generated by the inertial measurement unit to identify one or more frequency bands of the noise artifacts generated by the head motion and filter the one or more frequency bands out of the PPG waveform.

In some embodiments, the head-mounted device is further configured to determine whether a period of time is too noisy based on data generated by the inertial measurement unit during the period. In response to determining that the period is too noisy, data generated during the period is segmented out. Such data includes one or more images among the series of images taken during the period. In some embodiments, determining whether the period of time is too noisy includes determining a standard deviation of values obtained from the inertial measurement unit during the period of time. When the standard deviation is greater than a predetermined threshold, it is determined that the period of time is too noisy. When the standard deviation is no greater than the predetermined threshold, it is determined that the period of time is not too noisy.

In some embodiments, the head-mounted device is further configured to allow a wearer to perform a calibration operation to improve the machine-learned AI model based on the individual wearer. The calibration includes (1) detecting a first heart rate dataset of the wearer based on a series of images taken by the one or more eye-tracking cameras and the machine-learned AI model, and (2) detecting a second heart rate dataset of the wearer via a heart rate monitor while the series of images are taken. The second heart rate dataset is then used as feedback to calibrate the machine-learned AI model.

In some embodiments, the head-mounted device further includes one or more displays configured to display one or more images in front of one or more eyes of the wearer. The head-mounted device is further configured to remove at least a portion of noise artifacts generated by the one or more displays from the PPG waveform. In some embodiments, data related to the one or more images displayed on the one or more displays is input into the machine-learned AI model, causing the machine-learned AI model to cancel out at least a portion of noise artifacts generated by the one or more displays.

In some embodiments, the head-mounted device is further configured to determine whether a period of time is too noisy based on data generated by the one or more displays during the period. In response to determining that the period is too noisy, data generated during the period is segmented out. Such data includes one or more images among the series of images taken during the period. In some embodiments, determining whether the period of time is too noisy includes determining a standard deviation of values obtained from the display unit during the period of time. When the standard deviation is greater than a predetermined threshold, it is determined that a predetermined time window is too noisy. When the standard deviation is no greater than the predetermined threshold, it is determined that the predetermined time window is not too noisy.

FIG. 1 illustrates an example of an architecture of a head-mounted device 100. The head-mounted device 100 includes one or more processors 110, one or more system memories 120, and one or more persistent storage devices 130. In some embodiments, the head-mounted device 100 also includes one or more displays 140 and/or one or more eye-tracking cameras 150. The one or more displays 140 are configured to display one or more images in front of the eyes of a wearer. The one or more eye-tracking cameras 150 are configured to track the eye movement of the wearer.

In some embodiments, the head-mounted device 100 also includes one or more light sources 152 configured to illuminate light onto one or more areas around the one or more eyes of the wearer to help the one or more eye-tracking cameras 150 to better tracking the eye movement of the wearer. In some embodiments, the eye-tracking camera(s) 150 are infrared (IR) cameras, and the light source(s) 152 are IR light source(s). In some embodiments, the head-mounted device 100 also includes an inertial measurement unit 160 configured to measure the wearer’s head motion, such as (but not limited to) speed, acceleration, angular rate, and/or an orientation of the wearer’s head. In some embodiments, the inertial measurement unit 160 includes at least one of an accelerometer 162, a gyroscope 164, and/or a magnetometer 166.

In some embodiments, an operating system (OS) 170 is stored in the one or more persistent storage devices 130 and loaded in the one or more system memories 120. In some embodiments, one or more applications 172 are also stored in the one or more persistent storage devices 130 and loaded in the one or more system memories 120. In the embodiments described herein, among the one or more applications, there is a heart rate detecting application 174 configured to detect a heart rate of a wearer. In particular, the heart rate detecting application 174 includes one or more machine-learned artificial intelligence (AI) models 176 configured to extract a PPG waveform based on the images taken by the eye-tracking cameras 150 and detect a heart rate based on the PPG waveform.

FIG. 2A illustrates a side view of a simplified example structure of a head-mounted device 200, which corresponds to the head-mounted device 100 of FIG. 1. As illustrated, the head-mounted device 200 includes one or more displays 220 and one or more eye-tracking cameras 210. The one or more displays 220 are configured to display images in front of one or more eyes 230 of a wearer, and the one or more eye-tracking cameras 210 are configured to track the movements of eye(s) 230 of the wearer. Notably, when the eye-tracking camera(s) 210 track movements of eye(s) 230 of the wearer, they are configured to capture images of area(s) of skin surrounding the eye(s) 230.

FIG. 2B illustrates a front view of the simplified example structure of the head-mounted device 200. As illustrated, the eyes 230 of the wearer and areas of skin 240 surrounding the eyes 230 are captured by the one or more eye-tracking cameras 210. The images taken by the eye-tracking cameras 210 can be used not only to track eye movements of the wearer but also to detect heart rates of the wearer.

The principles described herein are also related to training a machine-learned AI model for detecting heart rates based on a series of images taken by one or more eye-tracking cameras of a head-mounted device. Training the machine-learned AI model (also referred to as the AI model) includes providing a machine learning network configured to train an AI model based on images taken by eye-tracking cameras of head-mounted devices at a computing system. In some embodiments, the computing system is the head-mounted device 100, 200. In some embodiments, the computing system is a separate computing system that is different from the head-mounted device 100, 200.

During the training of the AI model, a plurality of series of images of one or more areas of skin around one or more eyes of the wearer is taken by the one or more eye-tracking cameras of the head-mounted device as training data. The plurality of series of images is then sent to a machine learning network to train a machine-learned AI model in a particular manner, such that the machine-learned AI model is trained to extract a PPG waveform from images taken by the eye-tracking cameras and detect a heart rate based on the PPG waveform.

In some embodiments, the machine learning network is an unsupervised network that trains the machine-learned AI model based on unlabeled image data. In some embodiments, the machine learning network is a supervised network that trains the AI model based on labeled image data. In some embodiments, the method further includes gathering a plurality of heart rate datasets via a heart rate monitor simultaneously when the plurality of series of images is gathered. The plurality of series of images is then labeled with the plurality of heart rate datasets. The plurality of series of images that are labeled with the plurality of heart rate datasets are then used as training data to train the machine-learned AI model.

In some embodiments, each image in each series of images includes a plurality of pixels. Each pixel corresponds to a color value. The method further includes, for each image in each series of images, computing an average value based on color values corresponding to a plurality of pixels in an image. The machine learning network is used to train the AI model configured to extract a PPG waveform based on average values of images in the plurality of series of images.

In some embodiments, the head-mounted device further includes an inertial measurement unit configured to detect the head motion of the wearer. A plurality of datasets associated with the head motion of the wearer is gathered by the inertial measurement unit. The plurality of datasets associated with the head motion of the wearer is also used as training data in training the AI model, such that the AI model is trained to cancel out at least a portion of noise artifacts generated by head motions.

In some embodiments, the head-mounted device further includes one or more displays configured to display images in front of one or more eyes of the wearer. A plurality of datasets associated with the images displayed by the one or more displays is also gathered. The plurality of datasets associated with the images displayed by the one or more displays is also used as training data in training the AI model, such that the AI model is trained to cancel out at least a portion of noise artifacts generated by the images displayed on the one or more displays.

FIGS. 3A-3B illustrate examples of embodiments 300A and 300B for training an AI model 370A, 370B. Referring to FIG. 3A or 3B, data generated by one or more eye-tracking camera(s) 310 of a head-mounted device 302 is used as training data. The training data is sent to a machine learning network 360A, 360B to train an AI model 370A configured to detect heart rate based on images captured by the eye-tracking camera(s) 310. In some embodiments, the machine learning network 360A, 360B is a machine learning neural network. In some embodiments, the machine learning network 360A, 360B is a machine learning convolutional neural network. As illustrated, the head-mounted device 302 corresponds to the head-mounted device 100, 200 of FIGS. 1 and 2A-2B. The eye-tracking camera(s) 310 of the head-mounted device 302 are configured to capture a plurality of series of images 312. Each series of images is taken continuously in a time period, such as 30 seconds, 60 seconds, etc.

In some embodiments, the machine learning network 360A, 360B is an unsupervised training network, which uses unlabeled images 312 to generate an AI model 370A, 370B. In some embodiments, the machine learning network 360A, 360B is a supervised training network, which uses a plurality of heartbeat datasets 352 generated by one or more heartbeat monitor(s) 350 as feedback. In some embodiments, the training method is an unsupervised signal processing method. In some embodiments, the plurality of heartbeat datasets 352 are generated while the eye-tracking camera(s) 310 are taking the plurality of series of images 312, and the plurality of series of images 312 are labeled or paired with the plurality of heartbeat datasets 352. The labeled dataset pairs, including the plurality of series of images 312 and the plurality of heartbeat datasets 352, are used as training data to train the AI model 370A, 370B.

Notably, each image taken by the eye-tracking camera(s) includes a plurality of pixels, and each of the plurality of pixels is represented by a value. In some embodiments, the values of the plurality of pixels in each image are averaged to generate a mean value of the image, and a series of images would result in a series of mean values. In some embodiments, the machine learning network 360A, 360B is configured to identify patterns based on a plurality of series of mean values corresponding to the plurality of series of images 312 to train an AI model with or without labeling the image data 312.

Such training methods and the trained model 370A, 370B might work sufficiently well when the wearer is sitting still and not moving. However, when the wearer is moving, such as playing a video game and/or watching a video, noise artifacts would be introduced due to the movement of the wearer and/or the display of the head-mounted device 302.

The principles described herein introduce several different solutions to solve the above-described problems. Referring to FIG. 3A, in some embodiments, data associated with head motion 324, 328, 334 and/or data associated with the display(s) 340 are also used as training data to train the AI model 370A. As illustrated, in some embodiments, data associated with head motion is obtained via an inertial measurement unit 320, which corresponds to the inertial measurement unit 160 of FIG. 1. The inertial measurement unit 320 includes at least one of one or more accelerometer(s) 322, one or more gyroscope(s) 326, one or more magnetometer(s) 332. As illustrated, the accelerometer(s) 322 are configured to generate a first plurality of motion datasets 324, the gyroscope(s) 326 are configured to generate a second plurality of motion datasets 328, and the magnetometer(s) 332 are configured to generate a third plurality of motion datasets 334. In some embodiments, the motion datasets 324, 328, 334 are used as training data (in addition to the images taken by the eye-tracking cameras 310) to train the AI model 370A, such that the AI model 370A is trained to cancel at least a portion of the noise artifacts generated by the head motion.

Additionally, when the wearer is watching a video or play a game, the images generated by the display 340 of the head-mounted device 302 may also create noise artifacts. In some embodiments, dataset 342 associated with display(s) 340 is also used as training data to train the AI model 370A, such that the AI model is also trained to cancel at least a portion of the noise artifacts generated by the display(s) 340.

In some cases, the noise artifacts generated by the head motion and/or display(s) 340 may be too much, such that the noise artifacts cause the images generated by the eye-tracking camera(s) 310 to be unusable for detecting heart rates. For example, when the wearer’s head is moving rapidly, or the display(s) 340 is displaying fast-moving images, such rapid head movement and fast-moving images cause the images captured by the eye-tracking camera(s) 310 to be too noisy to extract PPG waveforms.

To address the above-described problems, in some embodiments, a noise detector 390A is further implemented to determine whether the noise artifacts generated by the head motion and/or display(s) 340 are too significant. In other words, whether the images captured by the eye-tracking camera(s) 310 during a period of time are too noisy, or whether the period of time is too noisy. Different methods may be implemented to determine whether a given period of time is too noisy.

In some embodiments, in a given period of time, the datasets 324, 328, 334, and/or 342 captured by the inertial measurement unit 320 and/or the display(s) 340 are computed to generate one or more standard deviations. When at least one of the standard deviations is greater than a predetermined threshold, the noise detector 390A determines that the period is too noisy. When the period is determined to be too noisy, the data generated during that period is segmented out. In some embodiments, a segment selector 392 is implemented to segment out the data generated during periods that are too noisy. Such data includes the images 312 taken by the eye-tracking camera(s) 310, datasets 324, 328, 334 generated by the inertial measurement unit 320, and/or dataset 342 generated by the display(s) 340. As such, only the data generated during periods that are not too noisy is used as training data to train the AI model 370A.

FIG. 3B illustrates another embodiment for removing noise artifacts generated by head motion and display(s) 340. As illustrated in FIG. 3B, a signal processor 390B is implemented to process the data associated with the inertial measurement unit 320 and the display(s) 340. In some embodiments, the signal processor 390B is configured to determine one or more frequency bands of the noise artifacts associated with the inertial measurement unit 320 and the display(s) 340. Based on the detected noise, a noise filter 394 is generated. The noise filter 394 is configured to filter out the one or more frequency bands of the noise artifacts. In some embodiments, the noise filter 394 is applied to the plurality of series of images 312 generated by the eye-tracking camera(s) 310. In some embodiments, the noise filter 394 is sent to the machine learning network 360B to filter data processed or partially processed by the machine learning network 360B. In some embodiments, a segment selector (not shown) is also implemented after the signal processor 390B to segment out the training datasets generated during particular periods that are too noisy.

Once the AI model 370A, 370B is sufficiently trained, the AI model 370A, 370B is provided to a head-mounted device 100, 200, such that the head-mounted device 100, 200 can detect heart rates of a wearer based on images captured by the eye-tracking cameras 150, 210. In some embodiments, the AI model 370A, 370B is deployed onto the head-mounted device 100, 200. In some embodiments, the AI model 370A, 370B is provided as a cloud service, and the head-mounted device 100, 200 sends the images captured by the eye-tracking camera 150 to the cloud service, which in turn performs computations using the AI model 370A, 370B to determine the heart rates of the wearer.

FIG. 4A illustrates an example of an embodiment 400A, in which a machine-trained AI model 470A (corresponding to the AI model 370A) is provided to a head-mounted device 402 (corresponding to the head-mounted device 100, 200, and/or 302). The head-mounted device 402 includes one or more eye-tracking cameras 410 configured to capture a series of images of one or more areas of skin around the one or more eyes of a wearer. In some embodiments, the machine-learned AI model 470A is configured to generate a PPG waveform 480 and detect a heart rate 482 of the wearer based on the PPG waveform.

In some embodiments, the head-mounted device 402 also includes an inertial measurement unit 420, which includes at least one of one or more accelerometer(s) 422, one or more gyroscope(s) 426, and/or one or more magnetometer(s) 432. In some embodiments, data generated by the inertial measurement unit 420 is also input into the machine-learned AI model 470A, and the machine-learned AI model 470A is configured to cancel out at least a portion of noise artifacts associated with the head motion of the wearer based on the data generated by the inertial measurement unit 420 (i.e., datasets 424, 428, 434).

In some embodiments, the head-mounted device 402 also includes one or more displays 440, which displays one or more images in front of one or more eyes of the wearer. In some embodiments, data associated with images 412 displayed on the display(s) 440 is also input into the machine-learned AI model 470A, and the machine-learned AI model 470A is further configured to cancel out at least a portion of noise artifacts associated with the images displayed on the display(s) 440.

In some embodiments, a noise detector 490A (which corresponds to the noise detector 390A) and a segment selector 492 (which corresponds to the segment selector 392) are also provided with the AI model 470A to the head-mounted device 402. The noise detector 490A is configured to determine whether the head motion and/or the display(s) 440 are creating too much noise during a given period. If the period is too noisy, the segment selector 492 is configured to segment out the data generated during the period, such as images taken by the eye-tracking camera(s) 410, data generated by the inertial measurement unit 420, and/or images displayed at the display(s) 440 during the period. As such, only data generated during the periods that are not too noisy is input to the AI model 470A to generate the PPG waveform 480, which is then used to detect a heart rate.

FIG. 4B illustrates an example of an embodiment 400B, in which a machine-trained AI model 470B (corresponding to the AI model 370B) is provided to a head-mounted device 402 (corresponding to the head-mounted device 100, 200, and/or 302). As illustrated in FIG. 4B, the data generated by the inertial measurement unit 420 and the display(s) 440 is sent to a signal processor 490B. In some embodiments, the signal processor 490B is configured to process the data generated by the inertial measurement unit 420 and the display(s) 440 to obtain one or more frequency bands of the noise artifacts. Based on the one or more frequency bands of the noise artifacts, a noise filter 494 is generated to filter out the one or more frequency bands of the noise artifacts from the images captured from the eye-tracking camera(s) 410 or from the PPG waveform 480 generated by the AI model 470B. The PPG waveform 480 is then processed to detect a heart rate 482.

In some embodiments, the machine-learned model 470A, 470B can further be calibrated based on individual wearers using a separate heartbeat monitor 450. For example, the heartbeat monitor 450 may be a fitness tracker, a watch, and/or a medical heartbeat monitor configured to track the heartbeats of a wearer. The dataset 452 generated by the heartbeat monitor 450 can be sent to the machine-learned AI model 470A, 470B to further calibrate and improve the machine-learned model 470A, 470B base on individual wearers of the head-mounted device 402.

The following discussion now refers to a number of methods and method acts that may be performed. Although the method acts may be discussed in a certain order or illustrated in a flow chart as occurring in a particular order, no particular ordering is required unless specifically stated, or required because an act is dependent on another act being completed prior to the act being performed.

FIG. 5 illustrates a flowchart of an example method 500 for training an AI model for detecting heartbeat using a machine learning network. In some embodiments, the training of the AI model may be performed at a head-mounted device. Alternatively, the training of the AI model may be performed at a separate computing system or at a combination of the head-mounted device and the separate computing system. The method 500 includes taking a plurality of series of images of areas of skin around eye(s) of a wearer by one or more eye-tracking camera(s) of a head-mounted device (act 510). In some embodiments, the method 500 further includes gathering a plurality of data associated with the head motion of the wearer (act 520) and/or gathering a plurality of datasets associated with one or more display(s) (act 530).

In some embodiments, the machine learning network is an unsupervised learning network that uses unlabeled data as training data. Alternatively, the machine learning network is a supervised learning network that uses labeled data as training data. When a supervised learning network is used, the method 500 further includes gathering a plurality of heart rate datasets via a heart rate monitor (act 540) and labeling the plurality of series of images taken by the eye-tracking camera(s) with the heart rate datasets (act 550).

In some embodiments, method 500 further includes determining whether a given period is too noisy (act 560). If the period is too noisy, data gathered during the period is discarded (act 570). If the period is not too noisy, data gathered during the period is kept (act 580), and the kept data is sent to a machine learning network (which may be unsupervised or supervised) to train an AI model (act 590).

FIG. 6 illustrates a flowchart of an example of a method 600 for using a machine-learned AI model to detect a heart rate based on images taken by one or more eye-tracking camera(s) of a head-mounted device. The method 600 includes taking a series of images by one or more eye-tracking camera(s) (act 610). In some embodiments, the method 600 further includes gathering a dataset associated with head motion (act 620) and/or gathering a dataset associated with display(s) (act 630). In some embodiments, the method 600 further includes determining whether a given period is too noisy (act 640). If the period is determined to be too noisy, data gathered during the period is discarded (act 650); otherwise, data gathered during the period is kept (act 660). The kept data is then sent to a machine-trained AI model to extract a PPG waveform (act 670). Based on the PPG waveform, a heart rate is then detected (act 680).

FIG. 7 illustrates a flowchart of an example method 700 for determining whether a given period is noisy, and segmenting data associated with noisy periods, which corresponds to acts 560, 570, 580 of FIG. 5 or acts 640, 650, 660 of FIG. 6. The method 700 includes dividing a period of time into a plurality of (N) time windows (act 710), where N is a natural number. The period of time may be any length that is sufficient to identify a heart rate of a wearer, such as 30 seconds, 1 minute, etc. Each time window has a predetermined size, such as 0.5 seconds, 1 second, 2 seconds, etc. The method 700 also includes, for an nth time window, determining a standard deviation of values obtained from an inertial measurement unit during the nth time window (act 710), where n is a natural number and n<=N. For example, in some embodiments, initially, n=1, and a standard deviation of values obtained from an inertial measurement unit is determined for the first time window among the plurality of N time windows.

The method 700 further includes determining whether the standard deviation is greater than a predetermined threshold (act 720). If the answer is yes, it is determined that the time window is noisy (act 730), and data generated during the time window is segmented out (act 732). On the other hand, if the answer is no, it is determined that the time window is not too noisy (act 740), and data generated during the time window is kept (act 742) as input of the machine learning network 360A, 360B or input of machine-learned AI model 470A, 470B. Next, if n750). For example, after it is determined whether data generated during the first time window is to be segmented out or kept, a second time window (i.e., n=2) is considered, and acts 710, 720, 730, 732, (or 740, 742), and 750 are repeated again based on n=2. In some embodiments, the acts 710, 720, 730, 732, (or 740, 742), and 750 repeat as many times as needed until each of the plurality of N time windows are analyzed.

Finally, because the principles described herein may be performed in the context of a computing system (for example, each of the head-mounted device 100, machine learning network 360A, 360B may include or be implemented at one or more computing system) some introductory discussion of a computing system will be described with respect to FIG. 8.

Computing systems are now increasingly taking a wide variety of forms. Computing systems may, for example, be handheld devices, appliances, laptop computers, desktop computers, mainframes, distributed computing systems, data centers, or even devices that have not conventionally been considered a computing system, such as wearables (e.g., glasses). In this description and in the claims, the term “computing system” is defined broadly as including any device or system (or a combination thereof) that includes at least one physical and tangible processor, and a physical and tangible memory capable of having thereon computer-executable instructions that may be executed by a processor. The memory may take any form and may depend on the nature and form of the computing system. A computing system may be distributed over a network environment and may include multiple constituent computing systems.

As illustrated in FIG. 8, in its most basic configuration, a computing system 800 typically includes at least one processing unit 802 and memory 804. The at least one processing unit 802 may include a general-purpose processor and may also include a field-programmable gate array (FPGA), an application-specific integrated circuit (ASIC), or any other specialized circuit. The memory 804 may be physical system memory, which may be volatile, non-volatile, or some combination of the two. The term “memory” may also be used herein to refer to non-volatile mass storage such as physical storage media. If the computing system is distributed, the processing, memory and/or storage capability may be distributed as well.

The computing system 800 also has thereon multiple structures often referred to as an “executable component”. For instance, memory 804 of the computing system 800 is illustrated as including executable component 806. The term “executable component” is the name for a structure that is well understood to one of ordinary skill in the art in the field of computing as being a structure that can be software, hardware, or a combination thereof. For instance, when implemented in software, one of ordinary skill in the art would understand that the structure of an executable component may include software objects, routines, methods, and so forth, that may be executed on the computing system, whether such an executable component exists in the heap of a computing system, or whether the executable component exists on computer-readable storage media.

In such a case, one of ordinary skill in the art will recognize that the structure of the executable component exists on a computer-readable medium such that, when interpreted by one or more processors of a computing system (e.g., by a processor thread), the computing system is caused to perform a function. Such a structure may be computer-readable directly by the processors (as is the case if the executable component were binary). Alternatively, the structure may be structured to be interpretable and/or compiled (whether in a single stage or in multiple stages) so as to generate such binary that is directly interpretable by the processors. Such an understanding of example structures of an executable component is well within the understanding of one of ordinary skill in the art of computing when using the term “executable component”.

The term “executable component” is also well understood by one of ordinary skill as including structures, such as hardcoded or hardwired logic gates, that are implemented exclusively or near-exclusively in hardware, such as within a field-programmable gate array (FPGA), an application-specific integrated circuit (ASIC), or any other specialized circuit. Accordingly, the term “executable component” is a term for a structure that is well understood by those of ordinary skill in the art of computing, whether implemented in software, hardware, or a combination. In this description, the terms “component”, “agent”, “manager”, “service”, “engine”, “module”, “virtual machine” or the like may also be used. As used in this description and in the case, these terms (whether expressed with or without a modifying clause) are also intended to be synonymous with the term “executable component”, and thus also have a structure that is well understood by those of ordinary skill in the art of computing.

In the description above, embodiments are described with reference to acts that are performed by one or more computing systems. If such acts are implemented in software, one or more processors (of the associated computing system that performs the act) direct the operation of the computing system in response to having executed computer-executable instructions that constitute an executable component. For example, such computer-executable instructions may be embodied in one or more computer-readable media that form a computer program product. An example of such an operation involves the manipulation of data. If such acts are implemented exclusively or near-exclusively in hardware, such as within an FPGA or an ASIC, the computer-executable instructions may be hardcoded or hardwired logic gates. The computer-executable instructions (and the manipulated data) may be stored in the memory 804 of the computing system 800. Computing system 800 may also contain communication channels 808 that allow the computing system 800 to communicate with other computing systems over, for example, network 810.

While not all computing systems require a user interface, in some embodiments, the computing system 800 includes a user interface system 812 for use in interfacing with a user. The user interface system 812 may include output mechanisms 812A as well as input mechanisms 812B. The principles described herein are not limited to the output mechanisms 812A or input mechanisms 812B as such will depend on the nature of the device. However, output mechanisms 812A might include, for instance, speakers, displays, tactile output, holograms and so forth. Examples of input mechanisms 812B might include, for instance, microphones, touchscreens, holograms, cameras, keyboards, mouse or other pointer input, sensors of any type, and so forth.

Embodiments described herein may comprise or utilize a special purpose or general-purpose computing system including computer hardware, such as, for example, one or more processors and system memory, as discussed in greater detail below. Embodiments described herein also include physical and other computer-readable media for carrying or storing computer-executable instructions and/or data structures. Such computer-readable media can be any available media that can be accessed by a general-purpose or special purpose computing system. Computer-readable media that store computer-executable instructions are physical storage media. Computer-readable media that carry computer-executable instructions are transmission media. Thus, by way of example, and not limitation, embodiments of the invention can comprise at least two distinctly different kinds of computer-readable media: storage media and transmission media.

Computer-readable storage media includes RAM, ROM, EEPROM, CD-ROM, or other optical disk storage, magnetic disk storage, or other magnetic storage devices, or any other physical and tangible storage medium which can be used to store desired program code means in the form of computer-executable instructions or data structures and which can be accessed by a general-purpose or special purpose computing system.

A “network” is defined as one or more data links that enable the transport of electronic data between computing systems and/or modules and/or other electronic devices. When information is transferred or provided over a network or another communications connection (either hardwired, wireless, or a combination of hardwired or wireless) to a computing system, the computing system properly views the connection as a transmission medium. Transmissions media can include a network and/or data links which can be used to carry desired program code means in the form of computer-executable instructions or data structures and which can be accessed by a general-purpose or special-purpose computing system. Combinations of the above should also be included within the scope of computer-readable media.

Further, upon reaching various computing system components, program code means in the form of computer-executable instructions or data structures can be transferred automatically from transmission media to storage media (or vice versa). For example, computer-executable instructions or data structures received over a network or data link can be buffered in RANI within a network interface module (e.g., a “NIC”), and then eventually transferred to computing system RANI and/or to less volatile storage media at a computing system. Thus, it should be understood that storage media can be included in computing system components that also (or even primarily) utilize transmission media.

Computer-executable instructions comprise, for example, instructions and data which, when executed at a processor, cause a general-purpose computing system, special purpose computing system, or special purpose processing device to perform a certain function or group of functions. Alternatively, or in addition, the computer-executable instructions may configure the computing system to perform a certain function or group of functions. The computer executable instructions may be, for example, binaries or even instructions that undergo some translation (such as compilation) before direct execution by the processors, such as intermediate format instructions such as assembly language, or even source code.

Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the described features or acts described above. Rather, the described features and acts are disclosed as example forms of implementing the claims.

Those skilled in the art will appreciate that the invention may be practiced in network computing environments with many types of computing system configurations, including, personal computers, desktop computers, laptop computers, message processors, handheld devices, multi-processor systems, microprocessor-based or programmable consumer electronics, network PCs, minicomputers, mainframe computers, mobile telephones, PDAs, pagers, routers, switches, data centers, wearables (such as glasses) and the like. The invention may also be practiced in distributed system environments where local and remote computing system, which are linked (either by hardwired data links, wireless data links, or by a combination of hardwired and wireless data links) through a network, both perform tasks. In a distributed system environment, program modules may be located in both local and remote memory storage devices.

Those skilled in the art will also appreciate that the invention may be practiced in a cloud computing environment. Cloud computing environments may be distributed, although this is not required. When distributed, cloud computing environments may be distributed internationally within an organization and/or have components possessed across multiple organizations. In this description and the following claims, “cloud computing” is defined as a model for enabling on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services). The definition of “cloud computing” is not limited to any of the other numerous advantages that can be obtained from such a model when properly deployed.

The remaining figures may discuss various computing system which may correspond to the computing system 800 previously described. The computing systems of the remaining figures include various components or functional blocks that may implement the various embodiments disclosed herein as will be explained. The various components or functional blocks may be implemented on a local computing system or may be implemented on a distributed computing system that includes elements resident in the cloud or that implement aspect of cloud computing. The various components or functional blocks may be implemented as software, hardware, or a combination of software and hardware. The computing systems of the remaining figures may include more or less than the components illustrated in the figures, and some of the components may be combined as circumstances warrant. Although not necessarily illustrated, the various components of the computing systems may access and/or utilize a processor and memory, such as the at least one processing unit 802 and memory 804, as needed to perform their various functions.

For the processes and methods disclosed herein, the operations performed in the processes and methods may be implemented in differing order. Furthermore, the outlined operations are only provided as examples, and some of the operations may be optional, combined into fewer steps and operations, supplemented with further operations, or expanded into additional operations without detracting from the essence of the disclosed embodiments.

The present invention may be embodied in other specific forms without departing from its spirit or characteristics. The described embodiments are to be considered in all respects only as illustrative and not restrictive. The scope of the invention is, therefore, indicated by the appended claims rather than by the foregoing description. All changes which come within the meaning and range of equivalency of the claims are to be embraced within their scope.

文章《Microsoft Patent | Detecting heart rates using eye-tracking cameras》首发于Nweon Patent

]]> Microsoft Patent | Optical attenuation via switchable grating https://patent.nweon.com/25244 Wed, 30 Nov 2022 20:46:11 +0000 https://patent.nweon.com/?p=25244 ...

文章《Microsoft Patent | Optical attenuation via switchable grating》首发于Nweon Patent

]]>
Patent: Optical attenuation via switchable grating

Patent PDF: 加入映维网会员获取

Publication Number: 20220382054

Publication Date: 20221201

Assignee: Microsoft Technology Licensing, Llc (Redmond, Wa, Us)

Abstract

Examples are disclosed relating to tunable attenuation of incident light using a switchable grating. One example provides an optical attenuator comprising a switchable grating configured to diffract light within a wavelength band at a diffraction angle. The optical attenuator further comprises an electrode pair configured to apply a voltage across the switchable grating to tune a proportion of incident light diffracted at the diffraction angle, and an optical dump to receive the proportion of incident light diffracted.

Claims

1.An optical attenuator, comprising: a switchable grating configured to diffract light within a wavelength band at a diffraction angle; an electrode pair configured to apply a voltage across the switchable grating to tune a proportion of incident light diffracted at the diffraction angle; and an optical dump to receive the proportion of incident light diffracted.

Description

BACKGROUND

Display devices may raster scan one or more laser beams while controlling a brightness of each to display an image. In the case of a head-mounted display (HIVID) device, the delivery of such images using a transparent combiner placed in front of the eye allows for the display of augmented reality images in which virtual holograms appear to be mixed with real world objects.

SUMMARY

Examples are disclosed that relate to tunable attenuation of light using a switchable grating. One example provides an optical attenuator comprising a switchable grating configured to diffract light within a wavelength band at a diffraction angle. The optical attenuator further comprises an electrode pair configured to apply a voltage across the switchable grating to tune a proportion of incident light diffracted, and an optical dump to receive the proportion of incident light diffracted. Examples are also disclosed that relate to devices utilizing such attenuators, including display devices and other types of optical devices.

This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter. Furthermore, the claimed subject matter is not limited to implementations that solve any or all disadvantages noted in any part of this disclosure.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 shows an example wearable optical device.

FIG. 2 shows a block diagram of an example optical device including one or more light sources, an optical attenuator, and a waveguide.

FIG. 3 shows a cross sectional view of an example optical attenuator comprising a switchable grating and a control electrode pair to tune a proportion of incident light diffracted towards an optical dump.

FIG. 4 shows an example optical attenuator comprising three switchable gratings in a stacked arrangement.

FIG. 5 shows an example optical attenuator comprising three switchable gratings in a spatially demultiplexed arrangement.

FIG. 6 shows an optical attenuator comprising a waveguide to direct diffracted light via total internal reflection to an optical dump.

FIG. 7 shows an example attenuator comprising three switchable gratings in a stacked arrangement, and three corresponding waveguides to direct light diffracted by each grating to an optical dump.

FIG. 8 shows an example optical device in the form of a helmet visor comprising a switchable grating.

FIG. 9 shows another example optical device in the form of an airplane window comprising a switchable grating.

FIG. 10 shows another example optical device in the form of a camera comprising an optical attenuator configured to tune an amount of attenuated light that reaches an image sensor.

FIG. 11 is a flow diagram depicting an example method for attenuating an optical signal.

FIG. 12 is a block diagram of an example computing system.

DETAILED DESCRIPTION

As mentioned above, a display device, such as a head-mounted display (HMD), may utilize a scanning laser projector to raster scan an image for display. However, in some low-light environments, the display luminosity of such a display device may be too bright for some mixed-reality/augmented reality use cases. For example, when using a head-mounted display device in a low-light environment, a relatively bright display may cause a user’s pupil to constrict, thus making it difficult to see the surrounding environment. Similarly, a relatively low brightness display may be desirable for night photography, and for other low light uses.

FIG. 1 shows an example head-mounted display system 100 including a display device 102 positioned near a wearer’s eyes. Display device 102 includes left-eye and right-eye displays 104a, 104b each comprising see-through waveguide combiners positioned to display virtual imagery in front of a view of a real-world environment to enable augmented reality applications, such as the display of mixed reality imagery. In other examples a display device may include a single display extending over one or both eyes, rather than separate right and left eye displays. Display device 102 includes an image producing system, for example a laser scanner, a liquid crystal on silicon (LCoS) microdisplay, a transmissive liquid crystal microdisplay, or digital micromirror device (DMD), to produce images for display.

As described above, in some low-light environments, it may be advantageous to lower the brightness of display device 102. However, achieving suitable low brightness operation may be pose various challenges. For example, one possible solution is to dim a laser by modulating current supplied to drive the laser.

However, it may be difficult to dim the laser beyond a threshold in this manner, as laser operation at such low current levels may be unstable.

Instead of controlling the laser brightness directly by modulating current provided to the laser, various mechanical devices may be used for optical attenuation, such as devices that utilize voice coils, piezoelectric actuators, MEMS (micro-electromechanical systems) devices, or mechanical actuators. However, these devices may be slow, bulky, and/or power hungry, and may lead to unsuitable insertion losses. As another option, fiber pigtailed attenuation with integrated variable optical attenuators may be used. However, the pigtailing of each laser may be bulky, and add costs and insertion losses to the system. Such costs may be multiplied for full-color systems that utilize red, green and blue lasers.

Accordingly, examples are disclosed that relate to attenuation of light using a solid-state wavelength-selective attenuator system comprising a switchable grating that diffracts a tunable proportion of light toward an optical dump (e.g. a suitable absorber, reflector or an optical path exiting the device) based on an applied voltage. The remaining proportion of light passes through the attenuator, thereby delivering attenuated light to the intended destination. The disclosed example attenuator systems may provide high repeatability, fast response times, power efficiency, wavelength selectivity, and compact size, and avoid unsuitable insertion losses. Further, the disclosed systems may achieve high dynamic range with regard to diffraction efficiency, allowing the tunable attenuation of a proportion of incident light between 1% and 99% in some examples. Additionally, as the switchable grating can have a planar configuration, an optical attenuator according to the disclosed examples may be compactly incorporated into different devices, some examples of which are described below.

FIG. 2 shows a block diagram of an example display device 200 comprising an optical attenuator. Display device 200 includes light sources 202a, 202b, 202c, which each may be configured to output coherent, collimated light. Beam combiner 204 combines light from light sources 202ac and directs the combined light to optical attenuator 206. Light from optical attenuator 206 is directed to scanner 208, which forms an image (e.g., MEMS device for raster scanning an image) by raster scanning the light received from the attenuator. The light is directed to waveguide 210 and output to an eyebox 212 for viewing by a user 214. In the example of FIG. 2, three light sources are shown, but any other suitable number of light sources can be used in other examples. It will be understood that various optical components not shown in FIG. 2 may be included in an optical system, and that various components shown in FIG. 2 may be omitted in some examples.

As described in more detail below, the optical attenuator comprises one or more switchable gratings that are tunable to controllably attenuate the brightness of the image output to eyebox 212. The terms “switchable” and the like as used herein indicate that the grating can be operated to selectively diffract a proportion of incident light toward an optical dump, while the terms “tunable” and the like as used herein indicate that the proportion of light diffracted is controllable over a range. In some examples, the display brightness may be tuned to output between 1% and 99% of light incident on the attenuator. Further, in some examples, an even larger dynamic range of luminosities may be achieved by using different attenuation methods for different luminosity ranges. As one such example, a display comprising a full brightness of 1000 nits may be dimmed to a first threshold using laser power control (e.g. 100 nits), and then as low as one nit using an optical attenuator according to the disclosed examples. The disclosed example attenuators further may have fast response times, transitioning from non-diffractive to diffractive in 10-20 microseconds (μs) in some examples.

In some examples, optical attenuator 206 may comprise a plurality of wavelength-selective switchable gratings to selectively attenuate light of different wavelength bands. For example, light sources 202a, 202b, 202c may be configured to output red, green, and blue light, respectively. As such, optical attenuator 206 may comprise respective switchable gratings to selectively attenuate the red, green, and blue light.

FIG. 3 shows an example optical attenuator 300 comprising a switchable grating 302 positioned between a cover plate 304 and a substrate 306. A control electrode pair comprising a first electrode 320 positioned between switchable grating 302 and cover plate 304, and a second electrode 322 positioned between switchable grating 302 and substrate 306, is controllable to apply a voltage across the switchable grating 302. It will be understood that the terms “substrate” and “cover plate” are not intended to imply any particular orientation for optical attenuator 300. Switchable grating 302 is configured to diffract light within a wavelength band at a diffraction angle. Likewise, the term “electrode pair” includes electrode configurations having two or more electrodes. As shown in FIG. 3, incident light 308 is received at switchable grating 302, and a proportion of the incident light is diffracted at angle 310. Diffracted light 312 is directed to an optical dump 314, while a remaining proportion of the incident light is not diffracted. As such, light 316 is attenuated relative to incident light 308.

The diffraction efficiency of switchable grating 302 is tunable based on the voltage applied across the grating. Switchable grating 302 may comprise a polymer dispersed liquid crystal (PDLC) grating. The diffraction efficiency of a PDLC switchable grating 302 decreases with increasing electric field. In the absence of an electric field across a PDLC, the incident light may be diffracted with a relatively high diffraction efficiency. As an applied voltage increases, a proportion of light diffracted decreases. As such, applying a voltage across switchable grating 302 tunes the diffraction efficiency, and thus the proportion of diffracted light 312. In some examples switchable grating 302 may be tuned to diffract between 1% and 99% of incident light. In other examples, a switchable grating may have any other suitable dynamic range. Cover plate 304, substrate 306, first electrode 320, and second electrode 322 each may comprise any suitable material. For example, the cover plate 304 and substrate 306 each can be formed from a material that is transparent to wavelengths of interest, including visibly transparent oxides and polymers for visible light applications, and/or materials transparent to other wavelengths (e.g. ultraviolet, infrared) for other applications. Likewise, the electrodes can be formed from a transparent conductor such as indium tin oxide.

FIG. 4 shows an example optical attenuator 400 comprising three switchable gratings 402ac arranged in a stacked configuration. A proportion of incident light 404 is diffracted at each grating toward an optical dump 406. In this example, each of the switchable gratings is configured to diffract light in a different wavelength band. For example, switchable grating 402a may be configured to diffract light in a blue wavelength band, switchable grating 402b may be configured to diffract light in a green wavelength band, and switchable grating 402c may be configured to diffract light in a red wavelength band. In other examples, optical attenuator 400 may comprise a different number of switchable gratings each of which may be configured to diffract light within any suitable wavelength band.

At each switchable grating 402ac, the proportion of diffracted light is tunable based on an applied voltage across the grating. Electrode 408a applies a voltage across switchable grating 402a relative to a common electrode 422 to tune the proportion of incident light in the first wavelength band that is diffracted. Similarly, electrode 408b applies a voltage across switchable grating 402b relative to the common electrode 422 to tune a proportion of incident light in the second wavelength band that is diffracted. Further, electrode 408c applies a voltage across switchable grating 402c relative to the common electrode 422 to tune a proportion of incident light in the third wavelength band that is diffracted. As such, by controlling the voltages applied across switchable gratings 402ac, a proportion of incident light diffracted at each of the three wavelength bands can be tuned to thereby attenuate incident light 404 with wavelength selectivity. In the example depicted, the three diffraction angles are different such that the diffracted light is directed to similar locations on the optical dump. In other examples, the diffraction angles may be equivalent, and the light of different colors may be diffracted toward different portions of an optical dump (which may be separate from one another or part of a contiguous structure).

Optical attenuator 400 further comprises a substrate 410, inter-grating layers 412ab located between switchable gratings, and cover plate 414. In some examples, substrate 410 may comprise a relatively greater thickness (e.g., 150-250 μm), while inter-grating layers 412ab and cover plate 414 may comprise a relatively lesser thickness (e.g., 25-75 μm). In other examples, these components may have any other suitable thicknesses. Inter-grating layers 412ab may be formed from any suitable optically transparent dielectric materials, such as various oxides and polymers. In other examples, optical attenuator 400 may comprise one, two, four, or a greater number of switchable gratings.

FIG. 5 shows an optical attenuator 500 comprising three switchable gratings 502ac in a spatially demultiplexed arrangement. Each switchable grating 502ac is configured to diffract light within a different wavelength band. At each switchable grating 502ac, incident light 504ac is diffracted at diffraction angle 506ac, and the corresponding proportion of diffracted light 508ac is directed to an optical dump 510ac. The remaining proportion of light that is not diffracted (i.e., attenuated light 512ac) may be combined using beam combiners 514ac for output.

FIG. 6 shows an example optical attenuator 600 comprising a waveguide 602 configured to direct diffracted light to an optical dump via total internal reflection (TIR). In the example shown, incident light 604 is diffracted at switchable grating 606 at a sufficiently high diffraction angle that diffracted light 608a, 608b is coupled into waveguide 602 and propagates through waveguide 602 (TIR) to optical dump 610. The remaining proportion of light that is not diffracted is output as attenuated light 612. As described above, the proportion of light diffracted at switchable grating 606 is tunable based on a voltage applied across switchable grating 606.

FIG. 7 shows another example optical attenuator 700 comprising three switchable gratings 702ac and three corresponding waveguides 704ac in a stacked arrangement. Similar to the example described in FIG. 4, each of the three switchable gratings 702ac is configured to diffract light in a different wavelength band. For example, incident light 706 may comprise light in a first, second, and third wavelength bands (e.g., red, green, and blue light). Light in the first wavelength band is diffracted at switchable grating 702a and propagates through waveguide 704a via TIR (dotted lines) to optical dump 710. Light in the second wavelength band is diffracted at switchable grating 702b and propagates via TIR through waveguide 704b to the optical dump. Likewise, light in the third wavelength band is diffracted at switchable grating 702c and propagates through waveguide 704c to the optical dump. As described above, the proportion of light diffracted at each switchable grating 702ac is tunable based on a voltage applied across the switchable grating.

A variety of devices may incorporate optical attenuators according to the present disclosure for various different purposes. For example, in addition to controlling a brightness of a displayed image, an optical attenuator in a display device additionally or alternatively may be used to perform color correction. As a more specific example, due to defects and inconsistencies in waveguide manufacturing, a population of display devices may suffer from color non-uniformity. As such, an optical attenuator may be used to correct for color-nonuniformity in display devices within the population via wavelength-selective attenuation.

As another example, an optical attenuator may be used with telecommunications laser devices to modulate different colors of light to encode communications signals. Further, in such an example, a switchable tunable grating according to the present disclosure may be used to demultiplex the different wavelengths of light on a receiving end, such as by diffracting each signal-carrying wavelength in a combined beam into a separate waveguide or otherwise toward separate detectors.

As yet another example, an optical attenuator as disclosed may be used as in a helmet visor, as illustrated by example helmet 800 of FIG. 8. Such a helmet may be used, for example, in military applications. Viewing window 801 of helmet 800 can include an integrated an optical attenuator including a switchable grating 802, substrate/cover plate layers, transparent electrodes, and a waveguide 804 configured to receive light from the switchable grating to deliver the light to an optical dump. Such a tunable attenuator can be used to selectively protect against lasers, as an example. A controller may tune the diffraction efficiency of switchable grating 802 by applying a voltage across the grating, thereby diffracting a proportion of incident light in a wavelength band through the waveguide to an optical dump.

As another example, an optical attenuator as disclosed herein may be integrated into a cockpit window of an aircraft. Malicious actors have been known to attack airplanes with high power-lasers. Thus, an optical attenuator incorporated into an airplane window may provide protection against such attacks. FIG. 9 shows an example optical device in the form of an airplane viewing window 900. As shown in the inset, airplane viewing window 900 comprises a switchable grating 902 to diffract a proportion of incoming light into a waveguide 904. The proportion of incident light diffracted propagates through waveguide 904 to an optical dump located at the edges of the window. In some examples, the switchable grating 902 is configured to diffract wavelengths corresponding to laser light that is known to pose dangers to pilots. The diffraction efficiency of switchable grating 902 is tunable based on a voltage applied across the grating. As such, airplane viewing window 900 may be controlled to have a higher diffraction efficiency in environments with a greater risk of laser attacks (e.g. closer to the ground, closer to airports), and a lower diffraction efficiency in other relatively safer environments (e.g. at higher altitudes). In other examples, an optical attenuation system comprising a switchable grating may be incorporated into viewing windows of other vehicles (e.g., automobile, helicopter, etc.).

FIG. 10 schematically shows an example camera 1000 including an optical attenuator 1002. Optical attenuator 1002 may be configured to provide for color balancing, dynamic color corrections, and full bit color depth for camera 1000. Optical attenuator 1002 comprises switchable gratings 1004ac, each configured to diffract incident light in a different wavelength band (e.g., red, green, and blue light). Camera 1000 further comprises a lens 1006 configured to focus light 1008 from a scene onto an image sensor 1010. Image sensor 1010 comprises a plurality of pixels, each pixel configured to measure an intensity of light received and output a corresponding value in bits.

In some examples, image sensor 1010 may be more sensitive to some colors of light than others. In this case, controller 1016 may control the voltage across each switchable grating 1004ac to tune a proportion of incident light that is diffracted at the grating, thereby attenuating the light received at image sensor 1010 with wavelength selectivity. As such, camera 1000 may achieve color balance. Controller 1016 may control optical attenuator 1002 based on pixel sensor data from image sensor 1010, for example.

Optical attenuator 1002 may also be used to in color correction. For example, different environments may have different relative intensities of light across the visible spectrum (e.g., outdoor sunlight versus indoor neon light). Software may be employed to perform color corrections via gamma correction. However, this may lead to loss of bit color depth. In such cases, controller 1016 controls optical attenuator 1002 to perform wavelength selective optical attenuation by controlling each switchable grating 1004ac to diffract a different proportion of incident light in the corresponding wavelength band. By attenuating different color light by different proportions, optical attenuator 1002 may perform color correction before the light is sensed at image sensor 1010, and help avoid performing gamma correction (with loss of bit color depth) after the light is sensed. Such wavelength selective attenuation may help preserve bit color depth and provide more vibrant color images in various lighting environments.

FIG. 11 is a flow diagram depicting an example method 1100 for attenuating light using a switchable grating. At 1102, method 1100 comprises receiving incident light at a switchable grating configured to diffract light within a wavelength band at a diffraction angle with a voltage-dependent diffraction efficiency. In some examples, the incident light is received from an image producing element of a display device. In other examples, the light is received from the environment. In still further examples, the light is received from a telecom device. In some examples, the incident light comprises light in one or more relatively narrow wavelength band, such as red, green, and/or blue laser light. In other examples, the incident light comprises light in a relatively broad wavelength band of IR, visible, and/or UV light.

At 1104, the method further comprises applying a voltage across the switchable grating to tune a proportion of incident light within the wavelength band diffracted at the diffraction angle and diffract a proportion of the incident light to an optical dump. In some examples, at 1106, the method comprises tuning the diffraction intensity to output a proportion of incident light between 1% and 99%. In some examples, at 1108 the method further comprises applying a voltage across a second switchable grating to tune a proportion of incident light within a second wavelength band diffracted at a second diffraction angle. Further, in some such examples, the method further comprises applying a voltage across a third switchable grating to tune a proportion of incident light within a third wavelength band. In such examples, the first, second, and third wavelength bands may correspond to red, blue and green light. In other examples, two or fewer switchable gratings may be used, or four or more switchable gratings may be used to attenuate light with wavelength-selectivity for a corresponding four or more colors.

In some examples, at 1112, the light is diffracted into a waveguide and propagates via total internal reflection to the optical dump. In some examples when 1108 is included, the method comprises, at 1114, diffracting the proportion of incident light within the second wavelength band to the optical dump.

Method 1100 further comprises, at 1116, outputting a proportion of incident light not diffracted. In some examples, at 1118, the output light is used for displaying an image. In other examples, the method comprises outputting attenuated light to an image sensor, or into a telecommunications fiber optic channel, as examples.

In some embodiments, the methods and processes described herein may be tied to a computing system of one or more computing devices. In particular, such methods and processes may be implemented as a computer-application program or service, an application-programming interface (API), a library, and/or other computer-program product.

FIG. 12 schematically shows a non-limiting embodiment of a computing system 1200 that can enact one or more of the methods and processes described above. Computing system 1200 is shown in simplified form. Computing system 1200 may take the form of one or more personal computers, server computers, tablet computers, home-entertainment computers, network computing devices, gaming devices, mobile computing devices, mobile communication devices (e.g., smart phone), and/or other computing devices. Computing system 1200 can represent the computing system of any of the optical devices described herein.

Computing system 1200 includes a logic machine 1202 and a storage machine 1204. Computing system 1200 may optionally include a display subsystem 1206, input subsystem 1208, communication subsystem 1210, and/or other components not shown in FIG. 12.

Logic machine 1202 includes one or more physical devices configured to execute instructions. For example, the logic machine may be configured to execute instructions that are part of one or more applications, services, programs, routines, libraries, objects, components, data structures, or other logical constructs. Such instructions may be implemented to perform a task, implement a data type, transform the state of one or more components, achieve a technical effect, or otherwise arrive at a desired result.

The logic machine may include one or more processors configured to execute software instructions. Additionally or alternatively, the logic machine may include one or more hardware or firmware logic machines configured to execute hardware or firmware instructions. Processors of the logic machine may be single-core or multi-core, and the instructions executed thereon may be configured for sequential, parallel, and/or distributed processing. Individual components of the logic machine optionally may be distributed among two or more separate devices, which may be remotely located and/or configured for coordinated processing. Aspects of the logic machine may be virtualized and executed by remotely accessible, networked computing devices configured in a cloud-computing configuration.

Storage machine 1204 includes one or more physical devices configured to hold instructions executable by the logic machine to implement the methods and processes described herein. When such methods and processes are implemented, the state of storage machine 1204 may be transformed—e.g., to hold different data.

Storage machine 1204 may include removable and/or built-in devices. Storage machine 1204 may include optical memory (e.g., CD, DVD, HD-DVD, Blu-Ray Disc, etc.), semiconductor memory (e.g., RAM, EPROM, EEPROM, etc.), and/or magnetic memory (e.g., hard-disk drive, floppy-disk drive, tape drive, MRAM, etc.), among others. Storage machine 1204 may include volatile, nonvolatile, dynamic, static, read/write, read-only, random-access, sequential-access, location-addressable, file-addressable, and/or content-addressable devices.

It will be appreciated that storage machine 1204 includes one or more physical devices. However, aspects of the instructions described herein alternatively may be propagated by a communication medium (e.g., an electromagnetic signal, an optical signal, etc.) that is not held by a physical device for a finite duration.

Aspects of logic machine 1202 and storage machine 1204 may be integrated together into one or more hardware-logic components. Such hardware-logic components may include field-programmable gate arrays (FPGAs), program- and application-specific integrated circuits (PASIC/ASICs), program- and application-specific standard products (PSSP/ASSPs), system-on-a-chip (SOC), and complex programmable logic devices (CPLDs), for example.

When included, display subsystem 1206 may be used to present a visual representation of data held by storage machine 1204. This visual representation may take the form of a graphical user interface (GUI). As the herein described methods and processes change the data held by the storage machine, and thus transform the state of the storage machine, the state of display subsystem 1206 may likewise be transformed to visually represent changes in the underlying data. Display subsystem 1206 may include one or more display devices utilizing virtually any type of technology. For example, display subsystem may include a microdisplay 1212 or a laser scanner 1214. Display subsystem may also include an optical attenuator 1216, which may be controlled by logic machine 1202 to perform wavelength-selective optical attenuation according to the method described herein. Such display devices may be combined with logic machine 1202 and/or storage machine 1204 in a shared enclosure, or such display devices may be peripheral display devices.

When included, input subsystem 1208 may comprise or interface with one or more user-input devices such as a keyboard, mouse, touch screen, or game controller. In some embodiments, the input subsystem may comprise or interface with selected natural user input (NUI) componentry. Such componentry may be integrated or peripheral, and the transduction and/or processing of input actions may be handled on- or off-board. Example NUI componentry may include a microphone for speech and/or voice recognition; an infrared, color, stereoscopic, and/or depth camera for machine vision and/or gesture recognition; a head tracker, eye tracker, accelerometer, and/or gyroscope for motion detection and/or intent recognition; as well as electric-field sensing componentry for assessing brain activity.

When included, communication subsystem 1210 may be configured to communicatively couple computing system 1200 with one or more other computing devices. Communication subsystem 1210 may include wired and/or wireless communication devices compatible with one or more different communication protocols. As non-limiting examples, the communication subsystem may be configured for communication via a wireless telephone network, or a wired or wireless local- or wide-area network. In some embodiments, the communication subsystem may allow computing system 1200 to send and/or receive messages to and/or from other devices via a network such as the Internet.

Another example provides an optical attenuator, comprising a switchable grating configured to diffract light within a wavelength band at a diffraction angle, an electrode pair configured to apply a voltage across the switchable grating to tune a proportion of incident light diffracted at the diffraction angle, and an optical dump to receive the proportion of incident light diffracted. In some such examples, the switchable grating is configured such that the proportion of incident light diffracted decreases with increasing applied voltage. In some such examples, the switchable grating alternatively or additionally comprises a polymer dispersed liquid crystal grating. In some such examples, the switchable grating is a first switchable grating, the optical band is a first optical band, and wherein the optical attenuator alternatively or additionally further comprises a second switchable grating configured to diffract light within a second wavelength band, and a second electrode pair configured to apply a voltage across the second switchable grating to tune a second proportion of incident light diffracted toward the optical dump. In some such examples, the first switchable grating and second switchable grating alternatively or additionally are arranged in a stacked arrangement. In some such examples, the proportion of incident light alternatively or additionally is diffracted into a waveguide and propagates via total internal reflection to the optical dump. In some such examples, the proportion of diffracted light alternatively or additionally is tunable within a range of between 1% and 99% of light incident on the optical attenuator. In some such examples, the switchable grating is a first switchable grating configured to diffract light within a red wavelength band at a first diffraction angle, the proportion of incident light is a proportion of incident red light, and the optical attenuator alternatively or additionally further comprises a second switchable grating configured to diffract light within a green wavelength band at a second diffraction angle, and a third switchable grating configured to diffract light within a blue wavelength band at a third diffraction angle.

Another example provides a method for attenuating light, the method comprising receiving incident light at a switchable grating configured to diffract light within a wavelength band at a diffraction angle with a voltage-dependent diffraction efficiency, applying a voltage across the switchable grating to tune a proportion of incident light within the wavelength band diffracted at the diffraction angle, and diffracting a proportion of the incident light at the diffraction angle to an optical dump. In some examples, diffracting the proportion of incident light alternatively or additionally comprises diffracting the proportion of incident light into a waveguide and propagating the light via total internal reflection to the optical dump. In some such examples, applying a voltage across the switchable grating alternatively or additionally comprises tuning a remaining proportion of incident light not diffracted to between 1% and 99%. In some such examples, the switchable grating is a first switchable grating and the wavelength band is a first wavelength band, and the method alternatively or additionally further comprises applying a voltage across a second switchable grating to tune a proportion of incident light within a second wavelength band diffracted at a second diffraction angle, and diffracting a second proportion of the incident light within the second wavelength band to the optical dump.

Another example provides an optical device, comprising one or more optical attenuators, each comprising one or more switchable gratings configured to diffract incident light at a diffraction angle, and a controller configured to, for each switchable grating, control the switchable grating to, based on a voltage applied across the switchable grating, tune a proportion of the incident light that is diffracted. In some such examples, the optical device comprises one or more light sources configured to output the light at the one or more wavelength bands, the light at a wavelength band directed to a corresponding switchable grating at the wavelength band. In some such examples, the one or more optical attenuators alternatively or additionally comprises a plurality of switchable gratings in a stacked arrangement. In some such examples, the one or more optical attenuators comprises a plurality of optical attenuators that are spatially demultiplexed. In some such examples, the proportion of incident light is diffracted into a waveguide and propagates via total internal reflection to an optical dump. In some such examples, the optical device alternatively or additionally comprises a viewing window comprising the one or more switchable gratings. In some such examples, the optical device alternatively or additionally comprises a laser scanner configured to output a scanned beam image. In some such examples, the optical device alternatively or additionally comprises a camera, and wherein the one or more optical attenuators are positioned upstream of an image sensor to attenuate light received by the image sensor.

It will be understood that the configurations and/or approaches described herein are exemplary in nature, and that these specific embodiments or examples are not to be considered in a limiting sense, because numerous variations are possible. The specific routines or methods described herein may represent one or more of any number of processing strategies. As such, various acts illustrated and/or described may be performed in the sequence illustrated and/or described, in other sequences, in parallel, or omitted. Likewise, the order of the above-described processes may be changed.

The subject matter of the present disclosure includes all novel and non-obvious combinations and sub-combinations of the various processes, systems and configurations, and other features, functions, acts, and/or properties disclosed herein, as well as any and all equivalents thereof.

文章《Microsoft Patent | Optical attenuation via switchable grating》首发于Nweon Patent

]]>
Microsoft Patent | Systems and methods for power efficient image acquisition using single photon avalanche diodes (spads) https://patent.nweon.com/25242 Wed, 30 Nov 2022 20:46:04 +0000 https://patent.nweon.com/?p=25242 ...

文章《Microsoft Patent | Systems and methods for power efficient image acquisition using single photon avalanche diodes (spads)》首发于Nweon Patent

]]>
Patent: Systems and methods for power efficient image acquisition using single photon avalanche diodes (spads)

Patent PDF: 加入映维网会员获取

Publication Number: 20220382056

Publication Date: 20221201

Assignee: Microsoft Technology Licensing, Llc (Redmond, Wa, Us)

Abstract

A system for power efficient image acquisition is configurable to capture, using an image sensor, a plurality of partial image frames including at least a first partial image frame and a second partial image frame. The first partial image frame is captured at a first timepoint using a first subset of image sensing pixels of the plurality of image sensing pixels of the image sensor. The second partial image frame is captured at a second timepoint using a second subset of image sensing pixels of the plurality of image sensing pixels of the image sensor. The second subset of image sensing pixels includes different image sensing pixels than the first subset of image sensing pixels, and the second timepoint is temporally subsequent to the first timepoint. The system is configurable to generate a composite image frame based on the plurality of partial image frames.

Claims

We claim:

Description

BACKGROUND

Mixed-reality (MR) systems, including virtual-reality and augmented-reality systems, have received significant attention because of their ability to create truly unique experiences for their users. For reference, conventional virtual-reality (VR) systems create a completely immersive experience by restricting their users’ views to only a virtual environment. This is often achieved, in VR systems, through the use of a head-mounted device (HMD) that completely blocks any view of the real world. As a result, a user is entirely immersed within the virtual environment. In contrast, conventional augmented-reality (AR) systems create an augmented-reality experience by visually presenting virtual objects that are placed in or that interact with the real world.

As used herein, VR and AR systems are described and referenced interchangeably. Unless stated otherwise, the descriptions herein apply equally to all types of mixed-reality systems, which (as detailed above) includes AR systems, VR reality systems, and/or any other similar system capable of displaying virtual objects.

Some MR systems include one or more cameras for facilitating image capture, video capture, and/or other functions. For instance, cameras of an MR system may utilize images and/or depth information obtained using the camera(s) to provide pass-through views of a user’s environment to the user. An MR system may provide pass-through views in various ways. For example, an MR system may present raw images captured by the camera(s) of the MR system to a user. In other instances, an MR system may modify and/or reproject captured image data to correspond to the perspective of a user’s eye to generate pass-through views. An MR system may modify and/or reproject captured image data to generate a pass-through view using depth information for the captured environment obtained by the MR system (e.g., using a depth system of the MR system, such as a time-of-flight camera, a rangefinder, stereoscopic depth cameras, etc.). In some instances, an MR system utilizes one or more predefined depth values to generate pass-through views (e.g., by performing planar reprojection).

In some instances, pass-through views generated by modifying and/or reprojecting captured image data may at least partially correct for differences in perspective brought about by the physical separation between a user’s eyes and the camera(s) of the MR system (known as the “parallax problem,” “parallax error,” or, simply “parallax”). Such pass-through views/images may be referred to as “parallax-corrected pass-through” views/images. By way of illustration, parallax-corrected pass-through images may appear to a user as though they were captured by cameras that are co-located with the user’s eyes.

A pass-through view can aid users in avoiding disorientation and/or safety hazards when transitioning into and/or navigating within a mixed-reality environment. Pass-through views may also enhance user views in low visibility environments. For example, mixed-reality systems configured with long wavelength thermal imaging cameras may facilitate visibility in smoke, haze, fog, and/or dust. Likewise, mixed-reality systems configured with low light imaging cameras facilitate visibility in dark environments where the ambient light level is below the level required for human vision.

To facilitate imaging of an environment for generating a pass-through view, some MR systems include image sensors that utilize complementary metal-oxide-semiconductor (CMOS) and/or charge-coupled device (CCD) technology. For example, such technologies may include image sensing pixel arrays where each pixel is configured to generate electron-hole pairs in response to detected photons. The electrons may become stored in per-pixel capacitors, and the charge stored in the capacitors may be read out to provide image data (e.g., by converting the stored charge to a voltage).

However, such image sensors suffer from a number of shortcomings. For example, the signal to noise ratio for a conventional image sensor may be highly affected by read noise, especially when imaging under low visibility conditions. For instance, under low light imaging conditions (e.g., where ambient light is below about 10 lux, such as within a range of about 1 millilux or below), a CMOS or CCD imaging pixel may detect only a small number of photons, which may cause the read noise to approach or exceed the signal detected by the imaging pixel and decrease the signal-to-noise ratio.

The dominance of read noise in a signal detected by a CMOS or CCD image sensor is often exacerbated when imaging at a high frame rate under low light conditions. Although a lower framerate may be used to allow a CMOS or CCD sensor to detect enough photons to allow the signal to avoid being dominated by read noise, utilizing a low framerate often leads to motion blur in captured images. Motion blur is especially problematic when imaging is performed on an HMD or other device that undergoes regular motion during use.

In addition to affecting pass-through imaging, the read noise and/or motion blur associated with conventional image sensors may also affect other operations performed by HMDs, such as late stage reprojection, rolling shutter corrections, object tracking (e.g., hand tracking), surface reconstruction, semantic labeling, 3D reconstruction of objects, and/or others.

To address shortcomings associated with CMOS and/or CCD image sensors, devices have emerged that utilize single photon avalanche diode (SPAD) image sensors. In contrast with conventional CMOS or CCD sensors, a SPAD is operated at a bias voltage that enables the SPAD to detect a single photon. Upon detecting a single photon, an electron-hole pair is formed, and the electron is accelerated across a high electric field, causing avalanche multiplication (e.g., generating additional electron-hole pairs). Thus, each detected photon may trigger an avalanche event. A SPAD may operate in a gated manner (each gate corresponding to a separate shutter operation), where each gated shutter operation may be configured to result in a binary output. The binary output may comprise a “1” where an avalanche event was detected during an exposure (e.g., where a photon was detected), or a “0” where no avalanche event was detected.

Separate shutter operations may be performed consecutively and integrated over a frame capture time period. The binary output of the consecutive shutter operations over a frame capture time period may be counted, and an intensity value may be calculated based on the counted binary output.

An array of SPADs may form an image sensor, with each SPAD forming a separate pixel in the SPAD array. To capture an image of an environment, each SPAD pixel may detect avalanche events and provide binary output for consecutive shutter operations in the manner described herein. The per-pixel binary output of consecutive shutter operations over a frame capture time period may be counted, and per-pixel intensity values may be calculated based on the counted per-pixel binary output. The per-pixel intensity values may be used to form an intensity image of an environment.

SPAD sensors show promise for overcoming various shortcomings associated with CMOS or CCD sensors, particularly for image acquisition under low light conditions. However, implementing SPAD sensors for image and/or video capture is still associated with many challenges. For example, each avalanche event of a SPAD pixel of a SPAD sensor consumes power. Thus, SPAD sensors imaging under low light conditions (where the SPADs detect fewer photons and therefore experiences fewer avalanche events) consume less power than SPAD sensors imaging under illuminated conditions. For instance, a SPAD sensor operating in a low light environment may consume about 150-200 milliwatts of power, whereas a SPAD sensor operating in an illuminated environment may consume about 700-800 milliwatts of power.

Furthermore, SPAD sensors are often affected by dark current. Dark current can induce an avalanche event without photon detection, thereby adding noise to SPAD imagery. The amount of dark current experienced by SPAD sensors increases with temperature. In addition, high-power operation of a SPAD sensor in a lighted environment may contribute to increased operational temperature of the SPAD sensor, thereby increasing dark current and resulting signal noise.

Accordingly, there is an ongoing need and desire for improvements to the image acquisition using SPADs, particularly in illuminated and/or high temperature environments.

The subject matter claimed herein is not limited to embodiments that solve any disadvantages or that operate only in environments such as those described above. Rather, this background is only provided to illustrate one exemplary technology area where some embodiments described herein may be practiced.

BRIEF SUMMARY

Disclosed embodiments include systems, methods, and devices for power efficient image acquisition.

Some embodiments include a system that has an image sensor comprising a plurality of image sensing pixels, one or more processors, and one or more hardware storage devices storing instructions that are executable by the one or more processors to configure the system to perform various acts. The acts include capturing, using the image sensor, a plurality of partial image frames including at least a first partial image frame and a second partial image frame. The first partial image frame is captured at a first timepoint using a first subset of image sensing pixels of the plurality of image sensing pixels of the image sensor. The second partial image frame is captured at a second timepoint using a second subset of image sensing pixels of the plurality of image sensing pixels of the image sensor. The second subset of image sensing pixels includes different image sensing pixels than the first subset of image sensing pixels, and the second timepoint is temporally subsequent to the first timepoint. The acts also include generating a composite image frame based on the plurality of partial image frames.

Some embodiments include a system that has an image sensor comprising a plurality of image sensing pixels, one or more processors, and one or more hardware storage devices storing instructions that are executable by the one or more processors to configure the system to perform various acts. The acts include obtaining a runtime conditions measurement comprising (i) runtime light or (ii) runtime temperature. The acts also include, in response to determining that the runtime conditions measurement satisfies one or more thresholds, selectively activating a sampling mode for image acquisition. The sampling mode configures the system to utilize a subset of image sensing pixels of the image sensor to capture image frames. The subset of image sensing pixels comprising fewer than all image sensing pixels of the image sensor.

Some embodiments include an image sensor that includes a plurality of image sensing pixels and one or more integrated circuits configured to, in response to detecting activation of a sampling mode, selectively activate a first subset of image sensing pixels of the plurality of image sensing pixels to configure the first subset of image sensing pixels for photon detection while selectively refraining from activating a second subset of image sensing pixels of the plurality of image sensing pixels.

This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.

Additional features and advantages will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the teachings herein. Features and advantages of the invention may be realized and obtained by means of the instruments and combinations particularly pointed out in the appended claims. Features of the present invention will become more fully apparent from the following description and appended claims or may be learned by the practice of the invention as set forth hereinafter.

BRIEF DESCRIPTION OF THE DRAWINGS

In order to describe the manner in which the above-recited and other advantages and features can be obtained, a more particular description of the subject matter briefly described above will be rendered by reference to specific embodiments which are illustrated in the appended drawings. Understanding that these drawings depict only typical embodiments and are not therefore to be considered to be limiting in scope, embodiments will be described and explained with additional specificity and detail through the use of the accompanying drawings in which:

FIG. 1 illustrates example components of an example system that may include or be used to implement one or more disclosed embodiments;

FIG. 2A illustrates an example of capturing an image frame of an object in a low light environment using a single photon avalanche diode (SPAD) array of a head-mounted display (HMD);

FIG. 2B illustrates a conceptual representation of activating a sampling mode based on one or more runtime conditions measurements;

FIGS. 2C-2E illustrate examples of capturing image frames of an object in a lighted environment using a SPAD array of an HMD in a sampling mode;

FIGS. 3A and 3B illustrate an example of capturing consecutive partial image frames of an object in a lighted environment using a SPAD array of an HMD in a sampling mode;

FIGS. 3C and 3D illustrate examples of combining consecutively captured partial image frames to generate composite images;

FIGS. 4A-4D illustrate an additional example of capturing consecutive partial image frames of an object in a lighted environment using a SPAD array in a sampling mode;

FIG. 4E illustrates an additional example of temporally filtering consecutively captured partial image frames to generate composite images;

FIG. 5 illustrates example SPAD pixels of a SPAD array that include color filters;

FIG. 6A illustrates an example of capturing consecutive partial image frames using subsets of SPAD pixels associated with different color channels;

FIG. 6B illustrates an example of generating a composite image by temporally filtering consecutively captured partial image frames and generating a color image by demosaicing the composite image; and

FIGS. 7 and 8 illustrate example flow diagrams depicting acts associated with power efficient image acquisition using SPADs.

DETAILED DESCRIPTION

Disclosed embodiments are generally directed to systems, methods, and devices for power efficient image acquisition using single photon avalanche diodes (SPADs).

Examples of Technical Benefits, Improvements, and Practical Applications

Those skilled in the art will recognize, in view of the present disclosure, that at least some of the disclosed embodiments may be implemented to address various shortcomings associated with at least some conventional image acquisition techniques. The following section outlines some example improvements and/or practical applications provided by the disclosed embodiments. It will be appreciated, however, that the following are examples only and that the embodiments described herein are in no way limited to the example improvements discussed herein.

The techniques described herein may facilitate a number of advantages over conventional systems, devices, and/or methods for SPAD image acquisition (including color image acquisition), particularly for imaging under illuminated conditions and/or high-temperature conditions.

For example, techniques of the present disclosure include operating an image sensor in a sampling mode for image acquisition. The sampling mode may be selectively activated in response to runtime conditions (e.g., temperature conditions and/or light conditions). The sampling mode causes systems to utilize one or more subsets of image sensing pixels of an image sensor to capture image frames of an environment. In some instances, consecutive image frames are captured using different subsets of image sensing pixels of the image sensor, and the consecutive image frames are combined or temporally filtered to generate a composite image.

Accordingly, techniques of the present disclosure may reduce the number of image sensing pixels used for image acquisition when runtime illumination and/or temperature conditions are high. By reducing the number of image sensing pixels used for image acquisition, systems employing techniques of the present disclosure may operate with reduced power consumption (e.g., fewer SPAD pixels may detect avalanche events, thereby reducing sensor power consumption) and/or may reduce temperature increases brought about by power consumption (e.g., thereby reducing the effects of dark current). Reduced power consumption may facilitate increased device battery life, reduced overall device heat, and/or other benefits.

Many of the examples described herein focus on image sensors embodied as SPAD arrays with a plurality of SPAD pixels. SPAD arrays may provide various benefits over conventional CMOS and/or CCD sensor, particularly when image acquisition functionality is desired for both low light environments and illuminated environments.

Initially, the binarization of the SPAD signal effectively eliminates read noise, thereby improving signal-to-noise ratio for SPAD image sensor arrays as compared with conventional CMOS and/or CCD sensors. Accordingly, because of the binarization of SPAD signal, a SPAD signal may be read out at a high framerate (e.g., 90 Hz or greater, such as 120 Hz or even 240 Hz) without causing the signal to be dominated by read noise, even for signals capturing a low number of photons under low light environments.

In view of the foregoing, multiple exposure (and readout) operations may be performed at a high framerate using a SPAD array to generate separate partial image frames, and these image frames may be temporally filtered with one another. The separate partial image frames may be aligned using motion data and combined (e.g., by averaging or other filtering) to form a single composite image. In this regard, SPAD images may be obtained in a temporally filtered manner (e.g., with persistence), using prior-timepoint image data to improve the quality of current-timepoint image data.

Although the present disclosure focuses, in at least some respects, on SPAD sensors that include a SPAD array with a plurality of SPAD pixels, it will be appreciated, in view of the present disclosure, that the principles described herein may apply to CMOS, CCD, and/or other types of image sensors. For example, image sensing pixels of any type of image sensor may be selectively activated and/or deactivated to facilitate image acquisition according to a sampling mode as discussed herein.

Having just described some of the various high-level features and benefits of the disclosed embodiments, attention will now be directed to FIGS. 1 through 8. These Figures illustrate various conceptual representations, architectures, methods, and supporting illustrations related to the disclosed embodiments.

Example Systems and Techniques for Power Efficient Image Acquisition Using SPADs

Attention is now directed to FIG. 1, which illustrates an example system 100 that may include or be used to implement one or more disclosed embodiments. FIG. 1 depicts the system 100 as a head-mounted display (HMD) configured for placement over a head of a user to display virtual content for viewing by the user’s eyes. Such an HMD may comprise an augmented reality (AR) system, a virtual reality (VR) system, and/or any other type of HMD. Although the present disclosure focuses, in at least some respects, on a system 100 implemented as an HMD, it should be noted that the techniques described herein may be implemented using other types of systems/devices, without limitation.

FIG. 1 illustrates various example components of the system 100. For example, FIG. 1 illustrates an implementation in which the system includes processor(s) 102, storage 104, sensor(s) 110, I/O system(s) 116, and communication system(s) 118. Although FIG. 1 illustrates a system 100 as including particular components, one will appreciate, in view of the present disclosure, that a system 100 may comprise any number of additional or alternative components.

The processor(s) 102 may comprise one or more sets of electronic circuitries that include any number of logic units, registers, and/or control units to facilitate the execution of computer-readable instructions (e.g., instructions that form a computer program). Such computer-readable instructions may be stored within storage 104. The storage 104 may comprise physical system memory and may be volatile, non-volatile, or some combination thereof. Furthermore, storage 104 may comprise local storage, remote storage (e.g., accessible via communication system(s) 116 or otherwise), or some combination thereof. Additional details related to processors (e.g., processor(s) 102) and computer storage media (e.g., storage 104) will be provided hereinafter.

In some implementations, the processor(s) 102 may comprise or be configurable to execute any combination of software and/or hardware components that are operable to facilitate processing using machine learning models or other artificial intelligence-based structures/architectures. For example, processor(s) 102 may comprise and/or utilize hardware components or computer-executable instructions operable to carry out function blocks and/or processing layers configured in the form of, by way of non-limiting example, single-layer neural networks, feed forward neural networks, radial basis function networks, deep feed-forward networks, recurrent neural networks, long-short term memory (LSTM) networks, gated recurrent units, autoencoder neural networks, variational autoencoders, denoising autoencoders, sparse autoencoders, Markov chains, Hopfield neural networks, Boltzmann machine networks, restricted Boltzmann machine networks, deep belief networks, deep convolutional networks (or convolutional neural networks), deconvolutional neural networks, deep convolutional inverse graphics networks, generative adversarial networks, liquid state machines, extreme learning machines, echo state networks, deep residual networks, Kohonen networks, support vector machines, neural Turing machines, and/or others.

As will be described in more detail, the processor(s) 102 may be configured to execute instructions 106 stored within storage 104 to perform certain actions associated with image acquisition. The actions may rely at least in part on data 108 (e.g., avalanche event counting or tracking, etc.) stored on storage 104 in a volatile or non-volatile manner.

In some instances, the actions may rely at least in part on communication system(s) 118 for receiving data from remote system(s) 120, which may include, for example, separate systems or computing devices, sensors, and/or others. The communications system(s) 120 may comprise any combination of software or hardware components that are operable to facilitate communication between on-system components/devices and/or with off-system components/devices. For example, the communications system(s) 120 may comprise ports, buses, or other physical connection apparatuses for communicating with other devices/components. Additionally, or alternatively, the communications system(s) 120 may comprise systems/components operable to communicate wirelessly with external systems and/or devices through any suitable communication channel(s), such as, by way of non-limiting example, Bluetooth, ultra-wideband, WLAN, infrared communication, and/or others.

FIG. 1 illustrates that a system 100 may comprise or be in communication with sensor(s) 110. Sensor(s) 110 may comprise any device for capturing or measuring data representative of perceivable phenomenon. By way of non-limiting example, the sensor(s) 110 may comprise one or more image sensors, microphones, thermometers, barometers, magnetometers, accelerometers, gyroscopes, and/or others.

FIG. 1 also illustrates that the sensor(s) 110 include SPAD array(s) 112. As depicted in FIG. 1, a SPAD array 112 comprises an arrangement of SPAD pixels 122 that are each configured to facilitate avalanche events in response to sensing a photon, as described hereinabove. SPAD array(s) 112 may be implemented on a system 100 (e.g., an MR HMD) to facilitate image capture for various purposes (e.g., to facilitate computer vision tasks, pass-through imagery, and/or others).

FIG. 1 also illustrates the SPAD pixels 122 of the SPAD array(s) 112 as being connected to and/or controllable by integrated circuit(s) 124. Integrated circuit(s) 124 may comprise one or more analog, digital, and/or mixed signal integrated circuits that include one or more logic circuitries for controlling operation of the SPAD pixels 122 of the SPAD array(s) 112. For example, integrated circuit(s) 124 may comprise one or more field-programmable gate arrays (FPGAs), microprocessors, digital memory chips, application-specific integrated circuits (ASICs), and/or others. As will be described in more detail hereinafter, the integrated circuit(s) 124 may be used to selectively activate or deactivate certain subsets of SPAD pixels 122 of the SPAD array(s) 112 (or other image sensing pixels of any image sensor) to facilitate image capture in accordance with a sampling mode.

FIG. 1 also illustrates that the sensor(s) 110 include inertial measurement unit(s) 114 (IMU(s) 114). IMU(s) 114 may comprise any number of accelerometers, gyroscopes, and/or magnetometers to capture motion data associated with the system 100 as the system moves within physical space. The motion data may comprise or be used to generate pose data, which may describe the position and/or orientation (e.g., 6 degrees of freedom pose) and/or change of position (e.g., velocity and/or acceleration) and/or change of orientation (e.g., angular velocity and/or angular acceleration) of the system 100.

Furthermore, FIG. 1 illustrates that a system 100 may comprise or be in communication with I/O system(s) 116. I/O system(s) 116 may include any type of input or output device such as, by way of non-limiting example, a touch screen, a mouse, a keyboard, a controller, and/or others, without limitation. For example, the I/O system(s) 116 may include a display system that may comprise any number of display panels, optics, laser scanning display assemblies, and/or other components.

Attention is now directed to FIG. 2A, which illustrates an example of capturing an image frame 210 of an object 206 (e.g., a table) in a low light environment 208 using a single photon avalanche diode (SPAD) array of a head-mounted display 202 (HMD 202). The HMD 202 corresponds, in at least some respects, to the system 100 disclosed hereinabove. For example, the HMD 202 includes a SPAD array (e.g., SPAD array(s) 112) that includes SPAD pixels (e.g., SPAD pixels 122) configured for photon detection to capture images (even in low light environments).

FIG. 2A illustrates image data 212 of the image frame 210 depicting the object 206. The image data 212 may comprise intensity values determined based on the per-pixel quantity of avalanche events detected by the SPAD pixels of the SPAD array(s) of the HMD 202. When imaging under low light conditions, as depicted in FIG. 2A, SPAD pixels of a SPAD array may consume relatively little power (e.g., as a result of detecting relatively few photons). However, when runtime lighting conditions increase, SPAD pixels may consume more power (e.g., as a result of detecting more photons and therefore triggering more avalanche events). Accordingly, at least some techniques of the present disclosure include operating SPAD sensors in a power-efficient manner that accommodates for high-light environments (and/or high-temperature environments).

FIG. 2B illustrates a conceptual representation of activating a sampling mode based on one or more runtime conditions measurements. In particular, FIG. 2B shows the object 206 within a lighted environment 214, in contrast with the low light environment 208 of FIG. 2A. The lighted environment 214 may cause SPADs of the HMD 202 to detect additional avalanche events, thereby increasing system power consumption. Accordingly, FIG. 2B illustrates runtime conditions measurement(s) 216 determined based on conditions associated with the captured environment (e.g., lighted environment 214) and/or conditions associated with the HMD 202 and/or components thereof. The runtime conditions measurement(s) 216 may comprise runtime light 218, runtime temperature 220, and/or other metrics.

Runtime light 218 may be determined in various ways, without limitation. One example technique for determining runtime light is based on gray level counts detected by an image sensor of the HMD 202 (e.g., the SPAD array(s) 112 of the HMD 202, or another sensor thereof). For example, a number of counts may be measured over an exposure time associated with the image sensor of the HMD to estimate runtime light 218.

Runtime temperature 220 may comprise various components, such as environment temperature 222 and/or device temperature 224. Environment temperature 222 may be determined based on one or more temperature sensors (e.g., sensor(s) 110) of the HMD 202 for measuring the ambient temperature of the environment surrounding the HMD 202 at runtime. Device temperature 224 may be determined based on one or more temperature sensors (e.g., sensor(s) 110) of the HMD 202 for measuring temperature of one or more devices of the HMD 202, such as SPAD array(s) 112 of the HMD 202 (or other image sensors thereof), display systems of the HMD 202, processing units of the HMD 202, and/or others.

Any combination of runtime conditions measurement(s) 216 may be obtained in accordance with the present disclosure. In some implementations, a system compares the runtime conditions measurement(s) 216 to one or more thresholds to determine whether the runtime conditions measurement(s) 216 satisfy the one or more thresholds (as indicated in FIG. 2B by decision block 226 stating “Threshold(s) Satisfied?”). By way of non-limiting example, a threshold device temperature 224 may be 40° C., a threshold environment temperature 222 may be 30° C., and a threshold runtime light 218 may be 300 lux.

FIG. 2B illustrates that if the runtime conditions measurement(s) 216 are determined to satisfy the threshold(s), a system may active a sampling mode (as indicated in FIG. 2B by block 228). In contrast, if the runtime conditions measurement(s) 216 are determined to fail to satisfy the threshold(s), a system may refrain from activating the sampling mode and instead remain in a normal image acquisition mode (as indicated in FIG. 2B by block 230). A “sampling mode” as used herein refers to an image acquisition mode wherein only one or more subsets of image sensing pixels (e.g., SPAD pixels 122) of an image sensor (e.g., SPAD array 112) are used to capture images of environments. For example, the integrated circuit(s) 124 of a SPAD array 112 may selectively activate one subset of SPAD pixels of the SPAD array 112 (thereby configuring the subset of SPAD pixels for photon detection to detect avalanche events) while selectively refraining from activating another subset of SPAD pixels of the SPAD array 112 (thereby refraining from configuring the other subset of SPAD pixels for photon detection).

Accordingly, fewer than all image sensing pixels of an image sensor may be selectively used to capture images of an environment, thereby allowing the image sensor to advantageously operate in a reduced power mode or in a power saving mode.

FIGS. 2C-2E illustrate examples of capturing image frames of an object in a lighted environment using a SPAD array of an HMD in a sampling mode. In particular, FIG. 2C illustrates the HMD 202 capturing the object 206 in the lighted environment 214 as discussed above. FIG. 2C illustrates a SPAD array 232 of the HMD 202, which may generally correspond to the SPAD array(s) 112 discussed hereinabove. The SPAD array 232 of the HMD 202 of FIG. 2C is reconfigured (e.g., utilizing integrated circuit(s) 124) in a sampling mode, wherein only a subset of the SPAD pixels of the SPAD array 232 are activated for photon detection to capture the object 206 in the lighted environment 214. The SPAD array 232 may become reconfigured according to the sampling mode in response to the runtime conditions measurement(s) 216 satisfying a threshold (e.g., the illumination of the lighted environment 214 may cause the runtime light 218 to satisfy a threshold).

FIG. 2C illustrates a configuration of a sampling mode in which every other column of SPAD pixels of the SPAD array 232 is activated to detect photons for generating an image. Inactive SPAD pixels 260 of the SPAD array 232 are illustrated in FIG. 2C with black squares (arranged to form black bars), whereas active SPAD pixels 262 of the SPAD array 232 are illustrated in FIG. 2C with white squares. In the example shown in FIG. 2C, using active SPAD pixels 262 (and refraining from using the inactive SPAD pixels 260), the HMD 202 may utilize the SPAD array 232 to capture an image frame 234 of the object 206 in the lighted environment 214. As is evident from FIG. 2C, the image frame 234 comprises a reduced image resolution in the horizontal dimension (e.g., relative to the image resolution of the image frame 210 of FIG. 2A captured under a normal image acquisition mode) in view of the inactive columns of SPAD pixels of the SPAD array 232.

Although the image frame 234 may be used for any desired purpose (e.g., passthrough imaging, depth imaging, simultaneous localization and mapping, object tracking, and/or other functions), in some instances, super-resolution processing 236 is performed on the image frame 234 to generate an upscaled image frame 238. The upscaled image frame 238 comprises an image resolution that is greater than the image resolution of the image frame 234 (e.g., at least in the horizontal dimension, to compensate for the inactive columns of SPAD pixels 260 of the SPAD array 232). In the example shown in FIG. 2C, the upscaled image frame 238 comprises an image resolution that matches the image resolution of image frame 210 captured under a normal image acquisition mode (however, other image resolutions for the upscaled image frame 238 are within the scope of the present disclosure).

Super-resolution processing 236 may include one or more upsampling algorithms configured to generate a high-resolution image from one or more low-resolution images. For example, super-resolution processing 236 to generate a high-resolution image from one or more low-resolution images may employ techniques such as spatial domain approaches (e.g., sample transformation using the sampling theorem and the Nyquist theorem), frequency domain approaches (e.g., registering images using properties of the discrete Fourier transform), learning based techniques (e.g., adaptive regularization, pair matching, etc.), iterative reconstruction and interpolation based techniques (e.g., iterative back projection, pixel replication, nearest-neighbor interpolation, bilinear or bicubic interpolation, etc.), dynamic tree and wavelet based resolution techniques (e.g., mean field approaches), filtering techniques (e.g., edge-preserving filtering operations such as joint bilateral filter, guided filter, bilateral solver, etc.) and/or others.

Although FIG. 2C illustrates an example configuration where a sampling mode causes columns of SPAD pixels of the SPAD array 232 to be deactivated or remain inactive for the capturing of an image frame (e.g., image frame 234), other configurations may be used. For example, FIG. 2D illustrates the SPAD array 232 of the HMD 202 capturing the object 206 in the lighted environment 214 in a sampling mode where every other row of the SPAD array is activated for detecting photons to facilitate image acquisition. Accordingly, the image frame 240 comprises a reduced image resolution in the vertical dimension (relative to the image resolution of the image frame 210 of FIG. 2A captured under a normal image acquisition mode) in view of the inactive rows of SPAD pixels of the SPAD array 232. Super-resolution processing 242 may similarly be performed to generate an upscaled image frame 244 based on the image frame 240.

Furthermore, FIG. 2E illustrates the SPAD array 232 of the HMD 202 capturing the object 206 in the lighted environment 214 in a sampling mode where one quarter of the SPAD pixels of the SPAD array 232 are activated for detecting photons to facilitate image acquisition. For example, for each 2×2 block of SPAD pixels, only one SPAD pixel is active for photon detection during image acquisition. Accordingly, the image frame 246 comprises a reduced resolution in both the horizontal dimension and the vertical dimension (e.g., relative to the image resolution of the image frame 210 of FIG. 2A captured under a normal image acquisition mode). Super-resolution processing 248 may be performed to generate an upscaled image frame 250 based on the image frame 246.

As is evident from FIGS. 2C-2E, different configurations of active SPAD pixels for image acquisition under a sampling mode can result in different image resolutions for captured image frames (e.g., compare image frames 234, 240, and 246). Furthermore, the configuration of active SPAD pixels used to capture image frame 246 includes fewer SPAD pixels than the configurations of active SPAD pixels used to capture image frames 234, 240, and 246. Thus, the configuration of active SPAD pixels used to capture image frame 246 (with 25% of the SPAD pixels active) may facilitate a greater reduction in power consumption than the configuration of active SPAD pixels used to capture image frames 234 and 240. In this regard, in some instances, the quantity or ratio of active SPAD pixels used in a sampling mode is dynamically determined based on the runtime conditions measurement(s) 216. For instance, multiple thresholds may be used to trigger different configurations of active SPAD pixels for a sampling mode. By way of example, a runtime light 218 of about 300 lux may trigger a sampling mode where 50% of the SPAD pixels are activated for image acquisition (e.g., as shown by example in FIGS. 2C and 2D), and a runtime light 218 of about 1000 lux may trigger a sampling mode where 25% of the SPAD pixels are activated for image acquisition (e.g., as shown by example in FIG. 2E). Other quantities and/or ratios for other light levels and/or runtime temperatures may be used.

In some implementations, a sampling mode causes a system to utilize different subsets of image sensing pixels of an image sensor to capture different temporally consecutive image frames. The different temporally consecutive image frames may be combined to form composite images. In some instances, such composite images captured under the sampling mode may advantageously comprise an image resolution that matches an image resolution defined for images captured under a normal image acquisition mode (without having to perform super-resolution processing).

Accordingly, FIGS. 3A and 3B illustrate an example of capturing consecutive partial image frames of the object 206 in the lighted environment 214 using a SPAD array 306 of an HMD 302 in a sampling mode. The HMD 302 and the SPAD array 306 may generally correspond to other HMDs and SPAD arrays described herein. FIG. 3A illustrates the SPAD array 306 of the HMD 302 capturing the object 206 in the lighted environment 214 while the HMD 302 is positioned according to pose 304A. The pose 304A may be tracked or measured utilizing sensors (e.g., IMU(s) 114, camera(s) to facilitate simultaneous localization and mapping, etc.) of the HMD 302.

FIG. 3A illustrates the SPAD array 306 with certain columns of SPAD pixels activated for image acquisition (e.g., the odd columns being activated). For example, the SPAD array 306 may be divided into sections that each comprise two adjacent SPAD pixels (e.g., section 308A comprising SPAD pixels 308A-1 and 308A-2, and section 308B comprising SPAD pixels 308B-1 and 308B-2, and so forth). A first subset of SPAD pixels used to capture the partial image frame 310 of FIG. 3A may include a first SPAD pixel from each of the different sections of SPAD pixels (e.g., SPAD pixel 308A-1 of section 308A and SPAD pixel 308B-1 of section 308B, and so forth).

As is evident from FIG. 3A, the representation of the partial image frame 310 captured using the first subset of SPAD pixels (SPAD pixels 308A-1, 308B-1, and so forth) includes placeholder pixels (shown as black vertical bars) that indicate portions of the captured environment that would have been detected by the SPAD pixels of the SPAD array 306 that were inactive for the capturing of the partial image frame 310 (the size of the image pixels relative to the partial image frame 310 is exaggerated for clarity). Thus, partial image frame 310 may appear as though it is missing image data when conceptually expanded to include the placeholder pixels as shown in FIG. 3A. However, as will be discussed hereafter, the apparently missing image data may be obtained by utilizing a different subset of SPAD pixels (e.g., SPAD pixels not included in the first subset of SPAD pixels discussed above) to capture a subsequent partial image frame for combination with the partial image frame 310.

Accordingly, FIG. 3B illustrates the HMD 302 positioned according to pose 304B while capturing the object 206 in the lighted environment 214 (pose 304A is illustrated in FIG. 3B for reference). FIG. 3B shows the SPAD array 306 with a different configuration of active columns of SPAD pixels (e.g., with the even columns being activated) relative to the configuration shown in FIG. 3A. For instance, a second subset of SPAD pixels used to capture the partial image frame 312 of FIG. 3B may include a respective second SPAD pixel from each of the different sections of SPAD pixels (e.g., SPAD pixel 308A-2 of section 308A and SPAD pixel 308B-2 of section 308B, and so forth). Partial image frame 312 is captured at a timepoint (e.g., when the HMD is positioned according to pose 304B) that is temporally subsequent to a timepoint associated with the capturing of partial image frame 310 (e.g., the timepoint when the HMD was positioned according to pose 304A).

Because different subsets of SPAD pixels of the SPAD array 306 are used to capture the different partial image frames 310 and 312, the image data of the partial image frames 310 and 312 may complement one another in capturing the subject environment (e.g., lighted environment 214 including object 206). For example, the previously mentioned apparently missing image data of partial image frame 310 may be supplemented with the image data of the partial image frame 312 to complete the representation of the captured scene. Similarly, the apparently missing image data of the partial image frame 312 as shown in FIG. 3B may be supplemented with the image data of the partial image frame 310 to complete the represented of the captured scene.

FIG. 3C illustrates the partial image frames 312 and 310 discussed hereinabove with reference to FIGS. 3A and 3B being composited or combined with one another via temporal filtering 314 to generate a composite image 316. Temporal filtering 314 may include using image pixels of the different image frames (e.g., partial image frames 310 and 312) to generate pixel values for an output image (i.e., composite image 316).

Image pixels of the different image frames may be combined or composited in various ways, such as by summing, averaging (e.g., weighted averaging), alpha blending, and/or others, and the manner/parameters of combining corresponding image pixels may differ for different pixel regions and/or may be dynamically determined based on various factors (e.g., signal strength, amount of motion, motion detected in a captured scene, etc.).

In some instances, the partial image frames 310 and 312 capture the object 206 from poses that are at least slightly different from one another. For example, the HMD 302 may capture the partial image frames 310 and 312 from poses 304A and 304B, respectively, which may at least slightly differ from one another. Accordingly, in some instances, temporal filtering 314 may include utilizing motion data 318 to align the partial image frames 310 and 312 with one another. Motion data 318, may comprise or be used to generate pose data that describes the position and/or orientation (e.g., 6 degrees of freedom pose) and/or change of position (e.g., velocity and/or acceleration) and/or change of orientation (e.g., angular velocity and/or angular acceleration) of the HMD 302 (and/or the SPAD array 306) during the capturing of the partial image frames 310 and 312.

As noted above, the motion data 318 may be used to align the partial image frames 310 and 312 with one another. For example, a system may use the motion data 318 to align partial image frames 310 with pose 304B of partial image frame 312, thereby generating aligned image frames that are spatially aligned with one another (e.g., appearing as though they were captured from pose 304B with the same capture perspective). In this regard, the temporal filtering 314 may comprise motion compensated temporal filtering.

In some instances, temporal filtering 314 additionally or alternatively utilizes optical flow estimations to align the partial image frames 310 and 312 to facilitate image compositing to generate a composite image 316. For example, in some instances, a system upsamples the consecutively captured partial image frames and performs optical flow analysis to obtain vectors for aligning the pixels of the consecutively captured image frames. Furthermore, although the present disclosure focuses, in at least some respects, on temporal filtering operations that utilize image frames that temporally precede an image frame associated with a target timepoint to generate a composite image associated with the target timepoint, temporal filtering operations may additionally or alternatively utilize at least some image frames that are temporally subsequent to an image frame associated with a target timepoint to generate a composite image associated with the target timepoint.

As is depicted in FIG. 3C, the composite image 316 comprises an image resolution that corresponds to the image resolution of the image frame 210 captured under a normal image acquisition mode (see FIG. 2A). Accordingly, because the partial image frames 310 and 312 are both captured using subsets of SPAD pixels, both of the partial image frames 310 and 312 may be separately captured with reduced power consumption. Furthermore, because the partial image frames 310 and 312 are captured using complementary subsets of SPAD pixels, the partial image frames 310 and 312 may comprise complementary image data that can be combined by temporal filtering/motion compensation to provide a composite image (which may comprise the same image resolution as would be available when using all SPAD pixels of a SPAD array to capture an image).

The example discussed with reference to FIG. 3C focuses on a particular configuration for first and second subsets of SPAD pixels (using alternating columns of SPAD pixels) for capturing first and second temporally consecutive partial image frames to generate a composite image. Other configurations for different subsets of SPAD pixels for capturing temporally consecutive partial image frames are within the scope of the present disclosure.

For example, FIG. 3D illustrates example partial image frames 320 and 322 captured using different subsets of SPAD pixels including respective sets of rows of active SPAD pixels. Similar to the partial image frames 310 and 312, the partial image frames may be combined via temporal filtering 324 to generate a composite image 326. Temporal filtering 324 may similarly utilize motion data 328 to align the image data of the partial image frames 320 and 322 to generate the composite image 326 (e.g., where the partial image frames 320 and 322 are captured from at least partially different poses or capture perspectives).

FIGS. 4A-4E illustrate an additional example of these principles of capturing consecutive partial image frames using different subsets of SPAD pixels of a SPAD array in a sampling mode to generate a composite image. In particular, FIG. 4A illustrates a SPAD array 402 (which may generally correspond to the other SPAD arrays described herein) and illustrates a first subset of SPAD pixels used to capture a first partial image frame 406. For example, the SPAD array 402 may be divided into sections that each comprise 2×2 SPAD pixels (e.g., sections 404A, 404B, and so forth). The first subset of SPAD pixels used to capture the first partial image frame 406 may comprise a respective first SPAD pixel from each different section of SPAD pixels (e.g., SPAD pixel 404A-1 from section 404A, SPAD pixel 404B-1 from section 404B, and so forth). As before, the partial image frame 406 is conceptually depicted with placeholder pixels to indicate that the portions of the environment not captured by the first subset of SPAD pixels at the desired resolution may be obtained by other subsets of SPAD pixels in subsequent partial image frames.

Similarly, FIG. 4B illustrates the SPAD array 402 and a second subset of SPAD pixels used to capture a second partial image frame 408. As depicted in FIG. 4B, the second subset of SPAD pixels used to capture the second partial image frame 408 frame may comprise a respective second SPAD pixel from each different section of SPAD pixels (e.g., SPAD pixel 404A-2 from section 404A, SPAD pixel 404B-2 from section 404B, and so forth). Furthermore, FIG. 4C illustrates the SPAD array 402 and a third subset of SPAD pixels used to capture a third partial image frame 410. As depicted in FIG. 4C, the third subset of SPAD pixels used to capture the third partial image frame 410 may comprise a respective third SPAD pixel from each different section of SPAD pixels (e.g., SPAD pixel 404A-3 from section 404A, SPAD pixel 404B-3 from section 404B, and so forth). Still furthermore, FIG. 4D illustrates the SPAD array 402 and a fourth subset of SPAD pixels used to capture a fourth partial image frame 412. As depicted in FIG. 4D, the fourth subset of SPAD pixels used to capture the fourth partial image frame 412 may comprise a respective fourth SPAD pixel from each different section of SPAD pixels (e.g., SPAD pixel 404A-4 from section 404A, SPAD pixel 404B-4 from section 404B, and so forth).

The different partial image frames 406, 408, 410, and 412 of FIGS. 4A-4D are captured in temporal sequence (e.g., each partial image frame is associated with a respective timepoint), and the different partial image frames 406, 408, 410, and 412 may have at least partially different poses associated therewith.

FIG. 4E illustrates the different partial image frames 406, 408, 410, and 412 of FIGS. 4A-4D being composited via temporal filtering 414 to generate a composite image 416 (utilizing motion data 418 as needed). As depicted in FIG. 4E, the composite image comprises an image resolution that corresponds to the image resolution of the image frame 210 captured under a normal image acquisition mode (see FIG. 2A). Thus, utilizing techniques of the present disclosure, full-resolution SPAD imagery may be acquired while operating a SPAD sensor at 25% power (or at another partial power level).

In some instances, the quantity or ratio of active SPAD pixels used in each separate subset of SPAD pixels for capturing consecutive partial image frames in a sampling mode is dynamically determined based on the runtime conditions measurement(s) 216. For instance, multiple thresholds may be used to trigger different configurations of active SPAD pixels for subsets of SPAD pixels of a sampling mode. By way of example, a runtime light 218 of about 300 lux may trigger a sampling mode where 50% of the SPAD pixels are activated for each subset of SPAD pixels to facilitate image acquisition (e.g., as shown by example in FIGS. 3A-3D), and a runtime light 218 of about 1000 lux may trigger a sampling mode where 25% of the SPAD pixels are activated for each subset of SPAD pixels to facilitate image acquisition (e.g., as shown by example in FIGS. 4A-4E). Other quantities and/or ratios for other light levels and/or runtime temperatures may be used. In this regard, a quantity of temporally consecutive partial image frames used to generate each composite image may be modified based on the runtime conditions measurement(s) 216 (e.g., two partial image frames may be used where 50% of SPAD pixels are activated for each subset of SPAD pixels, four partial image frames may be used where 25% of SPAD pixels are activated for each subset of SPAD pixels, three partial image frames may be used where one third of SPAD pixels are activated for each subset of SPAD pixels, and so forth).

As noted herein, the principles described herein may additionally or alternatively be implemented utilizing any type of image sensor (e.g., SPAD, CMOS, CCD and/or other image sensors). Furthermore, the principles described herein may additionally or alternatively be implemented utilizing image sensors that include color filters. FIG. 5 illustrates an example section of SPAD pixels 502 of a SPAD array (e.g., SPAD array 112, or any other SPAD array described herein) that includes respective color filters positioned over the SPAD pixels 502 thereof. FIG. 5 illustrates the color filters positioned over the SPADs 502 in a Bayer pattern, in particular with diagonally disposed green filters 506 and 508 and with a diagonally disposed red filter 504 and blue filter 510. This pattern may be repeated over a SPAD array to form a mosaic of color filtered SPAD pixels (as indicated by the ellipses). Although at least some of the examples disclosed herein focus, in at least some respects, on color-filtered SPADs 502 of a SPAD array arranged in a Bayer pattern, other patterns are within the scope of the present disclosure, such as by way of non-limiting example, CYGM (cyan, yellow, green magenta), RGBE (red, green, blue, emerald), Foveon X3 (e.g., a vertically arranged red, green, blue pattern), panchromatic cell patterns (e.g., RGBW (red, green, blue, white), CMYW (cyan, magenta, yellow, white), Fujifilm EXR, Fujifilm X-Trans, Quad Bayer)), and/or others. Furthermore, combinations of filtered and non-filtered SPADs (or other image sensing pixels) are within the scope of the present disclosure (e.g., arrangements of unfiltered SPADs and infrared filtered SPADs, arrangements of visible light filtered SPADs and infrared filtered SPADs, etc.).

FIG. 6A illustrates that the subsets of SPAD pixels used to acquire partial image frames for generating composite image frames may coincide with the colors of the different color filters positioned over the SPAD pixels. For example, FIG. 6A illustrates an example HMD 602 capturing a lighted environment 604 that includes a red object 606, a green object 608, and a blue object 610. The HMD 602 includes a SPAD array 612 to facilitate the capturing, and the SPAD array 612 includes SPAD pixels in a Bayer pattern as shown in FIG. 5.

In the example shown in FIG. 6A, the first subset of SPAD pixels of the SPAD array 612 used to acquire the red partial image frame 614 includes the SPAD pixels of the SPAD array that include a red color filter (e.g., red filter 504) positioned thereover. For example, for each 2×2 section of SPAD pixels of the SPAD array 612 that includes a red color filter 504, green filters 506 and 508, and blue filter 510, the first subset of SPAD pixels may comprise the SPAD pixel from each section that includes a red color filter 504. Accordingly, the red partial image frame 614 includes image data depicting the red object 606 based on photons reflected or scattered by the red object 606 toward the SPAD array 612 and that transmit through the red color filters to be detected by the first subset of SPAD pixels (placeholder pixels are not shown in the partial image frames of FIG. 6A for simplicity).

Furthermore, in FIG. 6A, the second subset of SPAD pixels of the SPAD array 612 used to acquire the green partial image frame 616 includes the SPAD pixels of the SPAD array that include a green color filter (e.g., green filters 506, 508) positioned thereover. Accordingly, the green partial image frame 616 includes image data depicting the green object 608 based on photons reflected or scattered by the green object 608 toward the SPAD array 612 and that transmit through the green color filters to be detected by the second subset of SPAD pixels. Similarly, the third subset of SPAD pixels of the SPAD array 612 used to acquire the blue partial image frame 618 includes the SPAD pixels of the SPAD array that include a blue color filter (e.g., blue filter 510) positioned thereover. Accordingly, the blue partial image frame 618 includes image data depicting the blue object 610 based on photons reflected or scattered by the blue object 610 toward the SPAD array 612 and that transmit through the blue color filters to be detected by the third subset of SPAD pixels. The color-specific partial image frames 614, 616, and 618 may be captured in (any) temporal sequence.

FIG. 6B illustrates the different color-specific partial image frames 614, 616, and 618 of FIG. 6A being composited via temporal filtering 620 to generate a composite image 622 (utilizing motion data 624 as needed). The composite image 622 combines the intensity data from the red partial image frame 614, the green partial image frame 616, and the blue partial image frame into a single image. FIG. 6B further illustrates performing demosaicing 626 on the composite image 622 to generate a color image 628. Demosaicing may comprise interpolating or extrapolating a color value (e.g., an RGB value) for each image pixel (or SPAD pixel) of an image frame (or a SPAD array that captures an image frame). In contrast with generating a single color value for each block of Bayer pixels (e.g., each 2×2 set of RGB pixels) to generate a color image (thereby causing an image resolution loss), demosaicing may provide RGB color imagery without loss of image resolution. Accordingly, techniques of the present disclosure may facilitate low power acquisition of color imagery.

The specific quantities, ratios, and/or arrangements of SPAD pixels discussed in the examples above (e.g., quantity of SPAD pixels in a section of SPAD pixels, subset of SPAD pixels, etc.) are provided by way of example only and are not limiting of the present disclosure.

Example Method(s) for Power Efficient Image Acquisition Using SPADs

The following discussion now refers to a number of methods and method acts that may be performed by the disclosed systems. Although the method acts are discussed in a certain order and illustrated in a flow chart as occurring in a particular order, no particular ordering is required unless specifically stated, or required because an act is dependent on another act being completed prior to the act being performed. One will appreciate that certain embodiments of the present disclosure may omit one or more of the acts described herein.

FIGS. 7 and 8 illustrate example flow diagrams 700 and 800, respectively, depicting acts associated with power efficient image acquisition using SPADs. The discussion of the various acts represented in the flow diagrams include references to various hardware components described in more detail with reference to FIG. 1.

Act 702 of flow diagram 700 of FIG. 7 includes capturing, using an image sensor, a plurality of partial image frames including at least a first partial image frame and a second partial image frame. Act 702 is performed, in some instances, by a system utilizing processor(s) 102, storage 104, sensor(s) 110 (e.g., SPAD array 112), input/output system(s) 116, communication system(s) 118, and/or other components.

In some implementations, the first partial image frame is captured at a first timepoint using a first subset of image sensing pixels of the plurality of image sensing pixels of the image sensor, and the second partial image frame is captured at a second timepoint using a second subset of image sensing pixels of the plurality of image sensing pixels of the image sensor. The second subset of image sensing pixels includes different image sensing pixels than the first subset of image sensing pixels, and the second timepoint is temporally subsequent to the first timepoint.

Furthermore, in some instances, the image sensor includes a single photon avalanche diode (SPAD) array, such that the plurality of image sensing pixels includes a plurality of SPAD pixels, the first subset of image sensing pixels includes a first subset of SPAD pixels of the plurality of SPAD pixels, and the second subset of image sensing pixels includes a second subset of SPAD pixels of the plurality of SPAD pixels.

In some implementations, the plurality of SPAD pixels comprises a plurality of sections of SPAD pixels, the first subset of SPAD pixels comprises at least one SPAD pixel from each of the plurality of sections of SPAD pixels, and the second subset of SPAD pixels comprises at least one different SPAD pixel from each of the plurality of sections of SPAD pixels.

In some implementations, each of the plurality of sections of SPAD pixels includes a respective first SPAD pixel and a respective second SPAD pixel, the first subset of SPAD pixels includes the respective first SPAD pixel of each of the plurality of sections of SPAD pixels, and the second subset of SPAD pixels includes the respective second SPAD pixel of each of the plurality of sections of SPAD pixels.

In some implementations, the plurality of partial image frames further includes, in addition to first and second partial image frames, a third partial image frame and a fourth partial image frame. The third partial image frame is captured at a third timepoint using a third subset of SPAD pixels of the plurality of SPAD pixels of the SPAD array. The third timepoint is temporally subsequent to the second timepoint. The third subset of SPAD pixels includes different SPAD pixels than the first subset of SPAD pixels and the second subset of SPAD pixels. The fourth partial image frame is captured at a fourth timepoint using a fourth subset of SPAD pixels of the plurality of SPAD pixels of the SPAD array. The fourth timepoint is temporally subsequent to the third timepoint. The fourth subset of SPAD pixels includes different SPAD pixels than the first subset of SPAD pixels and the second subset of SPAD pixels and the third subset of SPAD pixels.

In some implementations, each of the plurality of sections of SPAD pixels includes a respective first SPAD pixel, a respective second SPAD pixel, a respective third SPAD pixel, and a respective fourth SPAD pixel. Furthermore, in some instances, the first subset of SPAD pixels comprises the respective first SPAD pixel of each of the plurality of sections of SPAD pixels, the second subset of SPAD pixels comprises the respective second SPAD pixel of each of the plurality of sections of SPAD pixels, the third subset of SPAD pixels comprises the respective third SPAD pixel of each of the plurality of sections of SPAD pixels, and the fourth subset of SPAD pixels comprises the respective fourth SPAD pixel of each of the plurality of sections of SPAD pixels.

In some implementations, each of the plurality of SPAD pixels comprises a respective color filter positioned thereover. Each of the plurality of sections of SPAD pixels includes at least one respective first SPAD pixel associated with a first color, at least one respective second SPAD pixel associated with a second color, and at least one respective third SPAD pixel associated with a third color. The plurality of partial image frames further comprises, in addition to a first and second partial image frame, a third partial image frame. The third partial image frame is captured at a third timepoint using a third subset of SPAD pixels of the plurality of SPAD pixels of the SPAD array. The third timepoint is temporally subsequent to the second timepoint, and the third subset of SPAD pixels includes different SPAD pixels than the first subset of SPAD pixels and the second subset of SPAD pixels. The first subset of SPAD pixels comprises the at least one respective first SPAD pixel of each of the plurality of sections of SPAD pixels associated with the first color, the second subset of SPAD pixels comprises the at least one respective second SPAD pixel of each of the plurality of sections of SPAD pixels associated with the second color, and the third subset of SPAD pixels comprises the at least one respective third SPAD pixel of each of the plurality of sections of SPAD pixels associated with the third color. In some implementations, the first color comprises red, the second color comprises green, and the third color comprises blue. Each of the plurality of sections of SPAD pixels may be arranged in a Bayer pattern.

In some implementations, the first subset of SPAD pixels comprises a first set of rows of SPAD pixels of the plurality of SPAD pixels, and the second subset of SPAD pixels comprises a second set of rows of SPAD pixels of the plurality of SPAD pixels. In some implementations, the first subset of SPAD pixels comprises a first set of columns of SPAD pixels of the plurality of SPAD pixels, and the second subset of SPAD pixels comprises a second set of columns of SPAD pixels of the plurality of SPAD pixels.

Act 704 of flow diagram 700 includes generating a composite image frame based on the plurality of partial image frames. Act 704 is performed, in some instances, by a system utilizing processor(s) 102, storage 104, sensor(s) 110, input/output system(s) 116, communication system(s) 118, and/or other components. In some implementations, act 704 includes various sub-acts that may be performed. For example, act 704A of flow diagram 700 includes generating aligned partial image frames by using motion data associated with the SPAD array to spatially align each of the plurality of partial image frames with one another. Furthermore, Act 704B of flow diagram 700 includes compositing each of the aligned partial image frames with one another.

In some instances, one or more of the acts of flow diagram 700 is/are performed in response to detecting activation of a power saving mode based on a runtime conditions measurement. Furthermore, in some instances, a quantity of partial image frames in the plurality of partial image frames is based on the runtime conditions measurement.

Act 802 of flow diagram 800 of FIG. 8 includes obtaining a runtime conditions measurement comprising (i) runtime light or (ii) runtime temperature. Act 802 is performed, in some instances, by a system utilizing processor(s) 102, storage 104, sensor(s) 110 (e.g., SPAD array(s) 112), input/output system(s) 116, communication system(s) 118, and/or other components. In some implementations, the runtime conditions measurement comprises runtime light. In other instances, the runtime conditions measurement comprises runtime temperature.

Act 804 of flow diagram 800 includes in response to determining that the runtime conditions measurement satisfies one or more thresholds, selectively activating a sampling mode for image acquisition, wherein the sampling mode configures the system to utilize a subset of image sensing pixels of the image sensor to capture image frames, the subset of image sensing pixels comprising fewer than all image sensing pixels of the image sensor. Act 804 is performed, in some instances, by a system utilizing processor(s) 102, storage 104, sensor(s) 110, input/output system(s) 116, communication system(s) 118, and/or other components.

As is evident from FIG. 8, different actions may be performed in response to act 804. For example, act 806 may be performed in response to act 804. Act 806 of flow diagram 800 includes performing super-resolution processing on each of the captured image frames. Act 806 is performed, in some instances, by a system utilizing processor(s) 102, storage 104, sensor(s) 110, input/output system(s) 116, communication system(s) 118, and/or other components. In some implementations, the output image frames comprise a higher image resolution than the captured image frames.

Furthermore, act 808 may be performed in response to act 804. Act 808 of flow diagram 800 includes utilizing different subsets of image sensing pixels of the image sensor to capture temporally consecutive image frames. Act 808 is performed, in some instances, by a system utilizing processor(s) 102, storage 104, sensor(s) 110, input/output system(s) 116, communication system(s) 118, and/or other components.

Act 810 of flow diagram 800 (stemming from act 808) includes generating composite images using respective sets of temporally consecutive image frames. Act 810 is performed, in some instances, by a system utilizing processor(s) 102, storage 104, sensor(s) 110, input/output system(s) 116, communication system(s) 118, and/or other components. In some implementations, a quantity of temporally consecutive image frames in each set of temporally consecutive image frames is based on the runtime conditions measurement.

Disclosed embodiments may comprise or utilize a special purpose or general-purpose computer including computer hardware, as discussed in greater detail below. Disclosed embodiments also include physical and other computer-readable media for carrying or storing computer-executable instructions and/or data structures. Such computer-readable media can be any available media that can be accessed by a general-purpose or special-purpose computer system. Computer-readable media that store computer-executable instructions in the form of data are one or more “physical computer storage media” or “hardware storage device(s).” Computer-readable media that merely carry computer-executable instructions without storing the computer-executable instructions are “transmission media.” Thus, by way of example and not limitation, the current embodiments can comprise at least two distinctly different kinds of computer-readable media: computer storage media and transmission media.

Computer storage media (aka “hardware storage device”) are computer-readable hardware storage devices, such as RAM, ROM, EEPROM, CD-ROM, solid state drives (“SSD”) that are based on RAM, Flash memory, phase-change memory (“PCM”), or other types of memory, or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to store desired program code means in hardware in the form of computer-executable instructions, data, or data structures and that can be accessed by a general-purpose or special-purpose computer.

A “network” is defined as one or more data links that enable the transport of electronic data between computer systems and/or modules and/or other electronic devices. When information is transferred or provided over a network or another communications connection (either hardwired, wireless, or a combination of hardwired or wireless) to a computer, the computer properly views the connection as a transmission medium. Transmission media can include a network and/or data links which can be used to carry program code in the form of computer-executable instructions or data structures and which can be accessed by a general purpose or special purpose computer. Combinations of the above are also included within the scope of computer-readable media.

Further, upon reaching various computer system components, program code means in the form of computer-executable instructions or data structures can be transferred automatically from transmission computer-readable media to physical computer-readable storage media (or vice versa). For example, computer-executable instructions or data structures received over a network or data link can be buffered in RAM within a network interface module (e.g., a “NIC”), and then eventually transferred to computer system RAM and/or to less volatile computer-readable physical storage media at a computer system. Thus, computer-readable physical storage media can be included in computer system components that also (or even primarily) utilize transmission media.

Computer-executable instructions comprise, for example, instructions and data which cause a general-purpose computer, special purpose computer, or special purpose processing device to perform a certain function or group of functions. The computer-executable instructions may be, for example, binaries, intermediate format instructions such as assembly language, or even source code. Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the described features or acts described above. Rather, the described features and acts are disclosed as example forms of implementing the claims.

Disclosed embodiments may comprise or utilize cloud computing. A cloud model can be composed of various characteristics (e.g., on-demand self-service, broad network access, resource pooling, rapid elasticity, measured service, etc.), service models (e.g., Software as a Service (“SaaS”), Platform as a Service (“PaaS”), Infrastructure as a Service (“IaaS”), and deployment models (e.g., private cloud, community cloud, public cloud, hybrid cloud, etc.).

Those skilled in the art will appreciate that the invention may be practiced in network computing environments with many types of computer system configurations, including, personal computers, desktop computers, laptop computers, message processors, hand-held devices, multi-processor systems, microprocessor-based or programmable consumer electronics, network PCs, minicomputers, mainframe computers, mobile telephones, PDAs, pagers, routers, switches, wearable devices, and the like. The invention may also be practiced in distributed system environments where multiple computer systems (e.g., local and remote systems), which are linked through a network (either by hardwired data links, wireless data links, or by a combination of hardwired and wireless data links), perform tasks. In a distributed system environment, program modules may be located in local and/or remote memory storage devices.

Alternatively, or in addition, the functionality described herein can be performed, at least in part, by one or more hardware logic components. For example, and without limitation, illustrative types of hardware logic components that can be used include Field-programmable Gate Arrays (FPGAs), Program-specific Integrated Circuits (ASICs), Application-specific Standard Products (ASSPs), System-on-a-chip systems (SOCs), Complex Programmable Logic Devices (CPLDs), central processing units (CPUs), graphics processing units (GPUs), and/or others.

As used herein, the terms “executable module,” “executable component,” “component,” “module,” or “engine” can refer to hardware processing units or to software objects, routines, or methods that may be executed on one or more computer systems. The different components, modules, engines, and services described herein may be implemented as objects or processors that execute on one or more computer systems (e.g., as separate threads).

One will also appreciate how any feature or operation disclosed herein may be combined with any one or combination of the other features and operations disclosed herein. Additionally, the content or feature in any one of the figures may be combined or used in connection with any content or feature used in any of the other figures. In this regard, the content disclosed in any one figure is not mutually exclusive and instead may be combinable with the content from any of the other figures.

The present invention may be embodied in other specific forms without departing from its spirit or characteristics. The described embodiments are to be considered in all respects only as illustrative and not restrictive. The scope of the invention is, therefore, indicated by the appended claims rather than by the foregoing description. All changes which come within the meaning and range of equivalency of the claims are to be embraced within their scope.

文章《Microsoft Patent | Systems and methods for power efficient image acquisition using single photon avalanche diodes (spads)》首发于Nweon Patent

]]>
Microsoft Patent | Metalens for use in an eye-tracking system of a mixed-reality display device https://patent.nweon.com/25238 Wed, 30 Nov 2022 20:45:36 +0000 https://patent.nweon.com/?p=25238 ...

文章《Microsoft Patent | Metalens for use in an eye-tracking system of a mixed-reality display device》首发于Nweon Patent

]]>
Patent: Metalens for use in an eye-tracking system of a mixed-reality display device

Patent PDF: 加入映维网会员获取

Publication Number: 20220382064

Publication Date: 20221201

Assignee: Microsoft Technology Licensing, Llc (Redmond, Wa, Us)

Abstract

A head-mounted display device wearable by a user and supporting a mixed-reality experience includes a see-through display system through which the user can view a physical world and on which virtual images are renderable. At least one light source is configured to emit near infrared (IR) light that illuminates an eye of the user of the near-eye mixed reality display device. An imaging sensor is configured to capture reflections of the near IR light reflected from the eye of the user. A metalens is configured to receive the reflections of the IR light reflected from the eye of the user and direct the reflections onto the image sensor.

Claims

1.A method for operating a near-eye display system, comprising: illuminating an eye of a user of the near-eye display system with light from at least one light source emitting light in a prescribed waveband, wherein illuminating the eye of the user includes selectively activating a first set of light sources that includes at least one light emitting diode (LED) for performing user gaze detection and selectively activating a second set of light sources different from the first set of light sources and which includes at least one vertical-cavity surface-emitting laser (VCSEL) for performing iris recognition; and capturing reflections of the light from the eye of the user using an image sensor arrangement that includes a metalens that receives the reflections of light and directs the reflections of light onto an image sensor.

Description

BACKGROUND

Mixed-reality display devices, such as wearable head mounted mixed-reality (MR) display devices, may be configured to display information to a user about virtual and/or real objects in a field of view of the user and/or a field of view of a camera of the device. For example, an MR display device may be configured to display, using a see-through display system, virtual environments with real-world objects mixed in, or real-world environments with virtual objects mixed in.

In such MR display devices, tracking the positions of the eyes of a user can enable estimation of the direction of the user’s gaze. Gaze direction can be used as an input to various programs and applications that control the display of images on the MR display devices, among other functions. To determine the position and gaze of the user’s eyes, an eye tracker may be incorporated into the MR display device.

SUMMARY

In an embodiment, an eye-tracking system is disposed in a near-eye mixed reality display device. The eye-tracking system includes one or more light sources configured to emit light in a specified waveband (e.g., the near-infrared) that illuminates an eye of a user of the near-eye mixed reality display device. An imaging sensor is configured to capture reflections of the light reflected from the eye of the user. A metalensmetalens is configured to receive the reflections of light from the eye of the user and direct the reflections onto the image sensor.

The implementation of an eye-tracking system that uses a metalensmetalens to receive the reflected light from the user’s eye and direct it onto the image sensor provides significant technical advantages. In general, the use of a metalensmetalens allows for a higher performing eye-tracking system to be implemented in a smaller, potentially more energy efficient form factor. For example, metalensmetalenses are particularly well-suited for use in an eye-tracking system because such a system employs an illumination source with a predetermined and relatively narrow bandwidth which can be selected in advance as part of the system design process. In this way the metalens can be specifically tailored and optimized to operate at those wavelengths. As yet other examples of the advantages arising from the use of a metalens, a metalens can be thinner and lighter and have a greater sensitivity than its refractive counterpart. Additionally, the image quality provided by a metalensmetalens can be much better than that provided by a refractive lens when the metalensmetalens is matched with a suitable illumination source.

This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter. Furthermore, the claimed subject matter is not limited to implementations that solve any or all disadvantages noted in any part of this disclosure. These and various other features will be apparent from a reading of the following Detailed Description and a review of the associated drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates an example of a mixed reality (MR) display device.

FIG. 2 illustrates a block diagram of the MR display device illustrated in FIG. 1.

FIG. 3 illustratively shows holographic virtual images that are overlayed onto real-world images within a field of view (FOV) of a mixed reality device.

FIG. 4 shows one example of a sensor package which may be used in the eye tracking system of a mixed reality display device.

FIG. 5 shows a detail of an illustrative pattern of structures that collectively form the meta surface of the metalens shown in FIG. 4.

FIG. 6 illustrates another example of the mixed reality (MR) display device shown in FIG. 1 which employs both LEDs and VCSELs for performing both eye-tracking and iris recognition.

FIG. 7 is a flowchart showing one example of a method for operating an eye-tracking system in a near-eye display system.

FIG. 8 is a flowchart showing an example of a method for operating an eye-tracking system in a near-eye display system that employs both LED and VCSEL near IR light sources.

DETAILED DESCRIPTION

In the following detailed description, numerous specific details are set forth by way of examples in order to provide a thorough understanding of the relevant teachings. However, it should be apparent that the present teachings may be practiced without such details. In other instances, well known methods, procedures, components, and/or circuitry have been described at a relatively high-level, without detail, in order to avoid unnecessarily obscuring aspects of the present teachings.

FIG. 1 illustrates an example of a mixed reality (MR) display device 100, and FIG. 2 illustrates a block diagram of the MR display device 100 illustrated in FIG. 1. In the example illustrated in FIGS. 1 and 2, the MR display device 100 is a head mounted MR device, intended to be worn on a user’s head during ordinary use, including a head mounted display (HMD) device. However, it is noted that this disclosure is expressly not limited to head mounted MR devices or other near-eye display devices. Mixed reality refers to an experience allowing virtual imagery to be mixed with a real-world physical environment in a display. For example, real-world objects and/or real-world spaces may be identified and augmented with corresponding virtual objects. Mixed reality may be implemented with, for example, virtual reality or augmented reality technologies.

The MR display device 100 includes a display subsystem 120 for displaying images to a user of the MR display device 100. In the example illustrated in FIG. 1, the display subsystem 120 is intended to be close to a user’s eyes and includes a see-through MR display device including one or more transparent or semi-transparent see-through lenses 122 arranged such that images may be projected onto the see-through lenses 122, or produced by image-producing elements (for example, see-through OLED displays) located within the see-through lenses 122. A user wearing the MR display device 100 has an actual direct view of a real-world space (instead of image representations of the real-world space) through the see-through lenses 122, and at the same time view virtual objects (which may be referred to as virtual images or holograms) that augment the user’s direct view of the real-world space.

The MR display device 100 further includes one or more outward facing image sensors 130 configured to acquire image data for a real-world scene around and/or in front of the MR display device 100. The outward facing image sensors 130 may include one or more digital imaging camera(s) 132 arranged to capture two-dimensional visual images. In some implementations, two imaging camera(s) 132 may be used to capture stereoscopic images. The outward facing imaging sensors 130 may also include one or more depth camera(s) 134, such as, but not limited to, time of flight depth cameras, arranged to capture a depth image data, such as a depth map providing estimated and/or measured distances from the MR display device 100 to various portions of a field of view (FOV) of the depth camera(s) 134. Depth image data obtained via the depth camera(s) 134 may be registered to other image data, such as images concurrently captured via imaging camera(s) 132. The outward facing image sensors 130 may be configured to capture individual images and/or sequences of images (for example, at a configurable frame rate or frames rates). In some implementations, the outward facing image sensors 130 or other sensors associated with the MR display device 100 can be configured to assess and/or identify external conditions, including but not limited to time of day, direction of lighting, ambiance, temperature, and other conditions. The external conditions can provide the MR display device 100 with additional factor(s) to determine types of virtual graphical elements to display to a user.

The MR display device 100 may further include a gaze detection subsystem 140 configured to detect, or provide sensor data for detecting, a direction of gaze of each eye of a user, as illustrated in FIGS. 1 and 2. The gaze detection subsystem 140 may be arranged to determine gaze directions of each of a user’s eyes in any suitable manner. For instance, in the example illustrated in FIGS. 1 and 2, the gaze detection subsystem 140 includes one or more glint sources 142, such as infrared (IR) light sources, arranged to cause a glint of light to reflect from each eyeball of a user, and one or more image sensor(s) 144 arranged to capture an image of each eyeball of the user. Changes in the glints from the user’s eyeballs as determined from image data gathered via image sensor(s) 144 may be used to determine a direction of gaze. Further, a location at which gaze lines projected from the user’s eyes intersect the external display may be used to determine an object or position at which the user is gazing (for example, a virtual object displayed by the display subsystem 120). The gaze detection subsystem 140 may have any suitable number and arrangement of glint sources and image sensors. In one non-limiting example embodiment, four glint sources and one image sensor are used for each eye. Furthermore, in some implementations, the gaze detection subsystem 140 can be configured to assist the MR display device 100 in more accurately identifying real-world objects of interest and associating such objects with virtual applications.

The MR display device 100 may include a location subsystem 150 arranged to provide a location of the MR display device 100. Location subsystem 150 may be arranged to determine a current location based on signals received from a navigation satellite system, such as, but not limited to, GPS (United States), GLONASS (Russia), Galileo (Europe), and CNSS (China), and technologies augmenting such signals, such as, but not limited to, augmented GPS (A-GPS). The location subsystem 150 may be arranged to determine a location based on radio frequency (RF) signals identifying transmitting devices and locations determined for such devices. By way of example, Wi-Fi, Bluetooth, Zigbee, RFID, NFC, and cellular communications include device identifiers that may be used for location determination. MR display device 100 may be arranged to use a location provided by the location subsystem 150 as an approximate location, which is refined based on data collected by other sensors. The MR display device 100 may include audio hardware, including one or more microphones 170 arranged to detect sounds, such as verbal commands from a user of the MR display device 100, and/or one or more speaker(s) 180 arranged to output sounds to the user, such as verbal queries, responses, instructions, and/or information.

The MR display device 100 may include one or more motion sensor(s) 160 arranged to measure and report motion of the MR display device 100 as motion data. In some implementations, the motion sensor(s) 160 may include an inertial measurement unit (IMU) including accelerometers (such as a 3-axis accelerometer), gyroscopes (such as a 3-axis gyroscope), and/or magnetometers (such as a 3-axis magnetometer). The MR display device 100 may be arranged to use this motion data to determine changes in position and/or orientation of MR display device 100, and/or respective changes in position and/or orientation of objects in a scene relative to MR display device 100. The outward facing image sensor(s) 130, image sensor(s) 144, sensors included in the location subsystem 150, motion sensor(s) 160, and microphone(s) 170, which are included in or are coupled to the head mounted MR display device 100, may be, individually or collectively, referred to as head mounted sensors. Data collected via such head mounted sensors reflect the position and orientations of a user’s head.

The MR display device 100 further includes a controller 110 including a logic subsystem 112, a data holding subsystem 114, and a communications subsystem 116. The logic subsystem 112 may include, for example, one or more processors configured to execute instructions and communicate with the other elements of the MR display device 100 illustrated in FIGS. 1 and 2 according to such instructions to realize various aspects of this disclosure involving the MR display device 100. Such aspects include, but are not limited to, configuring and controlling devices, processing sensor input, communicating with other computer systems, and/or displaying virtual objects via display subsystem 120. The data holding subsystem 114 includes one or more memory devices (such as, but not limited to, DRAM devices) and/or one or more storage devices (such as, but not limited to, flash memory devices). The data holding subsystem 114 includes one or more media having instructions stored thereon which are executable by the logic subsystem 112, which cause the logic subsystem 112 to realize various aspects of this disclosure involving the MR display device 100. Such instructions may be included as part of an operating system, application programs, or other executable programs. The communications subsystem 116 is arranged to allow the MR display device 100 to communicate with other computer systems. Such communication may be performed via, for example, Wi-Fi, cellular data communications, and/or Bluetooth.

It will be appreciated that the MR display device 100 is provided by way of example, and thus is not meant to be limiting. Therefore, it is to be understood that the MR display device 100 may include additional and/or alternative sensors, cameras, microphones, input devices, output devices, etc. than those shown without departing from the scope of this disclosure. Further, the physical configuration of an MR device and its various sensors and subcomponents may take a variety of different forms without departing from the scope of this disclosure.

FIG. 3 illustrates a example of a user 115 making use of an MR display device 100 in a physical space. As noted above, an imager (not shown) generates holographic virtual images that are guided by the waveguide(s) in the display device to the user. Being see-through, the waveguide in the display device enables the user to perceive light from the real world.

The display subsystem 120 of the MR display device 100 can render holographic images of various virtual objects that are superimposed over the real-world images that are collectively viewed to thereby create a mixed-reality environment 200 within the MR display device’s FOV (field of view) 220. It is noted that the FOV of the real world and the FOV of the holographic images in the virtual world are not necessarily identical, as the virtual FOV provided by the display device is typically a subset of the real FOV. FOV is typically described as an angular parameter in horizontal, vertical, or diagonal dimensions.

It is noted that FOV is just one of many parameters that are typically considered and balanced by MR display device designers to meet the requirements of a particular implementation. For example, such parameters may include eyebox size, brightness, transparency and duty time, contrast, resolution, color fidelity, depth perception, size, weight, form-factor, and user comfort (i.e., wearable, visual, and social), among others.

In the illustrative example shown in FIG. 3, the user 115 is physically walking in a real-world urban area that includes city streets with various buildings, stores, etc., with a countryside in the distance. The FOV of the cityscape viewed on MR display device 100 changes as the user moves through the real-world environment and the device can render static and/or dynamic virtual images over the real-world view. In this illustrative example, the holographic virtual images include a tag 225 that identifies a restaurant business and directions 230 to a place of interest in the city. The mixed-reality environment 200 seen visually on the waveguide-based display device may also be supplemented by audio and/or tactile/haptic sensations produced by the MR display device in some implementations.

In a wearable device such as MR display device 100, estimating the position of a user’s eye can allow the MR display device to display images according to where the user’s eye is located and in which direction the user is looking. The user may also interact with the MR display device by using their gaze as input to command the MR display device. For this purpose gaze detection subsystem 310 is used to determine the position and gaze of the user’s eye.

As previously mentioned, gaze detection may be accomplished using one or more IR light sources that cause a glint of light to be reflected from each of the user’s eyes. The glint of light is then detected by an image sensor (e.g., image sensor 134 shown in FIG. 1). The IR light sources (e.g., glint sources 132 in FIG. 1) are typically light emitting diodes (LEDs) sources that operate at near infrared (IR) wavelengths e.g., wavelengths between about 750 nm and 2500 nm. A lens or lens system is generally incorporated in or otherwise associated with the image sensor to focus the light onto the sensor. In some case the lens may form a telecentric image. That is, the metalens may be telecentric in image space.

In a conventional gaze detection system the sensor lens is typically a refractive lens in which control of light characteristics such as amplitude, direction and polarization is determined by the lens geometry and the intrinsic material properties of the lens. For example, the refractive index of a conventional lens is determined by its refractive index, which is based at least in part on the lens material. In the embodiments described herein, the sensor lens is implemented from one or more elements formed of metamaterials. In general, an optical metamaterial (also referred to as a photonic metamaterial) can be defined as any composition of sub-wavelength structures arranged to modify the optical response of an interface. That is, in an optical metamaterial, the optical response depends on the arrangement of the sub-wavelength structures. Accordingly, metamaterials can be engineered to exhibit optical properties not otherwise available in other naturally occurring materials. An element having a meta surface structure for controlling the optical response of light is sometimes referred to as a metalensmetalens or simply a metalens.

FIG. 4 shows one example of a sensor package 300 which may be used in the eye tracking system of a mixed reality device. The sensor package 300 includes a sensor array 305 such as a CMOS sensor and a metalens 307. An aperture 309 in the sensor housing 311 allows NIR light reflected from the user’s eye to be incident on the meta surface 313 of the metalens 307, which directs the light onto the sensor array 305.

The meta surface 313 can include a dense arrangement of sub-wavelength structures arranged to introduce a phase shift in an incident wavefront, thereby allowing for precise control of the deflection of light rays. For example, FIG. 5 shows a detail of a pattern of structures 322 that collectively form the meta surface 313 of the metalens 307. The example structures depicted in the detail are shown with exaggerated features for illustrative purposes. The details are not intended to impart limitations with respect to the number, shape, arrangement, orientation, or dimensions of the features of the corresponding optical element. In other embodiments, meta surface 313 may have different structure patterns. The metalens 307 can be designed to deflect the NIR wavelengths of an incident light ray onto the sensor array 305, as indicated in FIG. 3 by extreme ray 315. It should be noted that the example arrangement of structures 322 depicted in the detail are shown for illustrative purposes and do not necessarily represent an arrangement suited for a particular image sensor.

The design and manufacture of a metalens for a particular wavelength is known in the art, and any of those known design methods for forming nanostructures on a metalensmetalens for a particular wavelength may be utilized in conjunction with the image sensor described herein for use with a gaze detection system such as described above. For example, the reference Amir Arbabi, et al., Miniature optical planar camera based on a wide-angle metasurface doublet corrected for monochromatic aberrations, Nature Communications 7, Article number: 13682 (2016), sets forth design principles and manufacturing techniques suitable for use with the present technology.

It is well-known that metalensmetalenses generally exhibit poor performance across broad wavelength bands and perform well for a single wavelength or narrow band of wavelengths, with the performance degrading quickly as the bandwidth increases. That is, metalenses suffer from relatively large chromatic aberrations. This characteristic of metalenses can make them problematic when used with relatively broadband light sources. For instance, the image quality provided by a camera having a metalens may be poor when the camera is used to capture an image of an object or scene illuminated with ambient light (e.g., sunlight, interior lighting). The present inventors have recognized, however, that metalenses are particularly well-suited for use in cameras or other imaging devices that capture an image using an active light source that has a narrow bandwidth which can be selected in advance as part of the system design process so that the metalens can be specifically tailored to operate at those wavelengths.

Moreover, in addition to their overall compatibility with an imaging device employing active illumination, a number of significant advantages arise from the use of a metalens in the gaze-detection system of an MR device such as a head-mounted MR device. For example, a metalens can be thinner and lighter than its refractive counterpart, which is particularly important in a device designed for portability such as a head-mounted MR device. Also, despite its susceptibility to high chromatic dispersion, the image quality provided by a metalens can be much better than that provided by a refractive lens when the metalens is matched with a suitable illumination source.

Yet another advantage of a metalens is that it can be designed with a lower f-number than its refractive counterpart, which increases its sensitivity at low light levels, thereby reducing the power requirements of the illumination source, which once again is particularly important in a portable device such as a head-mounted MR device. Other advantages of a metalens includes its thermal stability and various manufacturing advantages such as the ability to relatively easily apply an anti-reflective coating to the flat surface opposite the metasurface of the lens.

While the example of a head-mounted MR device described above has been described as having a gaze detection system that employs an image sensor with a metalens, more generally the head-mounted MR device may be equipped with any type of eye-tracking system that employs an image sensor having a metalens. Such eye-tracking systems may be used for gaze detection and/or pupil position tracking and imaging for e.g., iris recognition for biometric identification or authentication. One particular embodiment of an eye-tracking system that can be used for both gaze detection and/or pupil position tracking and imaging will be discussed below.

In some embodiments the light source is a light emitting diode (LED) operating at near IR wavelengths. In an alternative embodiment the light source may be a vertical-cavity surface-emitting laser (VCSEL), which may be advantageous because it can be designed to be suitably compact and energy efficient, while emitting a narrower band of wavelengths than an LED operating at near IR wavelengths. For example, while an LED operating at near IR wavelengths may have a bandwidth of about e.g., 50 nm at near IR wavelengths, a VCSEL may have a bandwidth of about e.g., 5 nm, at near IR wavelengths. In addition to the aforementioned advantages arising from the use of a VCSEL, they also may be advantageous because the use of a narrower bandwidth can produce a higher quality image since the metalens will suffer less chromatic dispersion. In addition, the use of a narrower bandwidth can improve IR ambient light coexistence since interference may be reduced from reflections of ambient light and stray ambient light from the eye, which can compete with the light from the narrowband, near IR light source.

When LEDs are used, they serve as glint sources, which, as explained above, cause a glint of light to reflect from each eye of a user, allowing the user’s direction of gaze to be determined. However, the resolution or sharpness of the image produced using LEDs is generally not sufficient for performing iris recognition. However, because of the improved image quality that can be produced when VCSELs are used, an eye tracking system using a VCSEL as the light source and a metalens that is used with the image sensor can be collectively used to perform iris recognition in addition to gaze detection.

In yet another embodiment, a hybrid approach may be employed in which both one or more LEDs and one or more VCSELs are provided as light sources for the eye tracking system. The LEDs may be used when the user’s direction of gaze is to be determined. And the VCSELs may be used when a high-resolution image is required (e.g., for iris recognition). Hence, only the LEDs or the VCSELs may need to be supplied with power at any one time.

FIG. 6 shows an alternative example of the MR display device shown in FIG. 1. In FIGS. 1 and 6 like elements are denoted by like reference numbers. The device in FIG. 6 employs LED sources 142 and 144 as shown in FIG. 1 as well VCSEL sources 146 and 148. It should be noted that the number of LED and VCSEL light sources, as well their placement on the frame of the device may vary and that the number of light sources and their location in FIG. 6 are shown for illustrative purposes only. Moreover, the number of LED sources and VCSEL sources need not necessarily be the same.

FIG. 7 is a flowchart showing one example of a method for operating an eye-tracking system in a near-eye display system that employs a single type of near IR light source (e.g., LED or VCSEL). At step 405 one or more light sources in the near-eye display system is activated so that near IR light is emitted. At step 410 the light is directed to an eye of the user of the near-eye display system. A metalens is arranged at step 415 to receive near IR light reflected from the user’s eye. The metalens has a meta surface with sub-wavelength structures having a configuration and arrangement for directing the reflections onto the image sensor which is determined based at least in part on the prescribed waveband. An image sensor is arranged in the near-eye display system so that the reflected near IR light received by the metalens is directed by the metalens onto an image sensor.

FIG. 8 is a flowchart showing an example of a method for operating an eye-tracking system in a near-eye display system that employs both LED and VCSEL near IR light sources. In this example the eye-tracking system is first used for eye tracking and then for iris tracking, although more generally the system may be used in any sequence to perform eye tracking and iris tracking or it may be used to perform only one of eye tracking or iris tracking. At step 505 the one or more LEDs in the near-eye display system are activated so that near IR light is emitted and the VCSELs are powered off. At step 510 the near IR light is directed to an eye of the user of the near-eye display system. A metalens is arranged at step 515 to receive near IR light reflected from the user’s eye. The metalens directs the reflections onto an image sensor at step 520. This results in a low modulation transfer function (MTF) image that is sufficient for eye tracking. After the image is obtained the LEDs may be powered off at step 525.

Next, at step 530, when it is desired to perform iris recognition, the one or more VCSELs in the near-eye display system are activated so that relatively narrowband near IR light is emitted. The LEDs remain off. At step 535 the near IR light is directed to the eye of the user. The metalens receives the near IR light reflected from the user’s eye at step 540. The metalens directs the reflections onto the image sensor at step 545 to form an image. Since the output from the VCSELs is relatively narrowband, a high modulation transfer function (MTF) image is formed that is generally sufficient for iris recognition.

Various exemplary embodiments of the present display system are now presented by way of illustration and not as an exhaustive list of all embodiments. An example includes a method for operating a near-eye display system, comprising: illuminating an eye of a user of the near-eye display system with light from at least one light source emitting light in a prescribed waveband; and capturing reflections of the light from the eye of the user using an image sensor arrangement that includes a metalens that receives the reflections of light and directs the reflections of light onto an image sensor.

In another example the metalens has a meta surface with sub-wavelength structures having a configuration and arrangement for directing the reflections onto the image sensor which is determined based at least in part on the prescribed waveband. In another example illuminating the eye of the user includes activating at least one LED to illuminate the eye of the user. In another example illuminating the eye of the user includes activating at least one VCSEL to illuminate the eye of the user. In another example illuminating the eye of the user includes selectively activating a first set of light sources for performing user gaze detection and selectively activating a second set of light sources different from the first set of light sources for performing iris recognition. In another example selectively activating the first set of light sources and selectively activating the second set of light sources includes only activating one of the first set of light sources and the second set of light sources at any given time. In another example the light sources in the first set of light sources are configured to emit a narrower bandwidth of light than the light sources in the second set of light sources. In another example the first set of light sources includes at least one LED and the second set of light sources includes at least one VCSEL. In another example the specified waveband is a near Infrared (IR) waveband. In another example the near-eye display system includes a mixed-reality (MR) display device. In another example the metalens is configured to operate as a telecentric lens.

A further example includes an eye-tracking system disposed in a near-eye mixed reality display device, comprising: at least one light source configured to emit light in a specified waveband that illuminates an eye of a user of the near-eye mixed reality display device; an imaging sensor configured to capture reflections of the light reflected from the eye of the user; and a metalens configured to receive the reflections of the light reflected from the eye of the user and direct the reflections onto the image sensor, the metalens having a meta surface with sub-wavelength structures having a configuration and arrangement for directing the reflections onto the image sensor which is determined based at least in part on the specified waveband.

In another example the at least one light source includes a light emitting diode (LED) or a vertical-cavity surface-emitting laser (VCSEL). In another example the at least one light source includes at least one LED and at least one VCSEL. In another example the at least one light source includes at least first and second light sources, the first light source being configured to emit a narrower bandwidth of light than the second light source. In another example the specified waveband is a near Infrared (IR) waveband. In another example the metalens is configured to operate as a telecentric lens.

A further example includes a head-mounted display device wearable by a user and supporting a mixed-reality experience, comprising: a see-through display system through which the user can view a physical world and on which virtual images are renderable; at least one light source configured to emit near IR light that illuminates an eye of the user of the near-eye mixed reality display device; an imaging sensor configured to capture reflections of the near IR light reflected from the eye of the user; and a metalens configured to receive the reflections of the IR light reflected from the eye of the user and direct the reflections onto the image sensor.

In another example the metalens has a meta surface with sub-wavelength structures being configured and arranged for operation at near IR wavelengths. In another example the at least one light source includes at least first and second light sources, the first light source being configured to emit a narrower bandwidth of light than the second light source.

Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.

文章《Microsoft Patent | Metalens for use in an eye-tracking system of a mixed-reality display device》首发于Nweon Patent

]]>
Microsoft Patent | Providing haptic feedback through touch-sensitive input devices https://patent.nweon.com/25226 Wed, 30 Nov 2022 20:27:22 +0000 https://patent.nweon.com/?p=25226 ...

文章《Microsoft Patent | Providing haptic feedback through touch-sensitive input devices》首发于Nweon Patent

]]>
Patent: Providing haptic feedback through touch-sensitive input devices

Patent PDF: 加入映维网会员获取

Publication Number: 20220382373

Publication Date: 20221201

Assignee: Microsoft Technology Licensing, Llc (Redmond, Wa, Us)

Abstract

A method for providing haptic feedback. Haptic feedback may be provided to a user through a touch-sensitive input device configured to provide input to a touch-sensitive computing device. The method includes determining a haptic perception factor based at least in part on one or more of a set of input device inputs received from sensors of the touch-sensitive input device and a set of computing device inputs received from sensors of the touch-sensitive computing device. A haptic response profile is determined based at least in part on the haptic perception factor. Haptic devices of the touch-sensitive input device are then actuated based at least in part on the determined haptic response profile.

Claims

1.A method for providing haptic feedback through a touch-sensitive input device configured to provide input to a touch-sensitive computing device, comprising: determining a haptic perception factor based at least in part on a set of input device inputs received from sensors of the touch-sensitive input device and a set of computing device inputs received from sensors of the touch-sensitive computing device, the computing device inputs including at least a contact area between a hand of a user and the touch-sensitive computing device; determining a haptic response profile based at least in part on the haptic perception factor; actuating haptic devices of the touch-sensitive input device based at least in part on the determined haptic response profile; determining a change in the haptic perception factor based at least in part on a change in the contact area between the hand of the user and the touch-sensitive computing device; adjusting the haptic response profile based at least in part on the changed haptic perception factor; and actuating the haptic devices of the touch-sensitive input device based at least in part on the adjusted haptic response profile.

Description

BACKGROUND

Smartphones, tablets, and other computing devices with touch-sensitive displays allow for input using fingers, an electronic stylus, etc. A touch-sensitive input device may include haptic actuators configured to provide feedback to the user as a means of enhancing the user experience. As one example, a stylus may provide haptic output in the form of vibration applied to a body of the stylus via an internal motor.

SUMMARY

This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter. Furthermore, the claimed subject matter is not limited to implementations that solve any or all disadvantages noted in any part of this disclosure.

A method for providing haptic feedback is disclosed. Haptic feedback may be provided to a user through a touch-sensitive input device configured to provide input to a touch-sensitive computing device. The method includes determining a haptic perception factor based at least on part on one or more of a set of input device inputs received from sensors of the touch-sensitive input device and a set of computing device inputs received from sensors of the touch-sensitive computing device. A haptic response profile is determined based at least in part on the haptic perception factor. Haptic devices of the touch-sensitive input device are then actuated based at least in part on the determined haptic response profile.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 depicts an example of touch-sensitive input in the form of a user interacting with a touch-sensitive display via a touch-sensitive input device.

FIG. 2A depicts a user providing input with a stylus in an example automatic feedback mode.

FIG. 2B depicts a user providing input with a stylus in another example automatic feedback mode.

FIG. 3A depicts a user providing input with a stylus in an example interactive feedback mode.

FIG. 3B depicts a user providing input with a stylus in another example interactive feedback mode.

FIG. 4A depicts a user holding a stylus with an example hand grip.

FIG. 4B depicts a user holding a stylus with another example hand grip.

FIG. 5A depicts a touch-sensitive computing device positioned on a solid surface.

FIG. 5B depicts a touch-sensitive computing device being held in a user’s hand.

FIG. 6A depicts a heat map for a touch-sensitive computing device featuring a contact point for a stylus.

FIG. 6B depicts a heat map for a touch-sensitive computing device featuring contact points for a stylus and a palm of a user’s hand.

FIG. 6C depicts a heat map for a touch-sensitive computing device featuring contact points for a stylus, a palm of a user’s hand, and a portion of the user’s second hand.

FIG. 7A depicts a foldable computing device in a flat configuration positioned on a flat surface.

FIG. 7B depicts a foldable computing device in a back-to-back pose positioned on a flat surface.

FIG. 7C depicts a foldable computing device in a back-to-back pose held in a user’s hand.

FIG. 7D depicts a foldable computing device in a tented pose positioned on a flat surface.

FIG. 8 depicts an example method for providing haptic feedback through a touch-sensitive input device being used to provide input to a touch-sensitive computing device.

FIG. 9 depicts an example method for providing haptic feedback through a touch-sensitive input device being used to provide input to a touch-sensitive computing device.

FIG. 10 shows a schematic depiction of an example computing environment in which the systems of FIG. 1 may be enacted.

DETAILED DESCRIPTION

A variety of input devices have been developed that provide haptic output. As one example, a haptic stylus may provide haptic output in the form of vibration applied to a body of the stylus via an internal motor. Styli and other input devices may provide haptic output for a variety of purposes, including but not limited to simulating a tactile sensation (e.g., resulting from the traversal of a virtual surface such as gravel, or from touching a virtual object), simulating ink-on-surface feel sensations, confirming a user input (e.g., in response to user selection of a graphical user interface element), and/or providing another type of feedback (e.g., an indication of the state of an input device such as a battery level, the state of an application).

To achieve haptic output, a haptic feedback mechanism such as a motor may be arranged within the body of the stylus, such as near the tip of the stylus. This localized positioning of the motor, however, may be such that users perceive noticeably different haptic outputs as their grip and finger positions on the stylus change, which tends to occur in typical stylus use scenarios. Further, the haptic output may be dampened by pressure between the stylus and the touch-sensitive computing device it is being used with. This dampening may be exacerbated if the user is also pressing on the touch-sensitive computing device with their hand while using the stylus, if the computing device is laying on a surface, etc. Conversely, if haptic feedback is provided to a user holding the stylus loosely while hovering over the touch-sensitive computing device, the force, if too high, may be distracting or cause the user to drop or lose grip on the stylus. As such, there are numerous challenges with providing a consistent and favorable user experience with a haptic stylus and corresponding computing device.

Accordingly, systems and methods are presented herein that may be used to generate a consistent and favorable experience for a user operating a touch-sensitive input device with a touch-sensitive computing device. Sensors, both within and on the touch-sensitive input device and within and on the touch-sensitive computing device, provide inputs to a controller which uses the sensory information to determine a haptic perception factor. Based at least in part on this haptic perception factor, the controller may determine how much and/or what type of haptic actuation is necessary to generate a consistent haptic sensation profile, and adjust the intensity, frequency, or other characteristics of the haptic actuation accordingly. In this way, the user may experience consistent haptic feedback, whether holding the stylus in the air or pressing down upon the touch-sensitive computing device with the stylus when the computing device is resting between the palm of the user’s hand and a solid table. In some cases, a user may specify a preferred amount of haptic sensation (e.g., via a user profile) which they would like to feel when haptic feedback is provided.

FIG. 1 depicts an example touch-sensitive input device in the form of a stylus 100. While described primarily using the stylus form as an example any touch-sensitive input device configured to sense the touch of a user, provide input to a touch-sensitive computing device, and deliver haptic feedback may be utilized. Stylus 100 includes an elongated body 101 in the form factor of a pen, though the body may assume any suitable form. As shown in the depicted example, stylus 100 is operable to provide user input to a computing device 104. Computing device 104 is shown in the form of a mobile computing device (e.g., tablet) having a touch-sensitive display 106, but may assume any suitable form. Any suitable type of user input may be provided to computing device 104 using stylus 100. As examples, stylus 100 may be used to draw graphical content on touch-sensitive display 106 of computing device 104, modify graphical content (e.g., resize, reposition, rotate), erase graphical content, select graphical user interface (GUI) elements, and/or provide gestural input.

To enable the provision of user input from stylus 100 to computing device 104, the stylus may include a communication subsystem with which data may be transmitted from the stylus to the computing device. For example, the communication subsystem may include a radio transmitter for wirelessly transmitting data to and from computing device 104 along a radio link. As another example, the communication subsystem alternatively or additionally may include a capacitive transmitter for wirelessly transmitting data to and from computing device 104 along a capacitive link. The capacitive link may be established between the capacitive transmitter and a touch-sensitive display 106 having a capacitive touch sensor, for example.

Any suitable data may be transmitted to computing device 104 via the communication subsystem, including but not limited to indications of actuation at stylus 100 (e.g., depression of a stylus tip 108 or a stylus end 110), data regarding the position of the stylus relative to the computing device (e.g., one or more coordinates), a power state or battery level of the stylus, and data from a motion sensor on-board the stylus (e.g., accelerometer data with which stylus gestures may be identified). Moreover, in some examples, data regarding the locations of contact points between a user hand and stylus 100, which may be sensed by the stylus as described below, may be transmitted to computing device 104 via the communication subsystem. It will be understood that any suitable mechanism may be used to transmit information from stylus 100 to computing device 104. Additional examples include optical, resistive, and wired mechanisms. Example hardware including a processor and communication subsystem that may be incorporated by stylus 100 to implement the disclosed approaches is described below with reference to FIG. 10.

Stylus 100 is configured to provide haptic feedback to users. To this end, stylus 100 includes a haptic feedback mechanism 102 configured to apply haptic output to body 101. As shown in FIG. 1, haptic feedback mechanism 102 is arranged within body 101 toward stylus tip 108, but may be provided at any suitable location at stylus 100. Haptic feedback mechanism 102 may employ any suitable component(s) to provide haptic feedback as described herein. As one example, haptic feedback mechanism 102 may include a motor that applies haptic output to body 101 in the form of vibration induced in the body. Haptic feedback mechanism 102 may additionally or alternatively include electrostatic, ultrasonic, auditory, or other haptic mechanisms. In some examples, multiple haptic feedback mechanisms are provided at different locations within a stylus.

Stylus 100 further includes a sensor subsystem schematically depicted at 112. Sensor subsystem 112 may be configured to output sensor data indicating locations and local pressures along body 101 of the contact points formed between a user hand 114 and body 101 as detected by multiple grip sensing elements (not shown) such as a capacitive sleeve. Sensor subsystem 112 may be further configured to indicate a pressure between user hand 114 and stylus body 101 at one or more of the contact points. Additional sensing elements may include one or more tip pressure sensors positioned at a tip and/or an opposite end of stylus 100. One or more electrostatic sensors may be included in the tip and/or opposite end of stylus 100. Sensor subsystem 112 may further include one or more accelerometers, gyrometers, proximity sensors, etc. configured to provide information regarding the pose, velocity, and orientation of stylus 100. Data received from sensor subsystem 112 and/or from computing device 104 may be stored at memory 120.

Computing device 104 may also include a sensor subsystem (not shown). For example, computing device 104 may include capacitive touch sensors, peripheral grip sensors, accelerometers, gyrometers, proximity sensors, hall sensors, etc. In some examples, computing device 104 may include two or more touch-sensitive displays coupled by one or more hinges, and may thus include one or more hinge angle sensors. Touch-sensitive display 106 may be configured to output capacitance values for each touch-sensing pixel or capacitive grid point in the form of heat maps in order to determine which, if any areas of the capacitive touch sensor are being touched.

Computing device 104 may be configured to communicate with stylus 100 via electrostatic circuitry, radio circuitry, other wireless communication circuitry, etc. In this way, sensor information may be shared between computing device 104 and stylus 100. In this way, common inputs, such as speed and velocity of stylus tip 108 across touch-sensitive display 106 may be coordinated. Further, ambiguous information may be resolved. As an example, stylus 100 may not be able to discern whether stylus tip 108 is being pressed against touch-sensitive display 106 or pressed by a thumb of the user. Other components of example computing systems are described herein and with regard to FIG. 10.

A stylus may be used for both specific object selection, akin to a user’s finger, and for providing less structured input, such as writing, drawing, circling objects, etc. As such, different types of haptic feedback may be for provided for automatic feedback modes, such as inking and for interactive feedback modes, such as display object selection.

Haptic feedback may be used to generate a pleasing and reproducible writing experience, such as a perception of a pen or pencil writing on real paper. Some stylus tip compositions glide on glass surfaces with minimal friction. This decreases performance, accuracy, and control for a user attempting to write on the surface of a touch-sensitive display. Previous solutions include exchangeable stylus tips with different friction coefficients, and surface treatments or film overlays for touch-sensitive displays. However, these solutions may not be compatible for all applications of the touch-sensitive display, especially if the user also uses the computing device without a stylus.

FIGS. 2A and 2B show examples of a user providing input to computing device 104 with stylus 100 in interactive feedback modes. In an interactive feedback mode, subtle haptic feedback may be provided via the stylus to mimic the feeling of writing on a frictive surface. In FIG. 2A, at 200, the user is pressing stylus tip 108 onto touch-sensitive display 106, which is presenting inking canvas 205. In FIG. 2B, at 220, the user is pressing stylus end 110 onto touch-sensitive display 106 within inking canvas 205. At 200, the user may be inking content onto inking canvas 205, while at 220, the user may be erasing content from inking canvas 205.

In such an interactive feedback mode, the haptic feedback provided via stylus 100 may mimic the feel of pen-on-paper (FIG. 2A) or eraser-on-paper (FIG. 2B), or any other desired combination of writing implement and textured surface. For example, haptic feedback may mimic chalk on a chalkboard, crayons on cardboard, or other combinations. The haptic feedback may also generate a small amount of movement of stylus tip 108 or stylus end 110.

In some examples, the haptic feedback may be initiated automatically when stylus tip 108 or stylus end 110 contacts touch-sensitive display 106 within inking canvas 205, and may be applied continuously, then ended when stylus tip 108 or stylus end 110 is removed from touch-sensitive display 106 or leaves inking canvas 205. In other examples, the haptic feedback may be initiated automatically responsive to movement of stylus tip 108 or stylus end 110 within inking canvas 205.

Accordingly, computing device 104 and stylus 100 may share information regarding the type of application being executed on computing device 104, so that stylus 100 is aware that an inking canvas is being presented and that an automatic feedback mode is likely to be invoked. Sensor data, such as capacitive sensor data from touch-sensitive display 106 and tip pressure sensor data from stylus 100 may be exchanged to determine the position, velocity, and pressure of the stylus tip 108 on touch-sensitive display 106.

Haptic feedback may be adjusted based at least in part on determined pressure between stylus tip 108 or stylus end 110 and touch-sensitive display 106, angle of incidence between stylus tip 108 or stylus end 110 and touch-sensitive display 106, as well as other factors, as described further herein. Haptic feedback may be further adjusted with velocity, mimicking a pen or eraser traversing a rough surface. In some examples, haptic feedback may be increased or decreased in pulses while operating in an interactive feedback mode so as to provide interactive feedback (e.g., battery level decreasing below a threshold, new message received).

Scenarios where an inking canvas is not being utilized may use an interactive feedback mode to govern haptic feedback. Specific events may generate haptic triggers, which may in turn result in haptic feedback being applied through the stylus. In general, manual triggers may be generated when the user performs a specific task related to material presented on the touch-sensitive display and receives action-driven feedback in return.

FIGS. 3A and 3B depict a user providing input to touch-sensitive display 106 with stylus 100 in example interactive feedback modes. In FIG. 3A, at 300, a user is using stylus 100 to draw a lasso 305 over a portion of displayed content presented on touch-sensitive display 106. When lasso 305 is closed, haptic feedback may be provided via stylus 100 while stylus tip 108 is in contact with touch-sensitive display 106.

In FIG. 3B, at 320, the user is selecting an object 325, e.g., a button, from a number of display objects 330 presented on touch-sensitive display 106. In this scenario, as shown at 335, the haptic feedback may not be provided until stylus tip 108 has been removed from touch-sensitive display 106, completing the depression-and-release of object 325. Similar workflow may be used for lassoing objects or other actions where completion of the task includes removing stylus 100 from touch-sensitive display 106. Haptic feedback may be provided in other hovering scenarios, such as when a user selects an object by hovering over it for a threshold duration. Hovering may be determined based at least in part on capacitive sensors in touch-sensitive display 106 and tip pressure sensors in stylus 100.

When stylus 100 is hovering over touch-sensitive display 106, it is only contacted by hand 114, and thus the only dampening of the feedback is provided by hand 114, whereas when stylus 100 is contacting touch-sensitive display 106, the display itself also dampens the haptic feedback. As such, the user may be more sensitive to haptic feedback when stylus 100 is hovering. Accordingly, to maintain a consistent level of haptic feedback, haptic actuation levels may be reduced if it is determined that stylus 100 is hovering.

A user’s grip on a stylus may impact the amount of haptic sensation perceived at the fingers of the user. FIGS. 4A and 4B depict a user holding a stylus with example hand grips. At 400, FIG. 4A shows a user’s hand 114 holding stylus 100 near the middle of body 101. At 405, FIG. 4B depicts user’s hand 114 holding stylus 100 near the stylus tip 108. In other scenarios, a user may hold the stylus near the stylus end, may hold the stylus with a fist, etc. Some hand grips may only include contact points for two or three fingertips, while others may also include contact points for inter-finger webbing. The position, grip style, contact points, and grip pressure may all contribute to the amount of haptic sensation dampening. The location of grip contact points relative to the positioning of haptic actuators within the stylus body may also contribute to the user’s perception of haptic sensations.

As such, grip-determining inputs may include grip sensors positioned axially and circumferentially around stylus 100 as well as capacitive touch sensors on a touch-sensitive display. The contact points of the user’s grip may further inform the angle of contact of stylus tip 108, particularly when analyzed along with the tip pressure and tip angle.

The posture of the computing device, and in particular, whether the face of the computing device opposite the touch-sensitive display is contacting another surface may also contribute to dampening haptic sensations generated at the stylus. As examples, FIGS. 5A and 5B depict different postures for a computing device. At 500, FIG. 5A shows computing device 104 resting on a solid surface (e.g., table 505) while user’s hand 114 engages touch-sensitive display 106 with stylus 100. At 510, FIG. 5B shows user’s opposite hand 520 holding computing device 104 while user’s hand 114 engages touch-sensitive display 106 with stylus 100. In other examples, computing device 104 may be propped upright via a kickstand or other mechanism. While FIG. 5A shows computing device 104 resting on a solid surface, other scenarios may have computing device 104 resting on a more pliable surface. Each of these scenarios may impact the dampening of haptic sensations, as the haptic sensation is dampened by the computing device and by the surface supporting the computing device.

Postures of the computing device may be determined by motion sensors, accelerometers, gyroscopes, proximity sensors, etc. For example, if the computing device is not moving, it is unlikely that the device is being held by the user. It may not be possible to discern the type of static surface the device is placed on. However, this may be informed if the user manually adjusts the level of desired haptic feedback. Further, machine learning may be used to determine if the user consistently performs certain tasks in certain environments (e.g., spreadsheet applications on an office table vs. drawing applications on a pillow in bed).

Dampening of haptic sensation may also be impacted by how and whether the user’s hand is contacting the touch-sensitive display while using the stylus. In general, a greater contact area between the user’s hand and the computing device increases the amount of dampening. The position of the user’s hand relative to the point of stylus contact, and to the edges of the touch-sensitive display may also impact the user’s perception of haptic sensations, particularly in an automatic feedback mode such as inking.

FIGS. 6A-6C show example heat maps on a touch-sensitive display corresponding to different hand postures and thus to different amounts of dampening. As described with regard to FIG. 1, heatmaps may be generated based at least in part on capacitive data for each pixel or grid point on the touch-sensitive display. FIG. 6A shows an example heat map 600, whereby touch-sensitive display 106 is only contacted by a stylus (asterisk). FIG. 6B shows an example heat map 605 whereby a user is contacting touch-sensitive display with both a stylus and the palm of the hand holding the stylus. FIG. 6C shows an example heat map 610 for a large-scale computing device 612 having a touch-sensitive display 614. The size of the computing device itself may impact dampening, with larger devices comprising more material and a larger surface area with which to dissipate haptic sensations. In the example of FIG. 6C, the user is contacting touch-sensitive display 614 with both hands and the stylus, further dampening haptic sensations from the stylus.

Foldable computing devices may be configured to adopt numerous poses which may impact dampening of haptic sensations. As described with regard to FIG. 1, the addition of a hinge angle sensor and presence of multiple touch-sensitive displays may generate additional sensor data that may be used to discern the posture of the device and to inform haptic actuation at the stylus.

FIGS. 7A-7D depict a foldable computing device 700 having a first touch-sensitive display 702 coupled to a second touch-sensitive display 704 via a hinge 706. At 710, FIG. 7A depicts foldable computing device 700 in a flat configuration with a hinge angle of 180°, computing device 700 positioned on a flat, solid surface (e.g., table 715), while user’s hand 114 provides input from stylus 100 to first touch-sensitive display 702.

At 720, FIG. 8B depicts foldable computing device 700 in a back-to-back pose with a hinge angle of 360°, computing device 700 positioned with second touch-sensitive display 704 facing a flat, solid surface (e.g., table 715), while user’s hand 114 provides input from stylus 100 to first touch-sensitive display 702. In this configuration, with the touch-sensitive displays collapsed, more haptic sensation may be absorbed by computing device 700 than for the scenario shown at 710, where first touch-sensitive display 702 is directly contacting table 715.

At 730, FIG. 7C depicts foldable computing device 700 in a back-to-back pose with a hinge angle of 360°, computing device 700 positioned with second touch-sensitive display 704 facing a second hand 735 of a user, while user’s hand 114 provides input from stylus 100 to first touch-sensitive display 702.

At 740, FIG. 7D depicts foldable computing device 700 in a tented pose with a hinge angle of 270°, computing device 700 positioned with edges opposite hinge 706 contacting a flat, solid surface 745, while user’s hand 114 provides input from stylus 100 to first touch-sensitive display 702. In some scenarios, the user may hold computing device 700 in a tent pose with one or more fingers of their opposite hand positioned between screens, providing a different dampening profile than for the example shown at 740. As such, the hinge angle and heat maps obtained from both first touch-sensitive display 702 and second touch-sensitive display 704 may be provided as computing device inputs for determining an amount of haptic dampening.

FIG. 8 depicts an example method 800 for providing haptic feedback through a touch-sensitive input device being used to provide input to a touch-sensitive device. Method 800 may be executed by a touch-sensitive input device, such as stylus 100 when used in concert with a touch-sensitive computing device, such as computing device 104. In some examples, all or part of method 800 may be executed by a touch-sensitive computing device.

At 810, method 800 includes determining a haptic perception factor based at least in part on one or more of a set of input device inputs received from sensors of the touch-sensitive input device and a set of computing device inputs received from sensors of the touch-sensitive computing device. As described, the haptic perception factor may be based at least in part on inputs from touch-sensitive input device sensors such as grip sensors, tip pressure sensors, accelerometers, gyrometers, proximity sensors, etc. and/or inputs from touch-sensitive computing device sensors such as capacitive touch sensors, peripheral grip sensors, accelerometers, gyrometers, proximity sensors, hall sensors, hinge angle sensors, etc. Computing device inputs may further include information about the nature of the device, such as size, thickness, materials, etc. that may influence dampening of haptic feedback, as well as indications of active applications, content currently presented on a device display, and other information that may cause a change in user perception of haptic feedback. The haptic perception factor may be determined via a lookup table, calculated as a stored function of inputs, etc.

Inputs may be weighted based at least in part on their influence on haptic perception, and may not be weighted at all if not applicable. For example, if the touch-sensitive input device is hovering, the set of computing device inputs from computing device sensors may be weighted as 0. If the user places the touch-sensitive input device on a table, the set of input device inputs may be weighted as 0, or may be weighted heavily, so as to prevent unnecessary haptic actuation when the touch-sensitive input device is not being held by the user. Further, by weighting the inputs, the haptic perception factor may be more accurately determined. For example, contact area between the user’s hand and the touch-sensitive computing device may be weighted differently depending on whether the user’s hand is over the center of the touch-display vs at an edge of the touch-display. If the user is gripping the touch-sensitive input device with a certain grip location, pressure, and angle that makes the input device haptics more susceptible to dampened haptic sensations, the weights of the corresponding computing device inputs may be increased. As such situational weighting of inputs may be used to generate a haptic perception factor that more closely approximates the user’s current use scenario.

Inputs may be interactive and/or additive. For example, if the inputs indicate the user is gripping the touch-sensitive input device tightly and placing their palm on the touch-sensitive display, the associated inputs may be used to generate a greater haptic perception factor than either group of inputs alone. In some examples, the inputs may be recorded as a change or difference from a previous input, and may be updated when a change increases or decreases below a threshold, indicating a significant difference.

At 820, method 800 includes determining a haptic response profile based at least in part on the haptic perception factor. In other words, based at least in part on the haptic perception factor, a haptic response profile may be determined to approximate the perception of a consistent amount of haptic feedback, accounting at least for any dampening of haptic sensation or otherwise reduction in haptic perception as a function of the set of input device inputs and the set of computing device inputs.

In some examples, the haptic response profile may be further based at least in part on a baseline haptic profile stored in memory, which may be retrieved responsive to recognizing a haptic trigger. A haptic trigger may be any action, generated by the touch-sensitive input device and/or the touch-sensitive computing device that is associated with and cues a haptic response at the touch-sensitive input device. As described with regard to FIGS. 2A and 2B, this may include a haptic trigger for automatic feedback, such as inking or erasing. As described with regard to FIGS. 3A and 3B, this may include a haptic trigger for interactive feedback, such as selecting a display object presented on a touch-sensitive display. Additionally or alternatively, haptic triggers may include device feedback and/or software feedback that is not driven by use of the touch-sensitive input device with the touch-sensitive computing device, such as a low battery warning or a calendar reminder.

The baseline haptic profile may be stored at the touch-sensitive input device, stored at the touch-sensitive computing device, or at another networked device. If stored at the touch-sensitive computing device, the baseline haptic profile may be pushed to the touch-sensitive input device. If stored at a networked storage device, the baseline haptic profile may be downloaded by either the touch-sensitive input device or the touch-sensitive computing device. In some examples, the baseline haptic profile may be loaded in memory at the touch-sensitive input device upon powering up and/or active use so it may be rapidly retrieved upon recognizing a haptic trigger.

The baseline haptic profile may be specific to the user, and may be stored as a user preference for the touch-sensitive input device. The baseline haptic profile may indicate a preferred amount of haptic sensation transferred to the user’s fingertips for any haptic feedback. The baseline haptic profile may be predetermined, such as a default setting, may be directly selected by the user, and/or may be iteratively determined over the course of stylus usage by the user.

Based at least in part on the inputs, the haptic perception factor may represent the state of the user, and haptic intensity settings adjusted accordingly. This may enable consistent haptic feedback across a range of scenarios. Applying the haptic perception factor to the baseline haptic profile may be based at least in part on predetermined outcomes, such as those based at least in part on empirical data or simulations. Interpolation and inference may be used to generate a plurality of settings to achieve consistency in haptic feedback. A generic relationship may be provided that may be further adapted for each user over time. For example, some users may not press with as much force with their palm on the touch-sensitive computing device and thus not dampen the signal as much as others.

The haptic response profile may be further adjust based at least in part on application specific events, and/or user habits that correlate to either reduced perception or heightened perception (e.g., a stoic application vs a very busy application). In some examples, there may be a range or levels of haptic response within an application—(e.g., select a photo vs. crop a photo vs. are you sure you want to delete this photo?). The haptic response profile may include both amplitude and frequency components, such as when applied through a haptic motor.

At 830, method 800 includes actuating haptic devices of the touch-sensitive input device based at least in part on the determined haptic response profile. As described with regard to FIG. 1, the haptic devices may include haptic motors, electrostatic devices, ultrasonic devices, auditory devices, etc. In some examples, there may be concurrent feedback from the touch-sensitive computing device, be it haptic, visual, audible, or other feedback. In some examples, the haptic feedback may replace feedback from the touch-sensitive computing device. For some applications, such as inking and other automatic feedback modes, the haptic feedback may be inverted, with a reduction or loss of persistent haptic feedback as a signaling tool.

Optionally, at 840, method 800 may include determining a change in the haptic perception factor based at least in part on one or more of the set of input device inputs and the set of computing device inputs. For example, a change in any sensor output value contributing to the haptic perception factor is likely to affect a change in haptic perception. For example, a change in the haptic perception factor may be based at least in part on a change in contact area between a hand of the user and the touch-sensitive computing device. An increase in contact area may correspond to an increased haptic perception factor, while a decrease in contact area may correspond to a decreased haptic perception factor. However, the change in contact area may have a lower impact on the haptic perception factor as compared to other changed input values, such as contact pressure. In some examples, the change in haptic perception factor may be based at least in part on a change in posture of the touch-sensitive computing device, a change in the user’s grip at the touch-sensitive input device body, a change in contact pressure between the touch-sensitive input device and the touch-sensitive computing device, etc.

Optionally, at 850, method 800 may include adjusting the haptic response profile based at least in part on the changed haptic perception factor and a baseline haptic profile, and, continuing at 860, method 800 may optionally include actuating the haptic devices of the touch-sensitive input device based at least in part on the adjusted haptic response profile. In this way, the user’s perceived haptic feedback may remain consistent even as the haptic perception factor fluctuates.

FIG. 9 depicts an example method 900 for providing haptic feedback through a touch-sensitive input device being used to provide input to a touch-sensitive display. Method 900 may be executed by a touch-sensitive input device, such as stylus 100 when used in concert with a touch-sensitive computing device, such as computing device 104. In some examples, all or part of method 800 may be executed by a touch-sensitive computing device.

At 910, method 900 includes determining a current mode of operation based at least in part on one or more of a first set of input device inputs received from touch-sensitive input device sensors and a first set of computing device inputs received from touch-sensitive computing device sensors. For example, the current mode of operation may be an automatic feedback mode, such as inking, or an interactive feedback mode, such as a mode wherein display objects are selected based at least on a position of the touch-sensitive input device relative to a touch-sensitive display of the touch-sensitive computing device. The first set of input device inputs may include inputs regarding the orientation of the touch-sensitive input device, such as whether the tip of the touch-sensitive input device or the end of the touch-sensitive input device is oriented towards the touch-sensitive display. The first set of input device inputs may further include pressure values output by pressure sensors at the tip of the touch-sensitive input device, such as whether the touch-sensitive input device is contacting the touch-sensitive display or hovering over the touch-sensitive display. The first set of computing device inputs may include a status of an application executing on the touch-sensitive computing device, for example an automatic feedback mode may be determined based at least in part on an inking canvas being presented on the touch-sensitive display.

At 920, method 900 includes retrieving a baseline haptic profile for the current mode of operation. The baseline haptic profile may be retrieved as described with regard to FIG. 8, excepting that in some examples separate baseline haptic profiles may be maintained for each mode of operation. Such a retrieval may be performed in response to recognizing a haptic trigger. In examples where the current mode is an automatic feedback mode, the haptic trigger may be based at least in part on a tip of the touch-sensitive input device contacting the touch-sensitive computing device when the touch-sensitive computing device is presenting an inking canvas. In examples where the current mode is an interactive feedback mode, the haptic trigger may be based at least in part on a position of a tip of the touch-sensitive input device relative to content displayed on the touch-sensitive computing device, as described with regard to FIGS. 3A and 3B.

At 930, method 900 includes determining a haptic perception factor based at least in part on one or more of a second set of input device inputs received from the touch-sensitive input device sensors and a second set of computing device inputs received from the touch-sensitive computing device sensors. At 940, method 900 includes determining a haptic response profile based at least in part on the retrieved baseline haptic profile and the haptic perception factor. The haptic perception factor and haptic response profile may be determined as described with regard to FIG. 8, though the current mode of operation may influence both values.

At 950, method 900 includes actuating haptic devices of the touch-sensitive input device based at least in part on the determined haptic response profile. In some examples, the haptic devices may be actuated based at least in part on a first haptic response profile responsive to the haptic trigger being recognized when the tip of the touch-sensitive input device is contacting the display of the touch-sensitive computing device. Further, in some examples, the haptic devices may be actuated based at least in part on a second haptic response profile, different than the first haptic response profile, responsive to the haptic trigger being recognized when the tip of the touch-sensitive input device is hovering over the display of the touch-sensitive computing device.

Optionally, at 960, method 900 includes determining a baseline haptic profile for a user based at least in part on one or more of a third set of input device inputs received from the touch-sensitive input device sensors and a third set of computing device inputs received from the touch-sensitive computing device sensors as the user inputs a preferred amount of haptic sensation transferred to the user’s fingertips. In other words, when a user is selecting a haptic profile, the input device inputs and computing device inputs may indicate the current state of the devices, and inform why a user is currently deciding to make an adjustment to their baseline haptic profile. For example, if the user’s palm is contacting the touch-sensitive display, and they indicate to increase haptic intensity, it may be deduced that their palm is dampening haptic sensations greater than what is being accounted for. The coefficient for palm contact may thus be increased with regard to the haptic perception factor, and this calculation may be stored in the user’s baseline haptic profile. Continuing at 970, method 900 may optionally include, responsive to recognizing a subsequent haptic trigger, actuating haptic devices of the touch-sensitive input device based at least in part on the updated baseline haptic profile.

In some embodiments, the methods and processes described herein may be tied to a computing system of one or more computing devices. In particular, such methods and processes may be implemented as a computer-application program or service, an application-programming interface (API), a library, and/or other computer-program product.

FIG. 10 schematically shows a non-limiting embodiment of a computing system 1000 that can enact one or more of the methods and processes described above. Computing system 1000 is shown in simplified form. Computing system 1000 may embody the computing device 104 described above and illustrated in FIG. 1. Computing system 1000 may take the form of one or more personal computers, server computers, tablet computers, home-entertainment computers, network computing devices, gaming devices, mobile computing devices, mobile communication devices (e.g., smart phone), and/or other computing devices, and wearable computing devices such as smart wristwatches and head mounted augmented reality devices.

Computing system 1000 includes a logic processor 1002 volatile memory 1004, and a non-volatile storage device 1006. Computing system 1000 may optionally include a display subsystem 1008, input subsystem 1010, communication subsystem 1012, and/or other components not shown in FIG. 10.

Logic processor 1002 includes one or more physical devices configured to execute instructions. For example, the logic processor may be configured to execute instructions that are part of one or more applications, programs, routines, libraries, objects, components, data structures, or other logical constructs. Such instructions may be implemented to perform a task, implement a data type, transform the state of one or more components, achieve a technical effect, or otherwise arrive at a desired result.

The logic processor may include one or more physical processors (hardware) configured to execute software instructions. Additionally or alternatively, the logic processor may include one or more hardware logic circuits or firmware devices configured to execute hardware-implemented logic or firmware instructions. Processors of the logic processor 1002 may be single-core or multi-core, and the instructions executed thereon may be configured for sequential, parallel, and/or distributed processing. Individual components of the logic processor optionally may be distributed among two or more separate devices, which may be remotely located and/or configured for coordinated processing. Aspects of the logic processor may be virtualized and executed by remotely accessible, networked computing devices configured in a cloud-computing configuration. In such a case, these virtualized aspects are run on different physical logic processors of various different machines, it will be understood.

Non-volatile storage device 1006 includes one or more physical devices configured to hold instructions executable by the logic processors to implement the methods and processes described herein. When such methods and processes are implemented, the state of non-volatile storage device 1006 may be transformed—e.g., to hold different data.

Non-volatile storage device 1006 may include physical devices that are removable and/or built-in. Non-volatile storage device 1006 may include optical memory (e.g., CD, DVD, HD-DVD, Blu-Ray Disc, etc.), semiconductor memory (e.g., ROM, EPROM, EEPROM, FLASH memory, etc.), and/or magnetic memory (e.g., hard-disk drive, floppy-disk drive, tape drive, MRAM, etc.), or other mass storage device technology. Non-volatile storage device 1006 may include nonvolatile, dynamic, static, read/write, read-only, sequential-access, location-addressable, file-addressable, and/or content-addressable devices. It will be appreciated that non-volatile storage device 1006 is configured to hold instructions even when power is cut to the non-volatile storage device 1006.

Volatile memory 1004 may include physical devices that include random access memory. Volatile memory 1004 is typically utilized by logic processor 1002 to temporarily store information during processing of software instructions. It will be appreciated that volatile memory 1004 typically does not continue to store instructions when power is cut to the volatile memory 1004.

Aspects of logic processor 1002, volatile memory 1004, and non-volatile storage device 1006 may be integrated together into one or more hardware-logic components. Such hardware-logic components may include field-programmable gate arrays (FPGAs), program- and application-specific integrated circuits (PASIC/ASICs), program- and application-specific standard products (PSSP/ASSPs), system-on-a-chip (SOC), and complex programmable logic devices (CPLDs), for example.

The terms “module,” “program,” and “engine” may be used to describe an aspect of computing system 1000 typically implemented in software by a processor to perform a particular function using portions of volatile memory, which function involves transformative processing that specially configures the processor to perform the function. Thus, a module, program, or engine may be instantiated via logic processor 1002 executing instructions held by non-volatile storage device 1006, using portions of volatile memory 1004. It will be understood that different modules, programs, and/or engines may be instantiated from the same application, service, code block, object, library, routine, API, function, etc. Likewise, the same module, program, and/or engine may be instantiated by different applications, services, code blocks, objects, routines, APIs, functions, etc. The terms “module,” “program,” and “engine” may encompass individual or groups of executable files, data files, libraries, drivers, scripts, database records, etc.

When included, display subsystem 1008 may be used to present a visual representation of data held by non-volatile storage device 1006. The visual representation may take the form of a graphical user interface (GUI). As the herein described methods and processes change the data held by the non-volatile storage device, and thus transform the state of the non-volatile storage device, the state of display subsystem 1008 may likewise be transformed to visually represent changes in the underlying data. Display subsystem 1008 may include one or more display devices utilizing virtually any type of technology. Such display devices may be combined with logic processor 1002, volatile memory 1004, and/or non-volatile storage device 1006 in a shared enclosure, or such display devices may be peripheral display devices.

When included, input subsystem 1010 may comprise or interface with one or more user-input devices such as a keyboard, mouse, touch screen, or game controller. In some embodiments, the input subsystem may comprise or interface with selected natural user input (NUI) componentry. Such componentry may be integrated or peripheral, and the transduction and/or processing of input actions may be handled on- or off-board. Example NUI componentry may include a microphone for speech and/or voice recognition; an infrared, color, stereoscopic, and/or depth camera for machine vision and/or gesture recognition; a head tracker, eye tracker, accelerometer, and/or gyroscope for motion detection and/or intent recognition; as well as electric-field sensing componentry for assessing brain activity; and/or any other suitable sensor.

When included, communication subsystem 1012 may be configured to communicatively couple various computing devices described herein with each other, and with other devices. Communication subsystem 1012 may include wired and/or wireless communication devices compatible with one or more different communication protocols. As non-limiting examples, the communication subsystem may be configured for communication via a wireless telephone network, or a wired or wireless local- or wide-area network, such as a HDMI over Wi-Fi connection. In some embodiments, the communication subsystem may allow computing system 1000 to send and/or receive messages to and/or from other devices via a network such as the Internet.

In one example, a method for providing haptic feedback through a touch-sensitive input device configured to provide input to a touch-sensitive computing device comprises determining a haptic perception factor based at least in part on one or more of a set of input device inputs received from sensors of the touch-sensitive input device and a set of computing device inputs received from sensors of the touch-sensitive computing device; determining a haptic response profile based at least in part on the haptic perception factor; and actuating haptic devices of the touch-sensitive input device based at least in part on the determined haptic response profile. In such an example, or any other example, the haptic response profile is further additionally or alternatively based at least in part on a baseline haptic profile stored in memory. In any of the preceding examples, or any other example, the baseline haptic profile is additionally or alternatively specific to a user. In any of the preceding examples, or any other example, the method additionally or alternatively comprises determining a change in the haptic perception factor based at least in part on one or more of the set of input device inputs and the set of computing device inputs; adjusting the haptic response profile based at least in part on the changed haptic perception factor and the baseline haptic profile; and actuating the haptic devices of the touch-sensitive input device based at least in part on the adjusted haptic response profile. In any of the preceding examples, or any other example, the change in the haptic perception factor is additionally or alternatively based at least in part on computing device inputs indicating a change in contact area between a hand of a user and the touch-sensitive computing device. In any of the preceding examples, or any other example, the change in haptic perception factor is additionally or alternatively based at least on computing device inputs indicating a change in posture of the touch-sensitive computing device. In any of the preceding examples, or any other example, the change in haptic perception factor is additionally or alternatively based at least on input device inputs indicating a change in a user’s grip at the touch-sensitive input device. In any of the preceding examples, or any other example, the change in haptic perception factor is additionally or alternatively based at least on one or more of computing device inputs and input device inputs indicating a change in contact pressure between the touch-sensitive input device and the touch-sensitive computing device.

In another example, a method for providing haptic feedback through a touch-sensitive input device configured to provide input to a touch-sensitive computing device comprises determining a current mode of operation based at least in part on one or more of a first set of input device inputs received from touch-sensitive input device sensors and a first set of computing device inputs received from touch-sensitive computing device sensors; retrieving a baseline haptic profile for the current mode of operation; determining a haptic perception factor based at least in part on one or more of a second set of input device input device inputs received from the touch-sensitive input device sensors and a second set of computing device inputs received from the touch-sensitive computing device sensors; determining a haptic response profile based at least in part on the retrieved baseline haptic profile and the haptic perception factor; and actuating haptic devices of the touch-sensitive input device based at least in part on the determined haptic response profile.

In such an example, or any other example, the haptic perception factor is additionally or alternatively determined in response to recognizing a haptic trigger. In any of the preceding examples, or any other example, the current mode is additionally or alternatively an interactive feedback mode, and wherein the haptic trigger is based at least in part on a position of a tip of the touch-sensitive input device relative to content displayed on the touch-sensitive computing device. In any of the preceding examples, or any other example, the haptic devices are additionally or alternatively actuated based at least in part on a first haptic response profile responsive to the haptic trigger being recognized when the tip of the touch-sensitive input device is contacting a display of the touch-sensitive computing device. In any of the preceding examples, or any other example, the haptic devices are additionally or alternatively actuated based at least in part on a second haptic response profile responsive to the haptic trigger being recognized when the tip of the touch-sensitive input device is hovering over the display of the touch-sensitive computing device. In any of the preceding examples, or any other example, the current mode is additionally or alternatively an automatic feedback mode, and wherein the haptic trigger is based at least in part on a tip of the touch-sensitive input device contacting the touch-sensitive computing device when the touch-sensitive computing device is presenting an inking canvas. In any of the preceding examples, or any other example, the method additionally or alternatively comprises determining an updated baseline haptic profile for a user based at least in part on a third set of input device input device inputs received from the touch-sensitive input device sensors and a third set of computing device inputs received from the touch-sensitive computing device sensors as the user inputs a preferred amount of haptic sensation transferred to the user’s fingertips; and responsive to recognizing a subsequent haptic trigger, actuating haptic devices of the touch-sensitive input device based at least in part on the updated baseline haptic profile.

In yet another example, a haptic stylus comprises a body; one or more haptic devices within the body; a communications subsystem; a sensor subsystem; and a controller configured to: determine a haptic perception factor based at least in part on one or more of a set of input device inputs received from the sensor subsystem and a set of computing device inputs received from sensors of a touch-sensitive computing device via the communications subsystem; determine a haptic response profile based at least in part on the determined haptic perception factor; and actuate the haptic devices at the determined haptic response profile. In such an example, or any other example, the sensor subsystem additionally or alternatively includes one or more pressure sensors coupled to a stylus tip. In any of the preceding examples, or any other example, the sensor subsystem additionally or alternatively includes one or more pressure sensors coupled to a stylus end. In any of the preceding examples, or any other example, the sensor subsystem additionally or alternatively includes one or more grip sensors positioned around a circumference of the body. In any of the preceding examples, or any other example, the controller is additionally or alternatively configured to determine a change in the haptic perception factor based at least in part on one or more of the set of input device inputs and the set of computing device inputs; adjust the haptic response profile based at least in part on the changed haptic perception factor; and actuate the haptic devices of the stylus based at least in part on the adjusted haptic response profile.

It will be understood that the configurations and/or approaches described herein are exemplary in nature, and that these specific embodiments or examples are not to be considered in a limiting sense, because numerous variations are possible. The specific routines or methods described herein may represent one or more of any number of processing strategies. As such, various acts illustrated and/or described may be performed in the sequence illustrated and/or described, in other sequences, in parallel, or omitted. Likewise, the order of the above-described processes may be changed.

The subject matter of the present disclosure includes all novel and non-obvious combinations and sub-combinations of the various processes, systems and configurations, and other features, functions, acts, and/or properties disclosed herein, as well as any and all equivalents thereof.

文章《Microsoft Patent | Providing haptic feedback through touch-sensitive input devices》首发于Nweon Patent

]]>
Microsoft Patent | Spatial attention model enhanced voice engagement system https://patent.nweon.com/25218 Wed, 30 Nov 2022 20:19:14 +0000 https://patent.nweon.com/?p=25218 ...

文章《Microsoft Patent | Spatial attention model enhanced voice engagement system》首发于Nweon Patent

]]>
Patent: Spatial attention model enhanced voice engagement system

Patent PDF: 加入映维网会员获取

Publication Number: 20220382510

Publication Date: 20221201

Assignee: Microsoft Technology Licensing, Llc (Redmond, Wa, Us)

Abstract

A computer implemented method includes detecting user interaction with mixed reality displayed content in a mixed reality system. User focus is determined as a function of the user interaction based on the user interaction using a spatial intent model. A length of time for extending voice engagement with the mixed reality system is modified based on the determined user focus. Detecting user interaction with the displayed content may include tracking eye movements to determine objects in the displayed content at which the user is looking and determining a context of a user dialog during the voice engagement.

Claims

1.A computer implemented method comprising: initiating voice engagement to interact with an application; detecting user interaction with mixed reality displayed content; determining user focus as a function of the user interaction with the displayed content using a spatial intent model; and modifying a length of time for extending the voice engagement with the application based on the determined user focus.

Description

BACKGROUND

Voice-enabled systems listen for an arbitrary length of time to ensure users can continuously use multiple voice commands before the systems stop listening. The length of time may be selected to avoid having the user repeatedly say a voice invocation wake word every single time for multiple voice inputs. After the length of time from the last voice command expires, the system stops listening and returns to an idle state. If the user desires to enter another voice command, the wake word must first be repeated.

In one prior method, user interaction with a virtual assistant may be used to maintain voice interaction with the virtual assistant. The user interaction is tracked to identify gaze at the virtual assistant and gesture/voice commands interacting with the virtual assistant to maintain engagement with the virtual assistant. As long as the user makes eye contact with the virtual assistant, the ability to interact with the virtual assistant is maintained. User eye gaze in mixed reality environments is constantly moving. Such eye movement can result in false negatives or even false positives regarding the user desire to interact with the virtual assistant, causing commands to be missed by the virtual assistant.

SUMMARY

A computer implemented method includes detecting user interaction with mixed reality displayed content in a mixed reality system. User focus is determined as a function of the user interaction based on the user interaction using a spatial intent model. A length of time for extending voice engagement with the mixed reality system is modified based on the determined user focus. Detecting user interaction with the displayed content may include tracking eye movements to determine objects in the displayed content at which the user is looking and determining a context of a user dialog during the voice engagement.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram view of a system for managing voice engagement via a voice engagement system according to an example embodiment.

FIG. 2 is a flowchart illustrating a computer implemented method of modifying a timeout for voice engagement according to an example embodiment.

FIG. 3 is a flowchart of a computer implemented method for detecting user interaction with the displayed content according to an example embodiment.

FIG. 4 is a flowchart of a computer implemented method for extending the length of voice engagement based on detected user gestures.

FIG. 5 is a block schematic diagram of a computer system to implement one or more example embodiments.

DETAILED DESCRIPTION

In the following description, reference is made to the accompanying drawings that form a part hereof, and in which is shown by way of illustration specific embodiments which may be practiced. These embodiments are described in sufficient detail to enable those skilled in the art to practice the invention, and it is to be understood that other embodiments may be utilized and that structural, logical and electrical changes may be made without departing from the scope of the present invention. The following description of example embodiments is, therefore, not to be taken in a limited sense, and the scope of the present invention is defined by the appended claims.

The functions or algorithms described herein may be implemented in software in one embodiment. The software may consist of computer executable instructions stored on computer readable media or computer readable storage device such as one or more non-transitory memories or other type of hardware based storage devices, either local or networked. Further, such functions correspond to modules, which may be software, hardware, firmware or any combination thereof. Multiple functions may be performed in one or more modules as desired, and the embodiments described are merely examples. The software may be executed on a digital signal processor, ASIC, microprocessor, or other type of processor operating on a computer system, such as a personal computer, server or other computer system, turning such computer system into a specifically programmed machine.

The functionality can be configured to perform an operation using, for instance, software, hardware, firmware, or the like. For example, the phrase “configured to” can refer to a logic circuit structure of a hardware element that is to implement the associated functionality. The phrase “configured to” can also refer to a logic circuit structure of a hardware element that is to implement the coding design of associated functionality of firmware or software. The term “module” refers to a structural element that can be implemented using any suitable hardware (e.g., a processor, among others), software (e.g., an application, among others), firmware, or any combination of hardware, software, and firmware. The term, “logic” encompasses any functionality for performing a task. For instance, each operation illustrated in the flowcharts corresponds to logic for performing that operation. An operation can be performed using, software, hardware, firmware, or the like. The terms, “component,” “system,” and the like may refer to computer-related entities, hardware, and software in execution, firmware, or combination thereof. A component may be a process running on a processor, an object, an executable, a program, a function, a subroutine, a computer, or a combination of software and hardware. The term, “processor,” may refer to a hardware component, such as a processing unit of a computer system.

Furthermore, the claimed subject matter may be implemented as a method, apparatus, or article of manufacture using standard programming and engineering techniques to produce software, firmware, hardware, or any combination thereof to control a computing device to implement the disclosed subject matter. The term, “article of manufacture,” as used herein is intended to encompass a computer program accessible from any computer-readable storage device or media. Computer-readable storage media can include, but are not limited to, magnetic storage devices, e.g., hard disk, floppy disk, magnetic strips, optical disk, compact disk (CD), digital versatile disk (DVD), smart cards, flash memory devices, among others. In contrast, computer-readable media, i.e., not storage media, may additionally include communication media such as transmission media for wireless signals and the like.

A spatial attention model enhanced voice engagement system enables a user to maintain contextual conversation with a voice-enabled system taking into account what the user is visually observing. Maintaining the contextual conversation with the voice-enabled system may also be based on a context of the user’s dialog with the voice engagement system.

In one example, a user can gaze at a voice-enabled signifier to wake the voice engagement system and begin a voice engagement dialog. The signifier may be one or more of an embodied virtual agent in mixed reality, a voice-enabled smart speaker, or a user interface object in mixed reality. The signifier may also display various states of the voice engagement system during the voice experience, such as ready—listening, voice input, acknowledgment, processing, task completion, return to ready state, and voice time out. The signifier can show when the voice engagement system is about to stop listening and return to the idle state, which could afford the user to explicitly extend a length of time for the voice engagement to time out, enabling further voice commands to be provided without having to wake the voice engagement system again.

Multiple signals, such as contextual utterance and focus target areas are used by a spatial intent model to extend the duration of the voice engagement. To deal with gaze drifting away from the focus area the spatial intent model may be used to predict the ranking of user interface objects or target areas that is most focused on by the user inside the user’s field of view. For example, frequency or duration of the gaze can be the signals used in the spatial intent model. The voice engagement system can implicitly extend the continuous voice engagement when a user is using contextual voice commands while staying focused on the target content/area.

FIG. 1 is a functional block diagram view of a system 100 for managing voice engagement via a voice engagement system 110 with a user 115. User 115 may be using a head mounted display 120 such as smart goggles or smart glasses that provide a mixed reality view 125. The mixed reality view may be provided by a mixed reality application 130. In one example, the voice engagement system 110 shares the same processing resources as the mixed reality application 130. The display 120 and system 110 wireless communicate with each other as indicated by wireless symbols 135.

The display 120 includes a microphone, speaker, and eye tracking cameras. Data collected by the display 120 are communicated wirelessly to system 110. Data generated by application 130 is also communicated wirelessly to the display 120 for generating the mixed reality view 125. The mixed reality view 125 may include several different objects indicated at 140, 141, 142, and 143. Each of these objects have known locations in the view and can be correlated to locations that eye tracking data indicates the user is looking at as indicate by gaze lines 145, 146, 147, and 148 respectively.

The objects may be holograms, which are virtual objects generated by application 130, or real objects that actually exist. Object 143 for example is a person, which may be real, or may even be an avatar, which is referred to as a hologram. Object 143 includes a bounding box 150 which encloses the object 143. Gaze line 148 being directed at the bounding box 150 may be interpreted as a gaze at object 143.

Object 143 for example may be a document that is being modified or viewed by the user 115. The document can also be real or virtual. In one example, a further gaze line 153 is directed to an area in the view in which no real or virtual object is located.

In one example, object 140 may be a voice-enabled signifier, such as a virtual assistant. Gazing at object 140 may automatically wake voice engagement 170 to begin a dialog. In some examples, gazing at object 140 may also signify a desire to continue the dialog.

System 110 include several functions that keep track of user 115 interaction with the virtual environment, and by extension, application 130. Eye tracking 160 is a function that receives the gaze data indicative of where the user is gazing. Eye tracking 160 may keep track of actual objects or areas where the user is looking by identifiers of the objects or areas. Eye tracking 160 may also keeping a history of actual times at which the gaze is directed at such objects or areas. The history allows identification of a ranked list of objects to which the user is most likely paying the most attention.

Context tracking 165 is a function used to generate a context for conversation or dialog occurring between the user 115 and a voice engagement 170 function. The context may include a name of the object to which voice commands and optionally gestures are being directed to by the user 115, as well as the commands and gestures themselves. The context may also include information identifying a state of the application 130 to which the commands are being applied. Note that the application 130 may include word processing functions, browsers, spreadsheets, shopping applications, and may other types of applications that may be used to interact with the mixed reality view 125.

Data from eye tracking 160 and context tracking 165 functions is provided to a spatial intent model 175. Model 175 processes the received data to determine a focus of the user 115 in terms of continuing the dialog with the voice engagement 170 function. The model 175 may indicate that the user focus is directed to the current dialog by simply using the top ranked object and comparing it to the context. A positive comparison results in the spatial intent model indicating that the time should be extended via a modify timeout length function 180, which provides an extension time to voice engagement 170 to extend the active voice engagement. The timeout length is a time used to continue active voice engagement. At the end of the timeout, active voice engagement will return to an idle state and wait for engagement with the signifier to wake for further active voice engagement.

In one example, the frequency of gazing at each object is tracked, along with lengths of gazes, and most recent time of gaze. That data, along with timing of dialog corresponding to objects being gazed at may be used to determine that time should be extended.

In the case of a person other than the user beginning to talk, the gaze data will be indicative of the user either looking at the person, in which case, the voice engagement may be stopped by modifying the time to be zero or near zero. However, if the user is looking back and forth between the person and an object related to the context, the voice engagement timeout value may be extended.

Further examples may be application specific. If a form is being filled out as reflected in the context, and the user is gazing at different objects as well as the form, the voice engagement timeout length value may also be extended. However, if the user opens a new application, or begins looking longer at an object not related to the current context, the voice engagement may be stopped, or at least the timeout length value may not be increased.

Past history with application interaction may also be included in the context data and used by the model to determine whether or not to modify the length of the timeout. If a user frequently has long dialogs with the application with periods of inaction, the length of time may also be extended.

If a movie delivery application has been opened, and the user is gazing at multiple different objects, such as objects representative of different movies or other shows, voice engagement may also be extended, as the context indicates that a voice command is likely to be an interaction with the movie delivery application.

If a shopping application is open, the context will reflect that. If the user is gazing at different objects to order, voice engagement may also be extended. Similarly, if a user returns to an application where voice commands were being used, voice engagement may be activated automatically as well as extended. The same may occur for opening a new application where voice commands were previously commonly used by the user.

In one example, model 175 may be a machine learning model, such as neural network model that is trained based on labeled examples of the data generated by the eye tracking 160 and context tracking 165 functions. The labels may indicate whether or not to extend the time by one or more amounts, or even whether or not to disengage voice engagement immediately. The examples may include the above listed examples with the labels manually generated, or automatically generated by noting that the user performed an express action to reengage voice engagement for a particular context with corresponding eye tracking data.

FIG. 2 is a flowchart illustrating a computer implemented method 200 of modifying a timeout for voice engagement according to an example embodiment. Method 200 begins at operation 210 by detecting user interaction with mixed reality displayed content.

User focus on the mixed reality displayed content is determined at operation 220 by using a spatial intent model. The spatial intent model ranks objects and areas in the displayed content as a function of frequency and duration of gaze. At operation 230, a length of time for extending voice engagement is modified based on the determined user focus.

Voice engagement may initially be enabled in response to the user gazing at a voice engagement initiation object in the mixed reality environment or by speaking a voice engagement wake phrase. The voice engagement initiation object may be a hologram or physical object.

In one example, the context comprises interacting with an application. The length of time is modified as a function of past user interaction with the application. The application may be in a state where more information from the user is being collected, indicating the length of the timeout should be extended. The tracked eye movements may be indicative of the user looking around at the displayed content demonstrating an intent to obtain more information, also indicating the length of the timeout should be extended.

Method 200 may also detect at operation 240 that a person other than the user is talking. The length of time may be modified at operation 250 to discontinue voice engagement in response to the object at which the user is looking being the person that is talking.

FIG. 3 is a flowchart of a computer implemented method 300 for detecting user interaction with the displayed content according to an example embodiment. At operation 310, eye movements are tracked to determine objects in the displayed content at which the user is looking. Operation 320 determines a context of a user dialog during the voice engagement. The user dialog may include voice commands and gesture commands. The length of time is modified at operation 230 as a function of the ranks and a determined context.

FIG. 4 is a flowchart of a computer implemented method 400 for extending the length of voice engagement based on detected user gestures. Method 400 may be begin at operation 410 by detecting a user gesture representative of intent to continue the user dialog during voice engagement. The user gesture, for example, may be the user holding up an index finger, which is commonly used in human to human interaction to signify a desire to continue a conversation after a short pause. At operation 420, the length of time for voice engagement is extended in response to detecting the user gesture. The length of time may continuously be extended as long as the gesture is maintained in one example.

FIG. 5 is a block schematic diagram of a computer system 500 for executing applications for a mixed realty experience, performing voice engagement, tracking user interactions with the mixed reality experience, extending the length of time for voice engagement and for performing methods and algorithms according to example embodiments. All components need not be used in various embodiments, such as for example, head mounted display devices.

One example computing device in the form of a computer 500 may include a processing unit 502, memory 503, removable storage 510, and non-removable storage 512. Although the example computing device is illustrated and described as computer 500, the computing device may be in different forms in different embodiments. For example, the computing device may instead be a smartphone, a tablet, smartwatch, smart storage device (SSD), or other computing device including the same or similar elements as illustrated and described with regard to FIG. 5. Devices, such as smartphones, tablets, and smartwatches, are generally collectively referred to as mobile devices or user equipment.

Although the various data storage elements are illustrated as part of the computer 500, the storage may also or alternatively include cloud-based storage accessible via a network, such as the Internet or server-based storage. Note also that an SSD may include a processor on which the parser may be run, allowing transfer of parsed, filtered data through I/O channels between the SSD and main memory.

Memory 503 may include volatile memory 514 and non-volatile memory 508. Computer 500 may include—or have access to a computing environment that includes—a variety of computer-readable media, such as volatile memory 514 and non-volatile memory 508, removable storage 510 and non-removable storage 512. Computer storage includes random access memory (RAM), read only memory (ROM), erasable programmable read-only memory (EPROM) or electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technologies, compact disc read-only memory (CD ROM), Digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium capable of storing computer-readable instructions.

Computer 500 may include or have access to a computing environment that includes input interface 506, output interface 504, and a communication interface 516. Output interface 504 may include a display device, such as a touchscreen, that also may serve as an input device. The input interface 506 may include one or more of a touchscreen, touchpad, mouse, keyboard, camera, one or more device-specific buttons, one or more sensors integrated within or coupled via wired or wireless data connections to the computer 500, and other input devices. The computer may operate in a networked environment using a communication connection to connect to one or more remote computers, such as database servers. The remote computer may include a personal computer (PC), server, router, network PC, a peer device or other common data flow network switch, or the like. The communication connection may include a Local Area Network (LAN), a Wide Area Network (WAN), cellular, Wi-Fi, Bluetooth, or other networks. According to one embodiment, the various components of computer 500 are connected with a system bus 520.

Computer-readable instructions stored on a computer-readable medium are executable by the processing unit 502 of the computer 500, such as a program 518. The program 518 in some embodiments comprises software to implement one or more methods described herein. A hard drive, CD-ROM, and RAM are some examples of articles including a non-transitory computer-readable medium such as a storage device. The terms computer-readable medium, machine readable medium, and storage device do not include carrier waves or signals to the extent carrier waves and signals are deemed too transitory. Storage can also include networked storage, such as a storage area network (SAN). Computer program 518 along with the workspace manager 522 may be used to cause processing unit 502 to perform one or more methods or algorithms described herein.

Examples

1. A computer implemented method includes detecting user interaction with mixed reality displayed content, determining user focus as a function of the user interaction using a spatial intent model, and modifying a length of time for extending voice engagement based on the determined user focus.

2. The method of example 1 wherein detecting user interaction with the displayed content includes tracking eye movements to determine objects in the displayed content at which the user is looking and determining a context of a user dialog during the voice engagement.

3. The method of example 2 wherein the spatial intent model ranks objects and areas in the displayed content as a function of frequency and duration of gaze.

4. The method of example 3 wherein the length of time is modified as a function of the ranks and determined context.

5. The method of any of examples 2-4 and further including detecting that a person other than the user is talking and wherein the length of time is modified to discontinue voice engagement in response to the object at which the user is looking being the person that is talking.

6. The method of any of examples 2-5 wherein the user dialog comprises voice commands and gesture commands.

7. The method of any of examples 2-6 wherein the context comprises interacting with an application, and wherein the length of time is modified as a function of past user interaction with the application.

8. The method of example 7 wherein the application is in a state where more information from the user is being collected and the tracked eye movements are indicative of the user looking around at the displayed content demonstrating an intent to obtain more information.

9. The method of any of examples 1-8 wherein voice engagement is enabled in response to the user gazing at a voice engagement initiation object in a mixed reality environment or speaking a voice engagement wake phrase.

10. The method of example 9 wherein the voice engagement initiation object comprises a hologram or physical object.

11. The method of any of examples 1-10 and further including detecting a user gesture representative of intent to continue to the user dialog during voice engagement and extending the length of time for voice engagement in response to detecting the user gesture.

12. A machine-readable storage device has instructions for execution by a processor of a machine to cause the processor to perform operations to perform a method. The operations include detecting user interaction with mixed reality displayed content, determining user focus as a function of the user interaction using a spatial intent model, and modifying a length of time for extending voice engagement based on the determined user focus.

13. The device of example 12 wherein detecting user interaction with the displayed content includes tracking eye movements to determine objects in the displayed content at which the user is looking and determining a context of a user dialog during the voice engagement.

14. The device of example 13 wherein the spatial intent model ranks objects and areas in the displayed content as a function of frequency and duration of gaze and wherein the length of time is modified as a function of the ranks and determined context.

15. The device of any of examples 13-14 wherein the operations further include detecting that a person other than the user is talking and wherein the length of time is modified to discontinue voice engagement in response to the object at which the user is looking being the person that is talking.

16. The device of any of examples 13-15 wherein the context includes interacting with an application, and wherein the length of time is modified as a function of past user interaction with the application and wherein the application is in a state where more information from the user is being collected and the tracked eye movements are indicative of the user looking around at the displayed content demonstrating an intent to obtain more information.

17. The device of any of examples 12-16 wherein voice engagement is enabled in response to the user gazing at a voice engagement initiation object in a mixed reality environment or speaking a voice engagement wake phrase.

18. The device of example 17 wherein the voice engagement initiation object comprises a hologram or physical object.

19. The device of any of examples 12-18 wherein the operations further include detecting a user gesture representative of intent to continue to the user dialog during voice engagement and extending the length of time for voice engagement in response to detecting the user gesture.

20. A device includes a processor and a memory device coupled to the processor and having a program stored thereon for execution by the processor to perform operations. The operations include detecting user interaction with mixed reality displayed content, determining user focus as a function of the user interaction using a spatial intent model, and modifying a length of time for extending voice engagement based on the determined user focus.

Although a few embodiments have been described in detail above, other modifications are possible. For example, the logic flows depicted in the figures do not require the particular order shown, or sequential order, to achieve desirable results. Other steps may be provided, or steps may be eliminated, from the described flows, and other components may be added to, or removed from, the described systems. Other embodiments may be within the scope of the following claims.

文章《Microsoft Patent | Spatial attention model enhanced voice engagement system》首发于Nweon Patent

]]>
Microsoft Patent | Generating user interface containers https://patent.nweon.com/25213 Wed, 30 Nov 2022 20:15:55 +0000 https://patent.nweon.com/?p=25213 ...

文章《Microsoft Patent | Generating user interface containers》首发于Nweon Patent

]]>
Patent: Generating user interface containers

Patent PDF: 加入映维网会员获取

Publication Number: 20220382566

Publication Date: 20221201

Assignee: Microsoft Technology Licensing, Llc (Redmond, Wa, Us)

Abstract

A system for generating a user interface described herein can include a processor to detect a plurality of display characteristics from a user interface manager, wherein the plurality of display characteristics correspond to a type of a device. The processor can also detect a list of applications being executed by the system and generate a user interface container by applying the plurality of display characteristics to each of the applications from the list of applications.

Claims

1. 115. (canceled)

Description

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a Continuation of U.S. patent application Ser. No. 16/613,617 filed Nov. 14, 2019, which is a National Stage of International Application No. PCT/2017/038027 filed Jun. 16, 2017, and which applications are incorporated herein by reference in their entireties. To the extent appropriate, a claim of priority is made to each of the above disclosed applications.

BACKGROUND

Desktop computers provide a user interface that enables users to view and interact with various applications. Since the introduction of mobile devices, users can also view and interact with applications on augmented reality devices, mobile devices, tablet devices, and gaming consoles, among others. Each device can separately generate a user interface based on fixed application functions. For example, each device can separately generate a user interface by hard coding or using a fixed format for displaying applications.

SUMMARY

The following presents a simplified summary in order to provide a basic understanding of some aspects described herein. This summary is not an extensive overview of the claimed subject matter. This summary is not intended to identify key or critical elements of the claimed subject matter nor delineate the scope of the claimed subject matter. This summary’s sole purpose is to present some concepts of the claimed subject matter in a simplified form as a prelude to the more detailed description that is presented later.

An embodiment described herein includes a system for generating user interface containers that includes a processor to detect a plurality of display characteristics from a user interface manager, wherein the plurality of display characteristics correspond to a type of a device. The processor can also detect a list of applications being executed by the system and generate a user interface container by applying the plurality of display characteristics to each of the applications from the list of applications.

In another embodiment described herein, a method for generating user interface containers can include detecting a plurality of display characteristics from a user interface manager, wherein the plurality of display characteristics correspond to a type of a device. The method can also include detecting a list of applications being executed by the system. Furthermore, the method can include generating a user interface container by applying the plurality of display characteristics to each of the applications from the list of applications.

In yet another embodiment described herein, one or more computer-readable storage devices for generating user interface containers can include a plurality of instructions that, based at least on execution by a processor, cause the processor to detect a plurality of display characteristics from a user interface manager, wherein the plurality of display characteristics correspond to a type of a device. The plurality of instructions can also cause the processor to detect a list of applications being executed by the system and generate a user interface container by applying the plurality of display characteristics to each of the applications from the list of applications.

The following description and the annexed drawings set forth in detail certain illustrative aspects of the claimed subject matter. These aspects are indicative, however, of a few of the various ways in which the principles of the innovation may be employed and the claimed subject matter is intended to include all such aspects and their equivalents. Other advantages and novel features of the claimed subject matter will become apparent from the following detailed description of the innovation when considered in conjunction with the drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

The following detailed description may be better understood by referencing the accompanying drawings, which contain specific examples of numerous features of the disclosed subject matter.

FIG. 1 is a block diagram of an example of a computing system that can generate a user interface container;

FIG. 2 is a block diagram of an example user interface manager that can generate a user interface container;

FIG. 3 is a process flow diagram of an example method for generating a user interface container; and

FIG. 4 is a block diagram of an example computer-readable storage media that can generate a user interface container.

DETAILED DESCRIPTION

User interfaces can be generated using various static, non-reusable techniques. For example, user interfaces for different devices can be generated using different sets of functions, different data paths, and different visual compositions. Accordingly, applications can include different code to generate a user interface for each type of device. The applications can also have deep context about the device on which they are running and map user interface controls directly to pixel coordinates on a display device. For example, the applications may specify pixel coordinates to display a user control element such as a text field, among others.

Techniques described herein provide a system for generating a user interface container that is re-usable by various user interface managers. A user interface container, as referred to herein, can include display characteristics, such as layout rules, among others, indicating how to generate a user interface for a particular type of device. In some embodiments, a system for generating the user interface containers can include detecting a plurality of display characteristics from a user interface manager, wherein the plurality of display characteristics correspond to a type of a device. For example, the plurality of display characteristics can indicate whether windows can be overlapped, whether a full screen mode is supported, window frame properties, and the like. Additionally, a system can detect a list of applications being executed by the system. Furthermore, the system can generate a user interface container by applying the plurality of display characteristics to each of the applications from the list of applications. Accordingly, the user interface container can indicate, for each application being executed, the frame and adornments around the window such as a close control feature, a maximize control feature, a resize control feature, and the like. Additionally, the user interface container can provide layout rules for sizing and arranging the application windows within the user interface container. For example, a desktop user interface or shell may configure the layout to be overlapped windows or a tablet user interface or shell may configure the arrangement to be one application at a time. In some examples, the display characteristics can correspond to a particular type of device. For example, the display characteristics for a desktop computing device can differ from the display characteristics for a mobile device, tablet device, augmented reality device, or gaming console device.

The techniques described herein enable code for generating user interface containers to be shared for any number of different devices. For example, the techniques described herein can enable generation of shared code that can be used when generating a user interface container for a desktop device, a tablet device, a mobile device, a phone device, a gaming console device, and an augmented reality device, among others. The shared code can be incorporated by different user interface containers based on display characteristics corresponding to each type of device. For example, code for displaying a user application with certain display characteristics can be shared between any number of different user interface managers corresponding to different types of devices.

As a preliminary matter, some of the figures describe concepts in the context of one or more structural components, referred to as functionalities, modules, features, elements, etc. The various components shown in the figures can be implemented in any manner, for example, by software, hardware (e.g., discrete logic components, etc.), firmware, and so on, or any combination of these implementations. In one embodiment, the various components may reflect the use of corresponding components in an actual implementation. In other embodiments, any single component illustrated in the figures may be implemented by a number of actual components. The depiction of any two or more separate components in the figures may reflect different functions performed by a single actual component. FIG. 1 discussed below, provide details regarding different systems that may be used to implement the functions shown in the figures.

Other figures describe the concepts in flowchart form. In this form, certain operations are described as constituting distinct blocks performed in a certain order. Such implementations are exemplary and non-limiting. Certain blocks described herein can be grouped together and performed in a single operation, certain blocks can be broken apart into plural component blocks, and certain blocks can be performed in an order that differs from that which is illustrated herein, including a parallel manner of performing the blocks. The blocks shown in the flowcharts can be implemented by software, hardware, firmware, and the like, or any combination of these implementations. As used herein, hardware may include computer systems, discrete logic components, such as application specific integrated circuits (ASICs), and the like, as well as any combinations thereof.

As for terminology, the phrase “configured to” encompasses any way that any kind of structural component can be constructed to perform an identified operation. The structural component can be configured to perform an operation using software, hardware, firmware and the like, or any combinations thereof. For example, the phrase “configured to” can refer to a logic circuit structure of a hardware element that is to implement the associated functionality. The phrase “configured to” can also refer to a logic circuit structure of a hardware element that is to implement the coding design of associated functionality of firmware or software. The term “module” refers to a structural element that can be implemented using any suitable hardware (e.g., a processor, among others), software (e.g., an application, among others), firmware, or any combination of hardware, software, and firmware.

The term “logic” encompasses any functionality for performing a task. For instance, each operation illustrated in the flowcharts corresponds to logic for performing that operation. An operation can be performed using software, hardware, firmware, etc., or any combinations thereof.

As utilized herein, terms “component,” “system,” “client” and the like are intended to refer to a computer-related entity, either hardware, software (e.g., in execution), and/or firmware, or a combination thereof. For example, a component can be a process running on a processor, an object, an executable, a program, a function, a library, a subroutine, and/or a computer or a combination of software and hardware. By way of illustration, both an application running on a server and the server can be a component. One or more components can reside within a process and a component can be localized on one computer and/or distributed between two or more computers.

Furthermore, the claimed subject matter may be implemented as a method, apparatus, or article of manufacture using standard programming and/or engineering techniques to produce software, firmware, hardware, or any combination thereof to control a computer to implement the disclosed subject matter. The term “article of manufacture” as used herein is intended to encompass a computer program accessible from any tangible, computer-readable device, or media.

Computer-readable storage media can include but are not limited to magnetic storage devices (e.g., hard disk, floppy disk, and magnetic strips, among others), optical disks (e.g., compact disk (CD), and digital versatile disk (DVD), among others), smart cards, and flash memory devices (e.g., card, stick, and key drive, among others). In contrast, computer-readable media generally (i.e., not storage media) may additionally include communication media such as transmission media for wireless signals and the like.

FIG. 1 is a block diagram of an example of a computing system that can generate a user interface container. The example system 100 includes a computing device 102. The computing device 102 includes a processing unit 104, a system memory 106, and a system bus 108. In some examples, the computing device 102 can be a gaming console, a personal computer (PC), an accessory console, a gaming controller, among other computing devices. In some examples, the computing device 102 can be a node in a cloud network.

The system bus 108 couples system components including, but not limited to, the system memory 106 to the processing unit 104. The processing unit 104 can be any of various available processors. Dual microprocessors and other multiprocessor architectures also can be employed as the processing unit 104.

The system bus 108 can be any of several types of bus structure, including the memory bus or memory controller, a peripheral bus or external bus, and a local bus using any variety of available bus architectures known to those of ordinary skill in the art. The system memory 106 includes computer-readable storage media that includes volatile memory 110 and nonvolatile memory 112.

The basic input/output system (BIOS), containing the basic routines to transfer information between elements within the computer 102, such as during start-up, is stored in nonvolatile memory 112. By way of illustration, and not limitation, nonvolatile memory 112 can include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.

Volatile memory 110 includes random access memory (RAM), which acts as external cache memory. By way of illustration and not limitation, RAM is available in many forms such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), enhanced SDRAM (ESDRAM), SynchLink™ DRAM (SLDRAM), Rambus® direct RAM (RDRAM), direct Rambus® dynamic RAM (DRDRAM), and Rambus® dynamic RAM (RDRAM).

The computer 102 also includes other computer-readable media, such as removable/non-removable, volatile/non-volatile computer storage media. FIG. 1 shows, for example a disk storage 114. Disk storage 114 includes, but is not limited to, devices like a magnetic disk drive, floppy disk drive, tape drive, Jaz drive, Zip drive, LS-210 drive, flash memory card, or memory stick.

In addition, disk storage 114 can include storage media separately or in combination with other storage media including, but not limited to, an optical disk drive such as a compact disk ROM device (CD-ROM), CD recordable drive (CD-R Drive), CD rewritable drive (CD-RW Drive) or a digital versatile disk ROM drive (DVD-ROM). To facilitate connection of the disk storage devices 114 to the system bus 108, a removable or non-removable interface is typically used such as interface 116.

It is to be appreciated that FIG. 1 describes software that acts as an intermediary between users and the basic computer resources described in the suitable operating environment 100. Such software includes an operating system 118. Operating system 118, which can be stored on disk storage 114, acts to control and allocate resources of the computer 102.

System applications 120 take advantage of the management of resources by operating system 118 through program modules 122 and program data 124 stored either in system memory 106 or on disk storage 114. It is to be appreciated that the disclosed subject matter can be implemented with various operating systems or combinations of operating systems.

A user enters commands or information into the computer 102 through input devices 126. Input devices 126 include, but are not limited to, a pointing device, such as, a mouse, trackball, stylus, and the like, a keyboard, a microphone, a joystick, a satellite dish, a scanner, a TV tuner card, a digital camera, a digital video camera, a web camera, any suitable dial accessory (physical or virtual), and the like. In some examples, an input device can include Natural User Interface (NUI) devices. NUI refers to any interface technology that enables a user to interact with a device in a “natural” manner, free from artificial constraints imposed by input devices such as mice, keyboards, remote controls, and the like. In some examples, NUI devices include devices relying on speech recognition, touch and stylus recognition, gesture recognition both on screen and adjacent to the screen, air gestures, head and eye tracking, voice and speech, vision, touch, gestures, and machine intelligence. For example, NUI devices can include touch sensitive displays, voice and speech recognition, intention and goal understanding, and motion gesture detection using depth cameras such as stereoscopic camera systems, infrared camera systems, RGB camera systems and combinations of these. NUI devices can also include motion gesture detection using accelerometers or gyroscopes, facial recognition, three-dimensional (3D) displays, head, eye, and gaze tracking, immersive augmented reality and virtual reality systems, all of which provide a more natural interface. NUI devices can also include technologies for sensing brain activity using electric field sensing electrodes. For example, a NUI device may use Electroencephalography (EEG) and related methods to detect electrical activity of the brain. The input devices 126 connect to the processing unit 104 through the system bus 108 via interface ports 128. Interface ports 128 include, for example, a serial port, a parallel port, a game port, and a universal serial bus (USB).

Output devices 130 use some of the same type of ports as input devices 126. Thus, for example, a USB port may be used to provide input to the computer 102 and to output information from computer 102 to an output device 130.

Output adapter 132 is provided to illustrate that there are some output devices 130 like monitors, speakers, and printers, among other output devices 130, which are accessible via adapters. The output adapters 132 include, by way of illustration and not limitation, video and sound cards that provide a means of connection between the output device 130 and the system bus 108. It can be noted that other devices and systems of devices provide both input and output capabilities such as remote computing devices 134.

The computer 102 can be a server hosting various software applications in a networked environment using logical connections to one or more remote computers, such as remote computing devices 134. The remote computing devices 134 may be client systems configured with web browsers, PC applications, mobile phone applications, and the like. The remote computing devices 134 can be a personal computer, a server, a router, a network PC, a workstation, a microprocessor based appliance, a mobile phone, a peer device or other common network node and the like, and typically includes many or all of the elements described relative to the computer 102.

Remote computing devices 134 can be logically connected to the computer 102 through a network interface 136 and then connected via a communication connection 138, which may be wireless. Network interface 136 encompasses wireless communication networks such as local-area networks (LAN) and wide-area networks (WAN). LAN technologies include Fiber Distributed Data Interface (FDDI), Copper Distributed Data Interface (CDDI), Ethernet, Token Ring and the like. WAN technologies include, but are not limited to, point-to-point links, circuit switching networks like Integrated Services Digital Networks (ISDN) and variations thereon, packet switching networks, and Digital Subscriber Lines (DSL).

Communication connection 138 refers to the hardware/software employed to connect the network interface 136 to the bus 108. While communication connection 138 is shown for illustrative clarity inside computer 102, it can also be external to the computer 102. The hardware/software for connection to the network interface 136 may include, for exemplary purposes, internal and external technologies such as, mobile phone switches, modems including regular telephone grade modems, cable modems and DSL modems, ISDN adapters, and Ethernet cards.

The computer 102 can further include a radio 140. For example, the radio 140 can be a wireless local area network radio that may operate one or more wireless bands. For example, the radio 140 can operate on the industrial, scientific, and medical (ISM) radio band at 2.4 GHz or 5 GHz. In some examples, the radio 140 can operate on any suitable radio band at any radio frequency.

The computer 102 includes one or more modules 122, such as a display detector 142, an application detector 144, and a user interface container generator 146. In some embodiments, the display detector 142 can detect a plurality of display characteristics from a user interface manager, wherein the plurality of display characteristics correspond to a type of a device. A user interface manager, as referred to herein, can include any suitable application that can generate a visual appearance for applications being executed on a particular type of device. For example, a user interface manager can generate a two dimensional or three dimensional image indicating the various user applications that are visible. In some embodiments, the application detector 144 can detect a list of applications being executed by the system. The list of applications can indicate a number of application windows that may be visible in a user interface. In some embodiments, the user interface container generator 146 can generate a user interface container by applying the plurality of display characteristics to each of the applications from the list of applications. The plurality of display characteristics can indicate a layout of a user interface, which can include a two dimensional representation of visible applications and system controls. The display characteristics can include whether application windows can be overlapped, whether applications can be visible in a full screen mode, or a location on a display screen corresponding to various operating system menus and functions, among others. The display characteristics can also indicate preferences for window chrome or user application windows. For example, the display characteristics can indicate a type of frame to include with an application displayed in the user interface, and a title bar to include with the application, among others.

It is to be understood that the block diagram of FIG. 1 is not intended to indicate that the computing system 102 is to include all of the components shown in FIG. 1. Rather, the computing system 102 can include fewer or additional components not illustrated in FIG. 1 (e.g., additional applications, additional modules, additional memory devices, additional network interfaces, etc.). Furthermore, any of the functionalities of the display detector 142, application detector 144, and user interface container generator 146 may be partially, or entirely, implemented in hardware and/or in the processing unit (also referred to herein as a processor) 104. For example, the functionality may be implemented with an application specific integrated circuit, in logic implemented in the processing unit 104, or in any other device.

FIG. 2 is a block diagram of a user interface manager that can generate a user interface container. The user interface manager 200 can be executed with any suitable number of applications and software components. For example, the user interface manager 200 can include a user interface host visual tree (also referred to as a user interface host) 202 that can include any number of user interface controls for different types of devices 204 such as gaming console devices, table devices, mobile devices, phone devices, and augmented reality devices, among others. In some embodiments, the user interface host visual tree 202 can include a user interface control root visual 206 that can include Windows.UI.Composition and root Extensible Application Markup Language (XAML) code to declaratively or imperatively generate various elements of a user interface or shell. The user interface manager 200 can also incorporate XAML code for generating visuals that are running outside of a process such as component applications and top level applications.

In some embodiments, the user interface control root visual 206 can also include other chrome 208, which can include a desktop background 210 and a taskbar 212. A taskbar, as referred to herein, can include a link to a digital assistant, a task view illustrating open applications, a set of icons corresponding to applications being executed, and various icons corresponding to applications and hardware features that are enabled each time a device receives power. The desktop background can include any suitable image, any number of links or shortcuts to locally stored files, links to directories, and the like.

In one example, a user interface manager 200 can configure a view set container (also referred to herein as a user interface container) 214 based on a list of applications 216 being executed on a system. The list of applications 216 can indicate a number of applications for which application windows may be generated. The view set container 214 can be configured based in part on a set of layout rules indicating how to arrange the application windows and display characteristics indicating how each application window is to be viewed. In some embodiments, the view set container 214 can also be configured based on a state of the system. In FIG. 2, there are four user applications windows 218, 220, 222, and 224. Each application window 218, 220, 222, and 224 in the list can include window chrome 226, 228, 230, 232, and window content 234, 236, 238, and 240. Window chrome, as referred to herein, can include any suitable frame settings or adornment settings. A frame setting can indicate a window to display proximate a user application, a title bar, and an icon to identify the user application. The frame setting can be the same for each application that is visible, or each application can have different frame settings. An adornment is a control or status area that is attached to an edge of a pane or window such as a toolbar or ruler. In some embodiments, the adornments can include a drop shadow on a top level visual of a user application. In some embodiments, the window chrome 226, 228, 230, 232 can include border and shadow settings for an application window, a title bar, and the like. The window content frames can include an application bar to support browser plug-ins such as Silverlight©, a loading or resuming splash function to indicate an application is loading or resuming, and a wait cursor function that modifies a cursor when an application is executing an instruction.

In some embodiments, the window chrome can be combined with window content to produce the user application windows 218, 220, 222, and 224. The user applications windows 218, 220, 222, and 224 can be arranged in a layout according to display characteristics indicated by the user interface container manager. For example, the user interface container manager 200 can indicate if user application 218 and user application 220, among others, can be overlapped, viewed side by side, resized, and the like. In some embodiments, the user interface container 214 can be built using frameworks such as an extensible application markup language (XAML), Silverlight, Splash, or legacy win32 frameworks.

In some embodiments, the user interface manager 200 can implement a specific user interface (also referred to herein as a shell) experience by composing different components of the shell together in a way that reflects the shell experience. In some embodiments, the user interface manager 200 can communicate with an underlying shell or user interface through a set of private application programming interfaces (APIs). These private APIs can allow the user interface manager to execute instructions corresponding to shell tasks associated with a shell. The shell task can include application activation and lifetime management, application or process locks, application resizing, application visibility changes, shutdown, and other shell features. In some embodiments, the user interface manager 200 can also reflect a current state of a shell. The state of a shell can include the active view, running applications, current screen resolution and other shell state information.

In some embodiments, the user interface manager 200 can present the visuals of user applications to the user. These applications can paint into a buffer (represented by a DCOMP visual) using the framework of the application’s choice. In some examples, the buffer does not paint to screen directly and by default does not get on screen by itself. Rather, the user interface manager 200 can select the visuals for an application and display the application. In some embodiments, the user interface manager 200 can also indicate any per application window content such as Splash screens, among others.

It is to be understood that the block diagram of FIG. 2 is not intended to indicate that the user interface manager 200 is to include all of the components shown in FIG. 2. Rather, the user interface manager 200 can include fewer or additional components not illustrated in FIG. 2 (e.g., additional applications, additional modules, etc.).

FIG. 3 is a process flow diagram of an example method for generating a user interface container. The method 300 can be implemented with any suitable computing device, such as the computing system 102 of FIG. 1.

At block 302, a display detector 142 can detect a plurality of display characteristics from a user interface manager, wherein the plurality of display characteristics correspond to a type of a device. In some examples, the plurality of display characteristics can include layout rules and window chrome or application window rules. The layout rules can indicate how to arrange applications within a user interface. For example, the layout rules can indicate if application windows can be displayed side by side, in a full screen setting, or using an overlapping technique. The window chrome, as discussed above, can indicate frame settings and adornment settings that can include a full size icon, a minimize option, and a close option, among others. In some embodiments, the frame and adornment settings can indicate if a frame is to be a glass window frame or transparent frame, a hidden frame, or a custom frame, among others. The window chrome can also include any suitable title of the user application and icons corresponding to the user application, as well as indicators for security such as enterprise data protection (EDP). The window chrome can also include a drop shadow. In some embodiments, a glass pane or grab handle can be displayed in user interface containers for augmented reality devices. In some embodiments, a gaming console device may not have a frame or adornment around a user application window. In some embodiments, a gaming console device can display a gripper control that enables simultaneously resizing two user applications to be viewable side by side. The display characteristics can also indicate occluded elements that may be occluding an application’s content and a frame position between each application window and an edge of a screen.

At block 304, an application detector 144 can detect a list of applications being executed by the system. The list of applications can include potential application windows to be included in a user interface.

At block 306, a user interface container generator 146 can generate a user interface container by applying the plurality of display characteristics to each of the applications from the list of applications. In some embodiments, the display characteristics can include layout rules that can indicate how to display applications being executed, an application launcher, a task bar, and a window list, among others. An application launcher, as referred to herein, can include a list of executable applications installed on a system, a list of recently accessed applications installed on the system, recommended applications to be installed on the system, and the like. In some examples, the application launcher can include commands that can access programs, documents, and settings. These commands can include a search function based on locally stored applications and files, a list of documents available locally on a device or on a remote server, a control panel to configure components of a device, power function commands to alter the power state of the device, and the like.

In some embodiments, the layout rules can indicate an area of a screen that is to be occupied by the application launcher, task bar, and windows corresponding to applications that are being executed. The layout rules may not rely on pixel to pixel mappings by the applications being executed. For example, user interface controls can be displayed in regions of a display screen based on the layout rules. In some examples, a text field may be displayed in the center of an application and the location of the text field can be determined by the user interface container generator 146 based on the layout rules. For example, the location of the text field may depend upon whether application windows are overlapping one another, if more than one application window is visible, and the like. The location of user interface controls can also be adjusted based on a size and location of the task bar. For example, the task bar can be displayed along the top, bottom, left side, or right side of a display screen. Each type of user interface manager can determine a location of application windows in relation to the location of the task bar. In some embodiments, the user interface container generator 146 can display the user interface based on at least one display characteristic corresponding to the user interface manager. For example, a user interface manager for gaming console devices may display applications in a full screen mode with no frame or adornments. In some embodiments, the user interface container can also display a taskbar and desktop background. A taskbar can include any number of icons corresponding to hardware control applications, executing applications, digital assistants, and the like.

Still at block 306, in some embodiments, the user application windows can be organized into various depth layers based on a locked application in the foreground and applications in the background. In some examples, each depth layer of a user interface can correspond to different user interface container with different display characteristics. In some embodiments, a user interface container corresponding to applications windows above or on top of a locked application may be limited to a single application for execution and any number of application may be included in user interface containers below or in the background of a locked application. In some embodiments, an additional user interface container can be created for an application being executed in full screen mode above or on top of other user interface containers.

In some embodiments, window chrome can be copied from a first user interface container to a second user interface container. Additionally, a layout rule or policy of each user interface container can be configured independently. For example, a layout policy indicating how many applications can be executed may differ between two user interface containers. The window chrome can also specify if application windows are docked elements that may be docked around an application’s content.

In one embodiment, the process flow diagram of FIG. 3 is intended to indicate that the blocks of the method 300 are to be executed in a particular order. Alternatively, in other embodiments, the blocks of the method 300 can be executed in any suitable order and any suitable number of the blocks of the method 300 can be included. Further, any number of additional blocks may be included within the method 300, depending on the specific application. In some embodiments, the plurality of display characteristics can include a three dimensional layout for the list of applications for the augmented reality device. In some embodiments, the method 300 can include detecting a user interface host and detecting the user interface manager from the user interface host based on the type of the device.

In some embodiments, the method 300 can include detecting a second user interface manager and transferring at least one of the applications comprising the plurality of display characteristics to the second user interface manager. In some embodiments, the method 300 can include detecting that the user interface manager corresponds to a first of the applications from the list of applications that resides in a background and detecting that a second user interface manager corresponds to a second of the applications from the list of applications that resides in a foreground. Additionally, the method 300 can also include transitioning between the user interface manager and the second user interface manager in response to detecting a selection of the first of the applications or the second of the applications, wherein the transition comprises generating the user interface container with a second set of display characteristics.

In some embodiments, the user interface container comprises a two dimensional image corresponding to a user interface to be displayed based on the list of the applications and the plurality of display characteristics from the user interface manager. In some embodiments, the plurality of display characteristics for the gaming console type of device indicate no adornment for a full screen display of one of the applications or a resizing of at least two of the applications to enable a side by side display.

FIG. 4 is a block diagram of an example computer-readable storage media that can generate a user interface container. The tangible, computer-readable storage media 400 may be accessed by a processor 402 over a computer bus 404.

Furthermore, the tangible, computer-readable storage media 400 may include code to direct the processor 402 to perform the steps of the current method.

The various software components discussed herein may be stored on the tangible, computer-readable storage media 400, as indicated in FIG. 4. For example, the tangible computer-readable storage media 400 can include a display detector 406 that can detect a plurality of display characteristics from a user interface manager, wherein the plurality of display characteristics correspond to a type of a device. In some embodiments, an application detector 408 can detect a list of applications being executed by the system. In some embodiments, a user interface container generator 410 can generate a user interface container by applying the plurality of display characteristics to each of the applications from the list of applications.

It is to be understood that any number of additional software components not shown in FIG. 4 may be included within the tangible, computer-readable storage media 400, depending on the specific application.

Example 1

In one embodiment, a system for generating user interface containers includes a processor to detect a plurality of display characteristics from a user interface manager, wherein the plurality of display characteristics correspond to a type of a device. The processor can also detect a list of applications being executed by the system and generate a user interface container by applying the plurality of display characteristics to each of the applications from the list of applications.

Alternatively, or in addition, the type of the device comprises a tablet device, a gaming console, a desktop device, a mobile device, an augmented reality device, or a phone device. Alternatively, or in addition, the plurality of display characteristics comprises a three dimensional layout for the applications for the augmented reality device. Alternatively, or in addition, the plurality of display characteristics comprise a frame and an adornment for each of the applications. Alternatively, or in addition, the plurality of display characteristics comprise a title bar corresponding to each of the applications. Alternatively, or in addition, the processor can detect a user interface host and detect the user interface manager from the user interface host based on the type of the device. Alternatively, or in addition, the processor can detect a second user interface manager and transfer at least one of the applications comprising the plurality of display characteristics to the second user interface manager. Alternatively, or in addition, the processor can detect that the user interface manager corresponds to a first of the applications from the list of applications that resides in a background. The processor can also detect that a second user interface manager corresponds to a second of the applications from the list of applications that resides in a foreground and transition between the user interface manager and the second user interface manager in response to detecting a selection of the first of the applications or the second of the applications, wherein the transition comprises generating the user interface container with a second set of display characteristics. Alternatively, or in addition, the user interface container comprises a two dimensional image corresponding to a user interface to be displayed based on the list of the applications and the plurality of display characteristics from the user interface manager. Alternatively, or in addition, the plurality of display characteristics for the gaming console type of device indicate no adornment for a full screen display of one of the applications. Alternatively, or in addition, the plurality of display characteristics for the gaming console type of device indicate a gripper control that is to enable simultaneously resizing two of the applications to be viewable side by side.

Example 2

In another embodiment described herein, a method for generating user interface containers can include detecting a plurality of display characteristics from a user interface manager, wherein the plurality of display characteristics correspond to a type of a device. The method can also include detecting a list of applications being executed by the system. Furthermore, the method can include generating a user interface container by applying the plurality of display characteristics to each of the applications from the list of applications.

Alternatively, or in addition, the type of the device comprises a tablet device, a gaming console, a desktop device, a mobile device, an augmented reality device, or a phone device. Alternatively, or in addition, the plurality of display characteristics comprises a three dimensional layout for the applications for the augmented reality device. Alternatively, or in addition, the plurality of display characteristics comprise a frame and an adornment for each of the applications. Alternatively, or in addition, the plurality of display characteristics comprise a title bar corresponding to each of the applications. Alternatively, or in addition, the method can include detecting a user interface host and detecting the user interface manager from the user interface host based on the type of the device. Alternatively, or in addition, the method can include detecting a second user interface manager and transferring at least one of the applications comprising the plurality of display characteristics to the second user interface manager. Alternatively, or in addition, the method can include detecting that the user interface manager corresponds to a first of the applications from the list of applications that resides in a background. The method can also include detecting that a second user interface manager corresponds to a second of the applications from the list of applications that resides in a foreground and transitioning between the user interface manager and the second user interface manager in response to detecting a selection of the first of the applications or the second of the applications, wherein the transitioning comprises generating the user interface container with a second set of display characteristics. Alternatively, or in addition, the user interface container comprises a two dimensional image corresponding to a user interface to be displayed based on the list of the applications and the plurality of display characteristics from the user interface manager. Alternatively, or in addition, the plurality of display characteristics for the gaming console type of device indicate no adornment for a full screen display of one of the applications. Alternatively, or in addition, the plurality of display characteristics for the gaming console type of device indicate a gripper control that is to enable simultaneously resizing two of the applications to be viewable side by side.

Example 3

In yet another embodiment described herein, one or more computer-readable storage devices for generating user interface containers can include a plurality of instructions that, based at least on execution by a processor, cause the processor to detect a plurality of display characteristics from a user interface manager, wherein the plurality of display characteristics correspond to a type of a device. The plurality of instructions can also cause the processor to detect a list of applications being executed by the system and generate a user interface container by applying the plurality of display characteristics to each of the applications from the list of applications.

Alternatively, or in addition, the type of the device comprises a tablet device, a gaming console, a desktop device, a mobile device, an augmented reality device, or a phone device. Alternatively, or in addition, the plurality of display characteristics comprises a three dimensional layout for the applications for the augmented reality device. Alternatively, or in addition, the plurality of display characteristics comprise a frame and an adornment for each of the applications. Alternatively, or in addition, the plurality of display characteristics comprise a title bar corresponding to each of the applications. Alternatively, or in addition, the plurality of instructions can cause the processor to detect a user interface host and detect the user interface manager from the user interface host based on the type of the device. Alternatively, or in addition, the plurality of instructions can cause the processor to detect a second user interface manager and transfer at least one of the applications comprising the plurality of display characteristics to the second user interface manager. Alternatively, or in addition, the plurality of instructions can cause the processor to detect that the user interface manager corresponds to a first of the applications from the list of applications that resides in a background. The plurality of instructions can cause the processor to detect that a second user interface manager corresponds to a second of the applications from the list of applications that resides in a foreground and transition between the user interface manager and the second user interface manager in response to detecting a selection of the first of the applications or the second of the applications, wherein the transition comprises generating the user interface container with a second set of display characteristics. Alternatively, or in addition, the user interface container comprises a two dimensional image corresponding to a user interface to be displayed based on the list of the applications and the plurality of display characteristics from the user interface manager. Alternatively, or in addition, the plurality of display characteristics for the gaming console type of device indicate no adornment for a full screen display of one of the applications. Alternatively, or in addition, the plurality of display characteristics for the gaming console type of device indicate a gripper control that is to enable simultaneously resizing two of the applications to be viewable side by side.

In particular and in regard to the various functions performed by the above described components, devices, circuits, systems and the like, the terms (including a reference to a “means”) used to describe such components are intended to correspond, unless otherwise indicated, to any component which performs the specified function of the described component, e.g., a functional equivalent, even though not structurally equivalent to the disclosed structure, which performs the function in the herein illustrated exemplary aspects of the claimed subject matter. In this regard, it will also be recognized that the innovation includes a system as well as a computer-readable storage media having computer-executable instructions for performing the acts and events of the various methods of the claimed subject matter.

There are multiple ways of implementing the claimed subject matter, e.g., an appropriate API, tool kit, driver code, operating system, control, standalone or downloadable software object, etc., which enables applications and services to use the techniques described herein. The claimed subject matter contemplates the use from the standpoint of an API (or other software object), as well as from a software or hardware object that operates according to the techniques set forth herein. Thus, various implementations of the claimed subject matter described herein may have aspects that are wholly in hardware, partly in hardware and partly in software, as well as in software.

The aforementioned systems have been described with respect to interaction between several components. It can be appreciated that such systems and components can include those components or specified sub-components, some of the specified components or sub-components, and additional components, and according to various permutations and combinations of the foregoing. Sub-components can also be implemented as components communicatively coupled to other components rather than included within parent components (hierarchical).

Additionally, it can be noted that one or more components may be combined into a single component providing aggregate functionality or divided into several separate sub-components, and any one or more middle layers, such as a management layer, may be provided to communicatively couple to such sub-components in order to provide integrated functionality. Any components described herein may also interact with one or more other components not specifically described herein but generally known by those of skill in the art.

In addition, while a particular feature of the claimed subject matter may have been disclosed with respect to one of several implementations, such feature may be combined with one or more other features of the other implementations as may be desired and advantageous for any given or particular application. Furthermore, to the extent that the terms “includes,” “including,” “has,” “contains,” variants thereof, and other similar words are used in either the detailed description or the claims, these terms are intended to be inclusive in a manner similar to the term “comprising” as an open transition word without precluding any additional or other elements.

文章《Microsoft Patent | Generating user interface containers》首发于Nweon Patent

]]>
Microsoft Patent | Distributed depth data processing https://patent.nweon.com/25203 Wed, 30 Nov 2022 19:23:09 +0000 https://patent.nweon.com/?p=25203 ...

文章《Microsoft Patent | Distributed depth data processing》首发于Nweon Patent

]]>
Patent: Distributed depth data processing

Patent PDF: 加入映维网会员获取

Publication Number: 20220383455

Publication Date: 20221201

Assignee: Microsoft Technology Licensing, Llc (Redmond, Wa, Us)

Abstract

Examples are provided that relate to processing depth camera data over a distributed computing system, where phase unwrapping is performed prior to denoising. One example provides a time-of-flight camera comprising a time-of-flight depth image sensor, a logic machine, a communication subsystem, and a storage machine holding instructions executable by the logic machine to process time-of-flight image data acquired by the time-of-flight depth image sensor by, prior to denoising, performing phase unwrapping pixel-wise on the time-of-flight image data to obtain coarse depth image data comprising depth values; and send the coarse depth image data and the active brightness image data to a remote computing system via the communication subsystem for denoising.

Claims

1.A time-of-flight camera, comprising: a time-of-flight depth image sensor; a logic machine; a communication subsystem; and a storage machine holding instructions executable by the logic machine to: process time-of-flight image data acquired by the time-of-flight depth image sensor by, prior to denoising, performing phase unwrapping pixel-wise on the image data to obtain coarse depth image data comprising depth values; and send the coarse depth image data and the active brightness image data to a remote computing system via the communication subsystem for denoising.

Description

BACKGROUND

Depth sensing systems, such as time-of-flight (ToF) cameras, may be used to produce a depth image of an environment, with each pixel of the depth image representing a distance to a corresponding point in the environment. In ToF imaging, a distance to a point on an imaged surface in the environment is determined based on a length of a time interval in which light emitted by the ToF camera travels out to that point and then returns back to a sensor of the ToF camera. The raw data collected at the depth sensor is processed to produce a depth image.

SUMMARY

Examples are provided that relate to processing depth image data over a distributed computing system, where phase unwrapping is performed prior to denoising. One example provides a time-of-flight camera comprising a time-of-flight depth image sensor, a logic machine, a communication subsystem, and a storage machine holding instructions executable by the logic machine to process time-of-flight image data acquired by the time-of-flight depth image sensor. The instructions are executable to, prior to denoising perform phase unwrapping pixel-wise on the time-of-flight image data to obtain coarse depth image data comprising depth values. The instructions are further executable to send the coarse depth image data and the active brightness image data to a remote computing system via the communication subsystem for denoising.

This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This

Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter. Furthermore, the claimed subject matter is not limited to implementations that solve any or all disadvantages noted in any part of this disclosure.

BRIEF DESCRIPTION OF THE DRAWINGS

FIGS. 1A-1B show example electronic devices comprising time-of-flight (ToF) cameras.

FIG. 2 shows aspects of an example ToF camera system.

FIG. 3 schematically illustrates example ToF image data for a plurality K of modulation frequencies.

FIG. 4 shows an example pipeline for processing depth images that includes pixel-wise operations on a depth camera and convolutional operations at a computing device remote from the depth camera.

FIG. 5 schematically shows another example distributed depth engine pipeline for processing segmented depth data.

FIG. 6 shows an example segmentation of a coarse depth image.

FIG. 7 shows a flow diagram of an example method for processing depth sensor data using pixel-wise operations on a depth camera to generate coarse depth image data and active brightness image data.

FIG. 8 shows a flow diagram of an example method for denoising coarse depth image data and active brightness image data.

FIG. 9 shows a block diagram of an example computing system.

DETAILED DESCRIPTION

As mentioned above, time-of-flight (ToF) depth cameras measure, for each sensor pixel of a depth image sensor, a length of a time interval for light emitted by the depth camera to return back to the sensor pixel. As reflectivity may vary across objects in a scene, some pixels may sense signals with low signal to noise ratios in some instances. Further, depth image sensor pixels may be sensitive to crosstalk errors, where photoelectrons captured at one pixel diffuse toward and are collected at neighboring pixels.

In view of such noise issues, denoising is commonly performed by a ToF depth camera on raw depth image data prior to performing other data processing, such as phase unwrapping that is used in phase-based ToF imaging. Phase-based ToF imaging is a variant of ToF imaging in which depth is computed based on the phase shift of amplitude modulated light reflected back from a subject. In phase-based ToF imaging, a light source on the ToF camera illuminates a scene with amplitude modulated light. The phase shift in the light reflected back from the subject is proportional to the subject’s distance modulo the wavelength of the modulation frequency. However, due to the periodic nature of the modulated light, the measured total phase repeats (or wraps) every 2π. Since the number of wrappings cannot be directly measured via a phase based ToF pixel, the total phase, and thus the actual distance related to the measurement, is ambiguous. To address this issue, two or more different modulation frequencies can be used to increase the range of unambiguity, allowing the phase information to be “unwrapped” for the accurate determination of distance. Phase unwrapping is a way to disambiguate the phase shift data by illuminating the scene with amplitude-modulated light of a plurality of different frequencies, as the distance ambiguities are different for each frequency of illumination light.

Accurate phase unwrapping may be difficult due to noise in the collected phase information. Noise, particularly near a 2π wrapping boundary, can lead to incorrect unwrapping, and thus relatively large errors in a determined distance at a pixel. As such, depth engine pipelines (processing pipelines used to process depth image data) commonly include procedures to denoise the data prior to performing phase unwrapping. For example, a depth sensor may perform multi-frequency phase collection to obtain noisy data for a plurality of modulation frequencies. Then, the noisy data is processed via signal calibration correction and denoising. Denoising processes generally utilize convolutions that apply a m×n kernel of pixels around a pixel being denoised, and thus are computationally expensive compared to pixelwise operations. After denoising, the total phase can be calculated from the complex signal, followed by phase unwrapping and crosstalk correction. Additionally, in some examples, an intensity image may be obtained from the denoised data via active brightness averaging. The final depth and, in some examples, intensity images are then output, e.g., for use in gesture identification, AR applications, and/or other uses.

Depth engine pipelines are commonly implemented locally on ToF cameras. However, it may be difficult to support compute-intensive depth imaging on some low-power computing devices, such as those depicted in FIGS. 1A-1B (described below). One possible solution is to send raw sensor data to a remote device having greater compute resources and/or more available power. As mentioned above, denoising may be a compute intensive procedure that may utilize relatively large convolution kernels (e.g., N×N Gaussian filters where N≥5 in some examples). Larger denoising kernels may provide better precision at high computational cost, while smaller denoising kernels that have lower computational cost may result in lower precision. Thus, sending depth image data to a remote device for denoising may allow the use of a larger denoising kernel. However, the depth image data used to generate a single depth image can be large. For example, an example depth sensor may collect N*K images per frame, where K is the number of modulation frequencies (e.g. 2 or 3 in some examples) and N is a number of samples acquired at different phase angles (e.g. 2 or 3 in some examples) for each modulation frequency. Transmitting such a large amount of depth data to a remote computing system for denoising prior to phase unwrapping may be difficult at a reasonable camera frame rate (e.g. 30-60 Hz in some examples), as high bandwidth communication may have high power costs while low bandwidth communication may be insufficient for the quantity of data.

Accordingly, examples are disclosed relating to a distributed depth engine pipeline for processing depth images in which pixel-wise operations, including phase unwrapping, are performed prior to denoising, and coarse images are transmitted to a remote machine for denoising using more compute-intensive convolutional spatial and/or temporal filtering processes. Processing depth sensor data over a distributed computing architecture according to the present disclosure may provide various advantages. For example, all operations performed at the ToF camera may be pixel-wise, which may reduce power consumption and thus provide a ToF camera better fit for lower-power devices, such as mobile devices. Further, as the phase data is unwrapped prior to transmitting, less data is compressed/transmitted compared to transmitting raw depth image data (e.g. a single coarse depth image per frame, or a coarse depth image and coarse intensity image per frame), offering additional power savings and allowing for lower bandwidth connections. Additionally, performing denoising on a more powerful remote system may allow for use of larger denoising kernels and thereby provide improved depth precision. The examples disclosed also provide the ability to segment coarse depth data. As such, the distributed depth processing pipeline further may perform heavy compute on low-signal data while using fewer computing resources on high-signal data, which may provide increased speed and efficiency. Likewise, the distributed depth processing system may selectively transmit either high-signal data or ignore low-signal data for remote processing in various examples.

Prior to discussing these examples in detail, FIGS. 1A-1B illustrate various different example electronic devices 100A-E that may employ phase-based ToF depth ToF cameras. Referring first to FIG. 1A, device 100A is a smartphone that includes a ToF camera 102A. Device 100B is a personal computer that includes a ToF web camera 102B. Device 100C is a video game system that includes a peripheral camera system comprising a ToF camera 102C. Device 100D is a virtual-reality headset that includes a camera system comprising a ToF camera 102D. Each device may communicate with a remote computing system 104 to implement a distributed depth pipeline according to the disclosed examples. In combination with remote computing system 104, electronic devices 100A-D may process depth image data utilizing a distributed depth engine pipeline. Remote computing system 104 may comprise any suitable computing system, such as a cloud computing system, a PC, a laptop, a phone, a tablet, etc.

FIG. 1B shows an example use environment 110 including a security camera 100E comprising a ToF camera. Security camera 100E sends data to an IoT (“Internet of Things”) endpoint computing device 120 via a communication hub 116 that also connects to other IoT devices, such as a thermostat 114. In combination with communication hub 116 and/or IoT endpoint computing device 120, security camera 100E may process depth image data utilizing a distributed depth engine pipeline. IoT endpoint computing device 120 may comprise any suitable computing system, e.g., cloud computing system, enterprise system, networked PC, or a virtual machine implemented on a cloud computing system.

FIG. 2 shows a schematic depiction of an example phase-based ToF depth imaging system 200 including a ToF camera 202. ToF camera 202 includes a sensor array 204 comprising a plurality of ToF pixels 206 each configured to acquire light samples that capture phase data, a controller 208, and an objective lens system 210. In some examples, objective lens system 210 may be omitted. Objective lens system 210 is configured to focus an image of at least one surface 220 of a subject 222 onto sensor array 204. Controller 208 is configured to gather and process data from ToF pixels 206 of sensor array 204 and thereby construct a depth image. Controller 208 may comprise executable instructions (e.g. software, firmware and/or hardware) to perform phase unwrapping, as described below. In some examples, controller 208 may be implemented across one or more computing devices. Controller 208 may communicate with a remote computing system 212 to perform depth image processing in accordance with the distributed depth image processing pipeline examples disclosed herein. Examples of hardware implementations of computing devices configured to perform phase unwrapping are described in more detail below with reference to FIG. 12.

Depth imaging system 200 also includes a modulated light emitter 230, and an analog and/or digitally modulated electronic shutter 232 for sensor array 204 to control the integration of light by the sensor array 204. Modulated light emitter 230 and sensor array 204 may be controlled via controller 208. Modulated light emitter 230 may be configured to emit electromagnetic radiation having any frequency detectable by ToF pixels 206. For example, modulated light emitter 230 may include an infrared (IR) light-emitting diode (LED), laser diode (LD), or any other suitable light source. The amplitude modulated light may be modulated at different frequencies sequentially or simultaneously, e.g., the modulation waveform may comprise a manifold of frequencies.

Sensor array 204 is configured to sample light from modulated light emitter 230 as reflected off surface 220 and back to the camera. Each ToF sensing pixel 206 of sensor array 204 may comprise one or more pixel taps operable to integrate the reflected light signal at different time intervals, from which the phase shift can be determined. Sensor array 204 is controlled, for each modulation frequency, to sample light at plural phase angles of the amplitude-modulated light from the light source, and determine a phase sample for each modulation frequency from the plurality of light samples for the modulation frequency. The phase samples can then be unwrapped to obtain a depth value for each pixel.

As mentioned above, due to the periodic nature of the modulated light, the measured total phase repeats (or wraps) every 2π. For example, given a measured phase {tilde over (ϕ)}k, the total phase is {tilde over (ϕ)}k+2πnk, where nk is an integer. Since nk cannot be directly measured via a phase based ToF pixel, the total phase, and thus the actual distance related to the measurement, is ambiguous. Thus, in phase-based ToF imaging, there is a limitation on the distance that can be measured (referred to as the unambiguity range) imposed by the modulation frequency. Two or more different modulation frequencies can be used to increase the range of unambiguity, and the collected phase shift data is then unwrapped for the accurate determination of distance.

FIG. 3 schematically illustrates example ToF image data 300 for a plurality K of modulation frequencies. Data 300 represents data that can be acquired by depth imaging system 200 during multi-frequency frame collection. In the example shown, the depth data comprises a M×N array of data for each of K modulation frequencies, resulting in M×N grids 302ac of data, wherein each pixel 304 in each grid represents a measurement acquired at a corresponding illumination light modulation frequency k of K modulation frequencies. For example, the experimental signal {tilde over (S)}k collected by pixel 304 at (m, n) is represented by

{tilde over (S)}k(m, n)=Ãk(m, n)ei{tilde over (φ)}k(m,n)

where {tilde over (ϕ)}k is the phase, {m∈1,2 . . . , M}, {n∈1,2 . . . , N}, and {k∈1,2 . . . , K}. A tilde accent over a variable indicates that the variable is obtained and/or calculated experimentally, while the absence of a tilde accent indicates variables that correspond to a noise-free situation. While the example depicted in FIG. 3 shows three grids 302ac, any number of frequencies K≥2 can be used.

The phase of the complex signal {tilde over (ϕ)}k may be computed as

{tilde over (ϕ)}k=arctan2({tilde over (S)}ki,{tilde over (S)}kr)

where {tilde over (S)}ki is the imaginary part of the signal collected for frequency k and {tilde over (S)}kr is the real part of the signal collected. The measured phase is used to compute the depth value associated with the pixel. However, as mentioned above, in phase-based ToF imaging, there is a limitation on the distance that can be measured (referred to as the unambiguity range) imposed by the modulation frequency. Accordingly, a set of K≥2 modulation frequencies k can be used to increase the range of unambiguity, allowing the phase information to be unwrapped for the accurate determination of distance. Phase unwrapping is a way to disambiguate the phase shift data and identify a correct distance value by illuminating the scene with amplitude-modulated light of a plurality of different frequencies, as the distance ambiguities are different for each frequency of illumination light. For example, in a multifrequency method, the amplitude modulated light may comprise a waveform comprising a plurality of frequencies {right arrow over (f)}={f1,f2, . . . , fK}. The collection of frequencies comprises frequencies that are chosen to wrap at different locations in the unambiguity range, which extends from distance zero to a point where all three frequencies wrap at a common distance.

As mentioned above, current depth image data processing methods perform denoising prior to phase unwrapping, which often involves the application of a convolutional spatial filter comprising a kernel of pixels surrounding a pixel being denoised. However, the application of the spatial filter for each pixel of depth data may be computationally intensive and consume significant computing resources. Thus, the disclosed examples utilize a distributed depth engine pipeline to move more compute-intensive operations to a remote device with more available power and/or compute resources, thereby preserving resources local to the depth imaging system. In this manner, a larger denoising kernel can be applied by the remote system to correct errors in a coarse depth image that is output by the depth imaging system.

FIG. 4 schematically shows an example distributed pipeline 400 for processing time-of-flight image data to obtain a denoised depth image. In this example, the procedures above dashed line 401 are performed within the ToF camera and/or within the depth camera, while procedures below the line are performed by processing remote to the depth camera. At 402, the pipeline includes multi-frequency frame collection, where a plurality of phase samples (each comprising a frame of image data) is collected for each of a plurality of amplitude modulation frequencies. The phase data is collected by a depth image sensor of the ToF camera. At 404, pixel-wise signal calibration correction is performed. In the example depicted, a 1×1 kernel indicates pixel-wise operations local to the depth camera.

In current depth engine pipelines, denoising is performed prior to phase and active brightness calculation. However, in the disclosed examples, phase and active brightness calculations are performed without first performing denoising (and thus without using spatial or temporal filters.) In the depicted example, at 406, distributed pipeline 400 calculates phase information from the time-of-flight image data, and then performs phase unwrapping pixel-wise at 408. The phase unwrapping operations provide a phase number (i.e. a number of wrappings of each modulation frequency) for each pixel, which is then used to compute a depth value for each pixel. As a result of the phase unwrapping, a coarse depth image is produced. The coarse depth image may have more unwrapping errors than a depth image produced using denoised data, as noise can cause a phase measurement to appear in a different phase wrapping than wrapping corresponding to the actual distance. However, such phase errors may be corrected by remote denoising. The calibrated image data also may be used to produce an active brightness (AB) image, at 412. Then, at 414, pixel-wise AB averaging operations are performed to generate the active brightness image.

Continuing with FIG. 4, the coarse depth image and active brightness image are transmitted to a remote computing system at 416. In some examples, the coarse depth image and active brightness image can be compressed to conserve bandwidth. Further, in some examples, the coarse depth image can be segmented based upon a metric such as signal to noise ratio, and pixels above or below a threshold may be sent for remote processing to the exclusion of other pixels, as described in more detail below.

At 418, the remote computing system uses the depth values from coarse depth image to reconstruct noisy phase data. For example, M×N×k phase data can be reconstructed from a coarse M×N depth image by

{tilde over (S)}(m, n, k)={tilde over (S)}r(m, n, k)+i{tilde over (S)}i(m, n, k)=(m, n)ei{tilde over (ϕ)}(m,n,k)

where {tilde over (S)} is the reconstructed signal, {tilde over (S)}r, and {tilde over (S)}i are the real and imaginary parts of the signal, is the active brightness transmitted by the device, and {tilde over (ϕ)} is the phase. Here, the tilde accent indicates a noisy signal or noisy value. The phase may be determined from the coarse depth by

ϕ˜(m,n,k)=4πd˜(m,n)fkc

where d is the depth and fk is a frequency of K total frequencies.

In some examples, the frequencies used in reconstruction can be different from the frequencies used by the camera during frame collection. For example, a set of virtual frequencies can be introduced and used to reconstruct phase data using the above equations. Further, any suitable plurality K of frequencies may be used. Different frequencies and/or a different number of frequencies may be chosen to produce a more noise resilient solution by maximizing the area, volume, or hypervolume of the Voronoi cell determined by the frequencies.

At 420, the distributed pipeline performs signal correction on the coarse depth image and coarse intensity image. As described in more detain below, signal correction may comprise various denoising processes, such as jitter reduction, smoothing, and/or edge enhancement, some of which can include convolutional operations, as shown by the depicted N×N kernel. Further, the signal correction can include segmentation of the image to process different pixels differently in some examples. After signal correction, crosstalk correction is performed at 422 as shown by the depicted N′×N′ kernel to generate a final denoised depth image and a final coarse intensity (active brightness) image at 424. The final images may be output, for example, to software applications on the remote computing system, to the device incorporating the ToF camera, or to a cloud computing system.

Using distributed pipeline 400, more compute-intensive processes can be performed remotely rather than on the depth imaging system. For example, remotely denoising at 420 may use large kernel sizes (N×N Gaussian filters, N≥5), thus improving efficiency of the distributed pipeline. In some examples, the denoising kernel may have a size of between 5×5 and 19×19 pixels. Remote processing of denoising may allow for larger kernel sizes to be employed, compared to other pipelines where denoising is performed on the ToF camera. The use of such larger denoising kernels remotely after phase unwrapping may allow the recovery of depth data that has a higher accuracy compared to the use of a smaller denoising kernel used on the depth camera prior to phase unwrapping.

In some examples, the kernel size may be tuned to provide a desired level of accuracy. As discussed in more detail below, a relatively larger or smaller kernel size may be used depending on a signal to noise ratio, and kernel sizes may be varied on a pixel-by-pixel basis. Further, in some examples, the remote system alternatively or additionally can perform temporal filtering, which may comprise compute-intensive convolutions over T stored coarse depth image frames (e.g., using N×N×T kernels). The use of a remote system to perform temporal filtering after phase unwrapping may provide advantages over performing temporal filtering on a depth camera prior to phase unwrapping. For example, temporal filtering involves storing a number T of prior image frames. As such, performing temporal filtering prior to phase unwrapping involves the storage of a greater number of individual image frames of depth image data for each depth image, due to having to store phase samples at each modulation frequency, than performing temporal filtering using coarse depth (and coarse intensity) data. Further, a remote computing system may have more available storage than a depth camera, allowing the remote computing system to store a greater number of prior depth images.

As mentioned above, in some examples, a coarse depth image (and potentially an active brightness image corresponding to the depth image) may be segmented such that some depth pixels (as well as some intensity pixels of an AB image) are processed locally on a device comprising a depth camera, while other pixels are processed remote from the device comprising the depth camera. FIG. 5 shows a block diagram of an example distributed depth engine pipeline 500 that illustrates examples of such processing pathways. ToF image sensor 502 of depth camera 503 generates a coarse depth image and an active brightness image at 504, as described above with regard to FIG. 4. The ToF camera 503 also segments the images to direct some pixels of depth data to cloud-based computing system 518 for more compute-intensive processing and other pixels to a local processor (e.g. local to a device with which the depth camera is integrated or for which the depth camera is a peripheral) for less compute-intensive processing (a processor of a phone, wearable device, etc.). In some instances, a coarse depth image may not be segmented, and thus processed fully locally or fully remotely, depending upon conditions applied when determining whether to segment. Example conditions are described below.

For a segmented image, a first subset of pixels is transmitted at 505 to a processor local to a device on which the depth camera 503 is located for local denoising 506 utilizing a smaller denoising kernel. The denoised pixels may optionally be compressed at 508, provided to services at 510, and/or provided to a consuming application 512. Example services include machine-learning processes and/or high level algorithms, such as face identification, object recognition, surface reconstruction, and simultaneous localization and mapping algorithms. Other pixels of depth data from the coarse depth image can be compressed at 514 and transmitted at 516 to a cloud-based computing system 518 for remote denoising using a larger denoising kernel. The cloud-based computing system denoises those pixels of the coarse depth image (and potentially pixels of an active brightness image) to produce denoised pixels, and then provides the denoised pixels to the consuming application 512.

FIG. 6 shows an example segmentation of a coarse depth image 600 to generate a segmented image 602. In some examples, the segmenting process may be based upon image regions comprising relatively higher signal to noise (i.e., high signal) and regions comprise relatively lower signal to noise (i.e., low signal), and which regions are edge regions. Any suitable image metrics may be used to segment a coarse depth image, including variance, standard deviation, average, and/or coefficient of dispersion for intensity and/or depth. The coefficient of variation is the standard deviation of the kernel over the average value of the population and is a non-dimensional quantity that provides the variability in relation to the mean of the population. When the data in the kernel is highly variable compared to the mean signal it can indicate an edge in the case of active brightness, or unwrapping errors in the case of depth. The coefficient of dispersion, defined as the variance of the population over the average, is a dimensional amount and therefore non scale invariant that provides an indication of cluster in the data, i.e. a value over 1 detects edges in the case of active brightness, or unwrapping errors in the case of depth.

As mentioned above, in some examples, larger denoising kernels are used on lower signal to noise regions, and smaller kernels are used on higher signal to noise regions. Further, in some examples, edge regions are treated with other filters, such as Gaussian blurring. In some examples, Gaussian blurring generates coefficients radially distributed and spatially dependent according to:

eλ(ρ)(i2+j2) with {i=I,I+1, ,Ij=J,J+1, ,J

where λ is a parameter responsible for the smoothing. In some examples, the precision, or “jitter,” may be controlled and stabilized by making the smoothing coefficient dependent on the ratio ρ:

ρ(Δ)=ΔTζΔOζ with ζ=12,1

where ρ is the ratio between the noise target ΔTζ and the variability of the depth without filtering Δ0ζ within the kernel. Here, ζ denotes either the standard deviation (ζ=½) or the variance (ζ=1).

FIG. 7 shows a flow diagram of an example method 700 for processing depth data acquired from a ToF depth image sensor. Method 700 may be enacted on a ToF camera comprising the ToF depth image sensor, including those depicted in FIGS. 1A-1E above. At 702, method comprises receiving time-of-flight depth image data from a ToF depth image sensor. In some examples, at 704, the method comprises, prior to denoising, performing signal calibration correction pixel-wise on the time-of-flight image data. At 706 the method comprises, prior to denoising, performing active brightness averaging pixel-wise on the time-of-flight image data to obtain active brightness image data. At 708, the method comprises, prior to denoising, performing phase unwrapping pixel-wise on the time-of-flight image data to obtain coarse depth image data. In some examples, at 710, the method comprises, compressing the coarse depth image data and active brightness image data.

Method 700 further comprises, at 712, sending the coarse depth image data and the active brightness image data to a remote computing system via a communication subsystem for denoising. As mentioned above, in some examples, compressed images are sent to conserve bandwidth. In some examples, at 714, the remote computing system is local to a device incorporating the ToF camera. In other examples, at 716, the remote computing system is remote to a device incorporating the ToF camera. Further, as mentioned above, in some examples, coarse depth image data and active brightness image data can be segmented, and subsets of pixels of coarse depth image data can be sent to each of a local processor and a remote computing system based upon the segmenting.

In some examples, at 718, the method further comprises receiving, at a device incorporating the ToF camera, denoised depth image data and denoised active brightness image data from the remote system. Such a device may comprise any suitable computing device, such as a head-mounted display device, a phone, a laptop, an IoT sensor, an automobile, any of devices 100A-100E, or other device. Where a subset of coarse depth pixels were sent for remote processing, receiving the denoised depth image data and denoised active brightness image data may comprise receiving denoised image data corresponding to the coarse image data sent to the remote device.

FIG. 8 shows a flow diagram for an example method 800 enacted on a computing system for denoising a coarse depth image. Method 800 may be enacted on any suitable computing system, including a cloud computing system, enterprise system, networked PC, etc. At 802, the method comprises receiving coarse depth image data from a remote device comprising a ToF camera. Further, in some examples, at 804, the method comprises receiving active brightness image data. In some examples, active brightness image data comprises an average active brightness image. The coarse depth image data and active brightness image data may be received from a ToF camera or a device comprising a ToF camera that is remote to the computing system enacting method 800.

At 806, the method comprises applying a spatial denoising filter to the coarse depth image data to form a denoised depth image, the spatial denoising filter comprising a convolution kernel. In some examples, at 808, the convolution kernel comprises a size of 3×3 or greater. For example, the convolution kernel can have a size of between 3×3 and 19×19 in more specific examples.

In some examples, method 800 comprises, at 810, denoising the coarse depth image data based at least on the active brightness image data. For example, the AB image can be used with the coarse depth image to reconstruct phase data, and denoising can be performed based on the reconstructed phase data.

Further, in some examples, at 812, the method comprises segmenting the coarse depth image data into regions such as lower signal regions, higher signal regions, and/or edge regions. Also, in some such examples, at 814, the method comprises performing Gaussian blurring on edge regions. Additionally, in some examples, at 816, the method comprises denoising image regions with lower signal to noise ratios using a relatively larger convolution kernel, and denoising image regions with higher signal to noise ratios using a relatively smaller convolution kernel.

In some examples, coarse depth image data can be stored and used in denoising by temporal filtering. In such examples, at 820, the method comprises denoising the coarse depth image data using temporal filtering based on prior stored coarse depth image data. In some such examples, at 822, temporal filtering is performed based upon 3-7 previously received coarse depth images. In some such examples, when there is high relative movement between images, temporal filtering is performed based upon a greater number of images. In other examples, any other suitable number of images may be used for temporal filtering. More generally, any suitable temporal and/or spatio-temporal filters may be used in the denoising process.

Method 800 further comprises, at 824, outputting the denoised depth image data. In some examples, the method also outputs denoised active brightness image data. In some examples, at 826, the denoised depth image data and active brightness image data are output to the remote device comprising the ToF camera.

In some embodiments, the methods and processes described herein may be tied to a computing system of one or more computing devices. In particular, such methods and processes may be implemented as a computer-application program or service, an application-programming interface (API), a library, and/or other computer-program product.

FIG. 9 schematically shows a non-limiting embodiment of a computing system 900 that can enact one or more of the methods and processes described above. Computing system 900 is shown in simplified form. Computing system 900 may take the form of one or more personal computers, server computers, tablet computers, home-entertainment computers, network computing devices, gaming devices, mobile computing devices, mobile communication devices (e.g., smart phone), and/or other computing devices.

Computing system 900 includes a logic machine 902 and a storage machine 904. Computing system 900 may optionally include a display subsystem 906, input subsystem 908, communication subsystem 910, and/or other components not shown in FIG. 9.

Logic machine 902 includes one or more physical devices configured to execute instructions. For example, the logic machine may be configured to execute instructions that are part of one or more applications, services, programs, routines, libraries, objects, components, data structures, or other logical constructs. Such instructions may be implemented to perform a task, implement a data type, transform the state of one or more components, achieve a technical effect, or otherwise arrive at a desired result.

The logic machine may include one or more processors configured to execute software instructions. Additionally or alternatively, the logic machine may include one or more hardware or firmware logic machines configured to execute hardware or firmware instructions. Processors of the logic machine may be single-core or multi-core, and the instructions executed thereon may be configured for sequential, parallel, and/or distributed processing. Individual components of the logic machine optionally may be distributed among two or more separate devices, which may be remotely located and/or configured for coordinated processing. Aspects of the logic machine may be virtualized and executed by remotely accessible, networked computing devices configured in a cloud-computing configuration.

Storage machine 904 includes one or more physical devices configured to hold instructions executable by the logic machine to implement the methods and processes described herein. When such methods and processes are implemented, the state of storage machine 904 may be transformed—e.g., to hold different data.

Storage machine 904 may include removable and/or built-in devices. Storage machine 904 may include optical memory (e.g., CD, DVD, HD-DVD, Blu-Ray Disc, etc.), semiconductor memory (e.g., RAM, EPROM, EEPROM, etc.), and/or magnetic memory (e.g., hard-disk drive, floppy-disk drive, tape drive, MRAM, etc.), among others. Storage machine 904 may include volatile, nonvolatile, dynamic, static, read/write, read-only, random-access, sequential-access, location-addressable, file-addressable, and/or content-addressable devices.

It will be appreciated that storage machine 904 includes one or more physical devices. However, aspects of the instructions described herein alternatively may be propagated by a communication medium (e.g., an electromagnetic signal, an optical signal, etc.) that is not held by a physical device for a finite duration.

Aspects of logic machine 902 and storage machine 904 may be integrated together into one or more hardware-logic components. Such hardware-logic components may include field-programmable gate arrays (FPGAs), program- and application-specific integrated circuits (PASIC/ASICs), program- and application-specific standard products (PSSP/ASSPs), system-on-a-chip (SOC), and complex programmable logic devices (CPLDs), for example.

The terms “module,” “program,” and “engine” may be used to describe an aspect of computing system 900 implemented to perform a particular function. In some cases, a module, program, or engine may be instantiated via logic machine 902 executing instructions held by storage machine 904. It will be understood that different modules, programs, and/or engines may be instantiated from the same application, service, code block, object, library, routine, API, function, etc. Likewise, the same module, program, and/or engine may be instantiated by different applications, services, code blocks, objects, routines, APIs, functions, etc. The terms “module,” “program,” and “engine” may encompass individual or groups of executable files, data files, libraries, drivers, scripts, database records, etc.

It will be appreciated that a “service”, as used herein, is an application program executable across multiple user sessions. A service may be available to one or more system components, programs, and/or other services. In some implementations, a service may run on one or more server-computing devices.

When included, display subsystem 906 may be used to present a visual representation of data held by storage machine 904. This visual representation may take the form of a graphical user interface (GUI). As the herein described methods and processes change the data held by the storage machine, and thus transform the state of the storage machine, the state of display subsystem 906 may likewise be transformed to visually represent changes in the underlying data. Display subsystem 906 may include one or more display devices utilizing virtually any type of technology. Such display devices may be combined with logic machine 902 and/or storage machine 904 in a shared enclosure, or such display devices may be peripheral display devices.

When included, input subsystem 908 may comprise or interface with one or more user-input devices such as a keyboard, mouse, touch screen, or game controller. In some embodiments, the input subsystem may comprise or interface with selected natural user input (NUI) componentry. Such componentry may be integrated or peripheral, and the transduction and/or processing of input actions may be handled on- or off-board. Example NUI componentry may include a microphone for speech and/or voice recognition; an infrared, color, stereoscopic, and/or depth camera (e.g., depth camera 200) for machine vision and/or gesture recognition; a head tracker, eye tracker, accelerometer, and/or gyroscope for motion detection and/or intent recognition; as well as electric-field sensing componentry for assessing brain activity.

When included, communication subsystem 910 may be configured to communicatively couple computing system 900 with one or more other computing devices, such as remote computing system 914. Remote computing system 914 may comprise, e.g., a cloud computing system, an enterprise system, or a networked PC, as examples. Communication subsystem 910 may include wired and/or wireless communication devices compatible with one or more different communication protocols. As non-limiting examples, the communication subsystem may be configured for communication via a wireless telephone network, or a wired or wireless local- or wide-area network. In some embodiments, the communication subsystem may allow computing system 900 to send and/or receive messages to and/or from other devices via a network such as the Internet.

Another example provides a time-of-flight camera, comprising a time-of-flight depth image sensor, a logic machine, a communication subsystem, and a storage machine holding instructions executable by the logic machine to process time-of-flight image data acquired by the time-of-flight depth image sensor by, prior to denoising, performing phase unwrapping pixel-wise on the image data to obtain coarse depth image data comprising depth values, and send the coarse depth image data and the active brightness image data to a remote computing system via the communication subsystem for denoising. In some such examples, the instructions are further executable to compress the coarse depth image and the active brightness image before sending the coarse depth image and the active brightness image to the remote computing system. Additionally or alternatively, in some examples the instructions are further executable to, prior to denoising, perform a pixel-wise signal calibration correction. Additionally or alternatively, in some examples the time-of-flight camera further comprises a device incorporating the time-of-flight camera, and the remote computing system is remote from the device incorporating the time-of-flight camera. Additionally or alternatively, in some examples the instructions are further executable to segment a coarse depth image including the coarse depth image data, and send the coarse depth image data to the remote computing system after segmenting. Additionally or alternatively, in some examples the instructions are further executable to process the time-of-flight image data by, prior to denoising, performing active brightness averaging pixel-wise on the time-of-flight image data to obtain active brightness image data.

Another example provides a computing system comprising a logic machine, a communication subsystem, and a storage machine holding instructions executable by the logic machine to receive coarse depth image data from a remote device comprising a time-of-flight camera, apply a spatial denoising filter comprising a convolution kernel to form denoised depth image data, and output the denoised depth image data. In some such examples, the instructions are further executable to receive active brightness image data from the remote device, and denoise the coarse depth image data based at least on the active brightness image data. Additionally or alternatively, in some examples the instructions are further executable to segment the coarse depth image data based upon a threshold signal to noise ratio. Additionally or alternatively, in some examples the instructions are executable to perform denoising on coarse depth image data with lower signal to noise ratios using a relatively larger convolution kernel, and perform denoising on coarse depth image data with higher signal to noise ratios using a relatively smaller convolution kernel. Additionally or alternatively, in some examples the convolution kernel comprises a size in a range of 3×3 and 19×19. Additionally or alternatively, in some examples the denoised depth image data is output to the remote device. Additionally or alternatively, in some examples the instructions are further executable to denoise the coarse depth image data using temporal filtering based on prior stored coarse depth image data. Additionally or alternatively, in some examples the temporal filtering is performed based upon 3-7 previously-received coarse depth images.

Another example provides a method enacted on a time-of-flight camera, the method comprising receiving time-of-flight image data from a time-of-flight depth image sensor of the time-of-flight camera, prior to denoising, performing active brightness averaging pixel-wise on the time-of-flight image data to obtain active brightness image data, prior to denoising, performing phase unwrapping pixel-wise on the time-of-flight image data to obtain coarse depth image data, and sending the coarse depth image data and the active brightness image data to a remote computing system via a communication subsystem for denoising. In some such examples, the method further comprises compressing the coarse depth image data and the active brightness image data before sending the coarse depth image data and the active brightness image data to the remote system. Additionally or alternatively, in some examples the method further comprises, prior to denoising, performing a signal calibration correction pixel-wise on the time-of-flight image data. Additionally or alternatively, in some examples the remote computing system is remote from a device incorporating the time-of-flight camera. Additionally or alternatively, in some examples the time-of-flight camera further comprises segmenting the coarse depth image data. Additionally or alternatively, in some examples the method further comprises receiving, at a device incorporating the time-of-flight camera, denoised depth image data and denoised active brightness image data from the remote system.

It will be understood that the configurations and/or approaches described herein are exemplary in nature, and that these specific embodiments or examples are not to be considered in a limiting sense, because numerous variations are possible. The specific routines or methods described herein may represent one or more of any number of processing strategies. As such, various acts illustrated and/or described may be performed in the sequence illustrated and/or described, in other sequences, in parallel, or omitted. Likewise, the order of the above-described processes may be changed.

The subject matter of the present disclosure includes all novel and non-obvious combinations and sub-combinations of the various processes, systems and configurations, and other features, functions, acts, and/or properties disclosed herein, as well as any and all equivalents thereof.

文章《Microsoft Patent | Distributed depth data processing》首发于Nweon Patent

]]>
Microsoft Patent | Mixed-reality device positioning based on shared location https://patent.nweon.com/25193 Wed, 30 Nov 2022 19:18:12 +0000 https://patent.nweon.com/?p=25193 ...

文章《Microsoft Patent | Mixed-reality device positioning based on shared location》首发于Nweon Patent

]]>
Patent: Mixed-reality device positioning based on shared location

Patent PDF: 加入映维网会员获取

Publication Number: 20220383539

Publication Date: 20221201

Assignee: Microsoft Technology Licensing, Llc (Redmond, Wa, Us)

Abstract

Techniques and systems are provided for positioning mixed-reality devices within mixed-reality environments. The devices, which are configured to perform inside out tracking, transition between position tracking states in mixed-reality environments and utilize positional information from other inside out tracking devices that share the mixed-reality environments to identify/update positioning of the devices when they become disoriented within the environments and without requiring an extensive or full scan and comparison/matching of feature points that are detectable by the devices with mapped feature points of the maps associated with the mixed-reality environments. Such techniques can conserve processing and power consumption that would be required when performing a full or extensive scan and comparison of matching feature points. Such techniques can also enhance the accuracy and speed of positioning mixed-reality devices.

Claims

What is claimed is:

Description

BACKGROUND

Mixed-reality (MR) systems, including virtual-reality (VR) and augmented-reality (AR) systems, have received significant attention because of their ability to create truly unique experiences for their users.

For reference, conventional VR systems create completely immersive experiences by restricting their users’ views to only virtual environments. This is often achieved through the use of a head mounted device (HMD) that completely blocks any view of the real world. As a result, a user is entirely immersed within the virtual environment, such that the user can only see virtual imagery rendered by their VR device. Some VR devices, however, are also configured to render actual or replicated passthrough images of the real word to their users, concurrently with their generated virtual imagery, such that the users may feel as though they are viewing the real world through their VR devices, along with the VR generated imagery.

In contrast, conventional AR systems create an augmented-reality experience by visually presenting virtual objects, referred to as holograms, to users within the users’ actual view of the real world. The AR holograms can be projected to the users, for example, on specialized lenses that render the hologram imagery while the users concurrently look through the lenses to see the real world.

As used herein, VR and AR systems are described and referenced interchangeably. Unless stated otherwise, the descriptions herein apply equally to all types of MR systems, which (as detailed above) include AR systems, VR reality systems, and/or any other similar system capable of displaying virtual content.

Sometimes, a plurality of HMDs and/or other mixed-reality devices are used concurrently and cooperatively within a shared mixed-reality environment to facilitate collaborative work, entertainment, and other joint activities. Whether these devices are used alone, or in combination, it is critically important for the mixed-reality devices to continually track their relative locations within the mixed-reality environments, so that the holograms and other virtual imagery is positioned properly for the users within the mixed-reality environment. Unfortunately, it can sometimes be difficult for mixed-reality devices to properly identify their positions within the environments they are being used. This may occur, for example, due to interference, processing glitches, poor visibility, motion irregularities, and so on.

Some devices, referred to inside out tracking devices use internal camera sensors to capture images of the real world to identify the real-world environment where they are located, as well as the relative location of the devices within that real world environment. These cameras may include, for example, traditional cameras, low light cameras, thermal imaging cameras, UV cameras and other cameras that are capable of detecting different features within the environment.

Even more particularly, camera sensors are used by the systems to capture images of the environment that can be used to generate depth maps of the environment to assess the relative position of the devices within the environment by correlating calculated depths of the device from detected feature points with the known markers, anchors, feature points and other location landmarks of the known and mapped environment. However, due to poor visibility or other imaging conditions, such as a lack of textured surfaces or edges having unique feature points, it can sometimes be difficult for inside out tracking devices to map new environments and/or to identify their relative locations with known mapped environments.

Some devices also rely on other sensors, such as GPS (Global Positioning System) sensors to obtain location information from dedicated positioning systems in communication with the devices, to determine the devices’ locations within the real world. However, poor communications with the dedicated positioning systems, due to network connectivity problems and/or interference, can sometimes prevent these types of devices from calculating or updating their positions based on GPS data.

Some devices may also use motion sensors, such as gravitometers, accelerometers and gyroscopes to estimate relative movement from a first known position to a new estimated position based on movement of the devices. However, jarring movements of the devices can prevent the devices from estimating their positions accurately, as the devices sometimes have difficulty assessing relative changes in position in response to such extreme changes in momentum. Additionally, even without extreme movements, some devices can still have difficulty determining and updating their positions accurately due to irregularities and inconsistencies in the monitored sensor data, particularly since even the smallest errors in estimation can become magnified during the iterative interpolations required to estimate positioning with such sensor data.

It will be appreciated that when mixed-reality devices are incapable of properly identifying their positioning within the real world, the resulting experience for the user can be very unsatisfactory, as the holograms and other virtual imagery of the generated virtual environment will not be properly aligned with the real-world environment. These problems are made even worse when the user is using multiple devices, such as peripheral devices to interact with virtual objects, particularly when these peripheral devices have estimated locations and orientations that are not co-aligned or positioned with the relative positioning that is determined for the HMD and/or other mixed-reality devices that image the mixed-reality environment. In these circumstances, the resulting interactions of the peripheral devices the with virtual objects will be inconsistent with the intended and expected results for the users that are immersed within the mixed-reality environments.

The problems associated with inconsistencies and inaccuracies in positioning mixed-reality devices are particularly evident when multiple different users are each using different devices within shared mixed-reality environments. Without proper positioning, it can be difficult to facilitate the desired collaborative work, entertainment, and other activities that rely on coordinated positioning.

When devices become disoriented or otherwise lose track of their specific positioning within a particular environment, the devices expend significant computational and power resources to reposition themselves within the environments. This expense may include, for example, generating and/or accessing the depth maps for the environments and identifying and matching imaged feature points with all potential matching sets of feature points in the associated map. When the map is very large, e.g., multiple Gigabytes, there may be several potential matching locations where the device could be positioned within the identified map(s). It takes significant processing to narrow the feature sets down to a particular location that is most certain. This processing is also a significant drain on battery power.

Accordingly, there is an ongoing need and desire for improving positioning of devices within the real world, particularly for mixed-reality devices. The subject matter claimed herein is not limited to embodiments that solve any disadvantages or that operate only in environments such as those described above. Rather, this background is only provided to illustrate one exemplary technology area where some embodiments described herein may be practiced.

BRIEF SUMMARY

Systems, devices, and methods are configured for positioning mixed-reality devices in shared mixed-reality environments.

Some mixed-reality devices are configured to perform inside out tracking and to perform positional tracking of the devices within the mixed-reality environments while transitioning between different position tracking states associated with different probabilities of positional certainty or accuracy. The devices are configured to utilize positional information from other inside out tracking devices that share the mixed-reality environments and to identify/update the positioning of the devices when they become disoriented within the environments and without requiring an extensive or full scan and comparison/matching of feature points that are detectable by the devices with mapped feature points of the maps associated with the mixed-reality environments. While not required, such techniques can conserve processing and power consumption that would otherwise be required when performing a full or extensive scan and comparison of matching feature points. Such techniques, while also not required, can also be used to enhance the accuracy and speed of positioning mixed-reality devices in shared mixed-reality environments.

Some disclosed devices include mixed-reality devices that are configured to determine positioning based on sensor data (e.g., image data) captured by the devices. Such devices include one or more processors and one or more camera sensors configured to image data within the environments where the devices are located. Such devices also include one or more computer-readable hardware storage devices that store instructions that are executable by the one or more processors to configure the devices to determine positioning of the device within the environment based at least in part based on positioning information obtained from a separate inside out tracking device within the same environments and that share common mixed-reality maps with the device(s).

The methods implemented by the devices include a method for positioning the devices that includes identifying a mixed-reality map corresponding with the environment, performing position tracking of the device within the environment, while in a first tracking state, to identify a relative position of the device within the mixed-reality map as the device moves within the environment in a first tracking state, and to detect an event associated with an interruption of the position tracking of the device during which the device transitions from a first tracking state to a second tracking state that is less certain than the first tracking state and that causes a reduced certainty of the relative position of the device within the environment and corresponding mixed-reality map.

Some disclosed methods also include obtaining positioning information from the separate inside out tracking device in the environment, the positioning information from the separate inside out tracking device identifying a relative position of the separate inside out tracking device inside of a sub-region of the mixed-reality map and that also indicates the device is within a same sub-region of the mixed-reality map as the separate inside out tracking device. Some methods also include obtaining one or more images with the one or more camera sensors and identifying one or more imaged features in the environment from the one or more images, as well as searching a particular sub-region of the mixed-reality map for a matching set of one or more matching features that match the one or more imaged features and while refraining searching other sub-regions of the mixed-reality map for the matching set of one more matching features and in a manner that conserves computational expense that would otherwise be associated with searching the other sub-regions of the mixed-reality map for the one or more matching features.

Finally, these methods also include determining a new position of the device within the sub-regions of the mixed-reality map based on finding the matching set of one or more matching features in the sub-region of the mixed-reality map and based on correlating a relative position of the device from the one or more imaged features and corresponding one or more matching features in the sub-region and resuming position tracking of the device based on the determined new position of the device.

Other methods additionally, or alternatively, include using the position information from the separate inside out tracking device and a known or estimated relative position of the device relative to the separate inside out tracking device to determine a new position of the device in the second tracking state and while by conserving resources by refraining from analyzing different portions of the mixed-reality map to identify a most likely location of the device within the mixed-reality map based on sensor data obtained by the device independently of the separate inside out tracking device.

Yet other methods additionally, or alternatively, include determining a probability valuation associated with probability that the device is within the sub-region of the mixed-reality map based on the searching and receiving position information from a second device comprising a separate probability valuation that the second device, which is a separate inside out tracking device, is within a particular location of the mixed-reality map, as well as determining a new position of the device within the mixed-reality map based on the position information from the second device and the probability valuation of the device.

This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.

Additional features and advantages will be set forth in the description which follows, and in part will be obvious from the description or may be learned by the practice of the teachings herein. Features and advantages of the invention may be realized and obtained by means of the instruments and combinations particularly pointed out in the appended claims. Features of the present invention will become more fully apparent from the following description and appended claims or may be learned by the practice of the invention as set forth hereinafter.

BRIEF DESCRIPTION OF THE DRAWINGS

In order to describe the manner in which the above-recited and other advantages and features can be obtained, a more particular description of the subject matter briefly described above will be rendered by reference to specific embodiments which are illustrated in the appended drawings. Understanding that these drawings depict only typical embodiments and are not therefore to be considered to be limiting in scope, embodiments will be described and explained with additional specificity and detail through the use of the accompanying drawings in which:

FIG. 1 illustrates an example mixed-reality device, which is illustrated as a Head Mounted Device (HMD).

FIG. 2 illustrates various example use cases in which a mixed-reality device may be used to navigate through an environment and mixed-reality environment.

FIG. 3 illustrates a use scenario in which a mixed-reality device identifies feature points in the mixed-reality environment where it is being used.

FIG. 4 illustrates a gaming environment in which mixed-reality devices are being used.

FIG. 5 illustrates another gaming environment in which mixed-reality devices are being used.

FIG. 6 illustrates how separate mixed-reality devices can be used in a cooperative manner with a shared mixed-reality environment.

FIG. 7 illustrates various example use cases in which a mixed-reality device may be positioned relative to feature points that are identified by the mixed-reality device and that may match corresponding feature points in different sub-regions of a mixed-reality map.

FIG. 8 illustrates a flow diagram with various acts associated with positioning mixed-reality devices.

FIG. 9 illustrates another gaming environment in which a plurality of mixed-reality devices are being used and that share a common mixed-reality environment.

FIG. 10 illustrates various example use cases in which a mixed-reality device may be positioned relative to other mixed-reality devices in a shared mixed-reality environment and/or based on detected and feature points associated with in the mixed-reality environment and corresponding mixed-reality map.

FIG. 11 illustrates a flow diagram with various acts associated with positioning mixed-reality devices.

FIG. 12 illustrates additional example computer systems and components that may include and/or be used to perform aspects of the disclosed invention.

DETAILED DESCRIPTION

As mentioned above, disclosed embodiments include systems, devices and methods configured for positioning mixed-reality devices in shared mixed-reality environments.

The mixed-reality devices are configured to perform inside out tracking and to perform positional tracking of the devices within mixed-reality environments while transitioning between different position tracking states associated with different probabilities of positional certainty or accuracy, such as, for example, due to various environmental conditions that affect the ability of the devices to obtain, verify or process sensor and location data.

As disclosed herein, when a device becomes disoriented and transitions from a state of positional certainty to a state of positional uncertainty, the device is configured to utilize positional information from other inside out tracking devices that share the same mixed-reality environment with the disoriented device and to identify/update its positioning based on this information without requiring an extensive or full scan of a mapped environment and comparison of feature points that are detectable by the devices with mapped feature points of the maps associated with the mixed-reality environment.

It will be appreciated that the technical benefits associated with the disclosed embodiments include the ability to conserve processing and power consumption that would otherwise be required when performing a full or extensive scan and comparison of matching feature points. Disclosed techniques can also be used to enhance the accuracy and speed of positioning mixed-reality devices in shared mixed-reality environments.

Example MR Systems and HMDs

Attention will now be directed to FIG. 1, which illustrates an example of a mixed-reality (MR) system/device 100A comprising a head-mounted device (HMD) 100. It will be appreciated that HMD 100 can be any type of MR system/device 100A, including a VR system 100B or an AR system 100C.

The mixed-reality system(s) 100A, as described herein, include primary devices that render the mixed-reality environment to the users, as well as peripheral devices that comprise controllers for interacting with the virtual objects in the shared/common mixed-reality environment and application instances.

In some scenarios, such as when multiple HMDs are used in a shared mixed-reality environment and application instance, one HMD may be referred to as a first or primary device and the other HMDs may be referred to as a secondary or peripheral device.

It should be noted that while a substantial portion of this disclosure is focused on the use of an HMD and corresponding peripheral devices (e.g., controllers) used in coordination with an HMD, the embodiments are not limited to being practiced using only an HMD systems. That is, any type of scanning and imaging system can be used, even systems entirely removed or separate from an HMD to perform the functionality described herein. Accordingly, the disclosed principles should be interpreted broadly to encompass any type of mixed-reality devices. Some embodiments may even refrain from actively using a scanning/imaging device themselves and may simply use the data generated by a shared scanning/imaging device. For instance, some embodiments may at least be partially practiced in a cloud computing environment where resources and components are shared.

HMD 100 is currently shown as including scanning sensor(s) 110 (i.e., a type of scanning or camera system, such as one or more visible light camera(s), low light camera(s), thermal imaging camera(s), potentially ultraviolet (UV) camera(s), and dot illuminator(s) or other cameras), which include corresponding processors for processing the captured images.

The HMD 100 is configured to use the scanning sensor(s) 110 and corresponding processor(s) 120 to scan environments, map environments, capture environmental data, detect features in the environment, determine depth from detected features in the environment, generate pose data, and/or generate any kind of images of the environment (e.g., by generating a 3D representation of the environment). Scanning sensor(s) 110 may comprise any number or any type of scanning devices, without limit.

Accordingly, the disclosed embodiments may be structured to utilize numerous different camera types. The different camera types include, but are not limited to, visible light cameras, low light cameras, thermal imaging cameras, and UV cameras. Stereo depth matching may be performed using images generated from any one type or combination of types of the above listed camera types. Images or image content generated by the scanning sensor(s) 110 may then be displayed on the display 110 of the HMD 100 for the user to view and interact with, along with one more virtual objects rendered by the mixed-reality device(s) within the same shared environment(s) on the display(s) 130 of the device(s).

Motion sensor(s) 140, such as accelerometers, gravitometers, gyroscopes, and other motion sensors 140 (e.g., IMU (inertial movement unit) devices) and corresponding processor(s) 120 detect and measure sensor data (e.g., IMU data) reflecting detected motion of the device and to estimate and interpolate positioning of devices based on the measured motion relative to the previously known position(s) of the device.

Other sensors 150, such as global positioning system (GPS) sensors, magnetometers, acoustic sensors, and other sensors are also provided with corresponding processor(s) 120 for enabling the mixed-reality devices to determine positioning of the devices. This positioning may include measured and estimated location and/or orientation positioning information relative to a measured sensor data, relative positioning to other objects and features in a known/shared environment, and/or based pm a previously known positioning information of the device.

The illustrated mixed-reality device(s) 100A also include storage 160, which stores executable instructions (e.g., code 170) that is executable by the hardware processor(s) 120 to implement the disclosed functionality. The storage also stores maps 180 of the mixed-reality environment that are described herein, as well as any of the other data that is referenced herein, such as sensor data, applications, interfaces, and objects used to render and utilize the disclosed mixed-reality environment.

Although not explicitly shown, the mixed-reality devices also include various communication interfaces and components for interfacing with and sharing information (e.g., maps and location information) between different mixed-reality devices and remote systems.

Attention will now be directed to FIG. 2, which illustrates a 2D map 210 and a 3D map 220 through which a user is navigating a corresponding path, respectively, 215 and 225. The maps 210/220 are mixed-reality maps that are used during the execution of a mixed-reality application to render holograms and other virtual content to a user wearing a mixed-reality device (e.g., HMD 200A) in a mixed-reality environment. The term mixed reality environment refers to any combination of virtual content with real world content and a real-world environment. In some instances, the term mixed-reality environment corresponds with a separate mixed-reality map of a real or virtual environment and features in the environment that can be virtualized relative to corresponding or different features in a real-world environment. It some instances, the term mixed-reality environment is used interchangeably with the corresponding mixed-reality map that contains any combination of real and/or virtual objects and features that are mapped with relative positions to other mapped/known objects within the mixed-reality environment/map.

With regard to the 2D and 3D maps 210 and 220, it will be appreciated that the devices disclosed herein may generate the maps by capturing and stitching together images from the sensors/cameras of the devices as the devices navigate a path (e.g., 215/225) through an environment. These maps may be supplemented with virtual content and/or they may be virtualized to render the mixed-reality environment corresponding with the maps to the users of the MR devices. Alternatively, or additionally, the devices may access and download maps of an environment, which are used in the mixed-reality applications. The mixed-reality maps may render borders/walls that exist or that do not really exist in the real world, but which are rendered in the mixed-reality environment along with other virtual objects/holograms.

As mentioned previously, it is critical that the mixed-reality devices are positioned properly within the mixed reality environment where they operate, particularly as they are moved around in a mixed-reality environment, irrespective of whether the real world has corresponding borders/walls or other features. Otherwise, the virtual objects of the mixed-reality environment will not be properly aligned with their intended positioning relative to the user and/or real world environment and this can result in unexpected and undesired consequences (e.g., interactions with virtual objects are not executed or executed in unexpected ways, users can become disoriented within a virtual map and/or collide with objects in the real worlds while navigating/traversing a path through the mixed-reality environment, and so forth).

During use, the mixed-reality devices and/or any remote servers they communicate with may continuously monitor and update the location of the MR devices relative to the map and mapped features of the mixed-reality environment (and corresponding real world), to ensure all virtual objects are properly positioned relative to the user/devices and real-world objects within the mixed-reality environments in the intended manner.

To enhance the user experience, the positioning of the device may occur multiple times a second so that updated positioning appears smooth and as expected while the user moves within a mixed-reality environment. However, sometimes, the sensor devices used to perform positioning of the devices becomes unavailable. For instance, a GPS sensor may become unusable when the device moves into a satellite obstructed or other GPS denied environment where the sensor is unable to communicate with the GPS satellites/systems. Likewise, imaging sensors may become unusable for identifying environmental features to position the device within the environment when lighting becomes too dark or objects in the environment are obscured. Additionally, certain surfaces and environments may not include many edges, objects or other unique features that are easy to detect with the imaging sensors.

FIG. 3, for example, shows an environment 310 in which a device is scanning a room that is part of a mapped mixed-reality environment and a corresponding mixed-reality map 320. During use, the device 300A uses cameras to scan/image the environment to identify features or feature points that are detectable with the device cameras/sensors to position the device. In the present illustration, various feature points 330, shown as dark dots, are detected by the device. Many of these feature points 330, such as the feature points 330 positioned between the walls and the floor, however, are not unique and could correspond to almost any room (sub-region) of the map 320, as well as to different locations in each room. In this regard, these feature points 330 may not be usable to position the device within the map unless the device was already generally aware of where it was.

When a device becomes disoriented, due to various positioning process glitches and/or interruptions in the processing routines, existing devices will attempt to correlate the detected feature set with all matching feature sets in the map to determine the relative position (e.g., location of the device within the map 320, as well as the relative orientation/pose 360 of the device). This exhaustive processing is computationally expensive and can undesirably consume scarce battery power.

When the detected set of features include unique feature points, such as the feature points 335 of the shelf 340, it may make the resulting correlation/matching of detected feature points with the mapped feature points of the map 320 more certain. However, it does not always make it more efficient, particularly if the system still performs a full comparison of the feature sets against all possible options in the entire map 320.

Existing systems and devices can help mitigate such consequences by relying on supplemental information from another device in the same shared mixed-reality environment, as described herein.

Attention is now directed to FIG. 4, which illustrates a mixed-reality environment 400 in which a user is wearing an HMD 410 and carrying a separate peripheral device 420 comprising a controller that operates as a painting device (e.g., for painting holograms) or as a capture device (e.g., for capturing holograms) in a virtual game. Both of the HMD and peripheral devices are separately scanning the environment to position and update the positioning of the devices within the environment properly. In particular, the HMD is using camera sensors (not shown) to make one or more camera scans 460 of the environment and to identify features in the environment that can be correlated with corresponding features in a mixed-reality map to position the HMD 410 within the mixed reality environment. Likewise, the peripheral mixed-reality device 420 is making external camera scans 450 with its external camera(s) 430 to identify its relative location within the environment 400.

Sometimes, as mentioned, one of the devices may lose its bearing and become disoriented within the mixed-reality environment for any number of reasons. In such circumstances, either one of the mixed-reality devices (which share the common mixed-reality environment) may utilize information from the other device (e.g., the HMD 410 or the Peripheral 420) to help ascertain its position within the mixed-reality environment and to help limit the range (e.g., sub-regions) of the mixed-reality map that must be evaluated when considering where the disoriented device is actually positioned within the mixed-reality map/environment.

Attention is now directed to FIG. 5, which illustrates another mixed-reality environment 500 in which a user is wearing an HMD 510 and carrying a separate peripheral device 520, comprising a controller for interacting with a hologram 530 in the mixed-reality environment 500. Both of the HMD and peripheral devices are separately scanning the environment to position and update the positioning of the devices within the environment properly, as previously described. In such instances, it is critical that the devices are both properly positioned within the environment 500. Otherwise, it may prevent the user from seeing and/or interacting with the hologram 530 in a desired and expected manner.

This is even more evident from the illustrations shown in FIG. 6. In this example, an HMD 600 is projecting a hologram target 610 to a user within the user’s/HMD field of view 650 of the mixed-reality environment. The target 610 may be rendered on a display of the HMD, for example, corresponding directly with the determined positioning of the HMD within the environment. This target 610 may be an isolated hologram that is untethered to a real-world object, such as the dragon hologram 530. This target 610 may also be displayed on one or more real world objects, such as a wall or a user (e.g., such as in the multiplayer scenario of FIG. 9). Accordingly, it is important the HMD 600 is properly positioned within a corresponding mixed-reality environment and corresponding map 660, which may correspond to and be aligned with either fixed or moving real world objects in the mixed reality environment.

Likewise, the user’s peripheral controller (Peripheral MR Device 630), comprising a controller for controlling or interacting with the target 610 (e.g., the hologram 530 of FIG. 5) must also be properly positioned within the environment that the HMD is located within. Otherwise, the peripheral controller will not be aligned with the target, which has a field of view 650 and alignment with the target in the current position, based on the HMD positioning (location in the mapped environment and orientation/pose 670), and it may not operate as intended when interacting with the hologram 530/target 610.

As described herein, if either of the devices loses its positioning within the environment, such as if the peripheral MR device camera 640 is not working or imaging properly, the peripheral MR device 630 may rely on information from the HMD to help position the peripheral MR device 630 within the mixed-reality environment by evaluating only a sub-region of the mapped environment and without requiring the imaging/scanning of an entire map of the mixed-reality environment to ascertain its position based on matching feature points or other corresponding features.

Attention will now be directed to FIG. 7, which illustrates a mixed-reality environment/map 700 comprising a 2D map in which a user is wearing an HMD 710 and holding a peripheral 720. The user and user’s MR devices are positioned in a particular sub-region of the map, namely, a particular room of the multi-room map.

Both of the MR devices (HMD 710 and peripheral 720) are inside out tracking/positioning devices, meaning that they both have independent sensors (e.g., cameras) for scanning the environment and are independently capable of finding feature points or other features within the scanned imagery and to correlate the scanned/detected feature points 750 with one or more sets of matching mapped feature points 760 of the mapped environment, which feature points include matching mapped feature points 760 (as shown in the upper right corner of the image) and which align directly with the scanned/detected feature points 750.

If the peripheral device loses its positioning in the mixed-reality environment/map 700, it may scan the scanned/detected feature points 750 in its current location and try to determine where it is in the mixed-reality environment/map 700. To do this, it may compare the scanned/detected feature points 750 to all sets of matching mapped feature points 760 throughout the mapped environment that correspond to possible locations & orientations of the peripheral 720 (namely possible location and orientations A, B, C, D, as well as actual location & orientation X). It may track all these possible locations until it receives/detects additional information that narrows the scope of possible locations. The processing to evaluate and track each of these possible locations is computationally expensive and can be made more efficient by relying on positioning information from the HMD 710 that is sharing the same mixed-reality environment, and which has a known proximity to the peripheral 720.

By way of example, the HMD can be known to be within a fixed radius/distance from the peripheral, based on known use patterns and settings associated with the devices. The HMD may also have uninterrupted tracking and/or have more certainty about its positioning based on additional feature points that it is able to scan and that are unique. If the HMD knows its general location (e.g., a particular sub-region of the mixed-reality environment/map 700), it can notify the peripheral device in either a push or pull scheme so that the peripheral device may be aware it is in a generally similar portion of the mixed-reality environment/map 700 (e.g., a particular sub-region of the map). In this instance, the sub-region may be a particular room wing, branch, or other identifiable region of a map. Then, the peripheral need not evaluate and compare the matching mapped feature points to all matching mapped feature points in the mapped environment. Instead, it may limit its analysis to only the sub-region where the HMD 710 is located, based on the shared position information from the HMD 710, and so as to refrain from considering the matching mapped feature points 760 in all of the other sub-regions of the mapped environment, thus saving computational processing and power resources. In such embodiments a device may evaluate a sub-region of a map containing relatively little data (<10, 20, 30, 40, 50, 60, 70, 80, 90, 100 MB of data), for example, and without having to evaluate an entire map or multiple sub-regions of a map that contain relatively more data (>100+MB of data, or even many GB of data).

These principals are further reflected in the flow diagram 800 of FIG. 8, which illustrates various acts associated with positioning a mixed-reality device within a mixed-reality environment based on positioning information from another mixed-reality device that shares the same mixed-reality environment/map, and which may be implemented by the disclosed systems and devices described herein. As described below, the various acts can be performed by a single MR device (also referred to as system). Additionally, or alternatively, the acts can be performed by and/or controlled by a server that causes one or more of the acts to be performed by instructions sent from the server to the referenced first and/or second devices.

As illustrated, the first act is an act of identifying a mixed-reality map corresponding with the environment (act 810). This act may be performed by a first MR device generating a map from scanned images, by updating a stored map with newly scanned images, or appending a map with scanned images, wherein the images are obtained by the first MR device or from a remote device. This act may also be performed by accessing and downloading a map from a remote device, such as a remote server or third-party device and/or from a device that is sharing the mixed-reality environment/map with the first MR device.

Next, the first device performs position tracking of its location and/or orientation of the first device within a mixed-reality environment/map, based on detected sensor data of the first device and while performing the position tracking in a first position tracking state (act 820). This first state is a state of high confidence or probability of accuracy. This first state may be based on supplemental information from third party sensors such as GPS sensors and it may also be based alternatively, or additionally, based on motion sensor data detected by the first device. The positioning or position tracking is performed while the device exists and/or moves within the mixed-reality environment in the first tracking state.

Then, at some point, a triggering event is detected (act 830) that is associated with an interruption of the position tracking of the device and/or a transition of the device from the first tracking state (with high confidence of probable accuracy) to a second tracking state (of lower confidence of probable accuracy) and that has a lower probability of accuracy than the first tracking state and that causes a reduced certainty of the relative position of the device within the environment and corresponding mixed-reality map than when the device operated in the first tracking state.

In response, the device obtains positioning information from a separate/second MR tracking device (e.g., inside out tracking MR device) that is sharing the same mixed-reality environment and/or mixed-reality application instance as the first MR device (act 840). Notably, the positioning information from the separate/second device identifies a relative position of the separate/second device inside of a sub-region of the mixed-reality environment/map.

In some instances, the shared position information also indicates the first device is within a same sub-region of the mixed-reality environment/map as the separate/second device. Other position information can also be used to make this determination, such as previously obtained position information that reflects the first device is used/present within a predetermined position and/or maintains a relatively similar and close position to the second device during use (act 835). This other position information may specifically identify the relative location and/or orientation of the first device relative to the second device during normal use, which may be a predetermined relative position and/or orientation and or historically tracked use that is stored and reviewed by the device.

In some instances, the first device also obtains feature information for the environment 850, specifically the sub-region that has been identified. This may occur, for instance, by identifying features from a last known location in the mixed-reality map/environment and/or by scanning new images in the environment with camera sensors and identifying features in the images, such as feature points or known objects, using object recognition.

The device also searches the a particular sub-region of the mixed-reality map for a matching set of one or more matching features (e.g., feature points 335, 750, 330, others) or objects (e.g., shelf 340) that match the one or more imaged features or objects and while refraining searching other sub-regions of the mixed-reality map for the matching set of one more matching features or objects and in a manner that conserves computational expense that would otherwise be associated with searching the other sub-regions of the mixed-reality map for the one or more matching features or objects (act 860).

Then, the device determines a new position of the device within the sub-regions of the mixed-reality map based on finding the matching set of one or more matching features in the sub-region of the mixed-reality map and based on correlating a relative position of the device from the one or more imaged features and corresponding one or more matching features in the sub-region (act 870). At this point, the probable location and certainty of position of the device within the mixed-reality environment/map may be greater than the second tracking state probability and even the first tracking state probability and while conserving processing resources by not requiring (and actually refraining from performing) a full analysis of the entire mapped environment for matching features/objects that are identified by the device.

Using the new positioning of the device, the device may then resume position tracking of the device in the mixed-reality environment based on detecting new sensor data. It will be appreciated, in this regard, that the determined new positing and resumed position tracking may comprise any combination of location and/or orientation positioning of the device within the mixed-reality environment.

In some instances, the first device is a peripheral MR device that is a controller that shares the mixed-reality map and a corresponding mixed-reality application instance with an HMD (the second device), the controller being operable to interact with one or more interactive virtual objects rendered to a user through the HMD.

In other instances, the device first device is a first HMD worn by a first user and the separate/second device comprises a second HMD worn by a second user, the first and second HMDs rendering one or more common virtual objects in a shared application.

In some instances, the triggering event for transitioning from the first tracking state to the second tracking state is an occurrence of the first device entering a global positioning system (GPS) denied environment or sub-region (such that the GPS positioning is used in the first tracking state but not the second tracking state), or an instance in which the first device loses the ability to image the environment due to environmental or processing conditions affecting the first device.

Although not required, the shared mixed-reality map may contain at least one matching set of the one more matching features in the other sub-regions of the mixed-reality map that are omitted from the search performed by the device based on the positioning information from the separate/second device. In particular the positioning information from the second device is used, in some instances, to filter the search of the mixed-reality map to only the sub-region that excludes the other sub-regions and to cause the device to refrain from searching the other sub-regions for the matching features (which may be feature points, landmarks, objects or other identifiable elements of the mapped environment and that are detectable in images taken by the first and or second devices).

It will be appreciated that aspects of the foregoing methods can also be performed independently from analyzing an actual mixed-reality map to identify a location of a device that has become disoriented.

The methods and systems of the invention, for example, are configured to help a device become re-oriented and to identify its position based on shared information from another device that it has a known positional relationship with. In these instances, a first device will perform position tracking within a particular environment to identify its relative position within the environment as the device moves throughout the environment. The device performs the initial position tracking, in a first state, using any combination of the positioning data described herein (e.g., scanning sensor data, motion sensor data and other sensor data, such as, but not limited to GPS and IMU sensor data). The device also identifies a relative position of the device relative to a separate device that shares the environment with the device. This relative position can be a fixed and known relative position based on tracked historical usage, a most recently measured/identified positional relationship, or based on user input that specifies the relative relationship/positioning.

Then, at some point, the device will detect an event associated with an interruption of the position tracking of the device, during which the device transitions from the first tracking state to a second tracking state that is less certain than the first tracking state and that causes a reduced certainty of the relative position of the device within the environment. This may occur, for example, in response to a loss of GPS signal or another interruption that affects the ability of the device to determine its location within the environment.

In this second state, the device will obtain positioning information from the separate/second device which is certain about where it is located/positioned in the environment. Then, the device can infer its own position (e.g., location and/or orientation) within the environment based on the positioning information obtained about/from the second device.

In particular, the device will use the position information from the separate device and the relative position of the device relative to the separate device to determine a new position of the device in the second tracking state and while by conserving resources by refraining from analyzing different portions of the environment to identify a most likely location of the device within the environment and by conserving resources trying to obtain GPS or other sensor data that it does not have access to.

In some instances, the device may also use newly obtained sensor data of the device (e.g., IMU or other motion sensor data or image data that is obtained in the second state) to refine/verify its positioning in the environment (sub-region of the environment). The device can then continue to monitor new sensor data (e.g., IMU data and other sensor data) to update its positioning based on relative movements from the newly determined position, which was determined based on shared data from the second device.

Attention will now be directed to FIGS. 9 and 10, which illustrate additional mixed-reality environments in which multiple users are using multiple corresponding mixed-reality devices and in which the device(s) share positioning information with other devices in the same shared mixed-reality environment which causes at least one of the devices to refrain from analyzing an entire shared mixed-reality environment/map when attempting to identify new positioning within the shared mixed-reality environment/map and in which the search for possible positions within the environment/map are restricted one or more sub-regions of the map identified by or based on shared positioning information from one or more of the other devices that share the mixed-reality environment.

By way of example, the mixed-reality environment 900 of FIG. 9, two users are visible, with a first user wearing a first HMD 910 and holding a peripheral device 920 (e.g., a hologram painting or capture controller). A second user is also wearing an HMD 930 and is holding a peripheral device 940 (e.g., another hologram painting or capture controller). The first peripheral device 920 may be known to be used within a predetermined distance and within a range of orientations relative to the HMD 910. Likewise, the second peripheral device 920 may be known to be used within a predetermined distance and within a range of orientations relative to the second HMD 930. This is important, as each peripheral device (e.g., 920 or 940), and each MR device, may selectively choose which HMD or other mixed-reality device(s) to obtain shared positioning information from to selectively determine which sub-region(s) of a shared mixed-reality map/environment to analyze when trying to position itself within the shared mixed-reality map/environment after transitioning from a first tracking state where it is very certain where it is located (with a high degree of probability based on a first set of sensor data) to a second tracking state where it is less certain where it is located.

In the current scenario, peripheral device 920 may rely on supplemental positioning information from HMD 910, rather than 930 to filter the shared map to the selective/filtered set of sub-regions to search for possible positioning of the peripheral device 920 if and/or when it transitions to a second position tracking state and/or in response to another triggering event. Likewise, peripheral device 940 may rely on supplemental positioning information from HMD 930, rather than 910, to filter the shared map to the selective/filtered set of sub-regions to search for possible positioning of the peripheral device 940 when transitioning to the second position tracking state or in response to another triggering event.

In contrast, HMD 910 may rely on supplemental positioning information from peripheral device 920, rather than peripheral device 940 to filter the shared map to the selective/filtered set of sub-regions to search for possible positioning of the HMD 910 if and/or when it transitions to the second position tracking state and/or in response to another triggering event. However, in these situations, if it is determined that the first and second user are on a same team and are commonly located in the same regions/sub-regions of a shared map, then the HMD 910 may additionally or alternatively rely on supplemental information from the HMD 930, peripheral device 940 and/or the peripheral device 920 when trying to re-position itself within the shared map when/if it transitions to the second position tracking state and/or in response to another triggering event.

In some instances, the triggering event is a determination that a period of time has passed, such as a few seconds, to trigger the verification of a probable positioning within a particular map/environment with a more certain verification/probability of positioning that is achievable in an efficient manner according to this disclosure by relying on supplemental information from the secondary/separate MR device(s) to scan/analyze selective and partial sub-regions of the map/environment. Another triggering event is a loss or lack of sensor data from a scanning, motion, or other location sensor. Yet another triggering event is a determination that a probability valuation for the device is below a predetermined threshold of probability or confidence/certainty, such as below a 95%, 90%, 85%, 80%, 75%, 70% or another confidence/probability of positioning within a shared mixed-reality environment.

Attention will now be directed to FIG. 10, which illustrates a mixed-reality environment/map 1000 in which a first HMD device 1010, a second HMD device 1020 and a third HMD device 1030 are located. The third HMD device 1030 is also associated with a peripheral device 1040 that is known to be within a predetermined position of the HMD device 1030 based on tracked and/or predetermined use attributes of the peripheral device 1040 relative to the HMD device 1030.

In this scenario, each of the HMD devices may be uncertain as to their exact locations within the mixed-reality environment/map 1000. Such an occurrence may result, for example, from a triggering event in which a game instance loads a new map for each of the different users to begin playing from. When the map is first loaded, they are not certain where they are in the shared map, as they have not scanned enough of the map to be certain. Other triggering events for situations in which multiple devices are uncertain about their locations can result from global application failures or glitches in location services that communicate with each of the devices.

Regardless of the triggering event, aspects of the disclosed invention can be used to facilitate each of the devices newly positioning themselves, repositioning themselves and/or increasing confidence in their estimated positioning by relying on positioning information from the other devices and, thereby, restricting subsequent searching in the mixed-reality map/environment to one or more limited sub-regions of the shared environment/map for matching features/objects that correspond to features/objects that each of the objects detect with their own sensors and without requiring a full exhaustive search through the entire map/environment for matching features/objects.

In this scenario, for example, the HMD 1010 may form a certain probability that it is in a certain sub-region of the shared map (a very low probability) as well as a probability that it is within a certain proximity to the HMD 1020 (a very high probability) based on detected/scanned HMD feature points 1080. The low probability of being in any particular sub-region of the map may be based on only identifying a few feature points 1080 that are likely to exist in a plurality or many other regions of the shared map.

In contrast, HMD 1020 may develop a very high probability it is in a particular sub-region of the map, knowing it has identified two door entrances, based on its scanned HMD feature points 1070 and since there is only a single room (sub-region), excluding a hall that has two perpendicularly facing door entrances. In fact, all occurrences of two perpendicular door openings occur only within the sub-region 1090 of the mixed-reality environment/map 1000. Accordingly, the HMD 1020 and HMD 1010 can limit their search for matching features/objects to the sub-region 1090 and without having to search the entire mixed-reality map for matching features/objects to the scanned feature points.

Likewise, if HMD 1030 can develop certainty or a certain probability valuation as to its relative location within the environment/map 1000 based on its scanned HMD feature points 1050, that information can be conveyed to peripheral 1040 when peripheral 1040 loses its bearings and needs to verify or update its positioning based on its scanned peripheral feature points 1060. This information can cause peripheral 1040 to restrict its search for matching objects/features in the map that correspond to the scanned peripheral feature points 1060 to only the selected sub-regions of the map 1000 that are visible from locations that are a predetermined distance from the HMD 1030 that the peripheral device 1040 is associated with.

If/when the HMD 1030 needs to update its positioning based on shared positioning information, it can also omit the room/sub-region 1005 in which HMD 1010 and 1020 are located since those devices can share positioning information to HMD 1030 that they are in a particular room/sub-region 1005, which is also a portion of the map where HMDs 1010 and 1020 are highly certain that HMD 1030 is not present in (due to their observations about the room 1005) and such that 1030 can omit the analysis and attempted matching of features/objects in room/sub-region 1005 from the other portions/sub-regions of the map 1000 that are analyzed during the processing of updating positioning for HMD 1030. This shared information will save processing resources when positioning the HMD 1030 in the environment.

The foregoing examples are even more evident in view if the referenced acts and methods referenced in the flow diagram 1100 of FIG. 11, in which MR devices are configured to share position information to assist other MR devices identify their positioning while refraining from analyzing unnecessary portions of a shared mixed-reality environment/map based on the shared position information. As described below, the various acts can be performed by a single MR device (also referred to as system). Additionally, or alternatively, the acts can be performed by and/or controlled by a server that causes one or more of the acts to be performed by instructions sent from the server to the referenced first and/or second devices.

The first illustrated act in the flow diagram 1100 includes an act of identifying a mixed-reality map corresponding with the shared environment/map in which a first and one or more second devices are located (act 1110). As noted before, this can be a scanned and generated map, or a shared map obtained from a remote and/or third-party system. In this example, a first device using the shared mixed-reality environment/map also identifies features within that environment (act 1120) using scanned information obtained by its sensors while in the environment.

Next, the first device determines a probability valuation associated with a probability that the first device is within a particular sub-region of the shared mixed-reality environment (act 1130). There are various techniques and processes that can be used to identify a probability valuation (value) associated with a probability that a device is in a certain position relative to a known map. This valuation can be based on various factors, such as uniqueness of detected features, quantity of detected features, image quality, consistency in scanned images, as well as many other factors. The valuation can also be based on shared position information from one or more other devices. The valuation can also be based on a size and complexity of the shared environment/map.

Once the valuation is determined (which may comprise a value according to any preferred scale and valuation scheme), that value/valuation information and corresponding position information used to form the valuation may be shared with one or more second devices that share the mixed-reality environment/map (act 1140). Likewise, the position information and/or probability valuation(s) formed by the second devices, relative to their positions in the shared environment/map, may also be shared with the first device (act 1150). This sharing of information may occur through a push or pull scheme, in response to a request from the first device and/or without a request from the first device.

The illustrated flow diagram 1100 also includes an act of determining positioning of the first device within a limited sub-region of the mixed-reality map/environment that is selectively identified and based on the position information from the second device(s) (act 1160). This act may include, for example, examining and searching the limited sub-region of the map while affirmatively refraining from evaluating other portions/sub-regions of the map for features/objects that match the features (e.g., feature points, objects, landmarks) identified by the first device in act 1120.

In some instances, the first device determines which portions of the map to refrain from searching based on an analysis of the relative probability valuations of the first device and the second device(s). This may include ignoring some position information from some second devices and/or preferentially treating position information from devices that have higher probability valuations. For instance, if one second device (device A) has a low probability valuation for its determined location, while another device (device B) has a higher probability valuation for its determined location, then the first device may consider the position information of device A, while ignoring the position information from device B, while determining which sub-regions of the map to include and exclude from a search for features/objects that match the features/objects it has identified in its limited scan of the environment.

A known or determined proximity or lack of proximity between the first device and one or more of the secondary devices can also be considered when determining which position information to use when identifying portions of the map to include or exclude from a search for matching features/objects.

Once the first device is able to re-determine or verify its position within the shared map, and particularly within the searched sub-region of the map, based on the limited/qualified search of the map, the device will continue/resume tracking of the first device positioning based on the newly determined/verified position of the device within the mixed-reality environment (act 1170). The continued/resumed tracking may include the performance of updated position tracking of the device based on the determined new position of the device as well as newly identified/obtained sensor data based on new motion of the device in the environment.

Example Computer/Computer Systems

Attention will now be directed to FIG. 12 which illustrates another example of a computer system 1200 that may include and/or be used to perform any of the operations described herein. Computer system 1200 may take various different forms. For example, computer system 1200 may be embodied as a tablet 1200A, a desktop or laptop 1200B, a wearable HMD 1200C, a peripheral controller 1200D (shown abstractly as a box, but which may take any form), a mobile device, or any other type of standalone device, as represented by the ellipsis 1200E. Computer system 1200 may also be a distributed system that includes one or more connected computing components/devices that are in communication with computer system 3000. In some instances, the computer system 1200 is a MR inside out tracking device.

In its most basic configuration, computer system 1200 includes various different components. FIG. 12 shows that computer system 1200 includes one or more processor(s) 1210 (aka a “hardware processing unit”) and storage 1240. As discussed previously, the computer system 1200 may also include any number or type of cameras or other sensor(s) 1220.

Regarding the processor(s) 1210, it will be appreciated that the functionality described herein can be performed, at least in part, by one or more hardware logic components (e.g., the processor(s) 1210). For example, and without limitation, illustrative types of hardware logic components/processors that can be used include Field-Programmable Gate Arrays (“FPGA”), Program-Specific or Application-Specific Integrated Circuits (“ASIC”), Program-Specific Standard Products (“ASSP”), System-On-A-Chip Systems (“SOC”), Complex Programmable Logic Devices (“CPLD”), Central Processing Units (“CPU”), Graphical Processing Units (“GPU”), or any other type of programmable hardware.

Storage 1240 may be physical system memory, which may be volatile, non-volatile, or some combination of the two. The term “memory” may also be used herein to refer to non-volatile mass storage such as physical storage media. If computer system 1200 is distributed, the processing, memory, and/or storage capability may be distributed as well.

Storage 1240 is shown as including executable instructions (i.e., code 1250). The executable instructions represent instructions that are executable by the processor(s) 1210 of computer system 1200 to perform the disclosed operations, such as those described in the various methods.

The disclosed embodiments may comprise or utilize a special-purpose or general-purpose computer including computer hardware, such as, for example, one or more processors (such as processor(s) 1210) and system memory (such as storage 1240), as discussed in greater detail below. Embodiments also include physical and other computer-readable media for carrying or storing computer-executable instructions and/or data structures. Such computer-readable media can be any available media that can be accessed by a general-purpose or special-purpose computer system. Computer-readable media that store computer-executable instructions in the form of data are “physical computer storage media” or a “hardware storage device.” Computer-readable media that carry computer-executable instructions are “transmission media.” Thus, by way of example and not limitation, the current embodiments can comprise at least two distinctly different kinds of computer-readable media: computer storage media and transmission media.

Computer storage media (aka “hardware storage device”) are computer-readable hardware storage devices, such as RAM, ROM, EEPROM, CD-ROM, solid state drives (“SSD”) that are based on RAM, Flash memory, phase-change memory (“PCM”), or other types of memory, or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to store desired program code means in the form of computer-executable instructions, data, or data structures and that can be accessed by a general-purpose or special-purpose computer.

Computer system 1200 may also be connected (via a wired or wireless connection) to external sensors (e.g., one or more remote cameras) or devices and third-party systems and/or other remote systems 1280 via a network 1260. For example, computer system 1200 can communicate with any number devices (e.g., remote system(s) 1280) and other MR devices 1200E or cloud services to obtain or process data. In some cases, network 1260 may itself be a cloud network.

A “network,” like network 1260, is defined as one or more data links and/or data switches that enable the transport of electronic data between computer systems, modules, and/or other electronic devices. When information is transferred, or provided, over a network (either hardwired, wireless, or a combination of hardwired and wireless) to a computer, the computer properly views the connection as a transmission medium. Computer system 1200 will include one or more communication channels that are used to communicate with the network 1260. Transmission media include a network that can be used to carry data or desired program code means in the form of computer-executable instructions or in the form of data structures. Further, these computer-executable instructions can be accessed by a general-purpose or special-purpose computer. Combinations of the above should also be included within the scope of computer-readable media.

Upon reaching various computer system components, program code means in the form of computer-executable instructions or data structures can be transferred automatically from transmission media to computer storage media (or vice versa). For example, computer-executable instructions or data structures received over a network or data link can be buffered in RAM within a network interface module (e.g., a network interface card or “NIC”) and then eventually transferred to computer system RAM and/or to less volatile computer storage media at a computer system. Thus, it should be understood that computer storage media can be included in computer system components that also (or even primarily) utilize transmission media.

Computer-executable (or computer-interpretable) instructions comprise, for example, instructions that cause a general-purpose computer, special-purpose computer, or special-purpose processing device to perform a certain function or group of functions. The computer-executable instructions may be, for example, binaries, intermediate format instructions such as assembly language, or even source code. Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the described features or acts described above. Rather, the described features and acts are disclosed as example forms of implementing the claims.

Those skilled in the art will appreciate that the embodiments may be practiced in network computing environments with many types of computer system configurations, including personal computers, desktop computers, laptop computers, message processors, hand-held devices, multi-processor systems, microprocessor-based or programmable consumer electronics, network PCs, minicomputers, mainframe computers, mobile telephones, PDAs, pagers, routers, switches, and the like. The embodiments may also be practiced in distributed system environments where local and remote computer systems that are linked (either by hardwired data links, wireless data links, or by a combination of hardwired and wireless data links) through a network each perform tasks (e.g., cloud computing, cloud services and the like). In a distributed system environment, program modules may be located in both local and remote memory storage devices.

The present invention may be embodied in other specific forms without departing from its characteristics. The described embodiments are to be considered in all respects only as illustrative and not restrictive. The scope of the invention is, therefore, indicated by the appended claims rather than by the foregoing description. All changes which come within the meaning and range of equivalency of the claims are to be embraced within their scope.

文章《Microsoft Patent | Mixed-reality device positioning based on shared location》首发于Nweon Patent

]]>
Microsoft Patent | Systems and methods for dark current compensation in single photon avalanche diode imagery https://patent.nweon.com/25142 Wed, 30 Nov 2022 17:01:48 +0000 https://patent.nweon.com/?p=25142 ...

文章《Microsoft Patent | Systems and methods for dark current compensation in single photon avalanche diode imagery》首发于Nweon Patent

]]>
Patent: Systems and methods for dark current compensation in single photon avalanche diode imagery

Patent PDF: 加入映维网会员获取

Publication Number: 20220385842

Publication Date: 20221201

Assignee: Microsoft Technology Licensing, Llc (Redmond, Wa, Us)

Abstract

A system for dark current compensation in SPAD imagery is configurable to capture an image frame with the SPAD array and generate a temporally filtered image by performing a temporal filtering operation using the image frame and at least one preceding image frame. The at least one preceding image frame is captured by the SPAD array at a timepoint that temporally precedes a timepoint associated with the image frame. The system is also configurable to obtain a dark current image frame. The dark current image frame includes data indicating one or more SPAD pixels of the plurality of SPAD pixels that detect an avalanche event without detecting a corresponding photon. The system is also configurable to generate a dark current compensated image by performing a subtraction operation on the temporally filtered image or the image frame based on the dark current image frame.

Claims

We claim:

Description

BACKGROUND

Mixed-reality (MR) systems, including virtual-reality and augmented-reality systems, have received significant attention because of their ability to create truly unique experiences for their users. For reference, conventional virtual-reality (VR) systems create a completely immersive experience by restricting their users’ views to only a virtual environment. This is often achieved, in VR systems, through the use of a head-mounted device (HMD) that completely blocks any view of the real world. As a result, a user is entirely immersed within the virtual environment. In contrast, conventional augmented-reality (AR) systems create an augmented-reality experience by visually presenting virtual objects that are placed in or that interact with the real world.

As used herein, VR and AR systems are described and referenced interchangeably. Unless stated otherwise, the descriptions herein apply equally to all types of mixed-reality systems, which (as detailed above) includes AR systems, VR reality systems, and/or any other similar system capable of displaying virtual objects.

Some MR systems include one or more cameras for facilitating image capture, video capture, and/or other functions. For instance, cameras of an MR system may utilize images and/or depth information obtained using the camera(s) to provide pass-through views of a user’s environment to the user. An MR system may provide pass-through views in various ways. For example, an MR system may present raw images captured by the camera(s) of the MR system to a user. In other instances, an MR system may modify and/or reproject captured image data to correspond to the perspective of a user’s eye to generate pass-through views. An MR system may modify and/or reproject captured image data to generate a pass-through view using depth information for the captured environment obtained by the MR system (e.g., using a depth system of the MR system, such as a time-of-flight camera, a rangefinder, stereoscopic depth cameras, etc.). In some instances, an MR system utilizes one or more predefined depth values to generate pass-through views (e.g., by performing planar reprojection).

In some instances, pass-through views generated by modifying and/or reprojecting captured image data may at least partially correct for differences in perspective brought about by the physical separation between a user’s eyes and the camera(s) of the MR system (known as the “parallax problem,” “parallax error,” or, simply “parallax”). Such pass-through views/images may be referred to as “parallax-corrected pass-through” views/images. By way of illustration, parallax-corrected pass-through images may appear to a user as though they were captured by cameras that are co-located with the user’s eyes.

A pass-through view can aid users in avoiding disorientation and/or safety hazards when transitioning into and/or navigating within a mixed-reality environment. Pass-through views may also enhance user views in low visibility environments. For example, mixed-reality systems configured with long wavelength thermal imaging cameras may facilitate visibility in smoke, haze, fog, and/or dust. Likewise, mixed-reality systems configured with low light imaging cameras facilitate visibility in dark environments where the ambient light level is below the level required for human vision.

To facilitate imaging of an environment for generating a pass-through view, some MR systems include image sensors that utilize complementary metal-oxide-semiconductor (CMOS) and/or charge-coupled device (CCD) technology. For example, such technologies may include image sensing pixel arrays where each pixel is configured to generate electron-hole pairs in response to detected photons. The electrons may become stored in per-pixel capacitors, and the charge stored in the capacitors may be read out to provide image data (e.g., by converting the stored charge to a voltage).

However, such image sensors suffer from a number of shortcomings. For example, the signal to noise ratio for a conventional image sensor may be highly affected by read noise, especially when imaging under low visibility conditions. For instance, under low light imaging conditions (e.g., where ambient light is below about 10 lux, such as within a range of about 1 millilux or below), a CMOS or CCD imaging pixel may detect only a small number of photons, which may cause the read noise to approach or exceed the signal detected by the imaging pixel and decrease the signal-to-noise ratio.

The dominance of read noise in a signal detected by a CMOS or CCD image sensor is often exacerbated when imaging at a high frame rate under low light conditions. Although a lower framerate may be used to allow a CMOS or CCD sensor to detect enough photons to allow the signal to avoid being dominated by read noise, utilizing a low framerate often leads to motion blur in captured images. Motion blur is especially problematic when imaging is performed on an HMD or other device that undergoes regular motion during use.

In addition to affecting pass-through imaging, the read noise and/or motion blur associated with conventional image sensors may also affect other operations performed by HMDs, such as late stage reprojection, rolling shutter corrections, object tracking (e.g., hand tracking), surface reconstruction, semantic labeling, 3D reconstruction of objects, and/or others.

To address shortcomings associated with CMOS and/or CCD image sensors, devices have emerged that utilize single photon avalanche diode (SPAD) image sensors. In contrast with conventional CMOS or CCD sensors, a SPAD is operated at a bias voltage that enables the SPAD to detect a single photon. Upon detecting a single photon, an electron-hole pair is formed, and the electron is accelerated across a high electric field, causing avalanche multiplication (e.g., generating additional electron-hole pairs). Thus, each detected photon may trigger an avalanche event. A SPAD may operate in a gated manner (each gate corresponding to a separate shutter operation), where each gated shutter operation may be configured to result in a binary output. The binary output may comprise a “1” where an avalanche event was detected during an exposure (e.g., where a photon was detected), or a “0” where no avalanche event was detected. Separate shutter operations may be performed consecutively and integrated over a frame capture time period. The binary output of the consecutive shutter operations over a frame capture time period may be counted, and an intensity value may be calculated based on the counted binary output.

An array of SPADs may form an image sensor, with each SPAD forming a separate pixel in the SPAD array. To capture an image of an environment, each SPAD pixel may detect avalanche events and provide binary output for consecutive shutter operations in the manner described herein. The per-pixel binary output of consecutive shutter operations over a frame capture time period may be counted, and per-pixel intensity values may be calculated based on the counted per-pixel binary output. The per-pixel intensity values may be used to form an intensity image of an environment.

Although SPAD sensors show promise for overcoming various shortcomings associated with CMOS or CCD sensors, implementing SPAD sensors for image and/or video capture is still associated with many challenges. For example, there is an ongoing need and desire for improvements to the image quality of SPAD imagery, particularly for SPAD imagery captured under low light conditions (including color imagery capturing using SPADs under low light conditions).

The subject matter claimed herein is not limited to embodiments that solve any disadvantages or that operate only in environments such as those described above. Rather, this background is only provided to illustrate one exemplary technology area where some embodiments described herein may be practiced.

BRIEF SUMMARY

Disclosed embodiments include systems, methods, and devices for dark current compensation in SPAD imagery.

Some disclosed systems include a SPAD array with a plurality of SPAD pixels. The systems also include one or more processors and one or more hardware storage devices storing instructions that are executable by the one or more processors to configure the system to perform various acts associated with performing dark current compensation in SPAD imagery. These acts include capturing an image frame with the SPAD array and generating a temporally filtered image by performing a temporal filtering operation using the image frame and a preceding image frame captured by the SPAD array at a timepoint that temporally precedes a timepoint associated with the image frame.

The disclosed acts for performing dark current compensation in SPAD imagery also include obtaining a dark current image frame that includes data indicating one or more SPAD pixels of the plurality of SPAD pixels have detected an avalanche event without detecting a corresponding photon.

The disclosed acts for performing dark current compensation in SPAD imagery also include generating a dark current compensated image by performing a subtraction operation on the temporally filtered image based on the dark current image frame.

This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.

Additional features and advantages will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the teachings herein. Features and advantages of the invention may be realized and obtained by means of the instruments and combinations particularly pointed out in the appended claims. Features of the present invention will become more fully apparent from the following description and appended claims or may be learned by the practice of the invention as set forth hereinafter.

BRIEF DESCRIPTION OF THE DRAWINGS

In order to describe the manner in which the above-recited and other advantages and features can be obtained, a more particular description of the subject matter briefly described above will be rendered by reference to specific embodiments which are illustrated in the appended drawings. Understanding that these drawings depict only typical embodiments and are not therefore to be considered to be limiting in scope, embodiments will be described and explained with additional specificity and detail through the use of the accompanying drawings in which:

FIG. 1 illustrates example components of an example system that may include or be used to implement one or more disclosed embodiments;

FIG. 2 illustrates an example of capturing an image frame of an object in a low light environment using a single photon avalanche diode (SPAD) array of a head-mounted display (HMD);

FIG. 3 illustrates an example of generating a temporally filtered image using consecutively captured image frames captured by a SPAD array of an HMD;

FIG. 4 illustrates an example of capturing a dark current image frame using a SPAD sensor;

FIG. 5 illustrates an example of generating a dark current compensated image using a dark current image frame;

FIG. 6 illustrates an example of capturing multiple dark current image frames using a SPAD sensor under different temperature conditions;

FIG. 7 illustrates an example of generating a dark current compensated image using a dark current image selected based on temperature conditions at runtime;

FIG. 8 illustrates an example dark current image frame that includes a region of unexposed SPAD pixels that will be unexposed while capturing image frames at runtime;

FIG. 9 illustrates an example of generating a dark current compensated image using a dark current image frame that is scaled using a dark current factor, where the dark current factor is determined based on runtime sensor data of unexposed SPAD pixels;

FIG. 10 illustrates example additional operations that may be performed on a dark current compensated image;

FIG. 11 illustrates an example of generating a dark current compensated image by subtracting a dark current value from pixels of an image frame captured at runtime, where the dark current value is determined based on runtime sensor data of unexposed SPAD pixels;

FIGS. 12 and 13 illustrate example flow diagrams depicting acts associated with compensating for dark current in SPAD imagery;

FIG. 14 illustrates example SPAD pixels of a SPAD array that include color filters;

FIG. 15 illustrates an example of capturing an image frame of colored objects in a low light environment using a color filtered SPAD array of an HMD;

FIG. 16 illustrates an example processing pipeline for obtaining color images using a SPAD array;

FIG. 17 illustrates an example of generating a filtered image by performing temporal filtering on an image frame prior to demosaicing;

FIG. 18A illustrates an example of demultiplexing consecutively captured image frames to generate color-specific image frames at multiple timepoints;

FIG. 18B illustrates an example of generating filtered color-specific image frames by performing temporal filtering using color-specific image frames associated with different timepoints;

FIG. 18C illustrates an example of multiplexing filtered color-specific image frames to generate a filtered image;

FIG. 19 illustrates an example of generating a filtered image by performing bilateral filtering on an image frame prior to demosaicing;

FIG. 20 illustrates an example of demultiplexing an image frame to generate color-specific image frames, generating filtered color-specific image frames by performing spatial filtering on the color-specific image frames, and multiplexing the filtered color-specific image frames to generate a filtered image; and

FIGS. 21 and 22 illustrate example flow diagrams depicting acts associated with obtaining color imagery using SPADs.

DETAILED DESCRIPTION

Disclosed embodiments are generally directed to systems, methods, and devices for compensating for dark current in single photon avalanche diode (SPAD) imagery.

Examples of Technical Benefits, Improvements, and Practical Applications

Those skilled in the art will recognize, in view of the present disclosure, that at least some of the disclosed embodiments may be implemented to address various shortcomings associated with at least some conventional image acquisition techniques. The following section outlines some example improvements and/or practical applications provided by the disclosed embodiments. It will be appreciated, however, that the following are examples only and that the embodiments described herein are in no way limited to the example improvements discussed herein.

The techniques described herein may facilitate a number of advantages over conventional systems, devices, and/or methods for SPAD image acquisition (including color image acquisition), particularly for imaging under low light conditions and/or for imaging from devices that undergo motion during image capture (e.g., HMDs).

Initially, the binarization of the SPAD signal effectively eliminates read noise, thereby improving signal-to-noise ratio for SPAD image sensor arrays as compared with conventional CMOS and/or CCD sensors. Accordingly, because of the binarization of SPAD signal, a SPAD signal may be read out at a high framerate (e.g., 90 Hz or greater, such as 120 Hz or even 240 Hz) without causing the signal to be dominated by read noise, even for signals capturing a low number of photons under low light environments.

In view of the foregoing, multiple exposure (and readout) operations may be performed at a high framerate using a SPAD array to generate separate partial image frames, and these image frames may be temporally filtered with one another. The separate partial image frames may be aligned using motion data and combined (e.g., by averaging or other filtering) to form a single composite image. In this regard, SPAD images may be obtained in a temporally filtered manner (e.g., with persistence), using prior-timepoint image data to improve the quality of current-timepoint image data. In contrast, attempting to utilize multiple image frames captured a high framerate to form a single composite image using a conventional CMOS or CCD camera would result in signals dominated by read noise, particularly under low light imaging conditions.

An additional challenge associated with image acquisition using SPADs is signal noise brought about by dark current. Dark current (sometimes referred to as reverse bias leakage current) refers to a small electric current that flows through photosensitive devices (e.g., SPADs) even when no photons are entering the device. Dark current can be thermally induced or brought about by crystallographic and/or manufacturing irregularities and/or defects.

In SPADs, dark current can cause an electron-hole pair to be generated in the depletion region and can trigger avalanche events, even when the SPAD is not detecting a photon. Avalanche events brought about by dark current are typically counted as detected photons, which can cause the binary output of a SPAD to include false counts (or “dark counts”). In SPAD imagery, dark counts can cause the intensity values assigned to at least some SPAD pixels to by inaccurately high, which can add noise to SPAD imagery. In some instances, the effects of dark counts are prominent when imaging under low light conditions, resulting in high-frequency noise that degrades user experiences.

Accordingly, disclosed techniques may facilitate dark current compensation by subtracting a dark current SPAD image from a SPAD image captured at runtime. A dark current image may be obtained as part of a calibration step by capturing one or more SPAD images while blocking or occluding the SPAD sensor. The dark current image may indicate which SPAD pixels generate dark counts and/or the quantity of dark counts generated by different SPAD pixels. The dark current image may therefore be used to subtract dark counts from SPAD imagery captured at runtime to compensate for the effects of dark current in the SPAD sensor. Such subtraction may reduce the amount of noise present in SPAD imagery, thereby improving user experiences.

Where temporal filtering is included in a processing pipeline for generating SPAD imagery, it has been found to be advantageous to perform temporal filtering prior to performing dark current compensation. Stated differently, it has been found to be advantageous to perform dark current compensation on temporally filtered images, rather than to perform temporal filtering on dark current-compensated images. For example, because intensity values stored for images truncate at zero (e.g., negative intensity values are not stored), performing dark current subtraction before performing temporal filtering can generate a bias toward larger intensity values. Such biasing toward higher intensity values may occur, for example, where a dark current image stores a higher intensity value for a SPAD pixel than an intensity value detected by the SPAD pixel at runtime. In such cases, a subtraction operation will bring the intensity value for the SPAD pixel to zero, but the further difference between the higher dark current intensity value and the lower runtime intensity value will be lost. Accordingly, if temporal filtering is subsequently performed, the intensity value for the SPAD pixel may be higher than desired for effectively removing the dark counts.

Thus, in some instances, temporal filtering and dark current compensation may be performed in a particular order (with temporal filtering occurring first) to facilitate improved image acquisition using SPADs (e.g., with reduced noise), particularly for imaging under low light conditions. That said, in some instances, dark current compensation is performed prior to temporal filtering.

Additional challenges are associated with acquiring color images using SPADs, particularly under low light imaging conditions. For example, to capture color images, color filters are positioned over SPAD pixels (red, green, blue (RGB) color filters, or other types of filters) in a pattern (e.g., a Bayer pattern or other type of pattern) to collect light of different wavelengths to generate color values for image pixels of a color image. To generate color values, conventional systems obtain raw image data (e.g., per-pixel counts of avalanche events, or per-pixel intensity values based on the counts of avalanche events) and then perform a demosaicing operation on the raw image data. Demosaicing involves generating (e.g., via interpolation) per-pixel color values (e.g., RGB values) for each image sensing pixel of an image sensor (even though each image sensing pixel typically includes a filter for only a single color channel positioned thereover). Demosaicing may allow a color image to match the resolution of the image sensor, which is preferable, in some instances, relative to generating single color values (e.g., RGB values) for each cluster of color filtered image sensing pixels (e.g., each 2×2 set of a red pixel, a green pixel, a green pixel, and a blue pixel in a Bayer pattern). The latter approach results in downsampling or downscaling of the color image relative to the image sensor.

Under low light imaging conditions, raw image data captured by color filtered SPAD sensors often include significant noise. Furthermore, demosaicing operations are associated with adding noise to processed images, which further compounds the noise problem when using SPADS to perform color imaging under low light conditions.

Although temporal and spatial filtering operations may ordinarily reduce noise in SPAD imagery, such techniques are, in many instances, not well-suited for reducing noise in demosaiced SPAD imagery. For instance, noise added to the image data via demosaicing often reduces the effectiveness of such filtering operations for improving color imagery captured using SPADs.

Accordingly, disclosed techniques may facilitate reduced noise in color images acquired using SPADS by performing filtering operations (e.g., temporal filtering and/or spatial filtering) on raw image data captured using SPADS with color filters positioned thereover. In accordance with the present disclosure, raw image data is filtered to generate a filtered image, and demosaicing is subsequently performed on the filtered image (rather than on the raw image data). By performing demosaicing after one or more filtering operations, embodiments of the present disclosure refrain from further adding noise to the raw image data (e.g., as a result of demosaicing) prior to performing the filtering operations, thereby improving the benefits facilitated by the filtering operations and improving color image acquisition using SPADS.

Disclosed embodiments also extend to performing filtering operations (e.g., temporal and/or spatial filtering) in a manner that accounts for different color filters associated with different SPAD pixels of a SPAD sensor. For example, as noted above, temporal filtering can include aligning consecutively acquired image frames and filtering aligned image pixels together to generate a final image. However, for demosaiced images, different image pixels that become filtered together may be associated with different color channels (e.g., where large amounts of motion are associated with the consecutively acquired images), which can distort colors and/or intensities within the output image. Furthermore, spatial filtering can cause neighboring pixels of different color channels to become filtered together, which may distort colors and/or intensities within the output image.

To combat these issues, raw image data captured using color filtered SPADs may be demultiplexed to separate the image data into separate images associated with the different color channels represented by the color filters of the SPADs. Separate filtering operations (e.g., temporal and/or spatial filtering operations) may then be performed on the separate images associated with the different color channels, and, after filtering, the separate images may be recombined (e.g., multiplexed) into a single image and subsequently demosaiced to provide a final color image.

Having just described some of the various high-level features and benefits of the disclosed embodiments, attention will now be directed to FIGS. 1 through 22. These Figures illustrate various conceptual representations, architectures, methods, and supporting illustrations related to the disclosed embodiments.

Example Systems and Techniques for Dark Current Compensation in SPAD Imagery

Attention is now directed to FIG. 1, which illustrates an example system 100 that may include or be used to implement one or more disclosed embodiments. FIG. 1 depicts the system 100 as a head-mounted display (HMD) configured for placement over a head of a user to display virtual content for viewing by the user’s eyes. Such an HMD may comprise an augmented reality (AR) system, a virtual reality (VR) system, and/or any other type of HMD. Although the present disclosure focuses, in at least some respects, on a system 100 implemented as an HMD, it should be noted that the techniques described herein may be implemented using other types of systems/devices, without limitation.

FIG. 1 illustrates various example components of the system 100. For example, FIG. 1 illustrates an implementation in which the system includes processor(s) 102, storage 104, sensor(s) 110, I/O system(s) 116, and communication system(s) 118. Although FIG. 1 illustrates a system 100 as including particular components, one will appreciate, in view of the present disclosure, that a system 100 may comprise any number of additional or alternative components.

The processor(s) 102 may comprise one or more sets of electronic circuitries that include any number of logic units, registers, and/or control units to facilitate the execution of computer-readable instructions (e.g., instructions that form a computer program). Such computer-readable instructions may be stored within storage 104. The storage 104 may comprise physical system memory and may be volatile, non-volatile, or some combination thereof. Furthermore, storage 104 may comprise local storage, remote storage (e.g., accessible via communication system(s) 116 or otherwise), or some combination thereof. Additional details related to processors (e.g., processor(s) 102) and computer storage media (e.g., storage 104) will be provided hereinafter.

In some implementations, the processor(s) 102 may comprise or be configurable to execute any combination of software and/or hardware components that are operable to facilitate processing using machine learning models or other artificial intelligence-based structures/architectures. For example, processor(s) 102 may comprise and/or utilize hardware components or computer-executable instructions operable to carry out function blocks and/or processing layers configured in the form of, by way of non-limiting example, single-layer neural networks, feed forward neural networks, radial basis function networks, deep feed-forward networks, recurrent neural networks, long-short term memory (LSTM) networks, gated recurrent units, autoencoder neural networks, variational autoencoders, denoising autoencoders, sparse autoencoders, Markov chains, Hopfield neural networks, Boltzmann machine networks, restricted Boltzmann machine networks, deep belief networks, deep convolutional networks (or convolutional neural networks), deconvolutional neural networks, deep convolutional inverse graphics networks, generative adversarial networks, liquid state machines, extreme learning machines, echo state networks, deep residual networks, Kohonen networks, support vector machines, neural Turing machines, and/or others.

As will be described in more detail, the processor(s) 102 may be configured to execute instructions 106 stored within storage 104 to perform certain actions associated with imaging using SPAD arrays. The actions may rely at least in part on data 108 (e.g., avalanche event counting or tracking, etc.) stored on storage 104 in a volatile or non-volatile manner.

In some instances, the actions may rely at least in part on communication system(s) 118 for receiving data from remote system(s) 120, which may include, for example, separate systems or computing devices, sensors, and/or others. The communications system(s) 120 may comprise any combination of software or hardware components that are operable to facilitate communication between on-system components/devices and/or with off-system components/devices. For example, the communications system(s) 120 may comprise ports, buses, or other physical connection apparatuses for communicating with other devices/components. Additionally, or alternatively, the communications system(s) 120 may comprise systems/components operable to communicate wirelessly with external systems and/or devices through any suitable communication channel(s), such as, by way of non-limiting example, Bluetooth, ultra-wideband, WLAN, infrared communication, and/or others.

FIG. 1 illustrates that a system 100 may comprise or be in communication with sensor(s) 110. Sensor(s) 110 may comprise any device for capturing or measuring data representative of perceivable phenomenon. By way of non-limiting example, the sensor(s) 110 may comprise one or more image sensors, microphones, thermometers, barometers, magnetometers, accelerometers, gyroscopes, and/or others.

FIG. 1 also illustrates that the sensor(s) 110 include SPAD array(s) 112. As depicted in FIG. 1, a SPAD array 112 comprises an arrangement of SPAD pixels 122 that are each configured to facilitate avalanche events in response to sensing a photon, as described hereinabove. SPAD array(s) 112 may be implemented on a system 100 (e.g., an MR HMD) to facilitate image capture for various purposes (e.g., to facilitate computer vision tasks, pass-through imagery, and/or others).

FIG. 1 also illustrates that the sensor(s) 110 include inertial measurement unit(s) 114 (IMU(s) 114). IMU(s) 114 may comprise any number of accelerometers, gyroscopes, and/or magnetometers to capture motion data associated with the system 100 as the system moves within physical space. The motion data may comprise or be used to generate pose data, which may describe the position and/or orientation (e.g., 6 degrees of freedom pose) and/or change of position (e.g., velocity and/or acceleration) and/or change of orientation (e.g., angular velocity and/or angular acceleration) of the system 100.

Furthermore, FIG. 1 illustrates that a system 100 may comprise or be in communication with I/O system(s) 116. I/O system(s) 116 may include any type of input or output device such as, by way of non-limiting example, a touch screen, a mouse, a keyboard, a controller, and/or others, without limitation. For example, the I/O system(s) 116 may include a display system that may comprise any number of display panels, optics, laser scanning display assemblies, and/or other components.

Attention is now directed to FIG. 2, which illustrates an example of capturing an image frame 210 of an object 206 (e.g., a table) in a low light environment 208 using a single photon avalanche diode (SPAD) array of a head-mounted display 202 (HMD 202). The HMD 202 corresponds, in at least some respects, to the system 100 disclosed hereinabove. For example, the HMD 202 includes a SPAD array (e.g., SPAD array(s) 112) that includes SPAD pixels configured for photon detection to capture images. In the example shown in FIG. 2, the HMD 202 is positioned according to pose 204 while capturing the image frame 210 of the object 206 in the low light environment 208. The pose 204 may be tracked or measured utilizing sensors (e.g., IMU(s) 114, camera(s) to facilitate simultaneous localization and mapping (SLAM), etc.) of the HMD 202.

FIG. 2 illustrates that the image frame 210 includes image data 212 depicting a noisy representation of the object 206. In some instances, this occurs when imaging under low light conditions (e.g., about 1 millilux or below) due to the low number of photons detected by SPAD pixels over the frame capture time period for capturing the image frame 210. FIG. 2 also illustrates the image frame 210 as including dark counts 214, which are depicted as high-frequency noise interspersed throughout the image frame 210. As discussed above, dark counts 214 may result from dark current occurring in SPAD pixels. The following discussion refers to various techniques that may be employed to provide an improved representation of the object 206 in SPAD imagery (e.g., by reducing the noise in the image data 212 depicting the object 206 and by compensating for the dark counts 214).

FIG. 3 illustrates an example of generating a temporally filtered image using consecutively captured image frames captured by a SPAD array of an HMD. In particular, FIG. 3 shows the image frame 210 (and its image data 212 and dark counts 214), as well as additional image frames 302 and 306 (e.g., captured by the HMD 202). Each of the additional image frames 302 and 306 include image data depicting the object 206 (i.e., image data 304 and 308, respectively) and include dark counts. FIG. 3 also indicates that the different image frames 210, 302, and 306 are captured at different timepoints. In particular, FIG. 3 indicates that image frame 302 was captured at timepoint 310, image frame 306 was captured at timepoint 312, and image frame 210 was captured at timepoint 314. In the present example, timepoint 310 temporally precedes timepoints 312 and 314, and timepoint 312 temporally precedes timepoint 314.

As indicated above, image data of consecutively captured image frames may be combined to form a composite image to facilitate adequate exposure of objects captured within the image frames (e.g., particularly under low light conditions). Accordingly, FIG. 3 illustrates temporal filtering 316 performed on the image frames 302, 306, and 210. Temporal filtering 316 includes combining corresponding image pixels of the different image frames 302, 306, and 210 to generate pixel values for an output image (i.e., temporally filtered image 318). “Corresponding image pixels” in different image frames are image pixels of different image frames that capture the same portion of a captured environment.

Corresponding image pixels of the different image frames 302, 306, and 210 may be combined or composited in various ways, such as by summing, averaging (e.g., weighted averaging), alpha blending, and/or others, and the manner/parameters of combining corresponding image pixels may differ for different pixel regions and/or may be dynamically determined based on various factors (e.g., signal strength, amount of motion, motion detected in a captured scene, etc.).

In some instances, image frames 302, 306, and 210 capture the object 206 from poses that are at least slightly different from one another. For example, the HMD 202 may capture image frames 302 and 306 from poses that at least slightly differ from pose 204 and/or from one another. Accordingly, in some instances, to align corresponding pixels of different image frames 302, 306, 210, temporal filtering 316 may utilize motion data 324, which may comprise or be used to generate pose data that describes the position and/or orientation (e.g., 6 degrees of freedom pose) and/or change of position (e.g., velocity and/or acceleration) and/or change of orientation (e.g., angular velocity and/or angular acceleration) of the HMD 202 during the capturing of the image frames 302, 306, and 210.

The motion data 324 may be used to align the image frames 302, 306, and 210 with one another. For example, a system may use the motion data 324 to align image frames 302 and 306 with pose 204 of image frame 210, thereby generating aligned image frames that are spatially aligned with one another (e.g., appearing as though they were all captured from pose 204 with the same capture perspective). In this regard, the temporal filtering 316 may comprise motion compensated temporal filtering.

In some instances, temporal filtering 316 additionally or alternatively utilizes optical flow estimations to align the image frames 302, 306, and 210 to facilitate image compositing to generate a composite image (i.e., temporally filtered image 318). For example, in some instances, a system downsamples the consecutively captured image frames and performs optical flow analysis to obtain vectors for aligning the pixels of the image frames. Furthermore, although the present disclosure focuses, in at least some respects, on temporal filtering operations that utilize image frames that temporally precede an image frame associated with a target timepoint to generate a composite image associated with the target timepoint, temporal filtering operations may additionally or alternatively utilize at least some image frames that are temporally subsequent to an image frame associated with a target timepoint to generate a composite image associated with the target timepoint.

FIG. 3 illustrates that the temporal filtering 316 generates a temporally filtered image 318 based on the composited image data 304, 308, and 212 of the image frames 302, 306, and 210 (e.g., after motion compensation). In the example depicted in FIG. 3, the temporally filtered image 318 includes image data 320 that represents the object 206 with reduced noise and improved signal (e.g., relative to the individual representations of the object 206 in the image data 304, 308, 212 of the image frames 302, 306, and 210, respectively). However, FIG. 3 illustrates that the temporally filtered image 318 still includes dark counts 322, which negatively affect image quality.

Accordingly, embodiments of the present disclosure provide dark count compensation techniques for facilitating improved SPAD imagery. FIG. 4 illustrates an example of capturing a dark current image frame 406 using a SPAD sensor 402. In the present example, the SPAD sensor 402 is part of the HMD 202 and comprises a SPAD array with a plurality of SPAD pixels. FIG. 4 illustrates a cover 404 occluding or obscuring the SPAD pixels of the SPAD sensor 402. The cover 404 may comprise any material or device that blocks light in any desired wavelength range (e.g., the visible spectrum, the near-IR spectrum, the IR spectrum, and/or others).

FIG. 4 illustrates an example in which the dark current image frame 406 is captured with the cover 404 positioned to prevent photons from reaching the SPAD pixels of the SPAD array of the SPAD sensor 402. The dark current image frame 406 may be obtained as a part of a calibration step performed in preparation for use of the HMD 202 in user applications (e.g., prior to the capturing of the image frames 302, 306, and/or 210). The dark current image frame 406 may comprise a single image frame captured by the SPAD sensor 402 while obscured by the cover 404, or the dark current image frame 406 may be generated based on any number of image frames captured by the SPAD sensor 402 while obscured by the cover 404. For example, the dark current image frame 406 may be generated by temporally averaging per-pixel intensity values of any number of image frames captured by the SPAD sensor 402 while blocked by the cover 404.

As is evident from FIG. 4, the dark current image frame 406 includes dark counts and therefore includes data indicating which SPAD pixels of the SPAD sensor 402 are associated with detecting avalanche events without being exposed to photons. This information may be used to compensate for dark current in image frames captured at runtime.

FIG. 5 illustrates an example of generating a dark current compensated image using a dark current image frame. In particular, FIG. 5 shows the temporally filtered image 318 (discussed above with reference to FIG. 3) and the dark current image frame 406 being provided as inputs to subtraction 502. Subtraction 502 may comprise subtracting intensity values of the dark current image frame 406 from intensity values of the temporally filtered image 318 on a per-pixel basis. FIG. 5 illustrates a dark current compensated image 504 provided as output of the subtraction 502. As is evident from FIG. 5, the dark current compensated image 504 substantially omits the dark counts that were present in the temporally filtered image 318, in view of the subtraction 502 based on the dark current image frame 406. Accordingly, the effects of dark current in SPAD imagery, particularly SPAD imagery captured under low light conditions, may be ameliorated.

FIGS. 25 have focused on a simple example of dark current compensation that utilizes a dark current image frame captured under controlled conditions (e.g., with a stop filter obfuscating the SPAD pixels of the SPAD sensor). However, in some instances, ambient conditions present while capturing a dark current image frame differ from ambient conditions present while capturing SPAD imagery at runtime. Because the severity of image noise brought about by dark current can vary with ambient conditions, such as temperature, discrepancies between dark current image frame capture conditions and runtime image frame capture conditions can cause systems to undercompensate or overcompensate for dark counts in SPAD imagery.

Accordingly, at least some implementations of the present disclosure account for differences between dark current image frame capture conditions and runtime image frame capture conditions. FIG. 6 illustrates a SPAD sensor 602, which corresponds to the SPAD sensor 402 of FIG. 4. The SPAD sensor 602 is similarly obscured by a cover 604 that prevents photons from reaching the SPAD pixels of the SPAD sensor 602. FIG. 6 illustrates a plurality of different dark current image frames 606, 608, and 610 captured using the SPAD sensor 602 while blocked by the cover 604 (e.g., during calibration). FIG. 6 illustrates that each of the different dark current image frames 606, 608, 610 are associated with different temperature values (or ranges of temperature values). For example, different dark current image frames may be captured under different temperature conditions, such that different dark current images are available to facilitate dark current compensation under different runtime temperature conditions. In the example shown in FIG. 6, dark current image frame 606 is associated with temperature 612, dark current image frame 608 is associated with temperature 614, and dark current image frame 610 is associated with temperature 616.

FIG. 7 illustrates an example of generating a dark current compensated image using a dark current image selected based on temperature conditions at runtime. In particular, FIG. 7 illustrates a temporally filtered image 702, which, according to the present example, captures the object 206 in the low light environment 208 discussed above and is captured by the HMD 202 in accordance with the principles discussed hereinabove with reference to FIGS. 23 (e.g., utilizing temporal filtering based on consecutively captured image frames). Temporally filtered image 702 is acquired at runtime (e.g., after the capturing of the dark current image frames of FIG. 6).

In the example of FIG. 7, the temporally filtered image 702 is associated with a temperature 704, which may correspond to an environment temperature and/or device temperature present for the capturing of one or more of the image frames used to generate the temporally filtered image 702. In some instances, the temperature 704 is captured using sensors (e.g., sensor(s) 110) of the HMD 202.

FIG. 7 also illustrates the dark current image frames 606, 608, and 610 discussed above with reference to FIG. 6, along with their respective temperatures 612, 614, and 616. FIG. 7 conceptually depicts that a system may select a dark current image frame to use for dark current compensation based on the temperature 704 associated with the temporally filtered image 702 obtained at runtime. For example, FIG. 7 illustrates a dashed line extending between the temperature 704 of the temporally filtered image 702 and the temperature 616 of the dark current image frame 610, indicating that a system may determine that the temperature 704 of the temporally filtered image 702 is most similar to the temperature 616 of dark current image frame 610 (relative to the temperatures 612 and 614 of the other available dark current image frames 606 and 608, respectively).

Based on this selection, FIG. 7 illustrates the temporally filtered image 702 and the dark current image frame 610 being provided as inputs to subtraction 706 (which corresponds in function to subtraction 502 discussed above with reference to FIG. 5). As shown in FIG. 7, the subtraction 706 provides a dark current compensated image 708, which substantially omits dark counts that were present in the temporally filtered image (e.g., they are subtracted out using the dark current image frame 610). Accordingly, a system may intelligently select from among available dark current image frames (each associated with a respective temperature or range of temperatures) based on a measured runtime temperature.

Although the foregoing example focuses on using temperature as a basis for selecting a dark current image frame to use to subtract dark counts from a temporally filtered image, temperature may, in some instances, be used to generate a scaled or interpolated dark current image for facilitating dark current compensation. For example, in some instances, a runtime temperature does not exactly match or is not within a particular range of a temperature value associated with a previously captured dark current image frame. To accommodate such circumstances, the runtime temperature and one or more of the temperatures associated with dark current image frames may be used to generate a dark current factor, which may comprise a ratio of the runtime temperature and a dark current image frame temperature (e.g., where the runtime temperature 30° C. and the nearest temperature associated with a dark current image frame is 25° C., a dark current factor may be 1.2). A system may then use the dark current factor to generate a scaled dark current image (e.g., by applying the dark current factor to the per-pixel intensity values of a nearest-temperature dark current image frame) and use the scaled dark current image frame to facilitate dark current compensation (e.g., via subtraction as discussed above). In this same vein, temperature values associated with a runtime image and one or more dark current image frames may be used to generate an interpolated or extrapolated dark current image frame to be used for dark current compensation (e.g., where a runtime temperature lies between two temperature values associated with different dark current images).

Thus, temperature may be used as a factor (e.g., a “dark current factor”) for selecting or generating a dark current image frame to use for facilitating dark current compensation. Additional or alternative dark current factors are within the scope of the present disclosure for selecting or generating a dark current image frame to facilitate dark current compensation. In some implementations, a dark current image frame may be selected or generated in a manner that is agnostic toward explicit temperature measurements, which may advantageously eliminate the need for runtime temperature measurements/sensors.

FIG. 8 illustrates an example SPAD sensor 802, which may be part of the HMD 202 and may correspond, in at least some respects, to the SPAD sensors 602, 402 discussed hereinabove. For example, the SPAD sensor 802 is similarly obscured by a cover 804 to prevent photons from reaching the SPAD pixels of the SPAD sensor 802 while capturing the dark current image frame 806 (e.g., during a calibration step). The dark current image frame 806 includes dark counts, similar to the dark current image frames discussed above.

The example shown in FIG. 8 conceptually depicts an unexposed region 808 of SPAD pixels of the SPAD sensor 802 that capture a portion of the dark current image frame 806 (indicated by the dashed line that defines an outer boundary portion of the dark current image frame 806). Although all SPAD pixels of the SPAD sensor 802 are covered by the cover 804 during the capturing of the dark current image frame 806, the unexposed region 808 of SPAD pixels are also obscured while capturing image frames at runtime. For example, FIG. 8 illustrates an example representation of a cover 812 that may be used at runtime to prevent photons from reaching the unexposed region 808 of SPAD pixels of the SPAD sensor 802. The particular structure and/or configuration of the cover 812 is provided as an example only, and a cover may take on any form and be positioned at any desirable portion of the SPAD sensor 802 to prevent photons from reaching the unexposed region 808 of SPAD pixels.

As is evident from FIG. 8, at least some of the SPAD pixels within the unexposed region 808 detect dark counts for the dark current image frame 806 without detecting photon counts from any captured environment. In some instances, where a cover 812 is used at runtime, the SPAD pixels within the unexposed region 808 continue to detect dark counts for runtime images without detecting photon counts from any captured environment. As will be described in more detail hereinafter, dark counts detected within the unexposed region 808 of SPAD pixels during calibration (e.g., while capturing dark current image frames 806) and during runtime may be leveraged to facilitate dark current compensation (without relying on explicit temperature measurements).

FIG. 9 illustrates an the SPAD sensor 802 at runtime with the cover 812 positioned on the SPAD sensor 802 to prevent photons from reaching the SPAD pixels of the unexposed region 808. FIG. 9 illustrates an example temporally filtered image 902 generated from image frames captured by the SPAD sensor 802 at runtime. As depicted in FIG. 9, the temporally filtered image 902 includes image data acquired using SPAD pixels within the unexposed region 808 (also marked via a dashed box associated with temporally filtered image 902). In the example shown in FIG. 9, the quantity of avalanche events (e.g., dark counts) detected at runtime by the SPAD pixels within the unexposed region 808 is used to determine dark current value(s) 904. The dark current value(s) 904 may comprise any quantification of the dark counts detected by the SPAD pixels within the unexposed region 808 at runtime. For example, the dark current value(s) 904 may comprise a sum, average, or other measure of the dark counts detected by the SPAD pixels within the unexposed region 808 at runtime.

FIG. 9 similarly illustrates dark current value(s) 906 detected based on dark counts detected by SPAD pixels within the unexposed region 808 while capturing the dark current image frame 806 (e.g., a sum, average, or other measure of the dark counts detected within the unexposed region 808 during calibration). A system may utilize the dark current value(s) 904 obtained based on runtime imagery (e.g., temporally filtered image) and the dark current value(s) 906 obtained based on calibration imagery (e.g., dark current image frame 806) to generate dark current factor(s) 908. The dark current factor(s) 908 may comprise any representation of a difference or similarity between the dark current value(s) 904 and the dark current value(s) 906, such as a ratio of the dark current value(s) 904 and the dark current value(s) 906. As a simplified example, where the runtime dark current value 904 comprises an average intensity of 2 and the dark current value(s) 906 comprises an average intensity of 1, the dark current factor(s) 908 may be 2.

As noted above, the quantity of dark current (and noise that results from dark counts) present in an image captured at runtime may depend on ambient conditions at runtime (e.g., temperature). Accordingly, the dark current value(s) 904 and the dark current factor(s) 908 may depend on ambient conditions at runtime. Thus, the dark current factor(s) 908 may be used to scale a previously captured dark current image frame to account for ambient conditions at runtime (e.g., ambient conditions that existed for capturing the temporally filtered image 902). FIG. 9 depicts a scaled dark current image frame 910, which may be generated by applying the dark current factor(s) 908 to a previously captured dark current image frame (e.g., dark current image frame 806). For example, the dark current factor(s) 908 may be multiplied by the per-pixel intensity values associated with the dark current image frame to generate pixel intensity values for the scaled dark current image frame 910.

As depicted in FIG. 9, the scaled dark current image frame 910 and the temporally filtered image 902 may be used as input to subtraction 912 to generate a dark current compensated image 914. Subtraction 912 generally corresponds to the subtraction operations 502 and 706 discussed hereinabove, and, in the example shown in FIG. 9, the dark current compensated image 914 substantially omits the dark count noise associated with the temporally filtered image 902.

Dark current values and/or dark current factors may be determined for entire unexposed regions (e.g., unexposed region 808) or multiple dark current values and/or dark current factors may be determined for different subsets of pixels of unexposed regions (e.g., rows, columns, and/or blocks of SPAD pixels of unexposed regions). Thus, different dark current factors may be used to scale different regions of a dark current image frame to generate a scaled dark current image frame. In this regard, dark current values, dark current factors, and/or dark current image frames may comprise or be associated with multiple constituent components or regions.

Although the foregoing examples focus, in at least some respects, on an unexposed region 808 of SPAD pixels that forms an outer boundary portion of the SPAD pixels of a SPAD sensor, other configurations are within the scope of the present disclosure. An unexposed region may comprise any suitable size and/or shape. Furthermore, one will appreciate, in view of the present disclosure, that an unexposed region may be omitted or excluded from output imagery (e.g., imagery that becomes displayed to one or more users).

Although dark current compensation may, as described herein, be effective in removing dark count noise from SPAD imagery, at least some noise resulting from dark counts may remain in dark current compensated imagery. FIG. 10 illustrates example additional operations that may be performed on a dark current compensated image to further ameliorate dark count noise. In particular, FIG. 10 illustrates pixel binning 1002 and median filtering 1004, which may be performed on a dark current compensated image 914.

Pixel binning 1002 may include reducing sections of pixels in an original image (e.g., dark current compensated image 914) to a single pixel in the output image. For example, in some instances, each pixel in an output image is defined by a pixel of an original image:

pd(m,n)=p(Km,Kn)

where pd is the pixel in the downsampled image, p is the pixel in the original image, K is a scaling factor, m is the pixel coordinate in the horizontal axis, and n is the pixel coordinate in the vertical axis. In some instances, the pixel binning 1002 also includes prefiltering functions for defining the pixels of the output image, such as anti-aliasing prefiltering to prevent aliasing artifacts.

In some implementations, pixel binning 1002 utilizes an averaging filter for defining the pixels of the output image based on the average of a section of pixels in the original image. In one example of pixel binning by a factor of 2 along each axis, each pixel in the output image is defined by an average of a 2×2 section of pixels in the original image:

pd(m,n)=[p(2m,2n)+p(2m,2n+1)+p(2m+1,2n)+p(2m+1,2n+1)]4

where pd is the pixel in the downsampled image, p is the pixel in the original image, m is the pixel coordinate in the horizontal axis, and n is the pixel coordinate in the vertical axis. Pixel binning 1002 may comprise iterative downsampling operations that are performed iteratively to arrive at an output image of a desired final image resolution.

Median filtering 1004 may comprise modifying each pixel value with the median pixel value of neighboring pixels (e.g., within a 3×3 pixel window centered about each pixel being modified). Because dark counts typically result in high frequency noise, median filtering 1004 may smooth out or remove dark counts that remain after dark current compensation operations discussed herein.

In some implementations, dark current compensation is performed without using a dark current image frame. For example, FIG. 11 illustrates the temporally filtered image 902 discussed above with reference to FIG. 9, which includes image data captured by an unexposed region 808 of SPAD pixels. FIG. 11 also illustrates the dark current value(s) 904 obtained based avalanche events detected by the SPAD pixels in the unexposed region 808 at runtime. FIG. 11 illustrates the dark current value(s) 904 and the temporally filtered image 902 provided as input to subtraction 1102 to generate a dark current compensated image 1108, indicating that the dark current value(s) 904 may be subtracted directly from intensity values of the temporally filtered image 902 without first being used to modify a dark current image frame to facilitate the subtraction.

Dark current value(s) 904 may be subtracted from all intensity values of the temporally filtered image 902, or the subtraction may, in some instances, be targeted to intensity values obtained by SPAD pixels known to generate dark current (e.g., based on a prior calibration operation). For example, FIG. 11 illustrates a dark current image frame 1104, which may be used to determine dark current pixel coordinates 1106 of SPAD pixels determined to generate dark current. These dark current pixel coordinates 1106 may be used to subtract the dark current value(s) 904 from only the intensity values obtained by SPAD pixels known to generate dark current, while refraining from subtracting the dark current value(s) 904 from other intensity values.

The dark current compensated image may be used for various purposes. For example, in some implementations, the HMD 202 (or another system) utilizes the dark current compensated image 1108 to facilitate stereo depth computations, simultaneous localization and mapping, object tracking, and/or others. For example, an HMD may generate a parallax-corrected dark current compensated image (e.g., by performing parallax reprojections using depth information, which may itself be generated using dark current compensated images) and display the parallax-corrected dark current compensated image to facilitate pass-through imaging.

Example Method(s) for Dark Current Compensation in SPAD Imagery

The following discussion now refers to a number of methods and method acts that may be performed by the disclosed systems. Although the method acts are discussed in a certain order and illustrated in a flow chart as occurring in a particular order, no particular ordering is required unless specifically stated, or required because an act is dependent on another act being completed prior to the act being performed. One will appreciate that certain embodiments of the present disclosure may omit one or more of the acts described herein.

FIGS. 12 and 13 illustrate example flow diagrams 1200 and 1300, respectively, depicting acts associated with compensating for dark current in SPAD imagery. The discussion of the various acts represented in the flow diagrams include references to various hardware components described in more detail with reference to FIG. 1.

Act 1202 of flow diagram 1200 of FIG. 12 includes capturing an image frame with a SPAD array. Act 1202 is performed, in some instances, by a system utilizing processor(s) 102, storage 104, sensor(s) 110 (e.g., SPAD array 112), input/output system(s) 116, communication system(s) 118, and/or other components. In some implementations, the plurality of SPAD pixels of the SPAD array comprises a plurality of unexposed SPAD pixels that are covered during the capturing of the dark current image frame and during the capturing of the image frame.

Act 1204 of flow diagram 1200 includes generating a temporally filtered image by performing a temporal filtering operation using the image frame and a preceding image frame. Act 1204 is performed, in some instances, by a system utilizing processor(s) 102, storage 104, sensor(s) 110, input/output system(s) 116, communication system(s) 118, and/or other components. In some implementations, the preceding image frame is captured by the SPAD array at a timepoint that temporally precedes a timepoint associated with the image frame of act 1202. Furthermore, in some instances, the temporal filtering operation includes (i) generating an aligned preceding image frame by using motion data associated with the SPAD array to spatially align the preceding image frame with the image frame and (ii) compositing the image frame with the aligned preceding image frame. In some implementations, the temporal filtering operation is based on an optical flow estimation generated based on the preceding image frame and the image frame.

Act 1206 of flow diagram 1200 includes obtaining a dark current factor indicating an amount of dark current associated with the image frame. Act 1206 is performed, in some instances, by a system utilizing processor(s) 102, storage 104, sensor(s) 110, input/output system(s) 116, communication system(s) 118, and/or other components. In some implementations, the dark current factor is based on a runtime temperature detected in association with the capturing of the image frame, such that the selection of the dark current image frame from the plurality of dark current image frames is based on the runtime temperature. In some implementations, the dark current factor is based on (i) a quantity of avalanche events detected by the plurality of unexposed SPAD pixels during the capturing of the dark current image frame and (ii) a quantity of avalanche events detected by the plurality of unexposed SPAD pixels during the capturing of the image frame. For example, the dark current factor may be determined by comparing (i) an average intensity detected by the plurality of unexposed SPAD pixels during the capturing of the dark current image frame and (ii) an average intensity detected by the plurality of unexposed SPAD pixels during the capturing of the image frame. In some instances, the dark current factor comprises a plurality of dark current factor components. For example, in some instances, each dark current factor component is associated with a respective subset of unexposed SPAD pixels of the plurality of unexposed SPAD pixels.

Act 1208 of flow diagram 1200 includes obtaining a dark current image frame. Act 1208 is performed, in some instances, by a system utilizing processor(s) 102, storage 104, sensor(s) 110, input/output system(s) 116, communication system(s) 118, and/or other components. In some instances, the dark current image frame includes data indicating one or more SPAD pixels of the plurality of SPAD pixels that detect an avalanche event without detecting a corresponding photon. In some implementations, the dark current image frame is one of a plurality of dark current image frames captured while the SPAD array is covered to prevent photons from reaching the SPAD array. In some implementations, the dark current image frame is selected from the plurality of dark current image frames based on the dark current factor of act 1206. In some implementations, each of the plurality of dark current image frames is associated with a different temperature or range of temperatures. In some instances, the dark current image frame is obtained by generating an interpolated dark current image frame based on the runtime temperature and at least two dark current image frames of the plurality of dark current image frames.

Act 1210 of flow diagram 1200 includes generating a dark current compensated image by performing a subtraction operation on the temporally filtered image based on the dark current image frame. Act 1210 is performed, in some instances, by a system utilizing processor(s) 102, storage 104, sensor(s) 110, input/output system(s) 116, communication system(s) 118, and/or other components. In some implementations, the subtraction operation includes (i) generating a scaled dark current image frame by modifying the dark current image frame using the dark current factor, (ii) generating the scaled dark current image frame comprises using each of the plurality of dark current factor components to modify corresponding regions of the dark current image frame, and (iii) subtracting the scaled dark current image frame from the image frame.

Act 1212 of flow diagram 1200 includes performing a pixel binning operation on the dark current compensated image. Act 1212 is performed, in some instances, by a system utilizing processor(s) 102, storage 104, sensor(s) 110, input/output system(s) 116, communication system(s) 118, and/or other components.

Act 1214 of flow diagram 1200 includes applying a median filter to the dark current compensated image. Act 1214 is performed, in some instances, by a system utilizing processor(s) 102, storage 104, sensor(s) 110, input/output system(s) 116, communication system(s) 118, and/or other components.

Act 1216 of flow diagram 1200 includes generating a parallax corrected dark current compensated image. Act 1216 is performed, in some instances, by a system utilizing processor(s) 102, storage 104, sensor(s) 110, input/output system(s) 116, communication system(s) 118, and/or other components.

Act 1218 of flow diagram 1200 includes displaying the parallax corrected color image on a display. Act 1218 is performed, in some instances, by a system utilizing processor(s) 102, storage 104, sensor(s) 110, input/output system(s) 116, communication system(s) 118, and/or other components.

Act 1302 of flow diagram 1300 of FIG. 13 includes capturing an image frame with the SPAD array while obscuring a plurality of unexposed SPAD pixels from a plurality of SPAD pixels. Act 1302 is performed, in some instances, by a system utilizing processor(s) 102, storage 104, sensor(s) 110 (e.g., SPAD array 112), input/output system(s) 116, communication system(s) 118, and/or other components.

Act 1304 of flow diagram 1300 includes identifying a region of the image frame that corresponds to the plurality of unexposed SPAD pixels which are obscured during the capturing of the image frame to prevent photons from reaching the plurality of unexposed SPAD pixels. Act 1304 is performed, in some instances, by a system utilizing processor(s) 102, storage 104, sensor(s) 110, input/output system(s) 116, communication system(s) 118, and/or other components.

Act 1306 of flow diagram 1300 includes determining a dark current value based on a quantity of photons detected by the plurality of unexposed SPAD pixels during the capturing of the dark current image frame. Act 1306 is performed, in some instances, by a system utilizing processor(s) 102, storage 104, sensor(s) 110, input/output system(s) 116, communication system(s) 118, and/or other components. In some instances, the dark current value is determined as an average intensity based on the quantity of photons detected by the plurality of unexposed SPAD pixels during the capturing of the image frame.

Act 1308 of flow diagram 1300 includes generating a dark current compensated image by performing a subtraction operation on the image frame based on the dark current value. Act 1308 is performed, in some instances, by a system utilizing processor(s) 102, storage 104, sensor(s) 110, input/output system(s) 116, communication system(s) 118, and/or other components. In some implementations, the subtraction operation includes subtracting the dark current value from one or more SPAD pixels of the plurality of SPAD pixels determined to generate dark current based on a previously performed calibration operation, while refraining from subtracting the dark current value from other SPAD pixels of the plurality of SPAD pixels that are separate from the one or more SPAD pixels.

Example Systems and Techniques for Obtaining Color Imagery Using SPADs

Attention is now directed to FIG. 14 illustrates an example implementation of SPADs 1402 of a SPAD array (e.g., SPAD array 112). A SPAD array may comprise any number of SPADs 1402, as indicated by the ellipses in FIG. 14. In the example shown in FIG. 14, the SPADs each comprise respective color filters positioned thereover (e.g., over the photon sensing portion of the SPADs 1402). FIG. 14 illustrates the color filters positioned over the SPADs 1402 in a Bayer pattern, in particular with diagonally disposed green filters 1406 and 1408 and with a diagonally disposed red filter 1404 and blue filter 1410. This pattern may be repeated over a SPAD array to form a mosaic of color-filtered SPAD pixels. Although the examples disclosed herein focus, in at least some respects, on color-filtered SPADs 1402 of a SPAD array arranged in a Bayer pattern, other patterns are within the scope of the present disclosure, such as by way of non-limiting example, CYGM (cyan, yellow, green magenta), RGBE (red, green, blue, emerald), Foveon X3 (e.g., a vertically arranged red, green, blue pattern), panchromatic cell patterns (e.g., RGBW (red, green, blue, white), CMYW (cyan, magenta, yellow, white), Fujifilm EXR, Fujifilm X-Trans, Quad Bayer)), and/or others.

FIG. 15 illustrates an example of capturing an image frame of colored objects in a low light environment using a color-filtered SPAD array of an HMD. In particular, FIG. 15 illustrates an HMD 1502, which corresponds, in at least some respects, to the system 100 described hereinabove. For example, the HMD may include a SPAD array (e.g., SPAD array 112) comprising SPADs (e.g., SPADs 1402) with color filters arranged in a Bayer pattern (e.g., as illustrated in FIG. 14). In the example shown in FIG. 15, the HMD 1502 is positioned according to pose 1504 while capturing an image frame 1514 of a red object 1506 (e.g., a table) a green object 1508 (e.g., a box), and a blue object 1510 (e.g., a can) in a low light environment 1512. The pose 1504 may be tracked or measured utilizing sensors (e.g., IMU(s) 114, camera(s) to facilitate SLAM, etc.) of the HMD 1502.

FIG. 15 illustrates that the image frame 1514 includes image data 1516 captured by the color filtered SPADs of the HMD 1502. The image data 1516 of the image frame 1514 may, under conventional systems, be demosaiced (e.g., based on the pattern of the color filters on the SPADs) to interpolate RGB color values for each SPAD pixel (or image pixel of the image frame 1514 captured by a SPAD pixel) to generate a color image. However, as is evident from FIG. 15, the image data 1516 of the image frame 1514 provides noisy representations of the captured objects, which may occur, in some instances, when imaging under low light conditions (e.g., about 1 millilux or below) due to the low number of photons detected by SPAD pixels over the frame capture time period for capturing the image frame 1514. As noted above, performing demosaicing on image data that includes significant noise may result in poor quality color images. The following discussion refers to various techniques that may be employed to reduce noise in SPAD imagery captured using color-filtered SPADs in preparation for performing demosaicing to obtain color imagery.

FIG. 16 illustrates an example processing pipeline for obtaining color images using a SPAD array. In particular, FIG. 16 illustrates an example pipeline in which image capture 1602 is performed to generate a raw image 1604 (e.g., similar to image frame 1514). FIG. 16 also illustrates filtering 1606 performed to generate a filtered image 1608. The filtering 1606 may comprise performing temporal filtering and/or spatial filtering. Additional details concerning filtering for generating a filtered image in preparation for performing demosaicing will be provided hereinafter. In some instances, the pipeline for obtaining color imagery includes dark current compensation 1610 to generate a dark current compensated image 1612 (e.g., as discussed hereinabove with reference to FIGS. 213).

The pipeline may also include gamma correction 1614 to generate a gamma corrected image 1616. Gamma correction 1614 may comprise raising input values (e.g., intensity values) to a power of a gamma value (and multiplying the input values by a constant) to provide output values. Gamma correction 1614 may optimize imagery for the non-linear manner in which humans perceive light and color. Gamma correction 1614 may have the effect of causing dark pixels to appear darker and causing bright pixels to appear brighter. Accordingly, gamma correction, if performed prematurely, can degrade the benefits provided by other operations such as temporal filtering. Accordingly, gamma correction 1614 may be performed after filtering 1606 but prior to demosaicing.

FIG. 16 also illustrates demosaicing 1618 performed subsequent to the other operations of the pipeline for generating a color image 1620. As noted above, demosaicing may comprise interpolating or extrapolating a color value (e.g., an RGB value) for each image pixel (or SPAD pixel) of an image frame (or a SPAD array that captures an image frame). In contrast with generating a single color value for each block of Bayer pixels (e.g., each 2×2 set of RGB pixels) to generate a color image (thereby causing an image resolution loss), demosaicing may provide RGB color imagery without loss of image resolution. However, as noted above, demosaicing may add or boost image noise, particularly when performed on image data that already includes noise. Accordingly, the pipeline of FIG. 16 illustrates performing demosaicing 1618 subsequent to filtering 1606 and other image frame preparation operations.

It should be noted that, in some embodiments, fewer than all of the operations of the pipeline illustrated in FIG. 16 are performed to generate color imagery. For example, in some instances, dark current compensation 1610 and/or gamma correction 1614 are not performed for generating a color image 1620.

The discussion that attends FIGS. 1720 is related to filtering 1606 that may be performed on raw imagery in preparation for demosaicing 1618 to generate a color image 1620. As noted above, filtering 1606 may comprise temporal filtering and/or spatial filtering. The discussion that attends FIGS. 1718C relates to temporal filtering, whereas the discussion that attends FIGS. 19 and 20 relates to spatial filtering.

FIG. 17 illustrates an example of generating a filtered image by performing temporal filtering on image frames that are consecutively captured using a color filtered SPAD array (e.g., a color filtered SPAD array of an HMD 1502). In particular, FIG. 17 shows the image frame 1514 (and its image data 1516 captured using color filtered SPADS), as well as an additional image frame 1702 (e.g., captured by the HMD 1502). The additional image frame 1702 includes image data 1704 depicting the red object 1506, the green object 1508, and the blue object 1510. Similar to image frame 1514, image frame 1702 may comprise a raw image captured by color filtered SPADs upon which no demosaicing has been performed.

FIG. 17 also indicates that the different image frames 1514 and 1702 are captured at different timepoints. In particular, FIG. 17 indicates that image frame 1702 was captured at timepoint 1706 and that image frame 1514 was captured at timepoint 1708. In the present example, timepoint 1706 temporally precedes timepoint 1708.

As indicated above, image data of consecutively captured image frames may be combined to form a composite image to facilitate adequate exposure of objects captured within the image frames (e.g., particularly under low light conditions). Accordingly, FIG. 17 illustrates temporal filtering 1710 performed on the image frames 1702 and 1514 to generate a filtered image 1714. Temporal filtering 1710 includes combining corresponding image pixels of the different image frames 1702 and 1514 to generate pixel values for an output image (i.e., filtered image 1714).

As noted above, corresponding image pixels of the different image frames 1702, 1514 may be combined or composited in various ways, such as by summing, averaging (e.g., weighted averaging), alpha blending, and/or others, and the manner/parameters of combining corresponding image pixels may differ for different pixel regions and/or may be dynamically determined based on various factors (e.g., signal strength, amount of motion, motion detected in a captured scene, etc.).

In some instances, image frames 1702 and 1514 capture environmental objects from poses that are at least slightly different from one another. For example, the HMD 1502 may capture image frame 1702 from a pose that at least slightly differs from pose 1504. Accordingly, in some instances, to align corresponding pixels of different image frames 1702 and 1514, temporal filtering 1710 may utilize motion data 1712, which may comprise or be used to generate pose data that describes the position and/or orientation (e.g., 6 degrees of freedom pose) and/or change of position (e.g., velocity and/or acceleration) and/or change of orientation (e.g., angular velocity and/or angular acceleration) of the HMD 1502 during the capturing of the image frames 1702 and 1514.

The motion data 1712 may be used to align the image frames 1702 and 1514 with one another. For example, a system may use the motion data 1712 to align image frame 1702 with pose 1504 of image frame 1514, thereby generating aligned image frames that are spatially aligned with one another (e.g., appearing as though they were all captured from pose 1504 with the same capture perspective). In this regard, the temporal filtering 1710 may comprise motion compensated temporal filtering. In some instances, temporal filtering 1710 additionally or alternatively utilizes optical flow estimations to align the image frames 1702 and 1514.

Additionally, or alternatively, temporal filtering 1710 may utilize a joint bilateral filter 1718 (JBLF 1718) using the image frame 1514 and the image frame 1702 (i.e., the preceding image frame). For instance, the JBLF 1718 may utilize the image frame 1514 as a guidance image to cause the output filtered image 1714 to be in the geometry of the image frame 1514. By way of example, a JBLF 1718 may utilize a three-dimensional kernel (e.g., a 3×3×2 kernel) to define each pixel value for the output image (e.g., filtered image 1714) based on neighboring pixel values in both image frames 1514 and 1702. Neighboring pixels within a pixel window that are near the pixel value of the center pixel (e.g., the current pixel under analysis for generating an output pixel according to the kernel) may be given additional weight in the calculation of the pixel value of the center pixel for the output image (e.g., filtered image 1714). For example, when the center pixel is one captured by a red filtered SPAD pixel, the JBLF 1718 may give greater weight to the intensity values of neighboring red filtered SPAD pixels (in both image frames 1702 and 1514) and give lesser weight to the intensity values of neighboring green or blue filtered SPAD pixels when calculating the output value for the center pixel. In this way, a JBLF 1718 may be used to composite the image frames 1702 and 1514 in a manner that accounts for the different color channels of the SPADs. Although not illustrated in FIG. 17, a JBLF 1718 may also utilize subsequent timepoint image frames to generate a filtered image 1714 (e.g., utilizing a 3×3×3 kernel).

FIG. 17 conceptually depicts that the image data 1716 of the filtered image 1714 comprises reduced noise (e.g., relative to the image frames 1702 and 1514). Thus, the filtered image 1714 may provide a superior basis for generating a color image via demosaicing (e.g., relative to the image frames 1702 and/or 1514). FIG. 17 thus illustrates demosaicing 1720 performed on the filtered image 1714 to generate a color image (as noted above, additional operations may be performed on the filtered image 1714 prior to demosaicing 1720, such as dark current compensation, gamma correction, etc.). Demosaicing 1720 may utilize the color filter pattern of the SPADs of the system (e.g., the HMD 1502) and image data 1716 (i.e., intensity values) obtained using the color filtered SPAD pixels to determine RGB data 1724 of the color image 1722. As depicted in FIG. 17, the color image 1722 depicts the various objects of the environment with their respective colors (and without resolution loss relative to the filtered image 1714 used as input for the demosaicing 1720).

In some implementations, where motion compensated temporal filtering is implemented, mixed pixels may occur. That is, when motion data (e.g., motion data 1712) is used to align image frames, some image pixels captured by SPADs associated with different color channels (e.g., a red pixel and a blue pixel) may be caused to overlap with one another. Pixel mixing can cause image data associated with different color channels to be composited to form a filtered image, which can degrade the output of the temporal filtering 1710.

Accordingly, in some implementations, separate temporal filtering operations are performed for image data captured by SPADs associated with different color filters. To illustrate, FIG. 18A shows an example of demultiplexing consecutively captured image frames to generate color-specific image frames at multiple timepoints. In particular, FIG. 18A illustrates the image frames 1702 and 1514 captured at the consecutive timepoints 1706 and 1708 discussed above. FIG. 18A further illustrates demultiplexing 1810 and 1802 performed on the image frames 1702 and 1514, respectively. Demultiplexing 1810 and 1802 may comprise separating the image frames 1702 and 1514 according to the separate channels present in the image frames 1702 and 1514. For instance, according to the present example, the SPADs 1402 of the HMD 1502 used to capture the image frames 1702 and 1514 includes SPADs include red filters 1404, green filters 1406, and blue filters 1410 (e.g., arranged in a Bayer pattern). Thus, demultiplexing 1810 and 1802 may include separating the image frames 1702 and 1514 into respective red image frames, green image frames, and blue image frames.

FIG. 18A illustrates a red image frame 1812, a green image frame 1814, and a blue image frame 1816 (e.g., “demultiplexed image frames”) resulting from performing demultiplexing 1810 on the image frame 1702. Similarly, FIG. 18A illustrates a red image frame 1804, a green image frame 1806, and a blue image frame 1808 resulting from performing demultiplexing 1802 on image frame 1514. The separate, color-specific image frames resulting from the demultiplexing 1810 and 1802 include image data representing intensity values detected through the separate color filters. For example, the red image frames 1812 and 1804 include image data depicting the red object 1506 and omitting the green object 1508 and the blue object 1510, the green image frames 1814 and 1806 include image data depicting the green object 1508 and omitting the red object 1506 and the blue object 1510, and the blue image frames 1816 and 1808 include image data depicting the blue object 1510 and omitting the red object 1506 and the green object 1508. By generating separate demultiplexed image frames for separate color channels, temporal filtering may be performed in a manner that substantially avoids pixel mixing.

FIG. 18B illustrates an example of generating filtered color-specific image frames by performing temporal filtering using color-specific image frames associated with different timepoints. In particular, FIG. 18B illustrates temporal filtering 1818 performed to composite the red image frame 1812 (e.g., derived from image frame 1702 and associated with timepoint 1706) and the red image frame 1804 (e.g., derived from image frame 1514 and associated with timepoint 1708) to generate a filtered red image frame 1820. FIG. 18B also shows temporal filtering 1822 performed to composite the green image frame 1814 (e.g., derived from image frame 1702 and associated with timepoint 1706) and the green image frame 1806 (e.g., derived from image frame 1514 and associated with timepoint 1708) to generate a filtered green image frame 1824. Similarly, FIG. 18B illustrates temporal filtering 1826 performed to composite the blue image frame 1816 (e.g., derived from image frame 1702 and associated with timepoint 1706) and the blue image frame 1808 (e.g., derived from image frame 1514 and associated with timepoint 1708) to generate a filtered blue image frame 1828.

As is evident from FIG. 18B, the filtered image frames (e.g., filtered red image frame 1820, filtered green image frame 1824, and filtered blue image frame 1828) comprise representations of the captured objects with reduced noise (e.g., relative to the individual representations of the captured objects in the red image frames 1812, 1804, green image frames 1814, 1806, or blue image frames 1816, 1808). The reduction is noise can be achieved in a manner that mitigates the risk of pixel mixing during temporal filtering (e.g., by demultiplexing the captured image frames and performing per-color channel temporal filtering). The filtered image frames may be combined to form a combined or composite filtered image, which may then be used for demosaicing to form a color image.

FIG. 18C illustrates an example of multiplexing filtered color-specific image frames to generate a filtered image. In particular, FIG. 18C illustrates multiplexing 1830 used to combine the filtered red image frame 1820, the filtered green image frame 1824, and the filtered blue image frame 1828. FIG. 18C illustrates a filtered image 1832 as the output of the multiplexing 1830. The multiplexing may comprise arranging the image data (e.g., pixel values) of the filtered red image frame 1820, the filtered green image frame 1824, and the filtered blue image frame 1828 in a single image (e.g., according to the Bayer pattern or other pattern of the SPADs that detected the image frames used to form the filtered color-specific image frames).

Because the filtered image 1832 is generated using the filtered color-specific image frames (e.g., the filtered red image frame 1820, the filtered green image frame 1824, and the filtered blue image frame 1828), the filtered image 1832 also depicts the captured objects with reduced noise (e.g., relative to relative to the individual representations of the captured objects in the red image frames 1812, 1804, green image frames 1814, 1806, or blue image frames 1816, 1808). Accordingly, the filtered image 1832 may provide an improved basis for performing demosaicing to form a color image (e.g., relative to raw images captured by the SPADs).

FIG. 18C illustrates demosaicing 1834 performed using the filtered image 1832 to form a color image 1836. As noted above, demosaicing comprises interpolating or extrapolating RGB values for the image pixels of an input image (e.g., the filtered image 1832). Accordingly, FIG. 18C illustrates RGB data 1838 of the color image, which depicts the colored objects (e.g., red object 1506, green object 1508, and blue object 1510) using RGB values to allow users to perceive the colors of the captured objects when viewing the color image 1836. Thus, implementations of the present disclosure may facilitate color image acquisition using SPADs in a manner that reduces noise, thereby allowing for color image capture in low light environments.

As noted above with reference to FIG. 16, the filtering 1606 performed to form a filtered image 1608 prior to demosaicing 1618 may additionally or alternatively include spatial filtering. However, many conventional spatial filters may cause neighboring image data captured by SPADs that have different color filters to be blended or filtered together (e.g., a simple averaging filter thus is not well-suited for such implementations). Thus, techniques of the present disclosure include spatial filtering processes that account for multiplexed pixels associated with different colors.

FIG. 19 illustrates an example of generating a filtered image by performing bilateral filtering on an image frame prior to demosaicing. In particular, FIG. 19 illustrates the image frame 1514 and the image data thereof 1516 (e.g., captured using color filtered SPADs 1402 of an HMD 1502). FIG. 19 shows bilateral filtering 1902 performed on the image frame 1514 to generate a filtered image 1904 with corresponding image data 1906. Bilateral filtering 1902 may comprise utilizing a two-dimensional kernel (e.g., a 3×3 kernel) to define each pixel value for the output image (e.g., filtered image 1904) based on neighboring pixel values in the image frame 1514. Similar to the JBLF discussed above, bilateral filtering 1902 may give additional weight to neighboring pixel values that are near the value of the pixel for which an output value is currently being determined (e.g., the center pixel of the kernel). Thus, when determining an output value for a particular image pixel captured by a red filtered SPAD, bilateral filtering 1902 may give additional weight for determining the output value to pixels within the pixel window surrounding the particular image pixel that have a similar intensity to the particular image pixel (e.g., pixels that are also captured by red filtered SPADs).

Bilateral filtering 1902 may preserve edges while reducing noise. Thus, bilateral filtering 1902 may provide various benefits while mitigating the effects of image pixels in a filtered image being influenced by image pixels associated with different colors. Thus, performing bilateral filtering 1902 prior to demosaicing may improve the quality of the filtered image 1904 for generating a color image.

Additional techniques are within the scope of the present disclosure to facilitate spatial filtering in a manner that accommodates the different color filters associated with neighboring SPAD pixels. For example, FIG. 20 illustrates an example of demultiplexing the image frame 1514 to generate color-specific image frames (i.e., red image frame 2004, green image frame 2006, and blue image frame 2008). Demultiplexing 2002 corresponds generally to demultiplexing 1810 and 1802 discussed hereinabove with reference to FIG. 18A. By demultiplexing the image frame 1514 to generate color-specific image frames associated with different color channels (e.g., “demultiplexed image frames”), spatial filtering operations may be performed separately for different color channels to prevent the spatial filtering operations from causing neighboring pixels associated with different colors to blend together.

Accordingly, FIG. 20 illustrates spatial filtering 2010 performed on red image frame 2004 to generate filtered red image frame 2012, spatial filtering 2014 performed on green image frame 2006 to generate filtered green image frame 2016, and spatial filtering 2018 performed on blue image frame 2008 to generate filtered blue image frame 2020. The spatial filtering 2010, 2014, and/or 2018 may comprise various operations for smoothing noise, preserving edges, and/or otherwise improving the input image frames (e.g., red image frame 2004, green image frame 2006, and blue image frame 2008). For example, one or more of spatial filtering 2010, 2014 and/or 2018 may comprise a mean filter, a Gaussian filter, an order statistics filter, a median filter, a Laplacian filter, a gradient filter, and/or others. Furthermore, the spatial filtering operations applied to the different color-specific input images may at least partially differ from one another.

FIG. 20 also illustrates multiplexing 2022 performed using the filtered red image frame 2012, the filtered green image frame 2016, and the filtered blue image frame 2020 to generate a filtered image 2024. Multiplexing 2022 may generally correspond to the multiplexing 1830 discussed hereinabove with reference to FIG. 18C. The filtered image 2024 may be used as a basis for performing demosaicing or other operations prior to demosaicing (e.g., temporal filtering, dark current compensation, gamma correction, etc.). Furthermore, in some implementations, the principles related to spatial filtering discussed herein are performed on images that have been temporally filtered.

Additional operations may be performed on a color image generated via demosaicing, such as reprojection to generate a parallax-corrected color image and/or presentation on a display of an HMD.

Example Method(s) for Obtaining Color Imagery Using SPADs

FIGS. 21 and 22 illustrate example flow diagrams 2100 and 2200, respectively, depicting acts associated with obtaining color imagery using SPADs.

Act 2102 of flow diagram 2100 of FIG. 21 includes capturing an image frame using a SPAD array. Act 2102 is performed, in some instances, by a system utilizing processor(s) 102, storage 104, sensor(s) 110 (e.g., SPAD array 112 comprising color filtered SPAD pixels 1402), input/output system(s) 116, communication system(s) 118, and/or other components. In some implementations, the respective color filters covering the plurality of SPAD pixels are arranged in a Bayer pattern.

Act 2104 of flow diagram 2100 includes generating a filtered image by performing a temporal filtering operation using the image frame and a preceding image frame. Act 2104 is performed, in some instances, by a system utilizing processor(s) 102, storage 104, sensor(s) 110, input/output system(s) 116, communication system(s) 118, and/or other components. The preceding image frame may be captured by the SPAD array at a timepoint that temporally precedes a timepoint associated with the image frame. In some instances, the preceding image frame includes a raw image frame on which no demosaicing has been performed. Furthermore, in some instances, performing the temporal filtering operation includes (i) generating an aligned preceding image frame by using motion data associated with the SPAD array to spatially align the preceding image frame with the image frame and (ii) compositing the image frame with the aligned preceding image frame.

In some implementations, performing the temporal filtering operation comprises applying a joint bilateral filter to the image frame and the preceding image frame. In some implementations, performing the temporal filtering operation comprises applying the joint bilateral filter to the image frame and a subsequent image frame. The subsequent image frame is captured at a timepoint that is temporally subsequent to the timepoint associated with the image frame.

In some instances, performing the temporal filtering operation includes (i) generating a plurality of demultiplexed image frames by demultiplexing the image frame where each of the plurality of demultiplexed image frames is associated with a respective color channel, (ii) accessing a plurality of demultiplexed preceding image frames, and (iii) generating a plurality of temporally filtered demultiplexed image frames by, for each particular demultiplexed image frame of the plurality of demultiplexed image frames: (a) generating a corresponding aligned demultiplexed preceding image frame by using motion data associated with the SPAD array to align the corresponding demultiplexed preceding image frame with the particular demultiplexed image frame, and (b) compositing the particular demultiplexed image frame with the corresponding aligned demultiplexed preceding image frame. The plurality of demultiplexed preceding image frames may be generated by demultiplexing the preceding image frame, and the plurality of demultiplexed preceding image frames may include, for each particular demultiplexed image frame of the plurality of demultiplexed image frames, a corresponding demultiplexed preceding image frame associated with a same color channel as the particular demultiplexed image frame.

In some implementations, generating the filtered image comprises multiplexing the plurality of temporally filtered demultiplexed image frames.

Act 2106 of flow diagram 2100 includes performing a gamma correction operation on the filtered image prior to demosaicing the filtered image. Act 2106 is performed, in some instances, by a system utilizing processor(s) 102, storage 104, sensor(s) 110, input/output system(s) 116, communication system(s) 118, and/or other components.

Act 2108 of flow diagram 2100 includes after performing the temporal filtering operation, generating a color image by demosaicing the filtered image. Act 2108 is performed, in some instances, by a system utilizing processor(s) 102, storage 104, sensor(s) 110, input/output system(s) 116, communication system(s) 118, and/or other components.

Act 2110 of flow diagram 2100 includes generating a parallax corrected color image. Act 2110 is performed, in some instances, by a system utilizing processor(s) 102, storage 104, sensor(s) 110, input/output system(s) 116, communication system(s) 118, and/or other components.

Act 2112 of flow diagram 2100 includes displaying the parallax corrected color image on the display. Act 2112 is performed, in some instances, by a system utilizing processor(s) 102, storage 104, sensor(s) 110, input/output system(s) 116, communication system(s) 118, and/or other components.

Act 2202 of flow diagram 2100 of FIG. 22 includes capturing an image frame using a SPAD array. Act 2202 is performed, in some instances, by a system utilizing processor(s) 102, storage 104, sensor(s) 110 (e.g., SPAD array 112 comprising color filtered SPAD pixels 1402), input/output system(s) 116, communication system(s) 118, and/or other components. In some instances, the respective color filters covering the plurality of SPAD pixels are arranged in a Bayer pattern.

Act 2204 of flow diagram 2100 includes generating a filtered image by performing a spatial filtering operation on the image frame. Act 2204 is performed, in some instances, by a system utilizing processor(s) 102, storage 104, sensor(s) 110, input/output system(s) 116, communication system(s) 118, and/or other components. In some instances, performing the spatial filtering operation comprises applying a bilateral filter to the image frame. In some instances, performing the spatial filtering operation includes (i) generating a plurality of demultiplexed image frames by demultiplexing the image frame where each of the plurality of demultiplexed image frames is associated with a respective color channel (ii) generating a plurality of spatially filtered demultiplexed image frames by applying a respective spatial filtering operation to each of the plurality of demultiplexed image frames. The respective spatial filtering operation may include one or more of a mean filter, a Gaussian filter, an order statistics filter, a median filter, a Laplacian filter, or a gradient filter. In some instances, generating the filtered image comprises multiplexing the plurality of spatially filtered demultiplexed image frames.

Act 2206 of flow diagram 2100 includes after performing the spatial filtering operation, generating a color image by demosaicing the filtered image. Act 2206 is performed, in some instances, by a system utilizing processor(s) 102, storage 104, sensor(s) 110, input/output system(s) 116, communication system(s) 118, and/or other components.

Disclosed embodiments may comprise or utilize a special purpose or general-purpose computer including computer hardware, as discussed in greater detail below. Disclosed embodiments also include physical and other computer-readable media for carrying or storing computer-executable instructions and/or data structures. Such computer-readable media can be any available media that can be accessed by a general-purpose or special-purpose computer system. Computer-readable media that store computer-executable instructions in the form of data are one or more “physical computer storage media” or “hardware storage device(s).” Computer-readable media that merely carry computer-executable instructions without storing the computer-executable instructions are “transmission media.” Thus, by way of example and not limitation, the current embodiments can comprise at least two distinctly different kinds of computer-readable media: computer storage media and transmission media.

Computer storage media (aka “hardware storage device”) are computer-readable hardware storage devices, such as RAM, ROM, EEPROM, CD-ROM, solid state drives (“SSD”) that are based on RAM, Flash memory, phase-change memory (“PCM”), or other types of memory, or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to store desired program code means in hardware in the form of computer-executable instructions, data, or data structures and that can be accessed by a general-purpose or special-purpose computer.

A “network” is defined as one or more data links that enable the transport of electronic data between computer systems and/or modules and/or other electronic devices. When information is transferred or provided over a network or another communications connection (either hardwired, wireless, or a combination of hardwired or wireless) to a computer, the computer properly views the connection as a transmission medium. Transmission media can include a network and/or data links which can be used to carry program code in the form of computer-executable instructions or data structures and which can be accessed by a general purpose or special purpose computer. Combinations of the above are also included within the scope of computer-readable media.

Further, upon reaching various computer system components, program code means in the form of computer-executable instructions or data structures can be transferred automatically from transmission computer-readable media to physical computer-readable storage media (or vice versa). For example, computer-executable instructions or data structures received over a network or data link can be buffered in RAM within a network interface module (e.g., a “NIC”), and then eventually transferred to computer system RAM and/or to less volatile computer-readable physical storage media at a computer system. Thus, computer-readable physical storage media can be included in computer system components that also (or even primarily) utilize transmission media.

Computer-executable instructions comprise, for example, instructions and data which cause a general-purpose computer, special purpose computer, or special purpose processing device to perform a certain function or group of functions. The computer-executable instructions may be, for example, binaries, intermediate format instructions such as assembly language, or even source code. Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the described features or acts described above. Rather, the described features and acts are disclosed as example forms of implementing the claims.

Disclosed embodiments may comprise or utilize cloud computing. A cloud model can be composed of various characteristics (e.g., on-demand self-service, broad network access, resource pooling, rapid elasticity, measured service, etc.), service models (e.g., Software as a Service (“SaaS”), Platform as a Service (“PaaS”), Infrastructure as a Service (“IaaS”), and deployment models (e.g., private cloud, community cloud, public cloud, hybrid cloud, etc.).

Those skilled in the art will appreciate that the invention may be practiced in network computing environments with many types of computer system configurations, including, personal computers, desktop computers, laptop computers, message processors, hand-held devices, multi-processor systems, microprocessor-based or programmable consumer electronics, network PCs, minicomputers, mainframe computers, mobile telephones, PDAs, pagers, routers, switches, wearable devices, and the like. The invention may also be practiced in distributed system environments where multiple computer systems (e.g., local and remote systems), which are linked through a network (either by hardwired data links, wireless data links, or by a combination of hardwired and wireless data links), perform tasks. In a distributed system environment, program modules may be located in local and/or remote memory storage devices.

Alternatively, or in addition, the functionality described herein can be performed, at least in part, by one or more hardware logic components. For example, and without limitation, illustrative types of hardware logic components that can be used include Field-programmable Gate Arrays (FPGAs), Program-specific Integrated Circuits (ASICs), Application-specific Standard Products (ASSPs), System-on-a-chip systems (SOCs), Complex Programmable Logic Devices (CPLDs), central processing units (CPUs), graphics processing units (GPUs), and/or others.

As used herein, the terms “executable module,” “executable component,” “component,” “module,” or “engine” can refer to hardware processing units or to software objects, routines, or methods that may be executed on one or more computer systems. The different components, modules, engines, and services described herein may be implemented as objects or processors that execute on one or more computer systems (e.g., as separate threads).

One will also appreciate how any feature or operation disclosed herein may be combined with any one or combination of the other features and operations disclosed herein. Additionally, the content or feature in any one of the figures may be combined or used in connection with any content or feature used in any of the other figures. In this regard, the content disclosed in any one figure is not mutually exclusive and instead may be combinable with the content from any of the other figures.

The present invention may be embodied in other specific forms without departing from its spirit or characteristics. The described embodiments are to be considered in all respects only as illustrative and not restrictive. The scope of the invention is, therefore, indicated by the appended claims rather than by the foregoing description. All changes which come within the meaning and range of equivalency of the claims are to be embraced within their scope.

文章《Microsoft Patent | Systems and methods for dark current compensation in single photon avalanche diode imagery》首发于Nweon Patent

]]>
Microsoft Patent | Systems and methods for obtaining color imagery using single photon avalanche diodes https://patent.nweon.com/25140 Wed, 30 Nov 2022 17:01:45 +0000 https://patent.nweon.com/?p=25140 ...

文章《Microsoft Patent | Systems and methods for obtaining color imagery using single photon avalanche diodes》首发于Nweon Patent

]]>
Patent: Systems and methods for obtaining color imagery using single photon avalanche diodes

Patent PDF: 加入映维网会员获取

Publication Number: 20220385843

Publication Date: 20221201

Assignee: Microsoft Technology Licensing, Llc (Redmond, Wa, Us)

Abstract

A system for obtaining color imagery using SPADs includes a SPAD array that has a plurality of SPAD pixels. Each of the plurality of SPAD pixels includes a respective color filter positioned thereover. The system is configurable to capture an image frame using the SPAD array and generate a filtered image by performing a temporal filtering operation using the image frame and at least one preceding image frame. The at least one preceding image frame is captured by the SPAD array at a timepoint that temporally precedes a timepoint associated with the image frame. The system is also configurable to, after performing the temporal filtering operation, generate a color image by demosaicing the filtered image.

Claims

We claim:

Description

BACKGROUND

Mixed-reality (MR) systems, including virtual-reality and augmented-reality systems, have received significant attention because of their ability to create truly unique experiences for their users. For reference, conventional virtual-reality (VR) systems create a completely immersive experience by restricting their users’ views to only a virtual environment. This is often achieved, in VR systems, through the use of a head-mounted device (HMD) that completely blocks any view of the real world. As a result, a user is entirely immersed within the virtual environment. In contrast, conventional augmented-reality (AR) systems create an augmented-reality experience by visually presenting virtual objects that are placed in or that interact with the real world.

As used herein, VR and AR systems are described and referenced interchangeably. Unless stated otherwise, the descriptions herein apply equally to all types of mixed-reality systems, which (as detailed above) includes AR systems, VR reality systems, and/or any other similar system capable of displaying virtual objects.

Some MR systems include one or more cameras for facilitating image capture, video capture, and/or other functions. For instance, cameras of an MR system may utilize images and/or depth information obtained using the camera(s) to provide pass-through views of a user’s environment to the user. An MR system may provide pass-through views in various ways. For example, an MR system may present raw images captured by the camera(s) of the MR system to a user. In other instances, an MR system may modify and/or reproject captured image data to correspond to the perspective of a user’s eye to generate pass-through views. An MR system may modify and/or reproject captured image data to generate a pass-through view using depth information for the captured environment obtained by the MR system (e.g., using a depth system of the MR system, such as a time-of-flight camera, a rangefinder, stereoscopic depth cameras, etc.). In some instances, an MR system utilizes one or more predefined depth values to generate pass-through views (e.g., by performing planar reprojection).

In some instances, pass-through views generated by modifying and/or reprojecting captured image data may at least partially correct for differences in perspective brought about by the physical separation between a user’s eyes and the camera(s) of the MR system (known as the “parallax problem,” “parallax error,” or, simply “parallax”). Such pass-through views/images may be referred to as “parallax-corrected pass-through” views/images. By way of illustration, parallax-corrected pass-through images may appear to a user as though they were captured by cameras that are co-located with the user’s eyes.

A pass-through view can aid users in avoiding disorientation and/or safety hazards when transitioning into and/or navigating within a mixed-reality environment. Pass-through views may also enhance user views in low visibility environments. For example, mixed-reality systems configured with long wavelength thermal imaging cameras may facilitate visibility in smoke, haze, fog, and/or dust. Likewise, mixed-reality systems configured with low light imaging cameras facilitate visibility in dark environments where the ambient light level is below the level required for human vision.

To facilitate imaging of an environment for generating a pass-through view, some MR systems include image sensors that utilize complementary metal-oxide-semiconductor (CMOS) and/or charge-coupled device (CCD) technology. For example, such technologies may include image sensing pixel arrays where each pixel is configured to generate electron-hole pairs in response to detected photons. The electrons may become stored in per-pixel capacitors, and the charge stored in the capacitors may be read out to provide image data (e.g., by converting the stored charge to a voltage).

However, such image sensors suffer from a number of shortcomings. For example, the signal to noise ratio for a conventional image sensor may be highly affected by read noise, especially when imaging under low visibility conditions. For instance, under low light imaging conditions (e.g., where ambient light is below about 10 lux, such as within a range of about 1 millilux or below), a CMOS or CCD imaging pixel may detect only a small number of photons, which may cause the read noise to approach or exceed the signal detected by the imaging pixel and decrease the signal-to-noise ratio.

The dominance of read noise in a signal detected by a CMOS or CCD image sensor is often exacerbated when imaging at a high frame rate under low light conditions. Although a lower framerate may be used to allow a CMOS or CCD sensor to detect enough photons to allow the signal to avoid being dominated by read noise, utilizing a low framerate often leads to motion blur in captured images. Motion blur is especially problematic when imaging is performed on an HMD or other device that undergoes regular motion during use.

In addition to affecting pass-through imaging, the read noise and/or motion blur associated with conventional image sensors may also affect other operations performed by HMDs, such as late stage reprojection, rolling shutter corrections, object tracking (e.g., hand tracking), surface reconstruction, semantic labeling, 3D reconstruction of objects, and/or others.

To address shortcomings associated with CMOS and/or CCD image sensors, devices have emerged that utilize single photon avalanche diode (SPAD) image sensors. In contrast with conventional CMOS or CCD sensors, a SPAD is operated at a bias voltage that enables the SPAD to detect a single photon. Upon detecting a single photon, an electron-hole pair is formed, and the electron is accelerated across a high electric field, causing avalanche multiplication (e.g., generating additional electron-hole pairs). Thus, each detected photon may trigger an avalanche event. A SPAD may operate in a gated manner (each gate corresponding to a separate shutter operation), where each gated shutter operation may be configured to result in a binary output. The binary output may comprise a “1” where an avalanche event was detected during an exposure (e.g., where a photon was detected), or a “0” where no avalanche event was detected. Separate shutter operations may be performed consecutively and integrated over a frame capture time period. The binary output of the consecutive shutter operations over a frame capture time period may be counted, and an intensity value may be calculated based on the counted binary output.

An array of SPADs may form an image sensor, with each SPAD forming a separate pixel in the SPAD array. To capture an image of an environment, each SPAD pixel may detect avalanche events and provide binary output for consecutive shutter operations in the manner described herein. The per-pixel binary output of consecutive shutter operations over a frame capture time period may be counted, and per-pixel intensity values may be calculated based on the counted per-pixel binary output. The per-pixel intensity values may be used to form an intensity image of an environment.

Although SPAD sensors show promise for overcoming various shortcomings associated with CMOS or CCD sensors, implementing SPAD sensors for image and/or video capture is still associated with many challenges. For example, there is an ongoing need and desire for improvements to the image quality of SPAD imagery, particularly for SPAD imagery captured under low light conditions (including color imagery capturing using SPADs under low light conditions).

The subject matter claimed herein is not limited to embodiments that solve any disadvantages or that operate only in environments such as those described above. Rather, this background is only provided to illustrate one exemplary technology area where some embodiments described herein may be practiced.

BRIEF SUMMARY

Disclosed embodiments include systems, methods, and devices for obtaining color imagery using single photon avalanche diodes.

Some disclosed systems include a SPAD array that has a plurality of SPAD pixels, wherein each of the plurality of SPAD pixels includes a respective color filter positioned thereover. The systems are configured with one or more processors and one or more hardware storage devices storing instructions that are executable by the one or more processors to configure the system to perform various acts associated with obtaining color imagery using single photon avalanche diodes.

The disclosed acts for obtaining color imagery using single photon avalanche diodes include capturing an image frame using the SPAD array and generating a filtered image by performing a temporal filtering operation using the image frame and a preceding image frame captured by the SPAD array at a timepoint that temporally precedes a timepoint associated with the image frame.

The disclosed acts for obtaining color imagery using single photon avalanche diodes include, after performing the temporal filtering operation, generating a color image by demosaicing the filtered image.

This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.

Additional features and advantages will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the teachings herein. Features and advantages of the invention may be realized and obtained by means of the instruments and combinations particularly pointed out in the appended claims. Features of the present invention will become more fully apparent from the following description and appended claims or may be learned by the practice of the invention as set forth hereinafter.

BRIEF DESCRIPTION OF THE DRAWINGS

In order to describe the manner in which the above-recited and other advantages and features can be obtained, a more particular description of the subject matter briefly described above will be rendered by reference to specific embodiments which are illustrated in the appended drawings. Understanding that these drawings depict only typical embodiments and are not therefore to be considered to be limiting in scope, embodiments will be described and explained with additional specificity and detail through the use of the accompanying drawings in which:

FIG. 1 illustrates example components of an example system that may include or be used to implement one or more disclosed embodiments;

FIG. 2 illustrates an example of capturing an image frame of an object in a low light environment using a single photon avalanche diode (SPAD) array of a head-mounted display (HMD);

FIG. 3 illustrates an example of generating a temporally filtered image using consecutively captured image frames captured by a SPAD array of an HMD;

FIG. 4 illustrates an example of capturing a dark current image frame using a SPAD sensor;

FIG. 5 illustrates an example of generating a dark current compensated image using a dark current image frame;

FIG. 6 illustrates an example of capturing multiple dark current image frames using a SPAD sensor under different temperature conditions;

FIG. 7 illustrates an example of generating a dark current compensated image using a dark current image selected based on temperature conditions at runtime;

FIG. 8 illustrates an example dark current image frame that includes a region of unexposed SPAD pixels that will be unexposed while capturing image frames at runtime;

FIG. 9 illustrates an example of generating a dark current compensated image using a dark current image frame that is scaled using a dark current factor, where the dark current factor is determined based on runtime sensor data of unexposed SPAD pixels;

FIG. 10 illustrates example additional operations that may be performed on a dark current compensated image;

FIG. 11 illustrates an example of generating a dark current compensated image by subtracting a dark current value from pixels of an image frame captured at runtime, where the dark current value is determined based on runtime sensor data of unexposed SPAD pixels;

FIGS. 12 and 13 illustrate example flow diagrams depicting acts associated with compensating for dark current in SPAD imagery;

FIG. 14 illustrates example SPAD pixels of a SPAD array that include color filters;

FIG. 15 illustrates an example of capturing an image frame of colored objects in a low light environment using a color filtered SPAD array of an HMD;

FIG. 16 illustrates an example processing pipeline for obtaining color images using a SPAD array;

FIG. 17 illustrates an example of generating a filtered image by performing temporal filtering on an image frame prior to demosaicing;

FIG. 18A illustrates an example of demultiplexing consecutively captured image frames to generate color-specific image frames at multiple timepoints;

FIG. 18B illustrates an example of generating filtered color-specific image frames by performing temporal filtering using color-specific image frames associated with different timepoints;

FIG. 18C illustrates an example of multiplexing filtered color-specific image frames to generate a filtered image;

FIG. 19 illustrates an example of generating a filtered image by performing bilateral filtering on an image frame prior to demosaicing;

FIG. 20 illustrates an example of demultiplexing an image frame to generate color-specific image frames, generating filtered color-specific image frames by performing spatial filtering on the color-specific image frames, and multiplexing the filtered color-specific image frames to generate a filtered image; and

FIGS. 21 and 22 illustrate example flow diagrams depicting acts associated with obtaining color imagery using SPADs.

DETAILED DESCRIPTION

Disclosed embodiments are generally directed to systems, methods, and devices for obtaining color imagery using single photon avalanche diodes (SPADs).

Examples of Technical Benefits, Improvements, and Practical Applications

Those skilled in the art will recognize, in view of the present disclosure, that at least some of the disclosed embodiments may be implemented to address various shortcomings associated with at least some conventional image acquisition techniques. The following section outlines some example improvements and/or practical applications provided by the disclosed embodiments. It will be appreciated, however, that the following are examples only and that the embodiments described herein are in no way limited to the example improvements discussed herein.

The techniques described herein may facilitate a number of advantages over conventional systems, devices, and/or methods for SPAD image acquisition (including color image acquisition), particularly for imaging under low light conditions and/or for imaging from devices that undergo motion during image capture (e.g., HMDs).

Initially, the binarization of the SPAD signal effectively eliminates read noise, thereby improving signal-to-noise ratio for SPAD image sensor arrays as compared with conventional CMOS and/or CCD sensors. Accordingly, because of the binarization of SPAD signal, a SPAD signal may be read out at a high framerate (e.g., 90 Hz or greater, such as 120 Hz or even 240 Hz) without causing the signal to be dominated by read noise, even for signals capturing a low number of photons under low light environments.

In view of the foregoing, multiple exposure (and readout) operations may be performed at a high framerate using a SPAD array to generate separate partial image frames, and these image frames may be temporally filtered with one another. The separate partial image frames may be aligned using motion data and combined (e.g., by averaging or other filtering) to form a single composite image. In this regard, SPAD images may be obtained in a temporally filtered manner (e.g., with persistence), using prior-timepoint image data to improve the quality of current-timepoint image data. In contrast, attempting to utilize multiple image frames captured a high framerate to form a single composite image using a conventional CMOS or CCD camera would result in signals dominated by read noise, particularly under low light imaging conditions.

An additional challenge associated with image acquisition using SPADs is signal noise brought about by dark current. Dark current (sometimes referred to as reverse bias leakage current) refers to a small electric current that flows through photosensitive devices (e.g., SPADs) even when no photons are entering the device. Dark current can be thermally induced or brought about by crystallographic and/or manufacturing irregularities and/or defects.

In SPADs, dark current can cause an electron-hole pair to be generated in the depletion region and can trigger avalanche events, even when the SPAD is not detecting a photon. Avalanche events brought about by dark current are typically counted as detected photons, which can cause the binary output of a SPAD to include false counts (or “dark counts”). In SPAD imagery, dark counts can cause the intensity values assigned to at least some SPAD pixels to by inaccurately high, which can add noise to SPAD imagery. In some instances, the effects of dark counts are prominent when imaging under low light conditions, resulting in high-frequency noise that degrades user experiences.

Accordingly, disclosed techniques may facilitate dark current compensation by subtracting a dark current SPAD image from a SPAD image captured at runtime. A dark current image may be obtained as part of a calibration step by capturing one or more SPAD images while blocking or occluding the SPAD sensor. The dark current image may indicate which SPAD pixels generate dark counts and/or the quantity of dark counts generated by different SPAD pixels. The dark current image may therefore be used to subtract dark counts from SPAD imagery captured at runtime to compensate for the effects of dark current in the SPAD sensor. Such subtraction may reduce the amount of noise present in SPAD imagery, thereby improving user experiences.

Where temporal filtering is included in a processing pipeline for generating SPAD imagery, it has been found to be advantageous to perform temporal filtering prior to performing dark current compensation. Stated differently, it has been found to be advantageous to perform dark current compensation on temporally filtered images, rather than to perform temporal filtering on dark current-compensated images. For example, because intensity values stored for images truncate at zero (e.g., negative intensity values are not stored), performing dark current subtraction before performing temporal filtering can generate a bias toward larger intensity values. Such biasing toward higher intensity values may occur, for example, where a dark current image stores a higher intensity value for a SPAD pixel than an intensity value detected by the SPAD pixel at runtime. In such cases, a subtraction operation will bring the intensity value for the SPAD pixel to zero, but the further difference between the higher dark current intensity value and the lower runtime intensity value will be lost. Accordingly, if temporal filtering is subsequently performed, the intensity value for the SPAD pixel may be higher than desired for effectively removing the dark counts.

Thus, in some instances, temporal filtering and dark current compensation may be performed in a particular order (with temporal filtering occurring first) to facilitate improved image acquisition using SPADs (e.g., with reduced noise), particularly for imaging under low light conditions. That said, in some instances, dark current compensation is performed prior to temporal filtering.

Additional challenges are associated with acquiring color images using SPADs, particularly under low light imaging conditions. For example, to capture color images, color filters are positioned over SPAD pixels (red, green, blue (RGB) color filters, or other types of filters) in a pattern (e.g., a Bayer pattern or other type of pattern) to collect light of different wavelengths to generate color values for image pixels of a color image. To generate color values, conventional systems obtain raw image data (e.g., per-pixel counts of avalanche events, or per-pixel intensity values based on the counts of avalanche events) and then perform a demosaicing operation on the raw image data. Demosaicing involves generating (e.g., via interpolation) per-pixel color values (e.g., RGB values) for each image sensing pixel of an image sensor (even though each image sensing pixel typically includes a filter for only a single color channel positioned thereover). Demosaicing may allow a color image to match the resolution of the image sensor, which is preferable, in some instances, relative to generating single color values (e.g., RGB values) for each cluster of color filtered image sensing pixels (e.g., each 2×2 set of a red pixel, a green pixel, a green pixel, and a blue pixel in a Bayer pattern). The latter approach results in downsampling or downscaling of the color image relative to the image sensor.

Under low light imaging conditions, raw image data captured by color filtered SPAD sensors often include significant noise. Furthermore, demosaicing operations are associated with adding noise to processed images, which further compounds the noise problem when using SPADS to perform color imaging under low light conditions.

Although temporal and spatial filtering operations may ordinarily reduce noise in SPAD imagery, such techniques are, in many instances, not well-suited for reducing noise in demosaiced SPAD imagery. For instance, noise added to the image data via demosaicing often reduces the effectiveness of such filtering operations for improving color imagery captured using SPADs.

Accordingly, disclosed techniques may facilitate reduced noise in color images acquired using SPADS by performing filtering operations (e.g., temporal filtering and/or spatial filtering) on raw image data captured using SPADS with color filters positioned thereover. In accordance with the present disclosure, raw image data is filtered to generate a filtered image, and demosaicing is subsequently performed on the filtered image (rather than on the raw image data). By performing demosaicing after one or more filtering operations, embodiments of the present disclosure refrain from further adding noise to the raw image data (e.g., as a result of demosaicing) prior to performing the filtering operations, thereby improving the benefits facilitated by the filtering operations and improving color image acquisition using SPADS.

Disclosed embodiments also extend to performing filtering operations (e.g., temporal and/or spatial filtering) in a manner that accounts for different color filters associated with different SPAD pixels of a SPAD sensor. For example, as noted above, temporal filtering can include aligning consecutively acquired image frames and filtering aligned image pixels together to generate a final image. However, for demosaiced images, different image pixels that become filtered together may be associated with different color channels (e.g., where large amounts of motion are associated with the consecutively acquired images), which can distort colors and/or intensities within the output image. Furthermore, spatial filtering can cause neighboring pixels of different color channels to become filtered together, which may distort colors and/or intensities within the output image.

To combat these issues, raw image data captured using color filtered SPADs may be demultiplexed to separate the image data into separate images associated with the different color channels represented by the color filters of the SPADs. Separate filtering operations (e.g., temporal and/or spatial filtering operations) may then be performed on the separate images associated with the different color channels, and, after filtering, the separate images may be recombined (e.g., multiplexed) into a single image and subsequently demosaiced to provide a final color image.

Having just described some of the various high-level features and benefits of the disclosed embodiments, attention will now be directed to FIGS. 1 through 22. These Figures illustrate various conceptual representations, architectures, methods, and supporting illustrations related to the disclosed embodiments.

Example Systems and Techniques for Dark Current Compensation in SPAD Imagery

Attention is now directed to FIG. 1, which illustrates an example system 100 that may include or be used to implement one or more disclosed embodiments. FIG. 1 depicts the system 100 as a head-mounted display (HMD) configured for placement over a head of a user to display virtual content for viewing by the user’s eyes. Such an HMD may comprise an augmented reality (AR) system, a virtual reality (VR) system, and/or any other type of HMD. Although the present disclosure focuses, in at least some respects, on a system 100 implemented as an HMD, it should be noted that the techniques described herein may be implemented using other types of systems/devices, without limitation.

FIG. 1 illustrates various example components of the system 100. For example, FIG. 1 illustrates an implementation in which the system includes processor(s) 102, storage 104, sensor(s) 110, I/O system(s) 116, and communication system(s) 118. Although FIG. 1 illustrates a system 100 as including particular components, one will appreciate, in view of the present disclosure, that a system 100 may comprise any number of additional or alternative components.

The processor(s) 102 may comprise one or more sets of electronic circuitries that include any number of logic units, registers, and/or control units to facilitate the execution of computer-readable instructions (e.g., instructions that form a computer program). Such computer-readable instructions may be stored within storage 104. The storage 104 may comprise physical system memory and may be volatile, non-volatile, or some combination thereof. Furthermore, storage 104 may comprise local storage, remote storage (e.g., accessible via communication system(s) 116 or otherwise), or some combination thereof. Additional details related to processors (e.g., processor(s) 102) and computer storage media (e.g., storage 104) will be provided hereinafter.

In some implementations, the processor(s) 102 may comprise or be configurable to execute any combination of software and/or hardware components that are operable to facilitate processing using machine learning models or other artificial intelligence-based structures/architectures. For example, processor(s) 102 may comprise and/or utilize hardware components or computer-executable instructions operable to carry out function blocks and/or processing layers configured in the form of, by way of non-limiting example, single-layer neural networks, feed forward neural networks, radial basis function networks, deep feed-forward networks, recurrent neural networks, long-short term memory (LSTM) networks, gated recurrent units, autoencoder neural networks, variational autoencoders, denoising autoencoders, sparse autoencoders, Markov chains, Hopfield neural networks, Boltzmann machine networks, restricted Boltzmann machine networks, deep belief networks, deep convolutional networks (or convolutional neural networks), deconvolutional neural networks, deep convolutional inverse graphics networks, generative adversarial networks, liquid state machines, extreme learning machines, echo state networks, deep residual networks, Kohonen networks, support vector machines, neural Turing machines, and/or others.

As will be described in more detail, the processor(s) 102 may be configured to execute instructions 106 stored within storage 104 to perform certain actions associated with imaging using SPAD arrays. The actions may rely at least in part on data 108 (e.g., avalanche event counting or tracking, etc.) stored on storage 104 in a volatile or non-volatile manner.

In some instances, the actions may rely at least in part on communication system(s) 118 for receiving data from remote system(s) 120, which may include, for example, separate systems or computing devices, sensors, and/or others. The communications system(s) 120 may comprise any combination of software or hardware components that are operable to facilitate communication between on-system components/devices and/or with off-system components/devices. For example, the communications system(s) 120 may comprise ports, buses, or other physical connection apparatuses for communicating with other devices/components. Additionally, or alternatively, the communications system(s) 120 may comprise systems/components operable to communicate wirelessly with external systems and/or devices through any suitable communication channel(s), such as, by way of non-limiting example, Bluetooth, ultra-wideband, WLAN, infrared communication, and/or others.

FIG. 1 illustrates that a system 100 may comprise or be in communication with sensor(s) 110. Sensor(s) 110 may comprise any device for capturing or measuring data representative of perceivable phenomenon. By way of non-limiting example, the sensor(s) 110 may comprise one or more image sensors, microphones, thermometers, barometers, magnetometers, accelerometers, gyroscopes, and/or others.

FIG. 1 also illustrates that the sensor(s) 110 include SPAD array(s) 112. As depicted in FIG. 1, a SPAD array 112 comprises an arrangement of SPAD pixels 122 that are each configured to facilitate avalanche events in response to sensing a photon, as described hereinabove. SPAD array(s) 112 may be implemented on a system 100 (e.g., an MR HMD) to facilitate image capture for various purposes (e.g., to facilitate computer vision tasks, pass-through imagery, and/or others).

FIG. 1 also illustrates that the sensor(s) 110 include inertial measurement unit(s) 114 (IMU(s) 114). IMU(s) 114 may comprise any number of accelerometers, gyroscopes, and/or magnetometers to capture motion data associated with the system 100 as the system moves within physical space. The motion data may comprise or be used to generate pose data, which may describe the position and/or orientation (e.g., 6 degrees of freedom pose) and/or change of position (e.g., velocity and/or acceleration) and/or change of orientation (e.g., angular velocity and/or angular acceleration) of the system 100.

Furthermore, FIG. 1 illustrates that a system 100 may comprise or be in communication with I/O system(s) 116. I/O system(s) 116 may include any type of input or output device such as, by way of non-limiting example, a touch screen, a mouse, a keyboard, a controller, and/or others, without limitation. For example, the I/O system(s) 116 may include a display system that may comprise any number of display panels, optics, laser scanning display assemblies, and/or other components.

Attention is now directed to FIG. 2, which illustrates an example of capturing an image frame 210 of an object 206 (e.g., a table) in a low light environment 208 using a single photon avalanche diode (SPAD) array of a head-mounted display 202 (HMD 202). The HMD 202 corresponds, in at least some respects, to the system 100 disclosed hereinabove. For example, the HMD 202 includes a SPAD array (e.g., SPAD array(s) 112) that includes SPAD pixels configured for photon detection to capture images. In the example shown in FIG. 2, the HMD 202 is positioned according to pose 204 while capturing the image frame 210 of the object 206 in the low light environment 208. The pose 204 may be tracked or measured utilizing sensors (e.g., IMU(s) 114, camera(s) to facilitate simultaneous localization and mapping (SLAM), etc.) of the HMD 202.

FIG. 2 illustrates that the image frame 210 includes image data 212 depicting a noisy representation of the object 206. In some instances, this occurs when imaging under low light conditions (e.g., about 1 millilux or below) due to the low number of photons detected by SPAD pixels over the frame capture time period for capturing the image frame 210. FIG. 2 also illustrates the image frame 210 as including dark counts 214, which are depicted as high-frequency noise interspersed throughout the image frame 210. As discussed above, dark counts 214 may result from dark current occurring in SPAD pixels. The following discussion refers to various techniques that may be employed to provide an improved representation of the object 206 in SPAD imagery (e.g., by reducing the noise in the image data 212 depicting the object 206 and by compensating for the dark counts 214).

FIG. 3 illustrates an example of generating a temporally filtered image using consecutively captured image frames captured by a SPAD array of an HMD. In particular, FIG. 3 shows the image frame 210 (and its image data 212 and dark counts 214), as well as additional image frames 302 and 306 (e.g., captured by the HMD 202). Each of the additional image frames 302 and 306 include image data depicting the object 206 (i.e., image data 304 and 308, respectively) and include dark counts. FIG. 3 also indicates that the different image frames 210, 302, and 306 are captured at different timepoints. In particular, FIG. 3 indicates that image frame 302 was captured at timepoint 310, image frame 306 was captured at timepoint 312, and image frame 210 was captured at timepoint 314. In the present example, timepoint 310 temporally precedes timepoints 312 and 314, and timepoint 312 temporally precedes timepoint 314.

As indicated above, image data of consecutively captured image frames may be combined to form a composite image to facilitate adequate exposure of objects captured within the image frames (e.g., particularly under low light conditions). Accordingly, FIG. 3 illustrates temporal filtering 316 performed on the image frames 302, 306, and 210. Temporal filtering 316 includes combining corresponding image pixels of the different image frames 302, 306, and 210 to generate pixel values for an output image (i.e., temporally filtered image 318). “Corresponding image pixels” in different image frames are image pixels of different image frames that capture the same portion of a captured environment.

Corresponding image pixels of the different image frames 302, 306, and 210 may be combined or composited in various ways, such as by summing, averaging (e.g., weighted averaging), alpha blending, and/or others, and the manner/parameters of combining corresponding image pixels may differ for different pixel regions and/or may be dynamically determined based on various factors (e.g., signal strength, amount of motion, motion detected in a captured scene, etc.).

In some instances, image frames 302, 306, and 210 capture the object 206 from poses that are at least slightly different from one another. For example, the HMD 202 may capture image frames 302 and 306 from poses that at least slightly differ from pose 204 and/or from one another. Accordingly, in some instances, to align corresponding pixels of different image frames 302, 306, 210, temporal filtering 316 may utilize motion data 324, which may comprise or be used to generate pose data that describes the position and/or orientation (e.g., 6 degrees of freedom pose) and/or change of position (e.g., velocity and/or acceleration) and/or change of orientation (e.g., angular velocity and/or angular acceleration) of the HMD 202 during the capturing of the image frames 302, 306, and 210.

The motion data 324 may be used to align the image frames 302, 306, and 210 with one another. For example, a system may use the motion data 324 to align image frames 302 and 306 with pose 204 of image frame 210, thereby generating aligned image frames that are spatially aligned with one another (e.g., appearing as though they were all captured from pose 204 with the same capture perspective). In this regard, the temporal filtering 316 may comprise motion compensated temporal filtering.

In some instances, temporal filtering 316 additionally or alternatively utilizes optical flow estimations to align the image frames 302, 306, and 210 to facilitate image compositing to generate a composite image (i.e., temporally filtered image 318). For example, in some instances, a system downsamples the consecutively captured image frames and performs optical flow analysis to obtain vectors for aligning the pixels of the image frames. Furthermore, although the present disclosure focuses, in at least some respects, on temporal filtering operations that utilize image frames that temporally precede an image frame associated with a target timepoint to generate a composite image associated with the target timepoint, temporal filtering operations may additionally or alternatively utilize at least some image frames that are temporally subsequent to an image frame associated with a target timepoint to generate a composite image associated with the target timepoint.

FIG. 3 illustrates that the temporal filtering 316 generates a temporally filtered image 318 based on the composited image data 304, 308, and 212 of the image frames 302, 306, and 210 (e.g., after motion compensation). In the example depicted in FIG. 3, the temporally filtered image 318 includes image data 320 that represents the object 206 with reduced noise and improved signal (e.g., relative to the individual representations of the object 206 in the image data 304, 308, 212 of the image frames 302, 306, and 210, respectively). However, FIG. 3 illustrates that the temporally filtered image 318 still includes dark counts 322, which negatively affect image quality.

Accordingly, embodiments of the present disclosure provide dark count compensation techniques for facilitating improved SPAD imagery. FIG. 4 illustrates an example of capturing a dark current image frame 406 using a SPAD sensor 402. In the present example, the SPAD sensor 402 is part of the HMD 202 and comprises a SPAD array with a plurality of SPAD pixels. FIG. 4 illustrates a cover 404 occluding or obscuring the SPAD pixels of the SPAD sensor 402. The cover 404 may comprise any material or device that blocks light in any desired wavelength range (e.g., the visible spectrum, the near-IR spectrum, the IR spectrum, and/or others).

FIG. 4 illustrates an example in which the dark current image frame 406 is captured with the cover 404 positioned to prevent photons from reaching the SPAD pixels of the SPAD array of the SPAD sensor 402. The dark current image frame 406 may be obtained as a part of a calibration step performed in preparation for use of the HMD 202 in user applications (e.g., prior to the capturing of the image frames 302, 306, and/or 210). The dark current image frame 406 may comprise a single image frame captured by the SPAD sensor 402 while obscured by the cover 404, or the dark current image frame 406 may be generated based on any number of image frames captured by the SPAD sensor 402 while obscured by the cover 404. For example, the dark current image frame 406 may be generated by temporally averaging per-pixel intensity values of any number of image frames captured by the SPAD sensor 402 while blocked by the cover 404.

As is evident from FIG. 4, the dark current image frame 406 includes dark counts and therefore includes data indicating which SPAD pixels of the SPAD sensor 402 are associated with detecting avalanche events without being exposed to photons. This information may be used to compensate for dark current in image frames captured at runtime.

FIG. 5 illustrates an example of generating a dark current compensated image using a dark current image frame. In particular, FIG. 5 shows the temporally filtered image 318 (discussed above with reference to FIG. 3) and the dark current image frame 406 being provided as inputs to subtraction 502. Subtraction 502 may comprise subtracting intensity values of the dark current image frame 406 from intensity values of the temporally filtered image 318 on a per-pixel basis. FIG. 5 illustrates a dark current compensated image 504 provided as output of the subtraction 502. As is evident from FIG. 5, the dark current compensated image 504 substantially omits the dark counts that were present in the temporally filtered image 318, in view of the subtraction 502 based on the dark current image frame 406. Accordingly, the effects of dark current in SPAD imagery, particularly SPAD imagery captured under low light conditions, may be ameliorated.

FIGS. 25 have focused on a simple example of dark current compensation that utilizes a dark current image frame captured under controlled conditions (e.g., with a stop filter obfuscating the SPAD pixels of the SPAD sensor). However, in some instances, ambient conditions present while capturing a dark current image frame differ from ambient conditions present while capturing SPAD imagery at runtime. Because the severity of image noise brought about by dark current can vary with ambient conditions, such as temperature, discrepancies between dark current image frame capture conditions and runtime image frame capture conditions can cause systems to undercompensate or overcompensate for dark counts in SPAD imagery.

Accordingly, at least some implementations of the present disclosure account for differences between dark current image frame capture conditions and runtime image frame capture conditions. FIG. 6 illustrates a SPAD sensor 602, which corresponds to the SPAD sensor 402 of FIG. 4. The SPAD sensor 602 is similarly obscured by a cover 604 that prevents photons from reaching the SPAD pixels of the SPAD sensor 602. FIG. 6 illustrates a plurality of different dark current image frames 606, 608, and 610 captured using the SPAD sensor 602 while blocked by the cover 604 (e.g., during calibration). FIG. 6 illustrates that each of the different dark current image frames 606, 608, 610 are associated with different temperature values (or ranges of temperature values). For example, different dark current image frames may be captured under different temperature conditions, such that different dark current images are available to facilitate dark current compensation under different runtime temperature conditions. In the example shown in FIG. 6, dark current image frame 606 is associated with temperature 612, dark current image frame 608 is associated with temperature 614, and dark current image frame 610 is associated with temperature 616.

FIG. 7 illustrates an example of generating a dark current compensated image using a dark current image selected based on temperature conditions at runtime. In particular, FIG. 7 illustrates a temporally filtered image 702, which, according to the present example, captures the object 206 in the low light environment 208 discussed above and is captured by the HMD 202 in accordance with the principles discussed hereinabove with reference to FIGS. 23 (e.g., utilizing temporal filtering based on consecutively captured image frames). Temporally filtered image 702 is acquired at runtime (e.g., after the capturing of the dark current image frames of FIG. 6).

In the example of FIG. 7, the temporally filtered image 702 is associated with a temperature 704, which may correspond to an environment temperature and/or device temperature present for the capturing of one or more of the image frames used to generate the temporally filtered image 702. In some instances, the temperature 704 is captured using sensors (e.g., sensor(s) 110) of the HMD 202.

FIG. 7 also illustrates the dark current image frames 606, 608, and 610 discussed above with reference to FIG. 6, along with their respective temperatures 612, 614, and 616. FIG. 7 conceptually depicts that a system may select a dark current image frame to use for dark current compensation based on the temperature 704 associated with the temporally filtered image 702 obtained at runtime. For example, FIG. 7 illustrates a dashed line extending between the temperature 704 of the temporally filtered image 702 and the temperature 616 of the dark current image frame 610, indicating that a system may determine that the temperature 704 of the temporally filtered image 702 is most similar to the temperature 616 of dark current image frame 610 (relative to the temperatures 612 and 614 of the other available dark current image frames 606 and 608, respectively).

Based on this selection, FIG. 7 illustrates the temporally filtered image 702 and the dark current image frame 610 being provided as inputs to subtraction 706 (which corresponds in function to subtraction 502 discussed above with reference to FIG. 5). As shown in FIG. 7, the subtraction 706 provides a dark current compensated image 708, which substantially omits dark counts that were present in the temporally filtered image (e.g., they are subtracted out using the dark current image frame 610). Accordingly, a system may intelligently select from among available dark current image frames (each associated with a respective temperature or range of temperatures) based on a measured runtime temperature.

Although the foregoing example focuses on using temperature as a basis for selecting a dark current image frame to use to subtract dark counts from a temporally filtered image, temperature may, in some instances, be used to generate a scaled or interpolated dark current image for facilitating dark current compensation. For example, in some instances, a runtime temperature does not exactly match or is not within a particular range of a temperature value associated with a previously captured dark current image frame. To accommodate such circumstances, the runtime temperature and one or more of the temperatures associated with dark current image frames may be used to generate a dark current factor, which may comprise a ratio of the runtime temperature and a dark current image frame temperature (e.g., where the runtime temperature 30° C. and the nearest temperature associated with a dark current image frame is 25° C., a dark current factor may be 1.2). A system may then use the dark current factor to generate a scaled dark current image (e.g., by applying the dark current factor to the per-pixel intensity values of a nearest-temperature dark current image frame) and use the scaled dark current image frame to facilitate dark current compensation (e.g., via subtraction as discussed above). In this same vein, temperature values associated with a runtime image and one or more dark current image frames may be used to generate an interpolated or extrapolated dark current image frame to be used for dark current compensation (e.g., where a runtime temperature lies between two temperature values associated with different dark current images).

Thus, temperature may be used as a factor (e.g., a “dark current factor”) for selecting or generating a dark current image frame to use for facilitating dark current compensation. Additional or alternative dark current factors are within the scope of the present disclosure for selecting or generating a dark current image frame to facilitate dark current compensation. In some implementations, a dark current image frame may be selected or generated in a manner that is agnostic toward explicit temperature measurements, which may advantageously eliminate the need for runtime temperature measurements/sensors.

FIG. 8 illustrates an example SPAD sensor 802, which may be part of the HMD 202 and may correspond, in at least some respects, to the SPAD sensors 602, 402 discussed hereinabove. For example, the SPAD sensor 802 is similarly obscured by a cover 804 to prevent photons from reaching the SPAD pixels of the SPAD sensor 802 while capturing the dark current image frame 806 (e.g., during a calibration step). The dark current image frame 806 includes dark counts, similar to the dark current image frames discussed above.

The example shown in FIG. 8 conceptually depicts an unexposed region 808 of SPAD pixels of the SPAD sensor 802 that capture a portion of the dark current image frame 806 (indicated by the dashed line that defines an outer boundary portion of the dark current image frame 806). Although all SPAD pixels of the SPAD sensor 802 are covered by the cover 804 during the capturing of the dark current image frame 806, the unexposed region 808 of SPAD pixels are also obscured while capturing image frames at runtime. For example, FIG. 8 illustrates an example representation of a cover 812 that may be used at runtime to prevent photons from reaching the unexposed region 808 of SPAD pixels of the SPAD sensor 802. The particular structure and/or configuration of the cover 812 is provided as an example only, and a cover may take on any form and be positioned at any desirable portion of the SPAD sensor 802 to prevent photons from reaching the unexposed region 808 of SPAD pixels.

As is evident from FIG. 8, at least some of the SPAD pixels within the unexposed region 808 detect dark counts for the dark current image frame 806 without detecting photon counts from any captured environment. In some instances, where a cover 812 is used at runtime, the SPAD pixels within the unexposed region 808 continue to detect dark counts for runtime images without detecting photon counts from any captured environment. As will be described in more detail hereinafter, dark counts detected within the unexposed region 808 of SPAD pixels during calibration (e.g., while capturing dark current image frames 806) and during runtime may be leveraged to facilitate dark current compensation (without relying on explicit temperature measurements).

FIG. 9 illustrates an the SPAD sensor 802 at runtime with the cover 812 positioned on the SPAD sensor 802 to prevent photons from reaching the SPAD pixels of the unexposed region 808. FIG. 9 illustrates an example temporally filtered image 902 generated from image frames captured by the SPAD sensor 802 at runtime. As depicted in FIG. 9, the temporally filtered image 902 includes image data acquired using SPAD pixels within the unexposed region 808 (also marked via a dashed box associated with temporally filtered image 902). In the example shown in FIG. 9, the quantity of avalanche events (e.g., dark counts) detected at runtime by the SPAD pixels wi