Microsoft – Nweon Patent https://patent.nweon.com 映维网,影响力虚拟现实(VR)、增强现实(AR)产业信息数据平台 Fri, 31 Mar 2023 02:12:27 +0000 en-US hourly 1 https://wordpress.org/?v=4.8.17 https://patent.nweon.com/wp-content/uploads/2021/04/nweon-icon.png Microsoft – Nweon Patent https://patent.nweon.com 32 32 Microsoft Patent | Optically transparent antennas on transparent substrates https://patent.nweon.com/27649 Thu, 30 Mar 2023 12:32:40 +0000 https://patent.nweon.com/?p=27649 ...

文章《Microsoft Patent | Optically transparent antennas on transparent substrates》首发于Nweon Patent

]]>
Patent: Optically transparent antennas on transparent substrates

Patent PDF: 加入映维网会员获取

Publication Number: 20230099937

Publication Date: 2023-03-30

Assignee: Microsoft Technology Licensing

Abstract

Examples are disclosed related to optically transparent antennas. One example provides a device, comprising an electrically insulating substrate that is at least partially optically transparent, one or more antennas disposed on the electrically insulating substrate, each antenna comprising a film of a conductive material that is at least partially optically transparent, the one or more antennas comprising a communication antenna, and processing circuitry electrically coupled to the communication antenna, the processing circuitry configured to one or more of send or receive signals via the communication antenna.

Claims

1.A device, comprising: an electrically insulating substrate that is at least partially optically transparent; one or more antennas disposed on the electrically insulating substrate, each antenna comprising a film of a conductive material that is at least partially optically transparent, the one or more antennas comprising a communication antenna; and processing circuitry electrically coupled to the communication antenna, the processing circuitry configured to one or more of send or receive signals via the communication antenna.

2.The device of claim 1, wherein the electrically insulating substrate comprises one or more of glass, polymethyl methacrylate, polystyrene, polyethylene terephthalate, cyclic olefin polymer, or polycarbonate.

3.The device of claim 1, wherein the conductive material comprises one or more of indium tin oxide, silver nanowires, or carbon nanotubes.

4.The device of claim 1, wherein the device comprises a head-mounted display device.

5.The device of claim 1, wherein the one or more antennas further comprises a proximity sensing antenna, and wherein the processing circuitry includes a resonant circuit coupled to the proximity sensing antenna.

6.The device of claim 5, wherein the communication antenna is configured to utilize a first frequency band for communication and the proximity sensing antenna is configured to utilize a second frequency band for facial motion tracking, the second frequency band being different from the first frequency band.

7.The device of claim 1, wherein the one or more antennas includes a plurality of thin film segments, and further comprising a conductive trace formed in a trench region between a first thin film segment and a second thin film segment.

8.The device of claim 1, further comprising an insulating layer located over the film.

9.The device of claim 1, wherein the processing circuitry is electrically coupled to the communication antenna via one or more of a gold plate, a silver plate, a copper plate, or a metalized bonding pad.

10.A proximity sensor comprising: an electrically insulating substrate that is at least partially optically transparent; one or more antennas disposed on the electrically insulating substrate, each antenna comprising a film of a conductive material that is at least partially optically transparent, the one or more antennas comprising a proximity sensing antenna; and a resonant circuit electrically coupled to the proximity sensing antenna, the resonant LC circuit configured to output a signal responsive to a position of a surface relative to the proximity sensing antenna.

11.The proximity sensor of claim 10, wherein the one or more antennas comprises a communication antenna, and wherein the proximity sensor further comprises processing circuitry configured to one or more of send or receive signals via the communication antenna.

12.The proximity sensor of claim 10, wherein the proximity sensor comprises a lens system incorporated in a head-mounted display device, and wherein the proximity sensing antenna is configured for facial motion tracking.

13.The proximity sensor of claim 10, wherein the electrically insulating substrate comprises one or more of glass, polymethyl methacrylate, polystyrene, polyethylene terephthalate, cyclic olefin polymer, or polycarbonate.

14.The proximity sensor of claim 10, wherein the conductive material comprises one or more of indium tin oxide, silver nanowires, or carbon nanotubes.

15.The proximity sensor of claim 10, wherein the one or more antennas includes a plurality of thin film segments, and further comprising a conductive trace formed in a trench region between a first thin film segment and a second thin film segment.

16.The proximity sensor of claim 10, wherein the one or more antennas comprise a plurality of thin film segments.

17.A head-mounted computing device comprising: a see-through display comprising an electrically insulating substrate that is at least partially optically transparent, and a plurality of antennas formed on the electrically insulating substrate, each antenna being formed from an electrically conductive material that is at least partially optically transparent, the plurality of antennas comprising a communication antenna; and a controller configured to send and receive signals via the communication antenna.

18.The head-mounted computing device of claim 17, wherein the plurality of antennas further comprises a proximity sensing antenna.

19.The head-mounted computing device of claim 18, wherein the communication antenna is configured to utilize a first frequency band and the proximity sensing antenna is configured to utilize a second frequency band for facial motion tracking, the second frequency band being different from the first frequency band.

20.The head-mounted computing device of claim 17, wherein the plurality of antennas includes a plurality of antenna segments, and further comprising a conductive trace formed in a trench region between a first thin film segment and a second thin film segment.

Description

BACKGROUND

Mobile computing devices commonly include one or more antennas configured for wireless communication with other devices. Communication antennas are often incorporated into the chassis of computing devices.

SUMMARY

This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter. Furthermore, the claimed subject matter is not limited to implementations that solve any or all disadvantages noted in any part of this disclosure.

Examples are disclosed that relate to antennas formed on optically transparent structures. One example provides a device comprising an electrically insulating substrate that is at least partially optically transparent, one or more antennas disposed on the electrically insulating substrate, each antenna comprising a film of a conductive material that is at least partially optically transparent, the one or more antennas comprising a communication antenna, and processing circuitry electrically coupled to the communication antenna, the processing circuitry configured to one or more of send or receive signals via the communication antenna.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 shows an example wearable display device that includes a transparent antenna.

FIG. 2 shows a block diagram of an example display device comprising an antenna.

FIG. 3 shows an example antenna layout for a lens system.

FIG. 4 is an enlarged view of the antenna layout of FIG. 3, showing conductive traces formed within trench regions between antenna segments.

FIG. 5 shows a sectional view of an example antenna formed on an optically transparent substrate.

FIG. 6 shows a flow diagram for an example method for forming an antenna on an optically transparent substrate.

FIG. 7 shows a block diagram of an example computing system.

DETAILED DESCRIPTION

Antenna placement on mobile devices having relatively small form-factors, such as head-mounted display (HMD) devices, can be challenging. For example, placing an antenna above a near-eye display system of an HMD device may result in a relatively bulky device. As another option, an antenna may be formed in the device frame near a temple or above an ear of the user. However, wearable computing devices may have constraints on heat generation and electromagnetic (EM) radiation emission proximate to a wearer’s body. As such, placing an antenna relatively close to the head of a user, or close to other device components, may impose limits on antenna power and functionality. Further, forming cutouts in the device frame to define the antenna can weaken the device frame.

Accordingly, examples are disclosed that relate to optically transparent antennas formed on optically transparent substates, such as on a see-through display of an HMD device. For example, one or more communication antennas may be formed on a transparent lens of an augmented reality HMD device, such that a user may view real-world imagery through the communication antenna(s). In another example, one or more proximity sensing antennas configured to detect facial movements and poses of a user can be formed on a transparent lens of an HMD device. Placing transparent antennas on lenses may take advantage of the under-utilized space of the lens surface on smaller form factor devices while avoiding any integrity and/or weight issues that may be associated with antennas formed in a device chassis. Transparent antennas formed from thin films also may be relatively lower cost and lower weight than other antenna options. Additionally, an antenna incorporated into a lens system of a wearable computing device allows for an EM radiation-emitting element to be placed further from the human head compared to antenna placements within the temple pieces of the frame, and thereby may avoid constraints on antenna transmission power. Further, locating motion sensing antennas on a lens system may provide a relatively clear coupling path to sensing targets for facial expression recognition (e.g., eyes, brows, eyelids, cheeks, lips, chin, etc.) compared to locating such antennas on a device frame. The term “transparent antenna” and the like as used herein indicates an antenna that is at least partially transparent to visible wavelengths of light, thereby allowing a user to see through the antenna.

FIG. 1 shows an example wearable display device 100 that includes a lens system comprising transparent antennas 102L, 102R respectively disposed on a left lens 104L and a right lens 104R, which are supported by a frame 106. Frame 106 is connected to side frames 106L, 106R via optional hinge joints 110. Each of 102L, 102R schematically represents one or more antennas. In some examples, one or more antennas may be located on one of, but not both of, left lens 104L or right lens 104R. The term “lens” is used herein to represent one or more optical components through which a real-world environment can be viewed (e.g. an optical combiner that combines virtual and real imagery, and/or one or more transparent optical components other than a combiner, such as a separate lens with or without optical power). The term “lens system” includes one or more lenses as well as antennas disposed on the one or more lenses.

Each lens 104L, 104R comprises an electrically insulating substrate that is at least partially optically transparent. For example, the substrate may comprise a glass, or an optically transparent plastic such as polycarbonate, polymethyl methacrylate (PMMA), polystyrene, polyethylene terephthalate (PET), cyclic olefin polymer, or other suitable material

Transparent antennas 102L, 102R are formed from electrically conductive films that are at least partially optically transparent. The films may comprise one or more electrically conductive materials, such as indium tin oxide (ITO), silver nanowires, silver nanoparticles, carbon nanotubes, graphene, a mixture of two or more such materials (e.g., silver nanoparticle-ITO hybrid), and/or other suitable material(s). The film(s) may be formed via any suitable process, such as chemical vapor deposition, sputtering, atomic layer deposition, or evaporation. Further, the film may be patterned to form a plurality of individual antenna segments on the lens system. Such a pattern may be formed via any suitable method, examples of which are described below. Trenches between antenna segments may be utilized for placement of conductive traces, also described in more detail below.

As the conductive film may not be fully optically transparent in some examples, the use of relatively thinner films for antennas may provide for greater transparency compared to relatively thicker coatings. However, RF loss may be increased for relatively thinner coatings. As such, the thickness of the conductive film can be selected based on a balance between RF loss and transparency.

In some examples, one or more of antennas 102L, 102R comprises a communication antenna. As such, wearable display device 100 may communicate with a remote computing system 116 via the communication antenna. Remote computing system 116 may comprise any suitable computing system, such as a cloud computing system, a PC, a laptop, a phone, a tablet, etc. The communication antenna may utilize any suitable frequency band for communication (e.g., bands for 4G, 5G, WiFi, WiFi 7,Bluetooth, Bluetooth 5.1, etc.). Further, the communication antenna may comprise switches operable to change the radiation pattern and thus change to a different frequency band.

Alternatively or additionally, in some examples one or more of antennas 102L, 102R comprises a proximity sensing antenna. In combination with circuitry, the proximity sensing antenna is configured to output a signal responsive to a position of a surface proximate to the proximity sensing antenna. For example, the proximity sensing antenna may be connected to a resonant LC circuit that is responsive to changes in capacitance and/or inductance based upon a proximity of the proximity sensing antenna to a human body surface. In other examples, any circuitry capable of measuring S21 or S11 scattering parameters may be used to detect proximity via a proximity sensing antenna signal. In some examples, one or more proximity sensing antennas are configured for facial motion detection. For example, signal data from a plurality of proximity sensing antennas can be input into a trained machine learning function configured to output a most likely facial expression. The proximity sensing antenna may comprise any suitable quality factor (Q factor). In some examples, the proximity sensing antenna comprises a Q factor that is between 150 and 2000. Further, in some examples, a resonant LC circuit may be configured to have a resonant frequency in a range of 100 KHz to 1 MHz. In other examples, any other suitable frequencies and/or Q factors may be used.

In examples where transparent antennas 102L, 102R comprise both a communication antenna and a proximity sensing antenna, the antennas may be configured to utilize different frequency bands. In one example, the communication antenna uses a 2.4 GHz band while the proximity sensing antenna uses a 500 kHz band. In another example, the communication antenna is configured as a WIFI-7 antenna while the proximity sensing antenna senses at 1 MHz. These examples are intended to be illustrative and not limiting.

Wearable display device 100 further may include an image producing system (for example a laser scanner, a liquid crystal on silicon (LCoS) microdisplay, a transmissive liquid crystal microdisplay, an organic light emitting device (OLED) microdisplay, or digital micromirror device (DMD)) to produce images for display. Images displayed via left-eye and right-eye transparent combiners may comprise stereo images of virtual objects overlayed on the real-world scene such that the virtual objects appear to be present in the real-world scene.

Wearable display device 100 further comprises a controller 120. Controller 120 comprises, among other components, a logic subsystem and a storage subsystem that stores instructions executable by the logic subsystem to control the various functions of wearable display device 100, including but not limited to the communication and proximity sensing functions described herein. In various examples example, controller 120 may comprise instructions to send and/or receive signals via a communication antenna, to change a communication frequency band, to receive signal data from the proximity sensing antenna, and to determine (or obtain a determination of) a most likely facial expression using data from one or more proximity sensing antennas, among other functions.

The one or more antennas 102L, 102R may be electrically coupled to processing circuitry, such as one or more resonant LC circuits and/or controller 120. Further, flex cables, printed traces, and/or other suitable electrical connections may be used to connect the antenna on the front frame 106 to processing circuitry on a side frame 106R, 106L via hinges 110. Electrical connections between antennas 102L, 102R and conductors at the device frame may be made via any suitable interconnect. Examples include a metallic plate (e.g. gold, silver, or copper), and relatively large bonding pads having metalized edges (e.g. metalized with copper) to reduce ohmic loss at copper/coating mating points, as the sheet resistivity (ohm/sq) for metals may be 6-7 orders of magnitude smaller than that of the antenna coatings.

Conductors that carry signals from antennas 102L, 102R to other circuitry may be routed through or otherwise along frame 106, as indicated by pathway 122. The signals may comprise communication signals to/from the communication antenna and/or sensor signals from the proximity sensing antenna. In some examples, signals and shield conductors for the signal connections are routed alongside signals from other components, such as a microphone or a camera.

FIG. 2 shows a block diagram of an example display device 200 comprising an antenna system. Wearable display device 100 is an example of display device 200. Display device 200 comprises a lens system 202 with a transparent combiner 204. Transparent combiner 204 is configured to provide an optical path between an eyebox of display device 200 and an image source 212 to thereby allow a user to view images produced by image source 212 mixed with a real-world background viewable through transparent combiner 204.

Lens system 202 further comprises an antenna system 206. Antenna system 206 may comprise one or more communication antennas 208 and/or one or more proximity sensing antennas 210 formed from a material that is at least partially optically transparent. As such, a user can view imagery delivered from image source 212 via transparent combiner 204 together with real-world imagery visible through the combiner and antenna. The optical transparency may be due to a thickness of an electrically conductive film used to form the antennas, and/or due to a visible light absorption spectrum of the material allowing a substantial proportion of visible light to pass through the film. In some examples, a communication antenna 208 and/or proximity sensing antenna 210 can be formed directly on transparent combiner 204 of the lens system, while in other examples a communication antenna 208 and/or proximity sensing antenna 210 can be formed on another transparent component of the lens system that is configured to be located between the user’s eye and real-world environment (e.g., a lens that is separate from the combiner) when display device 200 is worn.

Display device 200 further comprises processing circuitry 220 configured to send and/or receive signals via communication antenna(s) 208, and/or to process data from signals received from communication antenna(s) 208 and/or proximity sensing antenna(s) 210. Display device 200 further comprises a resonant LC circuit 222 for each proximity sensing antenna 210. Each resonant LC circuit 222 is configured to be responsive to changes in capacitance and/or inductance caused by changes in proximity of a surface of a human body to a proximity sensing antenna to which the resonant LC circuit is connected. In some examples, a resonant LC circuit may be configured to have a resonant frequency in a range of 100 KHz to 1 MHz. In some examples, display device may comprise a plurality of resonant LC circuits coupled to a corresponding plurality of proximity sensing antennas. Display device 200 further comprises a computing system 230 comprising a processor and stored instructions executable by the processor to operate the various functions of display device 200. Examples of hardware implementations of computing devices are described below in more detail with regard to FIG. 7.

FIG. 3 shows a front view of an example wearable device 300 illustrating an example antenna layout 301. Wearable device 300 is an example implementation of display devices 100 and 200. Wearable device 300 includes a lens system comprising lenses 302a and 302b for right and left eyes, respectively. The antenna layout on each lens in this example comprises seven antenna segments formed on a transparent substrate, as described above. While the example depicted comprises seven antenna segments per lens, in other examples, any suitable antenna layout with any suitable number of antenna segments may be used.

As described above, the transparent antennas may comprise one or more communication antennas and/or one or more proximity sensing antennas. Wearable device 300 further may include one or more switches, indicated schematically at 308, to selectively connect antenna segments together. Switches can be used to change the radiation pattern emitted by communication antennas, and thereby change a frequency band used for communication. This may help to support a wider variety of communications bands and protocols.

As shown on lens 302a, antenna segments 304ag are separated by trench regions 306, indicated by thick dark lines. Trench regions 306 are regions between antenna segments that lack the electrically conductive film(s) that form antenna segments 304ag. As described in more detail below, in some examples, trench regions 306 may comprise electrically conductive traces to carry signals to and/or from antenna segments 304ag to other circuitry. Trench regions 306 may be formed by masking followed by deposition of the conductive film for the antenna segments, or by etching after forming the conductive film, in various examples. In some examples, trench regions are etched into the lens or other substrate.

As a conductive film from which the antenna segments 304ag are formed may not be fully transparent, the antenna layout may be visible to a user in some examples. However, when incorporated into a device configured to be worn on the head, the antenna layout may be positioned closer than a focal length of the human eye during most normal use of wearable device 300. As such, the layout may be out of focus to a user during ordinary device use, and thus may not obstruct the user’s view or distract the user.

As mentioned above, in some examples, signals may be carried to and/or from the antenna segments on electrically conductive traces. FIG. 4 shows an enlarged view of region 400 of antenna layout 301, schematically showing conductive traces in a trench region 402. FIG. 4 depicts three antenna segments 404ac, as indicated by hashed areas, and five conductive traces which connect the antenna regions to processing circuitry. As shown, the traces connect to processing circuitry at relatively closely spaced locations, as indicated at 406, which may provide for simpler routing than where the traces connect to other conductors at different locations around a device frame.

The conductive traces are utilized to form electrical connections between various antenna segments and other circuitry. In the depicted example, two conductive traces connect respectively to antenna segments 404a and 404c, while three other conductive traces run through trench region 402. Conductive trace 408 is coupled to antenna segment 404b, while conductive traces 410 connect to other antenna segments not shown in FIG. 4.

The conductive traces may comprise any suitable electrically conductive material, such as a conductive metal. Examples include gold, silver, and copper. In some examples, the conductive traces may comprise a sheet-like aspect ratio. In one illustrative example, the conductive traces comprise a width with a range of 25 μm (microns) to 75 microns and a thickness within a range of 5 to 500 microns. As such, the conductive trace may take advantage of sheet resistance behavior in thin metals. In other examples, conductive traces may have any other suitable dimensions.

FIG. 5 is a sectional view of an example device 500 showing material layers of an example antenna 501 formed on a transparent substrate 502. It will be understood that the elements in FIG. 5 are depicted schematically and may not be drawn to scale. Transparent substrate 502 is supported by device frame 504 at an edge. As discussed above, transparent substrate 502 is electrically insulating and may comprise an optically transparent glass, plastic, or other suitable material.

Antenna 501 comprises a conductive film 506 formed on top of transparent substrate 502, the conductive film being at least partially transparent. While substrate 502 is depicted as being relatively flat, the antenna may be formed on a curved surface in some examples. Conductive film 506 may comprise any suitable electrically conductive material, such as ITO, silver nanowires or carbon nanotubes). Conductive film 506 may comprise a plurality of antenna segments with any suitable antenna layout, as mentioned above. An insulating layer 508 is formed on top of conductive film 506. In some examples, insulating layer 508 comprises an anti-reflective material. Further, in some examples, insulating layer 508 comprises a lossy coating material configured to suppress thermal noise.

The device further comprises one or more conductive traces 510 formed on the insulating layer 508 and/or in a trench region 512. Conductive traces 510 may comprise a conductive metal (e.g., Au, Ag, Cu) in some examples. To connect traces 510 with other circuitry, an anisotropic conductive film (ACF) 514 may be applied over conductive traces 510, and a flex printed circuit 516 may be coupled to ACF 514. In other examples, any suitable method may be used to electrically connect a trace with a conductive element. Traces can be used to carry signals for components other than antennas. For example, device 500 may further comprise additional metal traces coupled to other components than antenna segments.

FIG. 6 shows a flow diagram of an example method 600 for forming an antenna on a transparent substrate. At 602, the method comprises forming an at least partially transparent film of conductive material on an at least partially transparent, electrically insulating substrate. As discussed above, for some materials that are not fully optically transparent, a relatively thicker film may provide better RF characteristics at the expense of loss of transparency, compared to thinner films. The thickness of the film can be adjusted based on a trade-off between translucency and RF loss. In some examples, at 604, the substrate comprises a glass or optical plastic (e.g., polycarbonate or acrylate). In some examples, at 606, the film comprises one or more of ITO, silver nanowires, or carbon nanotubes.

In some examples, at 608, the film comprises a pattern that defines a layout of antenna segments. In some examples, as indicated at 610, method 600 comprises forming the pattern by first masking trench regions followed by depositing the conductive film. Any suitable deposition method may be employed, such as chemical vapor deposition, sputtering, atomic layer deposition, or evaporation. In other examples, as indicated at 612, the conductive film is first deposited without any masking pattern, and then etched to create the pattern. In some such examples, at 614, a waterjet is used to etch the conductive film and create the pattern. In other examples, lithographic techniques may be used to form a pattern after deposition of the conductive material, and then a suitable wet or dry etching process may be used to form the trenches.

Method 600 further comprises, at 616, forming one or more conductive traces in a trench region of the substrate. In some examples, at 618, the conductive trace comprises a metal such as Au, Ag, or Cu.

Continuing, method 600 further comprises, at 620, forming an insulating layer on top of the conductive film. In some examples, at 622, the insulating layer comprises an anti-reflective coating. In some examples, at 624, the method further comprises forming a conductive trace on top of the insulating layer. Further, in some examples, at 626, the method comprises applying a layer of ACF on top of the conductive trace to form a connection to a flex circuit. In such examples, at 628, the method further comprises connecting a printed circuit to the ACF layer.

While disclosed above in the context of a see-through display device, in other examples an optically transparent antenna according to the disclosed examples may be used in any other suitable context. For example, a transparent antenna can be formed on a window in a building, vehicle, or other space. Such a transparent antenna is coupled to a resonant LC circuit and configured for proximity sensing. The proximity sensing antenna may detect proximate objects at relatively greater distances (e.g., 0-5 feet) compared to capacitive touch sensors (e.g., distances less than 5 mm). Further, a proximity sensing antenna system may be able to distinguish between objects of different sizes.

As another example, a mobile computing device may comprise an antenna formed on the display (e.g., smartphone screen). As the antenna comprises a conductive film that is at least partially transparent, a user may still view the display through the antenna.

In some embodiments, the methods and processes described herein may be tied to a computing system of one or more computing devices. In particular, such methods and processes may be implemented as a computer-application program or service, an application-programming interface (API), a library, and/or other computer-program product.

FIG. 7 schematically shows a non-limiting embodiment of a computing system 700 that can enact one or more of the methods and processes described above. Computing system 700 is shown in simplified form. Computing system 700 may take the form of one or more personal computers, server computers, tablet computers, home-entertainment computers, network computing devices, gaming devices, mobile computing devices, mobile communication devices (e.g., smart phone), and/or other computing devices.

Computing system 700 includes a logic machine 702 and a storage machine 704. Computing system 700 may optionally include a display subsystem 706, input subsystem 708, communication subsystem 710, and/or other components not shown in FIG. 7.

Logic machine 702 includes one or more physical devices configured to execute instructions. For example, the logic machine may be configured to execute instructions that are part of one or more applications, services, programs, routines, libraries, objects, components, data structures, or other logical constructs. Such instructions may be implemented to perform a task, implement a data type, transform the state of one or more components, achieve a technical effect, or otherwise arrive at a desired result.

The logic machine may include one or more processors configured to execute software instructions. Additionally or alternatively, the logic machine may include one or more hardware or firmware logic machines configured to execute hardware or firmware instructions. Processors of the logic machine may be single-core or multi-core, and the instructions executed thereon may be configured for sequential, parallel, and/or distributed processing. Individual components of the logic machine optionally may be distributed among two or more separate devices, which may be remotely located and/or configured for coordinated processing. Aspects of the logic machine may be virtualized and executed by remotely accessible, networked computing devices configured in a cloud-computing configuration.

Storage machine 704 includes one or more physical devices configured to hold instructions executable by the logic machine to implement the methods and processes described herein. When such methods and processes are implemented, the state of storage machine 704 may be transformed—e.g., to hold different data.

Storage machine 704 may include removable and/or built-in devices. Storage machine 704 may include optical memory (e.g., CD, DVD, HD-DVD, Blu-Ray Disc, etc.), semiconductor memory (e.g., RAM, EPROM, EEPROM, etc.), and/or magnetic memory (e.g., hard-disk drive, floppy-disk drive, tape drive, MRAM, etc.), among others. Storage machine 704 may include volatile, nonvolatile, dynamic, static, read/write, read-only, random-access, sequential-access, location-addressable, file-addressable, and/or content-addressable devices.

It will be appreciated that storage machine 704 includes one or more physical devices. However, aspects of the instructions described herein alternatively may be propagated by a communication medium (e.g., an electromagnetic signal, an optical signal, etc.) that is not held by a physical device for a finite duration.

Aspects of logic machine 702 and storage machine 704 may be integrated together into one or more hardware-logic components. Such hardware-logic components may include field-programmable gate arrays (FPGAs), program- and application-specific integrated circuits (PASIC/ASICs), program- and application-specific standard products (PSSP/ASSPs), system-on-a-chip (SOC), and complex programmable logic devices (CPLDs), for example.

It will be appreciated that a “service”, as used herein, is an application program executable across multiple user sessions. A service may be available to one or more system components, programs, and/or other services. In some implementations, a service may run on one or more server-computing devices.

When included, display subsystem 706 may be used to present a visual representation of data held by storage machine 704. This visual representation may take the form of a graphical user interface (GUI). As the herein described methods and processes change the data held by the storage machine, and thus transform the state of the storage machine, the state of display subsystem 706 may likewise be transformed to visually represent changes in the underlying data. Display subsystem 706 may include one or more display devices utilizing virtually any type of technology. Such display devices may be combined with logic machine 702 and/or storage machine 704 in a shared enclosure, or such display devices may be peripheral display devices.

When included, input subsystem 708 may comprise or interface with one or more user-input devices such as a keyboard, mouse, touch screen, or game controller. In some embodiments, the input subsystem may comprise or interface with selected natural user input (NUI) componentry. Such componentry may be integrated or peripheral, and the transduction and/or processing of input actions may be handled on- or off-board. Example NUI componentry may include a microphone for speech and/or voice recognition; an infrared, color, stereoscopic, and/or depth camera for machine vision and/or gesture recognition; a head tracker, eye tracker, accelerometer, and/or gyroscope for motion detection and/or intent recognition; as well as electric-field sensing componentry for assessing brain activity.

When included, communication subsystem 710 may be configured to communicatively couple computing system 700 with one or more other computing devices. Communication subsystem 710 may include wired and/or wireless communication devices compatible with one or more different communication protocols. As non-limiting examples, the communication subsystem may be configured for communication via a wireless telephone network, or a wired or wireless local- or wide-area network. In some embodiments, the communication subsystem may allow computing system 700 to send and/or receive messages to and/or from other devices via a network such as the Internet.

Another example provides a device comprising an electrically insulating substrate that is at least partially optically transparent, one or more antennas disposed on the electrically insulating substrate, each antenna comprising a film of a conductive material that is at least partially optically transparent, the one or more antennas comprising a communication antenna, and processing circuitry electrically coupled to the communication antenna, the processing circuitry configured to one or more of send or receive signals via the communication antenna. In some such examples, the electrically insulating substrate comprises one or more of glass, polymethyl methacrylate, polystyrene, polyethylene terephthalate, cyclic olefin polymer, or polycarbonate. In some examples the conductive material additionally or alternatively comprises one or more of indium tin oxide, silver nanowires, or carbon nanotubes. In some examples the device additionally or alternatively comprises a head-mounted display device. In some such examples, the one or more antennas additionally or alternatively comprises a proximity sensing antenna and the processing circuitry includes a resonant circuit coupled to the proximity sensing antenna. In some examples the communication antenna additionally or alternatively is configured to utilize a first frequency band for communication and the proximity sensing antenna is configured to utilize a second frequency band for facial motion tracking, the second frequency band being different from the first frequency band. In some examples the one or more antennas additionally or alternatively includes a plurality of thin film segments, and the device further comprises a conductive trace formed in a trench region between a first thin film segment and a second thin film segment. In some examples the device additionally or alternatively comprises an insulating layer located over the film. In some examples the processing circuitry additionally or alternatively is electrically coupled to the communication antenna via one or more of a gold plate, a silver plate, a copper plate, or a metalized bonding pad.

Another example provides a proximity sensor comprising an electrically insulating substrate that is at least partially optically transparent, one or more antennas disposed on the electrically insulating substrate, each antenna comprising a film of a conductive material that is at least partially optically transparent, the one or more antennas comprising a proximity sensing antenna, and a resonant circuit electrically coupled to the proximity sensing antenna, the resonant LC circuit configured to output a signal responsive to a position of a surface relative to the proximity sensing antenna. In some such examples, the one or more antennas comprises a communication antenna, and wherein the proximity sensor further comprises processing circuitry configured to one or more of send or receive signals via the communication antenna. In some examples the proximity sensor additionally or alternatively comprises a lens system incorporated in a head-mounted display device, and the proximity sensing antenna is configured for facial motion tracking. In some such examples, the electrically insulating substrate additionally or alternatively comprises one or more of glass, polymethyl methacrylate, polystyrene, polyethylene terephthalate, cyclic olefin polymer, or polycarbonate. In some such examples, the conductive material additionally or alternatively comprises one or more of indium tin oxide, silver nanowires, or carbon nanotubes. In some such examples, the one or more antennas additionally or alternatively includes a plurality of thin film segments, and further comprising a conductive trace formed in a trench region between a first thin film segment and a second thin film segment. In some such examples, the one or more antennas additionally or alternatively comprise a plurality of thin film segments.

Another example provides a head-mounted computing device comprising a see-through display comprising an electrically insulating substrate that is at least partially optically transparent, and a plurality of antennas formed on the electrically insulating substrate, each antenna being formed from an electrically conductive material that is at least partially optically transparent, the plurality of antennas comprising a communication antenna, and a controller configured to send and receive signals via the communication antenna. In some such examples, the plurality of antennas further comprises a proximity sensing antenna. In some such examples, the communication antenna additionally or alternatively is configured to utilize a first frequency band and the proximity sensing antenna is configured to utilize a second frequency band for facial motion tracking, the second frequency band being different from the first frequency band. In some such examples, the plurality of antennas additionally or alternatively includes a plurality of antenna segments, and further comprising a conductive trace formed in a trench region between a first thin film segment and a second thin film segment.

It will be understood that the configurations and/or approaches described herein are exemplary in nature, and that these specific embodiments or examples are not to be considered in a limiting sense, because numerous variations are possible. The specific routines or methods described herein may represent one or more of any number of processing strategies. As such, various acts illustrated and/or described may be performed in the sequence illustrated and/or described, in other sequences, in parallel, or omitted. Likewise, the order of the above-described processes may be changed.

The subject matter of the present disclosure includes all novel and non-obvious combinations and sub-combinations of the various processes, systems and configurations, and other features, functions, acts, and/or properties disclosed herein, as well as any and all equivalents thereof.

文章《Microsoft Patent | Optically transparent antennas on transparent substrates》首发于Nweon Patent

]]>
Microsoft Patent | Eye tracking head mounted display device https://patent.nweon.com/27637 Thu, 30 Mar 2023 12:04:40 +0000 https://patent.nweon.com/?p=27637 ...

文章《Microsoft Patent | Eye tracking head mounted display device》首发于Nweon Patent

]]>
Patent: Eye tracking head mounted display device

Patent PDF: 加入映维网会员获取

Publication Number: 20230100656

Publication Date: 2023-03-30

Assignee: Microsoft Technology Licensing

Abstract

This document relates to head mounted display devices. One example can include a housing configured to be positioned relative to a head and eye of a user and a transparent visual assembly positioned by the housing in front of the user’s eye and comprising multiple eye tracking illuminators distributed across the transparent visual assembly and configured to emit non-visible light and multiple eye tracking detectors distributed across the transparent visual assembly and configured to detect the non-visible light reflected back from the eye of the user.

Claims

1.A head mounted display device, comprising: a housing configured to be positioned relative to a head and eye of a user; and, a visual assembly positioned by the housing in front of the user’s eye, the visual assembly comprising: an electrical layer comprising side-by-side electronic components, individual electronic components configured to emit or detect light; and, an optical layer comprising side-by-side optical components, individual optical components configured to refract or reflect or diffract light relative to individual electronic components.

2.The head mounted display device of claim 1, wherein the electrical layer and the optical layer are formed on a single substrate or wherein the electrical layer comprises a first substrate and the optical layer comprises a second substrate, and wherein the first and second substrates are positioned against one another or wherein the first and second substrates are spaced apart from one another.

3.The head mounted display device of claim 2, wherein the optical layer is transparent to visible light.

4.The head mounted display device of claim 1, wherein at least some of the electronic components and optical components contribute to eye tracking of the eye of the user.

5.The head mounted display device of claim 1, wherein the electrical layer is positioned proximate to the user relative to the optical layer.

6.The head mounted display device of claim 1, wherein individual electronic components are paired with individual optical components as modules to achieve specific functionalities.

7.The head mounted display device of claim 6, wherein the specific functionalities include eye tracking illumination, eye tracking detection, image generation, 3D illumination, and/or 3D detection.

8.The head mounted display device of claim 7, wherein an individual eye tracking illumination pair comprises an individual electronic component that emits non-visible light away from the user’s eye and an individual optical component that redirects the non-visible light back towards the user’s eye.

9.The head mounted display device of claim 8, wherein an individual eye tracking detection pair further comprises a lens that receives the non-visible light reflected from the user’s eye and focuses the non-visible light toward another individual electronic component that senses the non-visible light reflected back from the user’s eye.

10.The head mounted display device of claim 9, wherein the another electronic component faces the user’s eye or wherein the another electronic component is positioned behind the electronic component.

11.The head mounted display device of claim 10, wherein eye tracking illumination pairs and eye tracking detection pairs are distributed across the visual assembly.

12.A head mounted display device, comprising: a housing configured to be positioned relative to a head and eye of a user; and, a transparent visual assembly positioned by the housing in front of the user’s eye and comprising multiple eye tracking illuminators distributed across the transparent visual assembly and configured to emit non-visible light and multiple eye tracking detectors distributed across the transparent visual assembly and configured to detect the non-visible light reflected back from the eye of the user.

13.The head mounted display device of claim 12, wherein the eye tracking illuminators are configured to emit the non-visible light in a direction away from the eye of the user.

14.The head mounted display device of claim 13, wherein the transparent visual assembly further comprises optical components that include non-visible selective reflectors that are configured to collimate the non-visible light in an eye box defined by the head mounted display device.

15.The head mounted display device of claim 14, wherein the optical components are configured to operate cooperatively to illuminate an entire eye box for the user.

16.The head mounted display device of claim 15, further comprising other optical components distributed across the transparent visual assembly and configured to cooperatively generate a visual image in the eye box.

17.The head mounted display device of claim 16, wherein the other optical components are configured to generate the visual image simultaneously to the optical components illuminating the entire eye box with the non-visible light.

18.The head mounted display device of claim 17, further comprising additional optical components that are configured to three-dimension (3D) map a region in front of the user simultaneously to the other optical components generating the visual image and the optical components illuminating the entire eye box with the non-visible light.

19.The head mounted display device of claim 18, wherein the optical components, the other optical components, and the additional optical components are interspersed across a field of view of the transparent visual assembly.

20.A system, comprising: a visual assembly configured to be positioned in front of an eye of a user and comprising multiple eye tracking illuminators distributed across the visual assembly and configured to emit non-visible light and multiple eye tracking detectors distributed across the visual assembly and configured to detect the non-visible light reflected back from the eye of the user; and, a controller configured to process the detected non-visible light from multiple eye tracking detectors to identify information relating to the eye.

Description

BACKGROUND

Head mounted display devices can enable users to experience immersive virtual reality scenarios and/or augmented reality scenarios. Such technology may be incorporated into a device in the form of eyeglasses, goggles, a helmet, a visor, or some other type of head-mounted display (HMD) device or eyewear. In order for the HMD device to be comfortable for any length of time, the head mounted display should be positioned relatively closely to the user’s face (e.g., eyes) and should be relatively light weight. Despite these constraints, the HMD device should be able to perform multiple functionalities, such as image generation, eye tracking, and/or 3D sensing of the environment. The present concepts can address these and/or other issues.

BRIEF DESCRIPTION OF THE DRAWINGS

The Detailed Description is described with reference to the accompanying figures. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. The use of similar reference numbers in different instances in the description and the figures may indicate similar or identical items. In some figures where multiple instances of an element are illustrated, not all instances are designated to avoid clutter on the drawing page.

FIG. 1A illustrates a perspective view of an example HMD device that is consistent with some implementations of the present concepts.

FIGS. 1B, 2, 3, 4, 5, 6, 7A-7D, and 816 illustrate elevational views of example HMD devices that are consistent with some implementations of the present concepts.

FIG. 17 illustrates example methods or techniques that are consistent with some implementations of the present concepts.

DETAILED DESCRIPTIONOverview

Head-mounted display (HMD) devices can present virtual content to a user in a virtual reality scenario and/or an augmented reality scenario. A primary function of the HMD device is to display images at an ‘eye box’ for perception by the user. While the display function is a central function of the HMD device, other functions, such as sensing the environment via depth sensing (e.g., 3D sensing) and eye tracking to understand the user’s interaction within the environment can be valuable functions that contribute to the overall quality of the user experience. Traditionally, 3D sensing and eye tracking have been accomplished with dedicated components positioned outside of the user’s field of view (FoV).

The present concepts can accomplish the eye tracking and/or 3D sensing within the FoV of the HMD device. The concepts can include multiple ways that 3D sensing, eye tracking, and/or image generation can be enhanced, simplified, and/or reduced in cost by employing a distributed and dispersed arrangement of electronic components and/or optical components on a visual assembly. The electronic components can be small enough that they are imperceptible to the user. The visual assembly can be transparent to visible light despite the distributed and dispersed arrangement of electronic components and/or optical components on the visual assembly. Utilizing multiple electronic components dispersed and distributed across the FoV can offer several advantages over traditional designs. These and other aspects are discussed below.

Introductory FIGS. 1A and 1B collectively depict an example HMD device 100 which can implement the present concepts. HMD device 100 can include a housing 102 that can orient a visual assembly 104 relative to a user 106. In some cases, the visual assembly 104 can include an electrical layer 108. In some implementations, the visual assembly 104 can be transparent in that it can allow ambient light 110 to pass through and reach an eye box 112 associated with the user’s eye 114. The transparent visual assembly 104 can also include side-by-side electronic components 116 distributed on the electrical layer 108. The term side-by-side is used to indicate that the electronic components are positioned adjacent to one another on the electrical layer 108 either abutting or with gaps in between.

The electronic components 116 can perform various light generation and light detection functions. For instance, electronic components 116(1) and 116(7) can generate non-visible light (shown as dotted lines), such as infra-red (IR) light that can be directed toward the eye box 112 to gain information about the user’s eye 114. Electronic component 116(4) can detect the non-visible light reflected from the user’s eye to gain information about the user’s eye. Electronic component 116(3) can generate non-visible light (shown as dashed lines), such as infra-red (IR) light that can be directed toward the environment to gain information about the environment. Electronic component 116(6) can detect the non-visible light returned from the environment to gain information about the environment, such as by 3D sensing/mapping. Electronic components 116(2) and 116(5) can generate visible light (shown as solid lines) that can be directed toward the eye box 112 to collectively generate a virtual image. These are just some of the types of example electronic component types that can occur on the electrical layer 108. Other examples are described below relative to FIG. 2.

As mentioned above, in some implementations ambient light 110 can pass through the virtual assembly 104 so that the user can see both the actual physical environment and virtual content (e.g., augmented reality) generated by a subset of the electronic components 116. Each type of electronic component 116 can be distributed and dispersed across the electronic layer (e.g., can have neighbors of different electronic component function). This aspect will be described in greater detail below relative to FIGS. 2 and 3. This configuration can be contrasted with traditional technologies that employ eye tracking and depth sensing components around a periphery of the HMD device, but not in the device’s FoV.

In some virtual reality scenarios, the visual assembly 104 may not be transparent, but the electrical layer can be transparent. For instance, eye tracking electronic components on the electrical layer would not degrade visual images from a display positioned away from the user in the visual assembly relative to the electrical layer 108.

Note also that for ease of illustration and for sake of brevity, FIG. 1B as well as some of the subsequent FIGURES show only one of the user’s eyes and part of the visual assembly 104 in front of the eye. However, the described concepts can be applied to both the left and right eyes by the HMD device 100.

FIG. 2 shows another example HMD device 100A. (The suffix, such as ‘A’ is used relative to HMD device 100A for purposes of distinguishing this HMD device from HMD device examples above and below. The various HMD device examples may have different and/or additional elements and/or some elements may be different in one implementation compared to other implementations.) In this case, the visual assembly 104 can include an optical layer 202. In this configuration, the optical layer 202 is positioned away from the eye 114 relative to the electrical layer 108. In other configurations, the optical layer 202 could be positioned on the opposite side of the electrical layer 108.

The optical layer 202 can include multiple optical components 204 that can be positioned side-by-side to one another on the optical layer. The optical components 204 can be configured to affect a path of some or all wavelengths of light that encounter an individual optical component. For instance, the optical components 204 can be manifest as mirrors and/or lenses. The optical components 204 can work cooperatively with the electronic components 116 to achieve various functionalities, such as eye tracking, image generation (e.g., RGB display), and/or 3D mapping, among others. Note that the optical components 204 and the electronic components 116 tend to be very small and as such are not drawn to scale and/or in the numbers that would likely be present on the visual assembly 104, but the illustrated optical components 204 and the electronic components 116 serve to convey the present concepts.

In this example, electronic component 116(1) and optical component 204(1) operate cooperatively to contribute to RGB image generation and thus can be viewed as an RGB display module 206(1). The electronic component 116(1) can entail a red, green, blue (RGB) display (e.g., pixel cluster), such as a light emitting diode(s) (LED) that is configured to emit light in a direction away from the eye 114. In this case, the optical component 204(1) can be manifest as a partially reflective mirror or a notch filter. A partially reflective mirror can reflect certain wavelengths of light while being transmissive to other wavelengths of light. Alternatively or additionally, a partially reflective mirror can reflect light received at certain angles while being transmissive to other angles. For instance, ambient light 110 traveling generally normal to the optical axis may pass through the partially reflective mirror 502(1). In contrast, the partially reflective mirror of optical component 204(1) can reflect the RGB light from the electronic component 116(1) back toward the eye 114. While only one RGB or single-color display module is shown, multiple dispersed and distributed RGB display modules 206 can contribute to the overall image perceived by the eye 114.

In the illustrated configuration, electronic components 116(3) and 116(4) can emit non-visible light for ET purposes. For instance, the electronic component 116(4) can be an IR LED or array of LEDs. This non-visible light can be emitted in a direction away from the eye and can be redirected back toward the eye by optical components 204(5) and 204(6), respectively that are manifest as partially reflective mirrors (e.g., hot mirrors), for instance. A hot mirror can transmit visible light while reflecting non-visible wavelengths, such as IR. Electronic component 116(3) and optical component 204(5) can function as an eye tracking illumination module 208(1) and electronic component 116(4) and optical component 204(6) can function as eye tracking illumination module 208(2). Note that electronic components 116(4) and 116(5) may emit the same wavelengths of non-visible light. In other configurations, these electronic components may emit different wavelengths of light from one another. Potential advantages of this latter configuration are described below relative to FIG. 15. Electronic component 116(5) can emit non-visible light for 3D mapping purposes and can function as a 3D mapping or depth map module 212.

Electronic component 116(2) can include a sensor that is sensitive to the non-visible light. The non-visible light can be emitted by ET illumination modules 208 and reflected back from the user’s eye. The non-visible light can be received at optical component 204(3), which redirects the light toward the electronic component 116(2). Thus, electronic component 116(2) and optical component 204(3) can function as an ET camera/sensing/detection module 210(1).

Other electronic components can entail multiple components that collectively can both emit non-visible light, such as IR, and sense non-visible light that is reflected back from objects in the environment. For instance, the emitting component can entail an IR LED or LED array and the detector can entail an IR CMOS sensor, for example. The IR light can be structured light and/or can be sensed stereoscopically (e.g., by multiple detectors) to convey 3D information. These configurations can enable 3D mapping of the environment in front of the user. In some cases, the electronic component is not paired with an optical component in the optical layer 202 (e.g., does not need focusing). For instance, the non-visible light can be emitted evenly in a flood pattern that can be effective without redirecting of the non-visible light that could be provided by an optical component. However, in other implementations, an optical component, such as various types of mirrors and/or lenses, can be employed to affect the light emitted from the electronic component. In either configuration (e.g., without or without an optical component) the electronic component can be viewed as contributing to a module configured to achieve a functionality.

Two of the depth sensing techniques that can be accomplished with the present implementations can include time of flight (ToF) techniques and stereo techniques. Time of flight can rely on measuring the time light needs to travel from the source (e.g., the IR emitter of electronic component 116(5) to the object and then back to the IR detector/sensor (e.g., camera) of electronic component 116(5). The sensor can measure the time the light has taken to travel and a value of the distance can be established. ToF techniques tend to utilize an optical pulse or a train of pulses. In addition, there is often a desire for the emitted beam to have a certain profile (this reduces “multipath” issues with the camera).

Using a multi-module architecture, it is possible to place the depth map LED or LEDs using the same arrangement as the LEDs for eye tracking but facing the real world. The same techniques used in eye tracking can be used for illuminating the real world. However, if a more “structured illumination” is desired, it is possible to have an array of LEDs that are partially collimated by a reflector. In that case, each LED can illuminate part of the real world and depending on the pattern desired, different LEDs can be activated. Structured illumination can be achieved by means of a partially reflective optical surface that combines a collimating component and a diffractive optical element (DOE) that creates the structured illumination pattern.

In the illustrated configuration, the ET illumination is accomplished with ET illumination module 208 and ET detection is accomplished with ET detection module 210. In contrast, depth map module 212 provides both illumination and detection functionalities in a single module. In a similar fashion a single ET module could combine the components of ET illumination module 208 and ET detection module 210 into a single module. Such a configuration is described below relative to FIG. 3 and FIG. 11.

The description above explains that the present concepts allow for pick and match electrical and optical components as modules to achieve desired functionalities, such as RGB display modules, depth sensing modules, and/or eye tracking modules, among others. These modules can be distributed and dispersed across the visual assembly 104 so that each functionality is achieved without compromising other functionalities. For instance, the eye tracking modules do not (perceptibly) compromise the quality of the RGB display perceived by the user. This distributed and dispersed module placement is described in more detail below relative to FIG. 3.

From another perspective, the present concepts offer a pallet of different components that can be unobstructive or minimally obstructive to the user so that the user can still see the environment (e.g., receive ambient visible light from the environment without noticeable interference). For instance, the electronic components 116 can have dimensions in the x and y reference directions less than 200 microns and in some implementations less than 100 microns, and in some implementations less than 10 microns. Electronic components of this size are so small that they are not visible to the user and are small enough that the user tends not to perceive any visual degradation of real-world images formed from ambient light 110 passing through the visual assembly 104 as long as the components are dispersed rather than clumped together.

Depending on the HMD design parameters, different electronic and/or optical components can be placed in front of the user across (e.g., interspersed throughout) the visual assembly 104. These components can achieve various functionalities including: ET detection, ET illumination, monochrome display, RGB/multicolor display, and/or IR depth sensing, among others, while permitting ambient light to pass through to the user’s eye. The electronic components, given their diminutive size may not individually have the emitting or detecting capabilities of larger (e.g., traditional macroscopic components). However, the components can be operated collectively. For instance, individual electronic devices can contribute to a portion of the eye box rather than the entire eye box. When analyzed collectively the distributed arrangement of the electronic components can provide high quality RGB images, eye tracking, and/or 3D mapping, consistent with specified design parameters.

The visual assembly 104 can be manufactured utilizing various techniques. For instance, the electrical layer 108 and the optical layer 202 can each be formed individually and then associated with one another. The electrical layer 108 can be made on a plastic (e.g., first) substrate with transparent wires (e.g., Indium Tin Oxide (ITO) lines). Using pick and place, different electronic components can be soldered on this substrate. ITO wires could be used in a “bus arrangement” so that the number of electrodes is reduced/minimized.

The optical layer 202 can be used to collimate light, focus, defocus and/or diffuse light. The optical layer can include multiple lenses, mirrors, and/or diffraction elements/components that can be positioned on, and/or formed from, a substrate (e.g., second substrate). For example, light from the ET IR LEDs could be partially collimated by mirrors and/or lenses so it more effectively covers the eye box. Alternatively, light from an RGB display could be collimated so it acts as a near eye display. Once completed, an adhesive (not specifically shown in FIG. 2) can be applied to one or both of the electrical layer 108 and the optical layer 202 and they can be secured together. This configuration lends itself to both planar visual assemblies (in the xy reference directions), curved visual assemblies, and visual assembly implementations that include both planar regions and curved regions as illustrated in FIG. 2.

FIG. 3 shows another example HMD device 100B that includes a major surface (generally along the xy reference plane) of the visual assembly 104. This view shows how the various modules introduced relative to FIG. 2 can be distributed and dispersed on the visual assembly 104. In this implementation, the ET illumination module 208 and ET detection module 210 of FIG. 2 are replaced by a single ET module 302. However, the description is equally applicable to the separate and distinct modules 208 and 210 described relative to FIG. 2.

In this configuration, the various modules are placed side-by-side (e.g., adjacent to one another). A majority of the modules can be dedicated to generating an RGB image for the user (e.g., RGB display modules 206). Other module types can be interspersed with the RGB display modules 206. This interspersing of module types can occur across the entire visual assembly 104 rather than just on the periphery because the size of the modules can be small enough that not all modules are required to contribute to RGB image generation and the modules do not interfere perceptibly with RGB light and/or ambient light.

In the illustrated case, modules can be arranged and managed in groups of seven that approximate a circle as indicated at 302. In this case, five of the seven positions in the circle are occupied by RGB display modules 206(1)-206(5). One position is allocated to eye tracking module 302(2) and the last position is allocated to depth mapping module 212(3). Because of the small size of the modules, this configuration can provide the same visual experience as if all seven positions were occupied by RGB display modules 206. Note that this illustrated configuration is provided for purposes of example and many other ratios of modules can be employed beyond the illustrated 5:1:1 ratio. For instance, another implementation can manage a 10×10 array of modules and employ 98 RGB display modules to one eye tracking module and one depth mapping module, for example.

One aspect of the inventive concepts is the use of an array of mini-lenses and/or mini-mirrors. Each lens can be used as a mini projector or a mini camera. This means that traditional eye tracking cameras and traditional eye tracking illuminators can be replaced by a group of ET modules that are interspersed across the visual assembly, such as among the RGB display modules (e.g., dispersed and distributed among the RGB display modules) and collectively contribute to the eye tracking functionality. Similarly, a traditional infrared (IR) illuminator for the environment can be replaced by a group of depth map modules that are interspersed among the RGB display modules (e.g., dispersed and distributed among the RGB display modules) and collectively contribute to a depth mapping functionality.

As mentioned above, one difference between solutions based on the present concepts and traditional solutions is the small size (e.g., visually imperceptible) and the “distributed” nature of the modules. This allows the visual assembly to have more flexibility and significantly smaller thickness (e.g., thinner).

FIGS. 46 show more details of example HMD devices relating to eye tracking. Eye tracking can be essential in many HMD devices. It can be used to understand the user’s interaction with the environment and can be used as an input device. Many existing HMD devices can use eye tracking to improve image quality as the image is optimized for the specific location of the user’s eye.

There are many existing eye tracking techniques. One of the most common existing techniques uses a ring of IR LEDs along the periphery of the visual assembly. The IR LEDs behave like point sources and emit light towards the user’s cornea. Light from the cornea is reflected towards a camera. By imaging the reflection of the LEDs, a ring is formed into the camera and the position of the cornea (and thus of the eye) can be determined.

Reflecting LEDs on the cornea works well. However, there is a major drawback of this traditional technique. The traditional system performs better when both the camera and the LEDs are in front of the user. This is of course challenging for a VR or AR display where the user should not have any occlusions between their eye and the HMD device’s optics. The traditional approach is to bring the ET camera as close to the nose as possible while attaching the LEDs in the rim of the display optics (waveguide or refractive optics). These traditional implementations work well; however, as the display optics increase in size (for covering a larger FoV) and the display becomes thinner (for ID purposes) the LEDs move way too close to the eyebrows and cheeks while the camera sees the reflections at a very oblique angle.

The present concepts offer improved performance. As introduced above relative to FIGS. 2 and 3, a potentially key aspect of the inventive concepts is the use of many and smaller (e.g., microscopic) light sources and detectors distributed and dispersed across the visual assembly 104. By using multiple distributed pairs of components to create the illumination and detection of the glint, the LEDs and detectors can be sufficiently small (e.g., less than 100 um) to become invisible to the human eye.

In FIG. 4 the visual assembly 104 of HMD device 100C includes electrical layer 108. A portion of the electrical layer 108 is shown with one electronic component 116 positioned in front of the eye 114 in the user’s field of view (FoV). In this case, the electronic component 116 is an IR LED 402 that is oriented to emit IR light directly toward the user’s eye 114. This configuration can achieve high efficiency because all of the IR light is directed towards the eye box (112, FIG. 1B).

FIG. 5 shows an alternative configuration on HMD device 100D where the electronic component 116 is manifest as IR LED 402 that is positioned in the user’s FoV. IR LED 402 is oriented to emit IR light away from the user’s eye 114. In this case, optical layer 202 includes optical component 204 in the form of a partially reflective mirror (e.g., hot mirror) 502. The partially reflective mirror 502 can reflect the IR light back toward the user’s eye 114. The partially reflective mirror 502 can have an optical shape that reflects the IR light back toward the user’s eye in a pattern that mimics the IR light being emitted from a virtual point source 504 that is farther from the eye than the visual assembly 104. Thus, the use of the partially reflective mirror 502 allows the HMD device 100D to be positioned closer to the user’s eye while still generating the desired eye tracking IR patterns on the user’s eye 114.

The illustrated configuration directs IR light away from the eye and reflects the IR light from partially reflective mirror (e.g., hot mirror) and towards the eye. While this indirect route may reduce efficiency (as the reflector may be less than 100% efficient) it allows for creating a virtual source that may be more convenient for ET purposes. In addition, multiple lenses can be used to create the same virtual source but formed by multiple emitters. This aspect is shown in FIG. 6.

FIG. 6 shows an alternative configuration on HMD device 100E that builds upon the concepts discussed relative to FIG. 5. This configuration shows two IR LEDs 402(1) and 402(2) associated with electronic components 116(1) and 116(2), respectively. Note that a discontinuity is shown in the visual assembly 104 to indicate that there can be intervening electronic components and optical components that are discussed above relative to FIGS. 2 and 3, but are not shown to avoid clutter on the drawing page.

In this case, the partially reflective (e.g., hot) mirrors 502(1) and 502(2) are configured to operate with their respective IR LEDs 402(1) and 402(2) to collectively create an IR image extending toward the user’s eye. For instance, each IR LED and hot mirror pair (e.g., ET illumination module 208) can illuminate a portion of the eye box (112, FIG. 1). Stated another way, the partially reflective mirrors 502(1) and 502(2) collectively create an IR image that appears to emanate from a single point source (e.g., virtual point source 504). This single image can provide more complete reflection and hence more information about a larger portion of the eye (e.g., eye box) than can be achieved with a single IR LED 402. Alternatively, both IR illumination modules could be directed to the same portion of the eye box to create a higher intensity IR image at that portion than could be achieved with either IR illumination module alone. In either case, a single ET illumination module 208 is not required to solely illuminate the entire eye box. Higher light intensity can be achieved by focusing individual illumination modules 208 on individual areas of the eye box so that collectively the entire eye box is covered with IR light of a desired intensity, even though none of the individual modules in isolation have such capability.

The implementations described above include a single electronic component 116 of a given type, such as LEDs, per optical component 204. Other implementations can have multiple electronic components 116, such as LEDs associated with individual optical components 204, such as partially reflective lenses. These LEDs can be controlled in various way to achieve various functionalities. For instance, all of the LEDs could be powered on and off simultaneously for eye tracking illumination to achieve higher IR intensity.

In other cases, the LEDs could be controlled separately. For instance, the LEDs could be powered on and off sequentially. These LEDs can be used; (a) for forming part of a sensing ring of IR LEDs along the periphery of the visual assembly; and/or (b) be wobbulated so the performance of the device increases (e.g., increase in resolution or determination of other optical properties, like the position on the cornea illuminated). Such a configuration is described below relative to FIGS. 7A-7D.

FIGS. 7A-7D collectively show details relating in inventive concepts introduced above. FIG. 7A shows another example HMD device 100F. FIGS. 7B-7D show representations of emitted and sensed IR light from the HMD device 100F. In this implementation, HMD device 100F can be viewed as a hybrid device that has IR LEDs distributed and dispersed on the visual assembly. IR reflections from the user’s cornea 704 can be captured by one or more IR sensors (e.g., cameras) 702 that are positioned around the periphery of the visual assembly 104, such as on the housing 102.

In this configuration, multiple (e.g., three) IR LEDs 402 are positioned in eye tracking module 302. The IR LEDs 402 can have dimensions D in the x and y reference directions of anywhere from 10 microns to 200 microns and thus are not visible to the user. The IR LEDs 402 can be positioned close together as indicated by gap G, such as in tens to hundreds of microns apart. The space between the IR LEDs can be occlusive if their separation is on the smaller end or transparent if their separation is larger end.

The multiple IR LEDs 402(1)-402(3) can be switched on sequentially or simultaneously. When switched on sequentially there is less demand on the spatial response of the IR sensor (e.g., camera) 702 and/or the IR LEDs. When switched on simultaneously there is more demand on the temporal response of the IR sensor and IR LEDs. In some configurations, such as the wobbulation configuration mentioned above, during a sampling period or cycle, each IR LED is activated for a subset of the cycle (e.g., in this example one-third of the cycle). The sensed IR reflections can be analyzed collectively to provide more accurate eye information than can otherwise be obtained.

The three IR LEDs 402 in this example form a simple triangle. By detecting the shape of the triangle at the IR sensor 702, other parameters of the HMD device 100F can be determined. For instance, these parameters can include the distance between corneal surface 704 and the ET module 302 (e.g., between the eye and the electronic components 116). This distance information can also provide information about a local slope of the eye/cornea. While one ET illumination module 208 may, by itself, not allow the IR sensor 702 to provide accurate distance, position, and/or slope information, multiple ET illumination modules 208 distributed and disbursed with multiple ET sensing modules can provide information sensed by the IR sensor 702 that when analyzed collectively is accurate.

FIG. 7B shows a representation of sequential IR emissions 706 from IR LEDs 402(1), 402(2), and 402(3). FIG. 7C shows a representation of the IR detections 708 of the IR emissions 706 as captured by IR sensor 702. FIG. 7D shows a representation of the IR detections 708 superimposed on the IR emissions 706. The differences or deltas 710 show changes in shape, location, and angular orientation. These changes can be caused by the user’s eye and can provide useful information about the eye location, shape, etc. at a resolution greater than would otherwise be achieved.

One example technique for obtaining this higher accuracy eye information can utilize the three sequential IR detections 708. The detected images can be deconvolved to produce a high-resolution image, even though the individual images are relatively low resolution. Deconvolution can be used to improve the modulation transfer function (MTF)/point spread function (PSF) of a low-quality optical system. One such technique can employ multiple IR detectors rather than a single detector. The combination of multiple LEDs being controlled and sensed by multiple detectors will provide more accurate information about the eye.

One such example multi-detector is a quadrant detector. Quadrant detectors have four active photodiode areas defining four quadrants. The four active photodiode areas can sense the centroid of an object (e.g., blob) in the four quadrants. Quadrant detectors operate at high frequencies, such as mega Hertz frequencies. As such, quadrant detectors can be used to detect fast eye movement, such as saccades. Some implementations may employ charge coupled devices (CCDs) or complementary metal oxide semiconductors (CMOS) sensors for general IR imaging purposes and quadrant detectors for detecting rapid eye movements.

The same or similar approach described above can be used to reduce the requirement for the IR sensor 702. For example, by using an IR sensor with, for example, 10×10 pixels and an IR LED array of (12×12) pixels the resolution could be enhanced to approximatelyl 20×120 pixels. Effectively getting N×M super-resolution where N is the number of IR detectors and M is the number of IR LEDs to get increased resolution in eye position.

The present concepts also provide enhanced pupil imaging for both “bright pupil” (retinal retroreflection) imaging and “dark pupil” imaging. Retinal retroreflection relates to the IR light that reflects off the retina straight back toward the source. When the IR sensor is close to the IR source and both are close to the optical axis, retinal retroreflection is more effective. Due to demographic differences, some pupils are easier to image with dark pupil while some are easier to image with bright pupil imaging. Bright pupil methods tend to work better for some demographics than others. However, dark pupil imaging tends to work better for other demographics. The present concepts can position IR emitters and IR sensors throughout the optical assembly including proximate to the optical axis. Thus, the present concepts can enable employment of both of these techniques via multiple distributed IR LEDs 402 and multiple IR sensors 702 to achieve accurate eye tracking regardless of the user demographics.

FIG. 8 shows another example HMD device 100G that illustrates that an IR sensor 702 can sense a portion of the eye box via partially reflective mirror 502. The partially reflective mirror 502 can function as the IR sensor’s lens in the illustrated configuration. The IR sensor 702 can be a single IR sensor, or multiple IR sensors. For instance, the detector could be an IR photodetector array. The use of multiple IR sensors operating cooperatively can provide higher resolution data than a single sensor as described above and below.

The illustrated IR sensor 702 can sense an individual portion of the eye box while other IR sensors sense other portions of the eye box. FIG. 9 illustrates this aspect. In FIG. 9, HMD device 100H is similar to HMD device 100G except that two IR sensors 702 are illustrated with two partially reflective mirrors 502. The orientation of individual partially reflective mirrors 502 can be adjusted so that each IR sensor and partially reflective mirror pair senses a different part of the eye box. This difference in orientation causes IR sensor 702(1) to receive IR light at angles one and two and IR sensor 702(2) to receive IR light at angles three and four. The entire eye box can be sensed by integrating the data from the various IR sensors. While only two IR sensors and partially reflective mirror pairs are illustrated, hundreds or thousands of pairs may be employed.

FIG. 10 shows another example HMD device 100I that illustrates how the present implementations can enable both bright and dark pupil imaging simultaneously using distributed and dispersed IR LEDs 402 and IR sensors 702. In this configuration multiple IR LEDs can emit light that upon reflection can be sensed by one or more of the IR sensors 702. At the illustrated point in time, IR LED 402(3) emitted light that reflected back from the user’s retina and is sensed by IR sensor 702(3) and can be processed consistent with bright pupil techniques. Meanwhile, IR light from IR LED 402(1) could be sensed by IR sensor 702(1) and processed consistent with dark pupil techniques. Finally, the IR light from IR LED 402(5) is sensed by IR sensor 702(5) and can be processed collectively with information from IR sensor 702(1). This combination of emission and detection from IR LEDs and sensors interspersed across the visual assembly 104 can ensure accurate eye tracking throughout the user experience. This can occur even if some of them are blocked or do not provide determinative data because of eye color issues, eye lid position, etc.

Note that for ease of explanation, the electronic components of the electrical layer 108 have generally been illustrated in a single layer, such as an IR sensor 702 adjacent to an IR LED 402 along the xy reference plane. However, other implementations can stack electronic components in the z direction. One such example is described below in relation to FIG. 11.

FIG. 11 shows another example HMD device 100J. In this case, IR LED 402 and IR sensor 702 are stacked in the z reference direction (e.g., parallel to the optical axis) on the electrical layer 108. In this configuration, the IR LED 402(1) is emitting light toward the eye 114. IR light reflected back from the eye is focused by partially reflective mirror 502(1) onto IR sensor 702(1). Similarly, the IR LED 402(2) is emitting light toward the eye 114. IR light reflected back from the eye is focused by partially reflective mirror 502(2) onto IR sensor 702(2). The electronic components may tend to obstruct ambient visible light from the environment more than those areas of the electrical layer without electronic components. Thus, stacking electronic components tends to increase the ratio of unobstructed areas to potentially or somewhat obstructed areas as indicated on FIG. 11.

In the same way that IR LEDs 402 can direct IR illumination towards the user, the IR sensor 702 may be configured to image a particular area of the eye box. Because of the simplicity of optics (a single reflector vs multiple refractive elements in an ET camera) the FoV of the IR sensor can be relatively small to reduce aberrations.

As mentioned, the field of view of each IR sensor 702 can be less than a traditional sensor positioned on the housing. This is not an issue because data from multiple sensors can be used to collectively capture the entire eye box. Note that in practice the FoV of the two (or more) lenses may require some overlap. This is because the lenses are not at infinity compared to the position of the eye and thus the potential need to capture a wider FoV per lens.

It is also possible to combine the use of the IR LED 402 and IR sensor 702 in a single lenslet. This configuration can minimize the occlusions caused as the LED and sensor occupy the same space. It may also bring some advantages in terms of geometry as the source and detector will be at the same point.

Note also that the present concepts offer many implementations. For instance, in HMD device 100J of FIG. 11, the IR LEDs 402 face toward the eye and the IR sensors 702 face the opposite way and receive IR light that is reflected from the user’s eye and reflected again by the partially reflective mirror 502. Alternatively, the components could be swapped so that the IR LEDs 402 could emit toward the partially reflective mirrors 502. IR light reflected by the partially reflective mirrors 502 and again off of the user’s eye could be detected by IR sensors 702 (potentially with the aid of a small lens, which is not specifically shown).

The same or similar arrangements can work with a transmissive or a combination of transmissive and reflective optical components. In addition, other optical components (diffractive, holographic, meta-optics) could be employed.

Consistent with the present implementations various coatings can be employed on the partially reflective mirrors 502 when ET and depth sensing IR illumination is used. For instance, the coatings can be dielectrics and tuned to a particular wavelength. That can improve the transparency of the combiner when used in an AR system.

It is also possible to combine the functions of ET, depth sensing and RGB display in a single element. This aspect is discussed in more detail below relative to FIG. 15.

FIG. 12 shows an alternative arrangement to FIG. 11. In this case, the example HMD device 100K positions the IR LEDs 402 away from the eye 114. IR light emitted by the IR LEDs 402 is reflected back toward the eye by the partially reflective mirrors 502. IR light that reflects from the eye is focused by lenses 1002 onto the IR sensors 702.

The discussion above relative to FIGS. 2 and 3 explains that the present distributed and dispersed module concepts can be applied to eye tracking and depth sensing among other functionalities. FIGS. 412 explain detailed configurations of multiple implementations relating to eye tracking. Those details are also applicable to depth sensing. One such example is shown relative to FIG. 13.

FIG. 13 shows another example HMD device 100L that can provide depth sensing on visual assembly 104. In this case the IR LEDs 402 are facing toward the eye 114. Partially reflective mirrors 502 on the optical layer 202 are oriented to reflect the IR light back toward the environment (e.g., away from the eye 114) as if the IR light was emitted from virtual point source 504 on the eye side. IR light reflected back from the environment can be detected by IR sensors 702, such as CMOS stereoscopic depth sensors, among others.

FIG. 14 shows another example HMD device 100M that can provide eye tracking and can generate color images to the user’s eye 114. In this case, multiple LEDs 1402 are distributed across the electrical layer 108. In this configuration, all of the LEDs are the same in that they emit light with the same properties. A determinative layer 1404, such as a quantum dot matrix is positioned relative to the LEDs. The determinative layer can have localized differences that affect the light that passes through the determinative layer from the individual LEDs 1402. For instance, the determinative layer 1404 can cause emitted light from LED 1402(1) and 1402(11) to be IR (T, on the FIGURE) light, while light from the remaining LEDs can be dedicated to visible RGB light. Stated another way, the electronic components (e.g., the LEDs 1402) can be generic for multiple modules of the electrical layer 108. The determinative layer 1404 positioned over individual modules can define a functionality of the module, such as the wavelength(s) of light emitted by the module.

In some cases, the IR light can be uniformly emitted across the visual assembly 104 (e.g., a ratio of IR emitters to RGB emitters can be uniform across the visual assembly). In other cases, the ratios of visible light and IR light may be different for different regions of the visual assembly 104.

In one such example of the latter configuration, visible light may be produced in higher concentrations proximate to the optical axis (e.g., less IR light) for enhanced image quality. Further from the optical axis, a percentage of IR light to RGB light can increase. Stated another way, the ratio of RGB emitters to IR emitters can be higher proximate to the optical axis and lower farther from the optical axis. The user tends to look along the optical axis and foveal regions along the user’s line of sight can have a higher concentration of RGB light output to provide a higher possible image quality that can be offered by high RGB density. Further from the optical axis, the user’s visual acuity tends to be less and more resources can be dedicated to eye tracking without affecting the perceived image quality. In some device configurations, the IR/RGB ratios can be static (e.g., unchangeable). Other device configuration can offer dynamically adjustable ratios. For instance, the initial configurations can be dynamically changed in some configurations, such as if the eye tracking indicates the user is looking to the side rather than straight in front. Such an example device configuration is described relative to FIG. 15.

FIG. 15 shows another example HMD device 100N that can provide eye tracking and can generate color images to the user’s eye 114. This example includes multiple adjacent (but potentially spaced apart) modules 1502 on the visual assembly 104. In this case, each module 1502 includes an LED light source that can produce IR and RGB light. Each module 1502 also includes a light detector that can detect IR and/or visible light. Each of these modules can be powered and/or controlled via one or more conductors (not specifically designated) in the visual assembly. Individual modules 1502 can be dynamically controlled to contribute to RGB images, eye tracking, or powered off, depending upon various parameters, such as eye gaze direction and/or foveation, among others. In some cases, the module may contribute to image generation for an entire cycle of image generation (e.g., frame). In other cases, the module may contribute to image generation for a sub-cycle of image duration and contribute to another function, such as eye tracking during another sub-cycle. Alternatively, the functionality may change if the user looks toward or away from the individual module.

The discussion above emphasizes emitting visible light or IR light, however, the LEDs 1402 can be controlled to selectively emit one or more of several IR wavelengths. This can allow different properties of each wavelength to be leveraged depending on the conditions and/or function. For instance, some wavelengths can provide better directional sensitivity than others to determine where the light is coming from. Further, different wavelengths can help with imaging the eye. For example, retinal images can be enhanced by using different wavelengths. Utilizing multiple IR wavelengths can facilitate distinguishing retinal reflections from corneal reflections. Conditions can also influence which IR wavelengths to utilize. For instance, some IR wavelengths are more affected by environmental factors. For example, 940 nm wavelength IR light is less affected by sunlight than lower wavelength IR light. Thus, 940 nm wavelength IR light could be employed outside in bright conditions and 830 nm wavelength IR light could be employed in lower light conditions, such as indoor environments.

FIG. 16 shows a system 1600 that includes HMD device 100P that is similar to HMD device 100 described above relative to FIGS. 1A and 1B. As introduced above, the HMD device 100P can include housing (e.g., frame) 102 that positions the visual assembly 104 in line with the user’s eye 114 along the optical axis. The electrical layer 108 and/or the optical layer 202 can include multiple microscopic (e.g., invisible to the user) components that are distributed across the visual assembly in the user’s and/or the device’s FoV including along the optical axis. The components can operate as modules that achieve specific functions, such as eye tracking, 3D sensing, and/or RGB image generation yet are imperceptible to the user.

The HMD device 100P can also include a controller 1602, a processing unit 1604, storage and/or memory 1606, a communication unit 1608, and/or a power supply 1610. In some implementations controller 1602 may include the processing unit 1604 and the memory 1606. The controller can utilize the memory for storing processor readable instructions and/or data, such as user data, image data, sensor data, etc. The communication unit 1608 can be communicatively coupled to the processing unit 1604 and can act as a network interface for connecting the HMD device to another computer system represented by computer 1612. The computer 1612 may include instances of any of the controller 1602, processing units 1604, memory 1606, communication units 1608, and power supplies 1610. The HMD device 100P may be robust and operate in a stand-alone manner and/or may communicate with the computer 1612, which may perform some of the described functionality.

Controller 1602 may provide commands and instructions, such as driving power to the electronic components 116 to generate visible and/or non-visible light. Similarly, the controller can receive data from sensors, such as IR sensors 702. The controller can use the data to identify information about the eye (e.g., eye tracking) and/or the environment (e.g., 3D mapping).

The controller 1602 can analyze the data from the sensors to identify features of the cornea and/or retina, such as by detecting glints of light and/or other detectable features associated with the user’s eye, to determine the pupil position and gaze direction of the eye.

The storage/memory 1606 can include an optics model 1614 and/or measured performance (e.g., deviation data) 1616. The optics model 1614 can be derived from the design specifications of the HMD device and the distributed and dispersed arrangement of the various modules. Recall that the eye information from any individual eye tracking module or 3D mapping module may not be as robust as traditional designs positioned outside the FoV. The controller can analyze the eye information collectively to identify meaningful eye information.

The controller 1602 can use this eye information to control the modules. For instance, the controller may increase image resolution generated by RGB LEDs in foveated regions and decrease image resolution outside the foveated regions. Similarly, the controller can use eye movement to increase resolution in regions of the visual assembly the eyes are moving toward and decrease resolution in regions the eyes are moving away from.

In some implementations, the controller 1602 may also employ artificial intelligence algorithms, such as neural networks, for analyzing sensor data from the distributed sensors. The data from any one sensor may be rather rudimentary, yet the artificial intelligence algorithms can collectively analyze data from the available sensors to find meaningful patterns that are not apparent with traditional analytics.

Processing unit 1604 may include one or more processors including a central processing unit (CPU) and/or a graphics processing unit (GPU). Memory 1606 can be a computer-readable storage media that may store instructions for execution by processing unit 1604, to provide various functionality to HMD device 100P. Finally, power supply 1610 can provide power for the components of controller 1602 and the other components of HMD device 100P.

The term “device”, “computer,” “computing device,” “client device,” “server,” and/or “server device” as used herein can mean any type of device that has some amount of hardware processing capability and/or hardware storage/memory capability. Processing capability can be provided by one or more hardware processing units 1604 and/or other processors (e.g., hardware processing units/cores) that can execute computer-readable instructions to provide functionality. Computer-readable instructions and/or data can be stored on persistent storage or volatile memory. The term “system” as used herein can refer to a single device, multiple devices, etc.

Memory 1606 can be storage resources that are internal or external to any respective devices with which it is associated. Memory 1606 can include any one or more of volatile or non-volatile memory, hard drives, flash storage devices, and/or optical storage devices (e.g., CDs, DVDs, etc.), among others. As used herein, the term “computer-readable media” can include signals. In contrast, the term “computer-readable storage media” excludes signals. Computer-readable storage media includes “computer-readable storage devices.” Examples of computer-readable storage devices include volatile storage media, such as RAM, and non-volatile storage media, such as hard drives, optical discs, and flash memory, among others, which may constitute memory 1606.

In some cases, the HMD devices are configured with a general-purpose hardware processor and storage resources. In other cases, a device can include a system on a chip (SOC) type design. In SOC design implementations, functionality provided by the device can be integrated on a single SOC or multiple coupled SOCs. One or more associated processors can be configured to coordinate with shared resources, such as memory, storage, etc., and/or one or more dedicated resources, such as hardware blocks configured to perform certain specific functionality. Thus, the term “processor,” “hardware processor” or “hardware processing unit” as used herein can also refer to central processing units (CPUs), graphical processing units (GPUs), holographic processing units (HPUs), controllers, microcontrollers, processor cores, or other types of processing devices suitable for implementation both in conventional computing architectures as well as SOC designs.

Alternatively, or in addition, the functionality described herein can be performed, at least in part, by one or more hardware logic components. For example, and without limitation, illustrative types of hardware logic components that can be used include Field-programmable Gate Arrays (FPGAs), Application-specific Integrated Circuits (ASICs), Application-specific Standard Products (ASSPs), System-on-a-chip systems (SOCs), Complex Programmable Logic Devices (CPLDs), etc.

In some configurations, any of the code discussed herein can be implemented in software, hardware, and/or firmware. In any case, the code can be provided during manufacture of the device or by an intermediary that prepares the device for sale to the end user. In other instances, the end user may install these code later, such as by downloading executable code and installing the executable code on the corresponding device.

Also note that the components and/or devices described herein can function in a stand-alone or cooperative manner to implement the described techniques. For example, the methods described herein can be performed on a single computing device and/or distributed across multiple computing devices that communicate over one or more network(s). Without limitation, such one or more network(s) can include one or more local area networks (LANs), wide area networks (WANs), the Internet, and the like.

Example Methods

FIG. 17 illustrates an example method 1700, consistent with the present concepts. Method 1700 can be implemented by a single device, e.g., HMD device 100, or various steps can be distributed over one or more servers, client devices, etc. Moreover, method 1700 can be performed by one or more components, such as a controller and/or by other components and/or devices.

At block 1702, the method can operate non-visible light emitters and sensors distributed across a transparent visual assembly of an HMD device with visible light emitters.

At block 1704, the method can identify properties of an eye of a user wearing the HMD device based at least in part from data from the non-visible light sensors.

At block 1706, the method can update operation of at least one of the non-visible light emitters and sensors or the visible light emitters based at least in part upon the properties of the eye of the user identified from the data from the non-visible light sensors.

Various examples are described above. Another example relates to an eye tracking system where the illumination is placed on a see-through transparent substrate (e.g., combiner) and directed towards the users’ eye.

Another example includes an eye tracking system where the illumination is placed on a see-through transparent substrate and pointed towards the real world. A reflector (e.g., IR selective reflector or partial mirror) collimates or partially collimates the LED illumination towards an eye box of an HMD device.

Another example taken alone or in combination with any of the above or below examples includes an eye tracking system where multiple LEDs are placed on a see-through transparent substrate and pointed towards the real world. A different type reflector is used for each LED so an entire eye box is illuminated by combining the illumination from multiple LEDs.

Another example taken alone or in combination with any of the above or below examples includes an eye tracking system where the IR light detector (camera or a single detector) is using a reflector embedded into the combiner to collimate and focus the beam on the detector.

Another example taken alone or in combination with any of the above or below examples includes an eye tracking system where both bright and dark images are imaged simultaneously.

Another example taken alone or in combination with any of the above or below examples includes an eye tracking system that uses multiple wavelengths.

Another example taken alone or in combination with any of the above or below examples includes an eye tracking system where multiple IR light detectors (camera or a single detector) are using different type reflectors embedded into the combiner to collect light from different parts of the eye box, and focus it on the detectors.

Another example taken alone or in combination with any of the above or below examples includes a system where the reflector is combined with other non-reflective optics.

Another example includes an eye tracking system where there is a plurality of LED (or display pixels) and detector (or camera pixel) arrays. Each LED or detector array faces an embedded reflector that collimates the outcoming or incoming light to or from the eye box. By combining multiple LEDs and detectors an improvement in resolution can be achieved.

Another example taken alone or in combination with any of the above or below examples includes an eye tracking system where there is a plurality of LED (or display pixels) and detector (or camera pixel) arrays. Each LED or detector array faces an embedded reflector that collimates the outcoming or incoming light to or from the eye box. By combining multiple LEDs and detectors an improvement in resolution can be achieved. Each LED or detector is activated at a different time so temporal resolution can be used to improve spatial resolution of the ET system.

Another example taken alone or in combination with any of the above or below examples includes an ET system where each LED source is composed of a number of sub-elements/pixels. By imaging these pixels on an ET camera and measuring the distortion of the IR pattern, more information can be obtained about the reflective surface (i.e., cornea).

Another example taken alone or in combination with any of the above or below examples includes a depth sensing system (such as Time of flight) where the “flood illumination” LEDs are attached on the combiner of the display and point directly towards the real world.

Another example taken alone or in combination with any of the above or below examples includes a depth sensing system (Time of flight or stereo) where the “flood illumination” LEDs are attached on the combiner of the display and point directly towards the user and then are reflected to the real world by an IR/partial mirror. This allows for the beam to have specific profile when illuminating the real world.

Another example taken alone or in combination with any of the above or below examples includes a depth sensing system (Time of flight or stereo) where an array of illumination LEDs are attached on the combiner of the display and point directly towards a reflector and then reflected to the real world by an IR/partial mirror. By switching different LEDs/pixels ON/OFF, it is possible to create a structured illumination that can enable or enhance depth sensing.

Another example taken alone or in combination with any of the above or below examples includes a depth sensing system where the camera is embedded into the combiner of the HMD device.

Another example taken alone or in combination with any of the above or below examples includes a depth sensing system where multiple cameras are embedded into the combiner of the HMD device. Each camera can cover part of the environment with different resolution or FoV.

Another example includes an HMD device that uses a plurality of mini-lenses to create the virtual image into the user’s eye. Such a system can contain lenses that (a) form the image into the user’s eye (b) enable ET by the use of emitters and sensors embedded into the mini lenses (c) facilitate or enhance depth sensing by providing lenses that emit light into the environment or sensors that collect light from the environment.

Another example includes a head mounted display device comprising a housing configured to be positioned relative to a head and eye of a user and a visual assembly positioned by the housing in front of the user’s eye, the visual assembly comprising an electrical layer comprising side-by-side electronic components, individual electronic components configured to emit or detect light and an optical layer comprising side-by-side optical components, individual optical components configured to refract or reflect or diffract light relative to individual electronic components.

Another example can include any of the above and/or below examples where the electrical layer and the optical layer are formed on a single substrate or wherein the electrical layer comprises a first substrate and the optical layer comprises a second substrate, and wherein the first and second substrates are positioned against one another or wherein the first and second substrates are spaced apart from one another.

Another example can include any of the above and/or below examples where the optical layer is transparent.

Another example can include any of the above and/or below examples where at least some of the electronic components and optical components contribute to eye tracking of the eye of the user.

Another example can include any of the above and/or below examples where the electrical layer is positioned proximate to the user relative to the optical layer.

Another example can include any of the above and/or below examples where individual electronic components are paired with individual optical components as modules to achieve specific functionalities.

Another example can include any of the above and/or below examples where the specific functionalities include eye tracking illumination, eye tracking detection, image generation, 3D illumination, and/or 3D detection.

Another example can include any of the above and/or below examples where an individual eye tracking illumination pair comprises an individual electronic component that emits non-visible light away from the user’s eye and an individual optical component that redirects the non-visible light back towards the user’s eye.

Another example can include any of the above and/or below examples where an individual eye tracking detection pair further comprises a lens that receives the non-visible light reflected from the user’s eye and focuses the non-visible light toward another individual electronic component that senses the non-visible light reflected back from the user’s eye.

Another example can include any of the above and/or below examples where the another electronic component faces the user’s eye or wherein the another electronic component is positioned behind the electronic component.

Another example can include any of the above and/or below examples where eye tracking illumination pairs and individual eye tracking detection pairs are distributed across the visual assembly.

Another example includes a head mounted display device comprising a housing configured to be positioned relative to a head and eye of a user and a transparent visual assembly positioned by the housing in front of the user’s eye and comprising multiple eye tracking illuminators distributed across the transparent visual assembly and configured to emit non-visible light and multiple eye tracking detectors distributed across the transparent visual assembly and configured to detect the non-visible light reflected back from the eye of the user.

Another example can include any of the above and/or below examples where the eye tracking illuminators are configured to emit the non-visible light in a direction away from the eye of the user.

Another example can include any of the above and/or below examples where the transparent visual assembly further comprises optical components that include non-visible selective reflectors that are configured to collimate the non-visible light in an eye box defined by the head mounted display device.

Another example can include any of the above and/or below examples where the optical components are configured to operate cooperatively to illuminate an entire eye box for the user.

Another example can include any of the above and/or below examples where other optical components are distributed across the transparent visual assembly and configured to cooperatively generate a visual image in the eye box.

Another example can include any of the above and/or below examples where other optical components are configured to generate the visual image simultaneously to the optical components illuminating the entire eye box with the non-visible light.

Another example can include any of the above and/or below examples where the optical components, the other optical components, and the additional optical components are interspersed across a field of view of the transparent visual assembly.

Another example can include any of the above and/or below examples where the eye tracking illuminators are configured to emit the non-visible light in a direction toward the eye of the user.

Another example comprises a system that includes a visual assembly configured to be positioned in front of an eye of a user and comprising multiple eye tracking illuminators distributed across the visual assembly and configured to emit non-visible light and multiple eye tracking detectors distributed across the visual assembly and configured to detect the non-visible light reflected back from the eye of the user and a controller configured to process the detected non-visible light from multiple eye tracking detectors to identify information relating to the eye.

Another example can include any of the above and/or below examples where the controller is located on an HMD device that includes the visual assembly or wherein the controller is located on a computer that is configured to communicate with the HMD device.

CONCLUSION

Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims and other features and acts that would be recognized by one skilled in the art are intended to be within the scope of the claims.

文章《Microsoft Patent | Eye tracking head mounted display device》首发于Nweon Patent

]]>
Microsoft Patent | Combined birefringent material and reflective waveguide for multiple focal planes in a mixed-reality head-mounted display device https://patent.nweon.com/27607 Thu, 30 Mar 2023 10:29:39 +0000 https://patent.nweon.com/?p=27607 ...

文章《Microsoft Patent | Combined birefringent material and reflective waveguide for multiple focal planes in a mixed-reality head-mounted display device》首发于Nweon Patent

]]>
Patent: Combined birefringent material and reflective waveguide for multiple focal planes in a mixed-reality head-mounted display device

Patent PDF: 加入映维网会员获取

Publication Number: 20230103091

Publication Date: 2023-03-30

Assignee: Microsoft Technology Licensing

Abstract

An optical combiner in a display system of a mixed-reality head-mounted display (HMD) device comprises a lens of birefringent material and a ferroelectric liquid crystal (FLC) modulator that are adapted for use with a reflective waveguide to provide multiple different focal planes on which holograms of virtual-world objects (i.e., virtual images) are displayed. The birefringent lens has two orthogonal refractive indices, ordinary and extraordinary, depending on the polarization state of the incident light. Depending on the rotation of the polarization axis by the FLC modulator, the incoming light to the birefringent lens is focused either at a distance corresponding to the ordinary refractive index or the extraordinary refractive index. Virtual image light leaving the birefringent lens is in-coupled to a see-through reflective waveguide which is configured to form an exit pupil for the optical combiner to enable an HMD device user to view the virtual images from the source.

Claims

What is claimed:

1.A method for operating an electronic device that includes a mixed-reality see-through optical display system configured for showing mixed-reality scenes comprising virtual images of virtual-world objects that are rendered over views of real-world objects to a user of the electronic device, the method comprising: receiving light for the virtual images, the light being linearly polarized in a first polarization state; operating a ferroelectric liquid crystal (FLC) modulator to switch between the first polarization state for the virtual image light and a second polarization state that is orthogonal to the first polarization state; providing a lens of birefringent material upon which virtual image light is incident in either the first polarization state or second polarization state, in which the lens provides one of two different focal distances for the virtual images depending on polarization state of the incident virtual image light; and in-coupling the virtual image light from the lens into the mixed-reality see-through optical display system which renders the virtual images at the one of two different focal distances to the user.

2.The method of claim 1 further comprising operating the FLC modulator at a rate that is synchronized to a refresh rate of the received virtual image light to provide a temporally multiplexed virtual image display comprising one or more virtual images located at either one or the other different focal distances or located at both of the different focal distances simultaneously.

3.The method of claim 1 further comprising stacking combinations of FLC modulators and lenses of birefringent material that act on the received virtual image light in series, in which each combination in the stack provides two unique focal distances for the rendered virtual images.

4.The method of claim 1 further comprising operating the FLC modulator according to a composition of a mixed-reality scene, in which the composed mixed-reality scene includes virtual-world objects that are located at different focal distances.

5.A head-mounted display (HMD) device wearable by a user and configured for supporting a mixed-reality experience including viewing, by the user, of virtual images that are combined with views of real-world objects in a physical world, comprising: a focal-distance modulation system that is operable to receive virtual images from a virtual image source, the focal-distance modulation system comprising a polarization modulator and a birefringent lens, wherein the polarization modulator is configured to selectively switch polarization of the virtual images between two orthogonal states, and wherein the birefringent lens has two different refractive indices each with sensitivity to a different orthogonal state of polarization of virtual images, wherein virtual images in a first polarization state are focused by the birefringent lens at a first focal distance, and wherein virtual images in a second polarization state are focused by the birefringent lens at a second focal distance; and an optical combiner with which the user can see the real-world objects and the virtual images in a mixed-reality scene, the optical combiner including an input coupler configured to in-couple virtual images from the focal-distance modulation system that are focused at either the first or second focal distance into the optical combiner and further including an output coupler configured to out-couple the virtual images that are focused at either the first or second focal distance from the optical combiner to one or more of the user’s eyes.

6.The HMD device of claim 5 further comprising a linear polarizing filter that is arranged to linearly polarize light from the virtual image source.

7.The HMD device of claim 5 further comprising an eye tracker for tracking vergence of the user’s eyes or tracking a gaze direction of at least one eye of the user to perform one of calibration of alignment between the user’s eye and the optical combiner, dynamic determination of whether alignment changes during use of the HMD device, or composition of a mixed-reality scene at the virtual image source.

8.The HMD device of claim 7 in which the composition of the mixed-reality scene comprises rendering virtual images in a single focal plane that is selected based on operation of the eye tracker to determine a gaze point of the user.

9.The HMD device of claim 8 further comprising a focal plane controller operatively coupled to the polarization modulator and configured to selectively switch the polarization state of the virtual images at a rate that is synchronized with a refresh rate of the virtual image source to generate virtual images at different focal distances in the mixed-reality scene supported by the optical combiner.

10.The HMD device of claim 9 in which the focal plane controller is further operatively coupled to the virtual image source and configured to selectively switch the polarization state of the virtual images based on a composition of a mixed-reality scene generated at the virtual image source.

11.The HMD device of claim 5 in which the focal-distance modulation system further comprises at least an additional polarization modulator and an additional birefringent lens wherein a total of N polarization modulator/birefringent lens pairs are utilized to provide 2N different focal distances.

12.The HMD device of claim 5 in which the optical combiner comprises a waveguide that is at least partially transparent, the waveguide configured for guiding focused virtual images from the input coupler to the output coupler.

13.The HMD device of claim 12 in which one or more of the input coupler, output coupler, or waveguide include one or more reflective surfaces.

14.The HMD device of claim 5 in which the optical combiner is configured to provide an exit pupil that is expanded in one or more directions relative to an input pupil to the optical combiner.

15.The HMD device of claim 5 in which the polarization modulator comprises one of ferroelectric liquid crystal (FLC) modulator, photo-elastic modulator, electro-optic modulator, magneto-optic modulator, or piezoelectric modulator.

16.A mixed-reality optical display system providing a plurality of different focal lengths for planes into which images of virtual-world objects are displayable, comprising: a source configured to generate light for virtual-world images, the virtual-world image light propagating on a light path from the source to an eye of a user of the mixed-reality display system; a ferroelectric liquid crystal (FLC) modulator disposed along the light path, and which is operatively coupled to the source to receive virtual-world image light, and which is switchable between first and second switched states; a linear polarizer disposed along the light path between the source and the FLC modulator and configured to impart a linearly polarized state to the virtual-world image light that is incident on the FLC modulator, wherein the switchable FLC modulator is configured as a half-wave plate that is aligned at zero degrees or 45 degrees with respect to a polarization axis of the linear polarizer depending on the switched state; a birefringent lens disposed along the light path downstream from the FLC modulator, the birefringent lens having an ordinary refractive index that is aligned with the polarization axis of the linear polarizer and an extraordinary refractive index that is orthogonal to the ordinary refractive index, wherein virtual-world image light incident on the birefringent lens having a state of polarization that is aligned with the ordinary refractive index is focused by the birefringent lens at a first focal length and virtual-world image light incident on the birefringent lens having a state of polarization that is aligned with the extraordinary refractive index is focused by the birefringent lens at a second focal length that is different from the first; a focal length controller operatively coupled to the FLC modulator to switch the FLC modulator between the first and the second states, wherein in the first switched state of the FLC modulator, virtual-world image light exiting the FLC modulator and incident on the birefringent lens has a state of polarization that is aligned with the ordinary refractive index of the birefringent lens, and wherein in the second switched state of the FLC modulator, virtual-world image light exiting the FLC modulator has a state of polarization that is aligned with the extraordinary refractive index of the birefringent lens; and a see-through optical combiner through which real-world objects are viewable by the user, the see-through optical combiner disposed on the light path downstream from the birefringent lens, and the see-through optical combiner being adapted to display the virtual-world object images which are superimposed over the views of real-world objects in first or second focal planes that are respectively associated with the first and second focal lengths.

17.The mixed-reality optical display system of claim 16 in which the see-through optical combiner comprises a waveguide.

18.The mixed-reality optical display system of claim 17 in which the waveguide comprises a reflective input coupler or a reflective output coupler.

19.The mixed-reality optical display system of claim 16 in which the optical combiner is adapted to selectively display virtual-world object images in either or both the first and second planes according to operations of the focal length controller.

20.The mixed-reality optical display system of claim 16 as configured for use in a head-mounted display (HMD) device.

Description

BACKGROUND

Mixed-reality computing devices, such as head-mounted display (HMD) devices may be configured to display information to a user about virtual objects, such as holographic images, and/or real objects in a field of view of the user. For example, an HMD device may be configured to display, using a see-through display system, virtual environments with real-world objects mixed in, or real-world environments with virtual objects mixed in.

To view objects clearly, humans must accommodate, or adjust their eyes’ focus, to the distance of the object. At the same time, the rotation of both eyes must converge to the object’s distance to avoid seeing double images. In natural viewing, vergence and accommodation are linked. When something near is viewed, for example, a housefly close to the nose, the eyes cross and accommodate to a near point. Conversely, something viewed at optical infinity (roughly starting at 6 m or farther for normal vision), the eyes’ lines of sight become parallel, and the eyes’ lenses accommodate to infinity. In most HMD devices, users will always accommodate to the focal distance of the display to get a sharp image but converge to the distance of the object of interest to get a single image. When users accommodate and converge to different distances, the natural link between the two cues is broken, leading to visual discomfort or fatigue.

SUMMARY

An optical combiner in a display system of a mixed-reality HMD device comprises a lens of birefringent material and a ferroelectric liquid crystal (FLC) modulator that are adapted for use with a reflective waveguide to provide multiple different focal planes on which holograms of virtual-world objects (i.e., virtual images) are displayed. The FLC modulator controls the polarization state of light from a virtual image source that is incident on the birefringent lens. The FLC modulator is configured to function as a half-wave plate having an optical axis that can be rotated through approximately 45 degrees; therefore, the optical output from the modulator can be rotated by either zero degrees or ninety degrees.

The birefringent lens has two orthogonal refractive indices, ordinary and extraordinary, depending on the polarization state of the incident light. If the polarization axis is rotated by the FLC modulator to match the ordinary axis, then the incoming light to the birefringent lens is focused at a distance corresponding to the ordinary refractive index. If the axis is rotated to match the extraordinary axis, then the incoming light is focused at a different distance corresponding to the extraordinary refractive index.

Virtual image light leaving the birefringent lens is in-coupled to the reflective waveguide which is configured to form an exit pupil for the optical combiner to enable an HMD device user to view the virtual images from the source. The reflective waveguide is at least partially transparent so that the user can see through the waveguide to view physical real-world objects simultaneously with the virtual images in mixed-reality use scenarios.

The FLC modulator may be operatively synchronized to the virtual image source to dynamically switch polarization states, and the corresponding states of focus for virtual images, to support a given composition of a mixed-reality scene. In such compositions, images of virtual-world objects can appear to the user in focal planes at different distances along with real-world objects. The time response of the FLC modulator enables rapid state switching to construct a temporally multiplexed mixed-reality scene having appropriate focus cues to provide a comfortable visual experience no matter where in the scene the HMD user is accommodating.

When far virtual images in the mixed-reality scene are displayed, the FLC modulator is switched to cause the birefringent lens to focus the virtual images at the far focal plane so that the user’s eyes accommodate far to view the virtual images in sharp focus. When near virtual images are displayed, the FLC modulator is switched to cause the birefringent lens to focus the virtual images at the near focal plane so that the user’s eyes accommodate near to view the virtual images in sharp focus.

Advantageously, utilization of the FLC modulator, birefringent lens, and reflective waveguide enables the focal depth of the virtual images to be adjusted before entering the waveguide without perturbing the HMD device user’s view of the real world through the waveguide. Such combination of elements in the optical combiner can eliminate the need to use a conventional conjugate lens pair in which a negative lens is disposed on an eye side of a waveguide to provide for virtual image focus at a non-infinite distance and a conjugate positive lens is disposed on the opposite real-world side to counteract the effect of the negative lens on incoming real-world light.

The FLC modulator and birefringent lens operate with faster switching compared to conventional variable-focus lenses to enable higher display refresh rates for a more immersive mixed-reality experience. In addition to providing fast-switching speeds, the FLC modulator and birefringent lens typically have solid state properties with no mechanical motion and associated noise or vibration. Utilization of the present principles enables the focus-adjusting components of the optical combiner to be moved away from the front of the HMD device user’s eyes, which can provide flexibility in device packaging while reducing weight and mass moment of inertia which are typically important considerations for HMD device comfort.

In various illustrative embodiments, multiple sets of FLC modulators and birefringent lenses can be utilized. If N sets are utilized, then 2N different focal planes are provided. An eye tracker may also be implemented in the HMD device to enable the location of the user’s eyes relative to the device to remain suitably calibrated in the event the device shifts on the head during use.

This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter. Furthermore, the claimed subject matter is not limited to implementations that solve any or all disadvantages noted in any part of this disclosure.

DESCRIPTION OF THE DRAWINGS

FIG. 1 shows a pictorial partially cutaway view of an illustrative HMD device that is configured to implement multiple virtual image focal planes using the present combined birefringent material and reflective waveguide;

FIG. 2 illustratively shows virtual images that are overlayed onto real-world images within a field of view (FOV) of a mixed-reality head-mounted display (HMD) device;

FIGS. 3A, 3B, and 3C show illustrative partially spherical wavefronts that are respectively associated with a distant object, an object at infinity, and a nearby object;

FIG. 4 shows an illustrative negative lens that provides for a virtual image that is located at a focal point of the lens;

FIG. 5 shows a side view of an illustrative virtual display system that includes a waveguide-based optical combiner providing for rendering of virtual images that may be used in an HMD device;

FIG. 6 shows a side view of an illustrative virtual display system in which light from real-world objects may be viewed through a see-through waveguide;

FIG. 7 shows a side view of an illustrative virtual display system in which a ferroelectric liquid crystal (FLC) modulator and a lens comprising birefringent material are selectively controlled to enable rendering of virtual images at two different focal planes;

FIG. 8 shows propagation of linearly polarized light through the FLC modulator and birefringent lens to focus light at different focal planes.

FIG. 9 shows an illustrative mixed-reality scene in which the user’s eyes accommodate to a far distance to view a virtual-world object in sharp focus;

FIG. 10 shows an illustrative mixed-reality scene in which the user’s eyes accommodate to a near distance to view a virtual-world object in sharp focus;

FIG. 11 shows an arrangement in which N sets of an FLC modulator and a birefringent lens are utilized to provide 2N different virtual image focal planes;

FIG. 12 shows a side view of an illustrative virtual display system in operative relationship with HMD device components including an eye tracker system, focal plane controller, and processors;

FIGS. 13 and 14 show an HMD user in a physical environment interacting with illustrative virtual objects;

FIG. 15 depicts an illustrative arrangement in which some virtual objects are relocated to a focal plane containing a virtual object that an HMD device user is currently viewing;

FIG. 16 shows an illustrative mixed-reality scene in which virtual objects are rendered in the same focal plane;

FIG. 17 is a flowchart of an illustrative method for operating an electronic device that includes a mixed-reality see-through optical display system for showing scenes comprising virtual images at multiple different focal planes that are superimposed over views of real-world objects;

FIG. 18 shows a pictorial front view of an illustrative sealed visor that may be used as a component of an HMD device;

FIG. 19 shows a pictorial rear view of an illustrative sealed visor;

FIG. 20 shows a partially disassembled view of an illustrative sealed visor;

FIGS. 21A, 21B and 21C are front, top, and side views, respectively, of an exemplary reflective waveguide that can be used to replicate a virtual image associated with an input pupil to an expanded exit pupil;

FIG. 22 is a pictorial view of an illustrative example of a virtual-reality or mixed-reality HMD device that may use the present combined birefringent material and reflective waveguide;

FIG. 23 shows a block diagram of an illustrative example of a virtual-reality or mixed-reality HMD device that may use the present combined birefringent material and reflective waveguide; and

FIG. 24 schematically shows an illustrative example of a computing system that can enact one or more of the methods and processes described herein with respect to the present combined birefringent material and reflective waveguide.

Like reference numerals indicate like elements in the drawings. Elements are not drawn to scale unless otherwise indicated.

DETAILED DESCRIPTION

FIG. 1 shows a pictorial partially cutaway view of an illustrative mixed-reality HMD device 100 that is configured to implement multiple virtual image focal planes using the present combined birefringent material and reflective waveguide. In this example, the HMD device includes a display device 105 and a frame 110 that wraps around the head of a user 115 to position the display device near the user’s eyes to provide a mixed-reality experience to the user.

Any suitable technology and configuration may be used to display virtual images, which may also be referred to as holograms or holographic images, using the display device 105. For a mixed-reality experience, the display device may be see-through so that the user of the HMD device 100 can view physical, real-world objects in the physical environment over which pixels for virtual objects are overlayed. For example, the display device may include one or more partially transparent waveguides used in conjunction with a virtual image source such as, for example, a microdisplay comprising RGB (red, green, blue) LEDs (light emitting diodes), an organic LED (OLED) array, liquid crystal on silicon (LCoS) device, and/or MEMS device, or any other suitable displays or microdisplays operating in transmission, reflection, or emission. The virtual image source may also include electronics such as processors, optical components such as mirrors and/or lenses, and/or mechanical and other components that enable a virtual display to be composed and provide one or more input optical beams to the display system. Virtual image sources may be referred to as light or display engines in some contexts.

In some implementations, outward facing cameras 120 that capture images of the surrounding physical environment may be provided, and these captured images may be rendered on the display device 105 along with computer-generated virtual images that augment the captured images of the physical environment.

The frame 110 may further support additional components of the HMD device 100, including a processor 125, an inertial measurement unit (IMU) 130, and an eye tracker 135. In some implementations the eye tracker can be configured to support one or more of vergence tracking and/or gaze tracking functions. The processor may include logic and associated computer memory configured to receive sensory signals from the IMU and other sensors, to provide display signals to the display device 105, to derive information from collected data, and to enact various control processes described herein.

The display device 105 may be arranged in some implementations as a near-eye display. In a near-eye display, the virtual image source does not actually shine the images on a surface such as a glass lens to create the display for the user. This is not feasible because the human eye cannot focus on something that is that close. Rather than create a visible image on a surface, the near-eye display uses an optical system to form a pupil and the user’s eye acts as the last element in the optical chain and converts the light from the pupil into an image on the eye’s retina as a virtual display. It may be appreciated that the exit pupil is a virtual aperture in an optical system. Only rays which pass through this virtual aperture can exit the system. Thus, the exit pupil describes a minimum diameter of the holographic virtual image light after leaving the display system. The exit pupil defines the eyebox which comprises a spatial range of eye positions of the user in which the holographic virtual images projected by the display device are visible.

FIG. 2 shows the HMD device 100 worn by a user 115 as configured for mixed-reality experiences in which the display device 105 is configured as a near-eye display system having at least a partially transparent, see-through waveguide, among various other components, and may be further adapted to utilize variable-focus lenses in accordance with the principles discussed herein. As noted above, a virtual image source (not shown) generates holographic virtual images that are guided by the waveguide in the display device to the user. Being see-through, the waveguide in the display device enables the user to perceive light from the real world to thereby have an unaltered view of real-world objects.

The see-through waveguide-based display device 105 can render holographic images of various virtual objects that are superimposed over the real-world images that are collectively viewed using the see-through waveguide display to thereby create a mixed-reality environment 200 within the HMD device’s FOV (field of view) 220. It is noted that the FOV of the real world and the FOV of the holographic images in the virtual world are not necessarily identical, as the virtual FOV provided by the display device is typically a subset of the real FOV. FOV is typically described as an angular parameter in horizontal, vertical, or diagonal dimensions. It may be understood that the terms such as “left,” “right,” “up,” “down,” “direction,” “horizontal,” and “vertical” are used primarily to establish relative orientations in the illustrative examples shown and described herein for ease of description. These terms may be intuitive for a usage scenario in which the user of the HMD device is upright and forward facing, but less intuitive for other usage scenarios. The listed terms are not to be construed to limit the scope of the configurations (and usage scenarios therein) of features utilized in the present arrangement.

It is noted that FOV is just one of many parameters that are typically considered and balanced by HMD device designers to meet the requirements of a particular implementation. For example, such parameters may include eyebox size, brightness, transparency and duty time, contrast, resolution, color fidelity, depth perception, size, weight, form factor, and user comfort (i.e., wearable, visual, and social), among others.

In the illustrative example shown in FIG. 2, the user 115 is physically walking in a real-world urban area that includes city streets with various buildings, stores, etc., with a countryside in the distance. The FOV of the cityscape viewed on HMD device 100 changes as the user moves through the real-world environment and the device can render static and/or dynamic virtual images over the real-world view. In this illustrative example, the holographic virtual images include a tag 225 that identifies a restaurant business and directions 230 to a place of interest in the city. The mixed-reality environment 200 seen visually on the waveguide-based display device may also be supplemented by audio and/or tactile/haptic sensations produced by the HMD device in some implementations.

Virtual images and digital content can be located in various positions within the FOV along all three axes of the coordinate system 235. The immersiveness of the content in three dimensions may be enhanced as the reach of the display along the “z” axis extends from the near-field focus plane (i.e., generally within arm’s length of the HMD device user) to the far field focus plane (i.e., generally beyond arm’s reach) to facilitate arm’s length virtual display interactions. Many mixed-reality HMD device experiences will employ a mix of near-field and far-field visual components. The boundary between near and far fields is not necessarily strictly defined and can vary by implementation. For example, distances beyond 2 m may be considered as the part of the far field in some mixed-reality HMD device scenarios.

During natural viewing, the human visual system relies on multiple sources of information, or “cues,” to interpret three-dimensional shapes and the relative positions of objects. Some cues rely only on a single eye (monocular cues), including linear perspective, familiar size, occlusion, depth-of-field blur, and accommodation. Other cues rely on both eyes (binocular cues), and include vergence (essentially the relative rotations of the eyes required to look at an object) and binocular disparity (the pattern of differences between the projections of the scene on the backs of the two eyes).

To view objects clearly, humans must accommodate, or adjust their eyes’ focus, to the distance of the object. At the same time, the rotation of both eyes must converge to the object’s distance to avoid seeing double images. The distance at which the lines of sight intersect is the vergence distance. The viewer also adjusts the focal power of the lens in each eye (i.e., accommodates) appropriately for the fixated part of the scene (i.e., where the eyes are looking). The distance to which the eye must be focused to create a sharp retinal image is the focal distance. In natural viewing, vergence and accommodation are linked. When viewing something near (e.g., a housefly close to the nose) the eyes cross and accommodate to a near point. Conversely, when viewing something at optical infinity, the eyes’ lines of sight become parallel, and the eyes’ lenses accommodate to infinity.

In typical HMD devices, users will always accommodate to the focal distance of the display (to get a sharp image) but converge to the distance of the object of interest (to get a single image). When users accommodate and converge to different distances, the natural link between the two cues must be broken and this can lead to visual discomfort or fatigue due to such vergence-accommodation conflict (VAC). Accordingly, to maximize the quality of the user experience and comfort with the HMD device 100, virtual images may be rendered in a plane to appear at a constant distance from the user’s eyes. For example, virtual images, including the images 225 and 230, can be set at a fixed depth (e.g., 2 m) from the user 115. Thus, the user will always accommodate near 2 m to maintain a clear image in the HMD device. It may be appreciated that 2 m is an illustrative distance and is intended to be non-limiting. Other distances may be utilized, and virtual images may typically be optimally placed at distances between 1.5 and 5 m from the HMD device user for many applications of a mixed-reality HMD device while ensuring user comfort, however in some applications and use cases, virtual images can be rendered more closely to the user.

In the real world as shown in FIG. 3A, light rays 305 from distant objects 310 reaching an eye of a user 115 are almost parallel. Real-world objects at optical infinity (roughly around 6 m and farther for normal vision) have light rays 320 that are exactly parallel when reaching the eye, as shown in FIG. 3B. Light rays 325 from a nearby real-world object 330 reach the eye with different, more divergent angles, as shown in FIG. 3C, compared to those for more distant objects.

Various approaches may be utilized to render virtual images with the suitable divergent angles to thereby appear at the targeted depth of focus. For example, FIG. 4 shows that a negative (i.e., concave) lens 405 can diverge the collimated/parallel rays 450 that are received from a conventional output coupler element (not shown) in an HMD device to produce a holographic virtual image having a location that is apparent to the user at a focal point, F (as indicated by reference numeral 415), that is determined by the focal length of the lens. For example, in various mixed-reality HMD device scenarios, focal lengths can range between −0.2 to −3.0 diopters (i.e., 33 cm to 5 m) to position virtual objects from the boundary of the far field (near infinity) to slightly more than one foot away. As shown, the rays from the negative lens arriving at the user’s eye 115 are non-parallel and divergent and converge using the eye’s internal lens to form the image on the retina, as indicated by reference numeral 420.

FIG. 5 shows a simplified side view of an illustrative mixed-reality display system 500 that is incorporated into the display device 105 (FIG. 1) and which may be used in the HMD device 100 to render virtual images. It is noted that the side view of FIG. 5 shows display components for a single eye of the user 115. However, it may be appreciated that the components can be extended such that separate displays are provided for each eye of the user in binocular implementations. Such arrangement may facilitate, for example, stereoscopic rendering of virtual images in the FOV of the HMD device 100 and enable other features to be realized on a per-eye basis.

The mixed-reality display system 500 includes at least one partially transparent (i.e., see-through) waveguide 510 that is configured to propagate visible light. The waveguide 510 facilitates light transmission between a virtual image source 520 and the eye of the user 115. One or more waveguides can be utilized in the near-eye display system because they are transparent and because they are generally small and lightweight. This is desirable in applications such as HMD devices where size and weight are generally sought to be minimized for reasons of performance and user comfort. Use of the waveguide can enable the virtual image source to be located out of the way, for example on the side of the user’s head or near the forehead, leaving only a relatively small, light, and transparent waveguide optical element in front of the eyes.

In an illustrative implementation, the waveguide 510 operates using a principle of total internal reflection (TIR) so that light can be coupled among the various optical elements in the HMD device 100. TIR is a phenomenon which occurs when a propagating light wave strikes a medium boundary (e.g., as provided by the optical substrate of a waveguide) at an angle larger than the critical angle with respect to the normal to the surface. In other words, the critical angle (θc) is the angle of incidence above which TIR occurs, which is given by Snell’s Law, as is known in the art. More specifically, Snell’s law states that the critical angle (θc) is specified using the following equation:

θc=sin−1(n2/n1)

where θc is the critical angle for two optical mediums (e.g., the waveguide substrate and air or some other medium that is adjacent to the substrate) that meet at a medium boundary, n1 is the index of refraction of the optical medium in which light is traveling towards the medium boundary (e.g., the waveguide substrate, once the light is coupled therein), and n2 is the index of refraction of the optical medium beyond the medium boundary (e.g., air or some other medium adjacent to the waveguide substrate).

Virtual image light 515 may be provided by a virtual image source 520 (e.g., a microdisplay or light engine, etc.). A collimating lens 522 may be optionally utilized depending on a particular type and configuration of the virtual image source so that the inputs to the waveguide comprise collimated light rays. The virtual image light is in-coupled to the waveguide by an input coupler 525 over an input pupil 516 and propagated through the waveguide in TIR. The virtual image light is out-coupled from the waveguide by an output coupler 530 over the eyebox 535 of the display system.

The exit pupil for the out-coupled image light 540 provided by the eyebox is typically expanded in size relative to the input pupil, in both vertical and horizontal directions. Typically, in waveguide-based optical combiners, the input pupil needs to be formed over a collimated field, otherwise each waveguide exit pupil will produce an image at a slightly different distance. This results in a mixed visual experience in which images are overlapping with different focal depths in an optical phenomenon known as focus spread. The collimated inputs and outputs in conventional waveguide-based display systems provide holographic virtual images displayed by the display device that are focused at infinity.

The combination of see-through waveguide and coupling elements may be referred to as a mixed-reality optical combiner 545 which functions to combine real-world and virtual-world images into a single display. While the input coupler and output coupler are shown in FIG. 5 as being embodied as discrete elements, it may be possible in some applications to directly incorporate the in-coupling and out-coupling functions either partially or fully into the waveguide and/or components thereof.

The optical combiner functionality provided by the waveguide and couplers may be implemented using a reflective waveguide combiner. For example, partially reflective surfaces may be embedded in a waveguide and/or stacked in a geometric array to implement an optical combiner that uses partial field propagation. The reflectors can be half-tone, dielectric, holographic, polarized thin layer, or be fractured into a Fresnel element. In other embodiments, the principles of the present combined birefringent material and reflective waveguide may be implemented using a reflective waveguide combiner with any suitable in-coupling and/or out-coupling methods.

A plurality of waveguides may be utilized in some applications. As shown in FIG. 5, the combiner 545 includes a single waveguide that is utilized for all colors in the virtual images, which may be desirable in some applications. For example, if the virtual image source 520 is configured using an RGB (red, green, blue) color model, then the waveguide 510 can be adapted to propagate light in each color component, as respectively indicated by reference numerals 524, 526, and 528. By comparison, diffractive combiners typically require multiple waveguides to meet a target FOV in polychromatic applications due to limitations on angular range that are dictated by the waveguide TIR condition.

The present combined birefringent material and reflective waveguide may also be utilized with various other waveguide/coupling configurations beyond reflective. For example, it may be appreciated that the principles of the present invention may be alternatively applied to waveguides that include one or more elements that are refractive, diffractive, polarized, hybrid diffractive/refractive, phase multiplexed holographic, and/or achromatic metasurfaces.

As shown in FIG. 6, the user 115 can look through the waveguide 510 of the mixed-reality display system 500 to see unaltered views of real-world objects 605 on the real-world side of the waveguide that is opposite from the eye side (the eye side is indicated by reference numeral 612 and the real-world side is indicated by reference numeral 614). The optical combiner 545 may superimpose virtual images (not shown for the sake of clarity in exposition) over the user’s view of light 610 reflected from real-world objects to thus form a mixed-reality display. In this particular example, the real-world object is in the distance so the parallel rays of real-world light incident on the display system remain parallel when viewed by the user 115.

FIG. 7 shows a side view of an illustrative mixed-reality display system 700 in which a ferroelectric liquid crystal (FLC) modulator 705 and a lens 710 comprising birefringent material are selectively controlled using a focal plane controller 715 to enable rendering of virtual images using diverging rays 740. The virtual images can appear to the user 115 at focal distances Fe and Fo that respectively define two different focal planes at distances d1 and d2 from the see-through waveguide 745 in an optical combiner 750. The waveguide and/or optical combiner may comprise reflective elements in a similar manner as discussed above in the text accompanying FIG. 5. It is noted that the propagation of light through the see-through waveguide is not shown for sake of clarity. It may also be appreciated that the user 115 can look through the see-through waveguide to observe real-world objects which are also not shown in the drawing.

A virtual image source 720 provides virtual image light 725 which may comprise multiple components (not shown) of a color model such as an RGB color model. As with the mixed-reality display system 500 shown in FIG. 5, a collimating lens 730 may be optionally utilized depending on characteristics of the source. A linearly polarizing filter 735 is disposed on the propagation path of the virtual image light between the collimating lens and FLC modulator, as shown. The focal plane controller 715 is operatively coupled to the virtual image source 720 and FLC modulator 705.

FIG. 8 shows propagation of linearly polarized light through the FLC modulator 705 and birefringent lens 710 to focus light at different focal planes. A property of the lens is that it has two focal lengths, Fo and Fe, that correspond to ordinary and extraordinary refractive indices, no and ne, of the birefringent material, as indicated by reference numerals 805 and 810. The birefringent lens may comprise any suitable birefringent material that is transparent and formable. The birefringent lens is not shown with any particular shape in FIG. 8, but it may be appreciated that one or more of its major surfaces can be shaped to provide additional control over the focal lengths to meet particular requirements.

The unpolarized light from the virtual image source 720 passes through the linearly polarizing filter 735 and is incident on the FLC modulator 705. The linearly polarizing filter is aligned with either the ordinary or extraordinary axis of the birefringent lens. The FLC modulator is configured to function as a switchable half-wave plate having a binary state. The FLC modulator has a fast axis 815 and slow axis 820. The fast axis provides a minimum index of refraction for one state of polarization of a linearly polarized wave with a maximum phase velocity. When the wave is rotated by 90° and polarized along the slow axis it will propagate with a maximum index of refraction and minimum phase velocity. The FLC modulator is oriented at either zero or 45° to the axis of the linearly polarizing filter depending on its switched state. In alternative implementations, rather than using an FLC modulator, polarization modulation may be performed by an appropriately configured photo-elastic modulator (PEM), linear electro-optic modulator employing, for example, the Pockels effect, quadratic electro-optic modulator employing, for example, the Kerr effect, magneto-optical modulator employing, for example, the Faraday effect, piezoelectric material, or other suitable device or technology.

If the FLC modulator 705 is in a first state, then virtual image light remains polarized at 0° when it propagates to the birefringent lens 710 and is thus aligned with the ordinary axis and is focused at focal length Fo. If the FLC modulator is in a second state, then the plane of polarization of emergent light is rotated by 90° and is aligned with the extraordinary axis of the birefringent lens and thus focused at focal length Fe. Accordingly, by switching between FLC modulator states, one of two different refractive indices of the birefringent lens may be selected which thereby selects one of two different focal powers for the lens.

Referring again to FIG. 7, the presentation of virtual images in a mixed-reality scene by the virtual image sources 720 may be synchronized to operations of the FLC modulator 705 using the focal plane controller 715. The synchronization enables construction of temporally multiplexed scenes with correct focus cues so that focal distances in the scene are presented with the birefringent lens 710 in the correct state. Accordingly, when more distant parts of the mixed-reality scene are composed at the virtual image source 720, the focal plane controller signals the FLC modulator to switch the birefringent lens to its longer focal length so that the user’s eyes have to accommodate far to create sharp retinal images. When nearer parts of the mixed-reality scene are composed, the focal plane controller signals the FLC modulator to switch the birefringent lens to its shorter local length so that the user’s eyes must accommodate to closer distances to create sharp images. It may be appreciated that each focal state in a given composition of a mixed-reality scene will be displayed in every other frame of the virtual image source.

FIG. 9 shows an FOV 905 of an illustrative mixed-reality scene 900 in which the virtual image source displays more distant parts of the scene. In response, the FLC modulator switches the birefringent lens to the Fe focal length. Thus, the mixed-reality display system creates a digital approximation to the light field that the user’s eyes normally encounter when naturally viewing a three-dimensional scene. It is not necessary to know where the eyes of the user 115 are focused to create appropriate focus cues.

If the user accommodates to a far distance at d2 to view a virtual-world object 910, then the far parts of the displayed scene are in sharp focus while the near parts, including virtual-world object 915, are blurred. FIG. 10 shows an FOV 1005 of illustrative mixed-reality scene 1000 in which the user accommodates to a near distance at d1 to view the virtual-world object 915 in sharp focus while the far parts, including virtual-world object 910, are blurred. The mixed-reality display system thus reproduces correct focus cues, including blur and binocular disparity, to thereby stimulate natural accommodation to converge to an appropriate focal distance to create sharp retinal images.

The values for the distances d1 and d2 may be selected based on application. In typical HMD device applications, virtual images may be sought to be displayed within 2 m (−0.5 diopters) to minimize VAC. As the depth of field for human vision is approximately +/−0.3 diopters, a half-diopter distance between focal planes may be utilized with sufficient focus cues to enable the user to smoothly shift focus between the focal planes. Thus, for example, the near distance may be around 1 m, and the far distance around 2 m. These values are illustrative and are not intended to be limiting.

FIG. 11 shows an illustrative mixed-reality display system 1100 in which N sets of an FLC modulator and a birefringent lens are utilized to provide 2N different virtual image focal planes. In this example N=2 so a first set 1105 comprising an FLC modulator 1110 and birefringent lens 1115 is placed in series with a second set 1120. As shown, the serial arrangement of modulator and lens sets are disposed along the virtual image light path that extends from the virtual image source 720 through the collimating lens 730 and linearly polarizing filter 735 to the see-through waveguide 745 in the optical combiner 750.

The two sets 1105 and 1120 of FLC modulators and birefringent lenses work in combination to provide four different focal lengths F1, F2, F3, and F4 at respective distances d1, d2, d3, and d4 from the waveguide 745. The spatial separation between the focal planes defined by the focal lengths can vary by application. For example, F1 and F4 could be separated by 1.5 diopters in which d1, d2, d3, and d4 are 50 cm, 1 m, 1.5 m, and 2 m, respectively.

FIG. 12 shows a side view of an illustrative virtual display system 1200 in operative relationship with HMD device components including an eye tracker 1205, focal plane controller 715, and processors 1210. The focal plane controller is operatively coupled to a virtual image source 720 and FLC modulator 705, as discussed above, to provide multiple different focal lengths using the birefringent lens 710 to render virtual images at different distances on the optical combiner 750 in mixed-reality scenarios. The components may be disposed in a frame (not shown) or other suitable structure of the HMD device 100 or the exemplary HMD device 2200 shown in FIGS. 22 and 23 and described in the accompanying text.

The eye tracker 1205 is operatively coupled to one or more illumination sources 1215 and one or more sensors 1220. For example, the illumination sources may comprise IR (infrared) LEDs that are located around the periphery of the virtual display system and/or optical combiner and/or may be disposed in some other suitable HMD device component such as a frame. The eye tracker illumination sources can function as glint sources and/or provide general or structured illumination of the user’s eye features. The eye tracker sensors may comprise inward-facing cameras that have sensitivity, for example, to IR light. Image-based and/or feature-based eye tracking, or other suitable eye-tracking techniques may be utilized to meet requirements of an implementation of the present principles.

In an illustrative example, the IR light from the illumination sources 1215 cause highly visible reflections, and the eye tracker sensors 1220 capture an image of the eye showing these reflections. The images captured by the sensors are used to identify the reflection of the light source on the cornea (i.e., “glints”) and in the pupil. Typically, a vector formed by the angle between the cornea and pupil reflections may be calculated using real-time image analysis, and the vector direction combined with other geometrical features of the reflections is then used to determine where the user is looking—the gaze point—and calculate eye movement, location, and orientation.

Eye tracking may be utilized to initially calibrate a location of the user’s eyes with respect to the HMD device 100 and assist in maintaining the calibration state during device use. The eye tracker 1205 can dynamically track eye location relative to the HMD device which may change if, for example, the device shifts on the user’s head. Proper continuous alignment of the user’s eye with the display system can ensure that a display of virtual images in the different focal planes is correctly rendered with the appropriate focus cues including accurate binocular disparity and occlusion of real and virtual objects.

For example, FIG. 13 shows a user 115 interacting with various virtual objects 1305 and 1310 in a mixed-reality scene 1300 that occurs in a real-world office setting. It may be noted that the virtual objects shown in the drawing are ordinarily viewable only by HMD device users. FIG. 14 shows the mixed-reality scene from the perspective of the HMD device user within the FOV 1405 of the device. The panel virtual object 1305 is displayed in the near focal plane while cylindrical virtual object 1310 is displayed in the far focal plane. To maintain the natural appearance of depth, the virtual image source composes the mixed-reality scene, which is displayed at the different focal distances through operations of the FLC modulator and birefringent lens, to maintain the appropriate occlusion relationships between the objects in the scene. As shown in the mixed-reality scene 1400 in FIG. 14, the cylindrical virtual object 1310 is partially occluded by the panel virtual object 1305 in the user’s FOV 1405. The virtual objects partially occlude the walls and contents of the room which are located beyond the far focal plane.

FIG. 15 depicts a top view of an illustrative arrangement in which virtual objects may be located in a focal plane containing a virtual object that the HMD device user is currently viewing. It may be appreciated that rendering all virtual images in a single focal plane may reduce implementation complexity in some cases, for example, by lowering the refresh rate of components in the mixed-reality display system that would otherwise be needed to support distribution of virtual content across multiple focal planes simultaneously. VAC may also be reduced using a single focal plane for all virtual content.

As shown, a gaze point 1505 of the user 115 is determined by the eye tracker 1205 (FIG. 12) which indicates that the user is currently looking at the panel virtual object 1305 at distance d1. The cylindrical virtual object 1310 may then be moved from its current location at distance d2 to the same focal plane as the panel at distance d1, as indicated by line 1510. As images for new virtual objects are introduced into the composition of the mixed-reality scene, they can be rendered in the focal plane corresponding to the current gaze point. For example, a new triangular virtual object 1515 is located in the focal plane at distance d1. It may be appreciated that the focal plane controller 715 (FIG. 12) may be configured to continuously interact with the eye tracker 1205 such that virtual objects in the mixed-reality scene may be located and/or moved to the appropriate focal plane in response to detected shifts in the user’s gaze point.

FIG. 16 shows an illustrative mixed-reality scene 1600, as discussed above with reference to FIG. 15, from the point of view of the HMD device user 115. As shown, all of the virtual objects 1515, 1305, and 1310 are located in a single focal plane which is selected based on the current gaze point of the user that is determined by operations of the eye tracker 1205 (FIG. 12). This mixed-reality scene 1600 thus differs from scene 1400 shown in FIG. 14 in which the panel and cylindrical virtual object are located at different focal planes at distances d1 and d2, respectively.

FIG. 17 is a flowchart of an illustrative method 1700 for operating an electronic device (e.g., an HMD device) that includes a mixed-reality see-through display system configured for showing mixed-reality scenes comprising virtual images of virtual-world objects that are rendered over views of real-world objects to a user of the electronic device. Unless specifically stated, the methods or steps shown in the flowchart and described in the accompanying text are not constrained to a particular order or sequence. In addition, some of the methods or steps thereof can occur or be performed concurrently and not all the methods or steps have to be performed in a given implementation depending on the requirements of such implementation and some methods or steps may be optionally utilized.

At block 1705, light for the virtual images is received, in which the light is linearly polarized in a first polarization state. At block 1710, an FLC modulator is operated to switch between the first polarization state for the virtual image light and a second polarization state that is orthogonal to the first polarization state. At block 1715, a lens of birefringent material is provided upon which virtual image light is incident in either the first polarization state or second polarization state, in which the lens provides one of two different focal distances for the virtual images depending on polarization state of the incident virtual image light. At block 1720, the virtual image light from the lens is in-coupled into the mixed-reality see-through optical display system which renders the virtual images at the one of two different focal distances to the user.

FIGS. 18 and 19 show respective front and rear views of an illustrative example of a visor 1800 that incorporates an internal near-eye display device 105 (FIGS. 1 and 2) that is used in the HMD device 100 as worn by a user 115. The visor, in some implementations, may be sealed to protect the internal display device. The visor typically interfaces with other components of the HMD device such as head-mounting/retention systems and other subsystems including sensors, power management, controllers, etc., as illustratively described in conjunction with FIGS. 22 and 23. Suitable interface elements (not shown) including snaps, bosses, screws and other fasteners, etc. may also be incorporated into the visor.

The visor 1800 may include see-through front and rear shields, 1805 and 1810 respectively, that can be molded using transparent or partially transparent materials to facilitate unobstructed vision to the display device and the surrounding real-world environment. Treatments may be applied to the front and rear shields such as tinting, mirroring, anti-reflective, anti-fog, and other coatings, and various colors and finishes may also be utilized. The front and rear shields are affixed to a chassis 2005 shown in the disassembled view in FIG. 20.

The sealed visor 1800 can physically protect sensitive internal components, including a display device 105, when the HMD device is operated and during normal handling for cleaning and the like. The display device in this illustrative example includes left and right optical display systems 2010L and 2010R that respectively provide holographic virtual images to the user’s left and right eyes for mixed-reality scenes. The visor can also protect the display device from environmental elements and damage should the HMD device be dropped or bumped, impacted, etc.

As shown in FIG. 19, the rear shield 1810 is configured in an ergonomically suitable form 1905 to interface with the user’s nose, and nose pads and/or other comfort features can be included (e.g., molded-in and/or added-on as discrete components). In some applications, the sealed visor 1800 can also incorporate some level of optical diopter curvature (i.e., eye prescription) within the molded shields in some cases.

FIGS. 21A, 21B, and 21C are front, top, and side views, respectively, of an exemplary optical display system 2010 that can be used to replicate an image associated with an input pupil to an expanded exit pupil. The term input pupil refers to an aperture through which light corresponding to an image is overlaid on an input coupler 2105 that is disposed on a waveguide 2110. The term exit pupil refers to an aperture through which light corresponding to an image exits an output coupler 2115 that is disposed on the waveguide.

The waveguide 2110 can be made of glass or optical plastic but is not limited thereto. The opposite sides may be configured to be parallel. The waveguide may be planar, as illustratively shown, or in alternative embodiments, be curved. The waveguide may utilize a bulk substrate configuration in which the waveguide thickness is at least ten times the wavelengths of light for which the waveguide functions as a propagation medium. The waveguide is at least partially transparent to allow light to pass through it so that a user can look through the waveguide and observe an unaltered view of real-world objects on the other side.

An intermediate component 2120 may be disposed on the waveguide 2100 in some implementations. The intermediate component may be configured to redirect light in a direction of the output coupler 2115. Furthermore, the intermediate component may be configured to perform one of horizontal or vertical pupil expansion, and the output coupler may be configured to perform the other one of horizontal or vertical pupil expansion. For example, the intermediate component may perform pupil expansion in a horizontal direction, and the output coupler may perform pupil expansion in a vertical direction. Alternatively, if the intermediate component were repositioned, for example, to be below the input coupler and to the left of the output coupler 2115 shown in FIG. 21A, then the intermediate component can be configured to perform vertical pupil expansion, and the output coupler can be configured to perform horizontal pupil expansion.

The input coupler 2105, intermediate component 2120, and output coupler 2115 are shown as having rectangular outer peripheral shapes but can have alternative outer peripheral shapes. These elements can also be disposed on the same side of the waveguide, or on opposite sides. Embedded configurations may also be utilized in which one or more of the couplers or the component is immersed within the waveguide between its exterior surfaces. The input coupler, intermediate component, and output coupler may be configured using reflective optical elements each having one or more reflective or partially reflective surfaces. In alternative implementations, one or more diffractive optical elements may also be utilized to perform the input and output coupling and pupil expansion.

FIG. 22 shows one particular illustrative example of a mixed-reality HMD device 2200, and FIG. 23 shows a functional block diagram of the device 2200. The HMD device 2200 provides an alternative form factor to the HMD device 100 shown in the preceding drawings and discussed above. HMD device 2200 comprises one or more lenses 2202 that form a part of a see-through display subsystem 2204, so that images may be displayed using lenses 2202 (e.g., using projection onto lenses 2202, one or more waveguide systems, such as a near-eye display system, incorporated into the lenses 2202, and/or in any other suitable manner).

HMD device 2200 further comprises one or more outward-facing image sensors 2206 configured to acquire images of a background scene and/or physical environment being viewed by a user and may include one or more microphones 2208 configured to detect sounds, such as voice commands from a user. Outward-facing image sensors 2206 may include one or more depth sensors and/or one or more two-dimensional image sensors. In alternative arrangements, as noted above, a mixed-reality or virtual-reality display system, instead of incorporating a see-through display subsystem, may display mixed-reality or virtual-reality images through a viewfinder mode for an outward-facing image sensor.

The HMD device 2200 may further include a gaze detection subsystem 2210 configured for detecting a direction of gaze of each eye of a user or a direction or location of focus, as described above. Gaze detection subsystem 2210 may be configured to determine gaze directions of each of a user’s eyes in any suitable manner. For example, in the illustrative example shown, a gaze detection subsystem 2210 includes one or more glint sources 2212, such as virtual IR light or visible sources as described above, that are configured to cause a glint of light to reflect from each eyeball of a user, and one or more image sensors 2214, such as inward-facing sensors, that are configured to capture an image of each eyeball of the user. Changes in the glints from the user’s eyeballs and/or a location of a user’s pupil, as determined from image data gathered using the image sensor(s) 2214, may be used to determine a direction of gaze.

In addition, a location at which gaze lines projected from the user’s eyes intersect the external display may be used to determine an object at which the user is gazing (e.g., a displayed virtual object and/or real background object). Gaze detection subsystem 2210 may have any suitable number and arrangement of light sources and image sensors. In some implementations, the gaze detection subsystem 2210 may be omitted.

The HMD device 2200 may also include additional sensors. For example, HMD device 2200 may comprise a global positioning system (GPS) subsystem 2216 to allow a location of the HMD device 2200 to be determined. This may help to identify real-world objects, such as buildings, etc., that may be located in the user’s adjoining physical environment.

The HMD device 2200 may further include one or more motion sensors 2218 (e.g., inertial, multi-axis gyroscopic, or acceleration sensors) to detect movement and position/orientation/pose of a user’s head when the user is wearing the system as part of a mixed-reality or virtual-reality HMD device. Motion data may be used, potentially along with eye-tracking glint data and outward-facing image data, for gaze detection, as well as for image stabilization to help correct for blur in images from the outward-facing image sensor(s) 2206. The use of motion data may allow changes in gaze direction to be tracked even if image data from outward-facing image sensor(s) 2206 cannot be resolved.

In addition, motion sensors 2218, as well as microphone(s) 2208 and gaze detection subsystem 2210, also may be employed as user input devices, such that a user may interact with the HMD device 2200 via gestures of the eye, neck and/or head, as well as via verbal commands in some cases. It may be understood that sensors illustrated in FIGS. 22 and 23 and described in the accompanying text are included for the purpose of example and are not intended to be limiting in any manner, as any other suitable sensors and/or combination of sensors may be utilized to meet the needs of a particular implementation. For example, biometric sensors (e.g., for detecting heart and respiration rates, blood pressure, brain activity, body temperature, etc.) or environmental sensors (e.g., for detecting temperature, humidity, elevation, UV (ultraviolet) light levels, etc.) may be utilized in some implementations.

The HMD device 2200 can further include a controller 2220 such as one or more processors having a logic subsystem 2222 and a data storage subsystem 2224 in communication with the sensors, gaze detection subsystem 2210, display subsystem 2204, and/or other components through a communications subsystem 2226. The communications subsystem 2226 can also facilitate the display system being operated in conjunction with remotely located resources, such as processing, storage, power, data, and services. That is, in some implementations, an HMD device can be operated as part of a system that can distribute resources and capabilities among different components and subsystems.

The storage subsystem 2224 may include instructions stored thereon that are executable by logic subsystem 2222, for example, to receive and interpret inputs from the sensors, to identify location and movements of a user, to identify real objects using surface reconstruction and other techniques, and dim/fade the display based on distance to objects so as to enable the objects to be seen by the user, among other tasks.

The HMD device 2200 is configured with one or more audio transducers 2228 (e.g., speakers, earphones, etc.) so that audio can be utilized as part of a mixed-reality or virtual-reality experience. A power management subsystem 2230 may include one or more batteries 2232 and/or protection circuit modules (PCMs) and an associated charger interface 2234 and/or remote power interface for supplying power to components in the HMD device 2200.

It may be appreciated that the HMD device 2200 is described for the purpose of example, and thus is not meant to be limiting. It may be further understood that the display device may include additional and/or alternative sensors, cameras, microphones, input devices, output devices, etc. than those shown without departing from the scope of the present arrangement. Additionally, the physical configuration of an HMD device and its various sensors and subcomponents may take a variety of different forms without departing from the scope of the present arrangement.

FIG. 24 schematically shows an illustrative example of a computing system that can enact one or more of the methods and processes described above for the present combined birefringent material and reflective waveguide. Computing system 2400 is shown in simplified form. Computing system 2400 may take the form of one or more personal computers, server computers, tablet computers, home-entertainment computers, network computing devices, gaming devices, mobile computing devices, mobile communication devices (e.g., smartphone), wearable computers, and/or other computing devices.

Computing system 2400 includes a logic processor 2402, volatile memory 2404, and a non-volatile storage device 2406. Computing system 2400 may optionally include a display subsystem 2408, input subsystem 2410, communication subsystem 2412, and/or other components not shown in FIG. 24.

Logic processor 2402 includes one or more physical devices configured to execute instructions. For example, the logic processor may be configured to execute instructions that are part of one or more applications, services, programs, routines, libraries, objects, components, data structures, or other logical constructs. Such instructions may be implemented to perform a task, implement a data type, transform the state of one or more components, achieve a technical effect, or otherwise arrive at a desired result.

The logic processor may include one or more processors configured to execute software instructions. In addition, or alternatively, the logic processor may include one or more hardware or firmware logic processors configured to execute hardware or firmware instructions. Processors of the logic processor may be single-core or multi-core, and the instructions executed thereon may be configured for sequential, parallel, and/or distributed processing. Individual components of the logic processor optionally may be distributed among two or more separate devices, which may be remotely located and/or configured for coordinated processing. Aspects of the logic processor may be virtualized and executed by remotely accessible, networked computing devices configured in a cloud-computing configuration. In such a case, these virtualized aspects may be run on different physical logic processors of various different machines.

Non-volatile storage device 2406 includes one or more physical devices configured to hold instructions executable by the logic processors to implement the methods and processes described herein. When such methods and processes are implemented, the state of non-volatile storage device 2406 may be transformed—e.g., to hold different data.

Non-volatile storage device 2406 may include physical devices that are removable and/or built-in. Non-volatile storage device 2406 may include optical memory (e.g., CD, DVD, HD-DVD, Blu-Ray Disc, etc.), semiconductor memory (e.g., ROM, EPROM, EEPROM, FLASH memory, etc.), and/or magnetic memory (e.g., hard-disk drive, floppy-disk drive, tape drive, MRAM, etc.), or other mass storage device technology. Non-volatile storage device 2406 may include nonvolatile, dynamic, static, read/write, read-only, sequential-access, location-addressable, file-addressable, and/or content-addressable devices. It will be appreciated that non-volatile storage device 2406 is configured to hold instructions even when power is cut to the non-volatile storage device 2406.

Volatile memory 2404 may include physical devices that include random access memory. Volatile memory 2404 is typically utilized by logic processor 2402 to temporarily store information during processing of software instructions. It will be appreciated that volatile memory 2404 typically does not continue to store instructions when power is cut to the volatile memory 2404.

Aspects of logic processor 2402, volatile memory 2404, and non-volatile storage device 2406 may be integrated together into one or more hardware-logic components. Such hardware-logic components may include field-programmable gate arrays (FPGAs), program- and application-specific integrated circuits (PASIC/ASICs), program- and application-specific standard products (PSSP/ASSPs), system-on-a-chip (SOC), and complex programmable logic devices (CPLDs), for example.

The term “program” may be used to describe an aspect of computing system 2400 typically implemented in software by a processor to perform a particular function using portions of volatile memory, which function involves transformative processing that specially configures the processor to perform the function. Thus, a program may be instantiated via logic processor 2402 executing instructions held by non-volatile storage device 2406, using portions of volatile memory 2404. It will be understood that different programs may be instantiated from the same application, service, code block, object, library, routine, API, function, etc. Likewise, the same program may be instantiated by different applications, services, code blocks, objects, routines, APIs, functions, etc. The term “program” may encompass individual or groups of executable files, data files, libraries, drivers, scripts, database records, etc.

When included, display subsystem 2408 may be used to present a visual representation of data held by non-volatile storage device 2406. This visual representation may take the form of a graphical user interface (GUI). As the herein described methods and processes change the data held by the non-volatile storage device, and thus transform the state of the non-volatile storage device, the state of display subsystem 2408 may likewise be transformed to visually represent changes in the underlying data. Display subsystem 2408 may include one or more display devices utilizing virtually any type of technology; however, one utilizing a MEMS projector to direct laser light may be compatible with the eye-tracking system in a compact manner. Such display devices may be combined with logic processor 2402, volatile memory 2404, and/or non-volatile storage device 2406 in a shared enclosure, or such display devices may be peripheral display devices.

When included, input subsystem 2410 may comprise or interface with one or more user-input devices such as a keyboard, mouse, touch screen, or game controller. In some embodiments, the input subsystem may comprise or interface with selected natural user input (NUI) componentry. Such componentry may be integrated or peripheral, and the transduction and/or processing of input actions may be handled on- or off-board. Example NUI componentry may include a microphone for speech and/or voice recognition; an infrared, color, stereoscopic, and/or depth camera for machine vision and/or gesture recognition; a head tracker, eye tracker, accelerometer, and/or gyroscope for motion detection and/or intent recognition; as well as electric-field sensing componentry for assessing brain activity.

When included, communication subsystem 2412 may be configured to communicatively couple various computing devices described herein with each other, and with other devices. Communication subsystem 2412 may include wired and/or wireless communication devices compatible with one or more different communication protocols. As non-limiting examples, the communication subsystem may be configured for communication via a wireless telephone network, or a wired or wireless local- or wide-area network. In some embodiments, the communication subsystem may allow computing system 2400 to send and/or receive messages to and/or from other devices via a network such as the Internet.

Various exemplary embodiments of the present combined birefringent material and reflective waveguide for multiple focal planes in a mixed-reality head-mounted display device are now presented by way of illustration and not as an exhaustive list of all embodiments. An example includes a method for operating an electronic device that includes a mixed-reality see-through optical display system configured for showing mixed-reality scenes comprising virtual images of virtual-world objects that are rendered over views of real-world objects to a user of the electronic device, the method comprising: receiving light for the virtual images, the light being linearly polarized in a first polarization state; operating a ferroelectric liquid crystal (FLC) modulator to switch between the first polarization state for the virtual image light and a second polarization state that is orthogonal to the first polarization state; providing a lens of birefringent material upon which virtual image light is incident in either the first polarization state or second polarization state, in which the lens provides one of two different focal distances for the virtual images depending on polarization state of the incident virtual image light; and in-coupling the virtual image light from the lens into the mixed-reality see-through optical display system which renders the virtual images at the one of two different focal distances to the user.

In another example, the method further comprises operating the FLC modulator at a rate that is synchronized to a refresh rate of the received virtual image light to provide a temporally multiplexed virtual image display comprising one or more virtual images located at either one or the other different focal distances or located at both of the different focal distances simultaneously. In another example, the method further comprises stacking combinations of FLC modulators and lenses of birefringent material that act on the received virtual image light in series, in which each combination in the stack provides two unique focal distances for the rendered virtual images. In another example, the method further comprises operating the FLC modulator according to a composition of a mixed-reality scene, in which the composed mixed-reality scene includes virtual-world objects that are located at different focal distances.

A further example includes a head-mounted display (HMD) device wearable by a user and configured for supporting a mixed-reality experience including viewing, by the user, of virtual images that are combined with views of real-world objects in a physical world, comprising: a focal-distance modulation system that is operable to receive virtual images from a virtual image source, the focal-distance modulation system comprising a polarization modulator and a birefringent lens, wherein the polarization modulator is configured to selectively switch polarization of the virtual images between two orthogonal states, and wherein the birefringent lens has two different refractive indices each with sensitivity to a different orthogonal state of polarization of virtual images, wherein virtual images in a first polarization state are focused by the birefringent lens at a first focal distance, and wherein virtual images in a second polarization state are focused by the birefringent lens at a second focal distance; and an optical combiner with which the user can see the real-world objects and the virtual images in a mixed-reality scene, the optical combiner including an input coupler configured to in-couple virtual images from the focal-distance modulation system that are focused at either the first or second focal distance into the optical combiner and further including an output coupler configured to out-couple the virtual images that are focused at either the first or second focal distance from the optical combiner to one or more of the user’s eyes.

In another example, the HMD device further comprises a linear polarizing filter that is arranged to linearly polarize light from the virtual image source. In another example, the HMD device further comprises an eye tracker for tracking vergence of the user’s eyes or tracking a gaze direction of at least one eye of the user to perform one of calibration of alignment between the user’s eye and the optical combiner, dynamic determination of whether alignment changes during use of the HMD device, or composition of a mixed-reality scene at the virtual image source. In another example, the composition of the mixed-reality scene comprises rendering virtual images in a single focal plane that is selected based on operation of the eye tracker to determine a gaze point of the user. In another example, the HMD device further comprises a focal plane controller operatively coupled to the polarization modulator and configured to selectively switch the polarization state of the virtual images at a rate that is synchronized with a refresh rate of the virtual image source to generate virtual images at different focal distances in the mixed-reality scene supported by the optical combiner. In another example, the focal plane controller is further operatively coupled to the virtual image source and configured to selectively switch the polarization state of the virtual images based on a composition of a mixed-reality scene generated at the virtual image source. In another example, the focal-distance modulation system further comprises at least an additional polarization modulator and an additional birefringent lens wherein a total of N polarization modulator/birefringent lens pairs are utilized to provide 2N different focal distances. In another example, the optical combiner comprises a waveguide that is at least partially transparent, the waveguide configured for guiding focused virtual images from the input coupler to the output coupler. In another example, one or more of the input coupler, output coupler, or waveguide include one or more reflective surfaces. In another example, the optical combiner is configured to provide an exit pupil that is expanded in one or more directions relative to an input pupil to the optical combiner. In another example, the polarization modulator comprises one of ferroelectric liquid crystal (FLC) modulator, photo-elastic modulator, electro-optic modulator, magneto-optic modulator, or piezoelectric modulator.

A further example includes a mixed-reality optical display system providing a plurality of different focal lengths for planes into which images of virtual-world objects are displayable, comprising: a source configured to generate light for virtual-world images, the virtual-world image light propagating on a light path from the source to an eye of a user of the mixed-reality display system; a ferroelectric liquid crystal (FLC) modulator disposed along the light path, and which is operatively coupled to the source to receive virtual-world image light, and which is switchable between first and second switched states; a linear polarizer disposed along the light path between the source and the FLC modulator and configured to impart a linearly polarized state to the virtual-world image light that is incident on the FLC modulator, wherein the switchable FLC modulator is configured as a half-wave plate that is aligned at zero degrees or 45 degrees with respect to a polarization axis of the linear polarizer depending on the switched state; a birefringent lens disposed along the light path downstream from the FLC modulator, the birefringent lens having an ordinary refractive index that is aligned with the polarization axis of the linear polarizer and an extraordinary refractive index that is orthogonal to the ordinary refractive index, wherein virtual-world image light incident on the birefringent lens having a state of polarization that is aligned with the ordinary refractive index is focused by the birefringent lens at a first focal length and virtual-world image light incident on the birefringent lens having a state of polarization that is aligned with the extraordinary refractive index is focused by the birefringent lens at a second focal length that is different from the first; a focal length controller operatively coupled to the FLC modulator to switch the FLC modulator between the first and the second states, wherein in the first switched state of the FLC modulator, virtual-world image light exiting the FLC modulator and incident on the birefringent lens has a state of polarization that is aligned with the ordinary refractive index of the birefringent lens, and wherein in the second switched state of the FLC modulator, virtual-world image light exiting the FLC modulator has a state of polarization that is aligned with the extraordinary refractive index of the birefringent lens; and a see-through optical combiner through which real-world objects are viewable by the user, the see-through optical combiner disposed on the light path downstream from the birefringent lens, and the see-through optical combiner being adapted to display the virtual-world object images which are superimposed over the views of real-world objects in first or second focal planes that are respectively associated with the first and second focal lengths.

In another example, the see-through optical combiner comprises a waveguide. In another example, the waveguide comprises a reflective input coupler or a reflective output coupler. In another example, the optical combiner is adapted to selectively display virtual-world object images in either or both the first and second planes according to operations of the focal length controller. In another example, the mixed-reality optical display system is configured for use in a head-mounted display (HMD) device.

Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.

文章《Microsoft Patent | Combined birefringent material and reflective waveguide for multiple focal planes in a mixed-reality head-mounted display device》首发于Nweon Patent

]]>
Microsoft Patent | Rf retroreflector based controller tracking for vr headsets https://patent.nweon.com/27553 Thu, 23 Mar 2023 12:58:41 +0000 https://patent.nweon.com/?p=27553 ...

文章《Microsoft Patent | Rf retroreflector based controller tracking for vr headsets》首发于Nweon Patent

]]>
Patent: Rf retroreflector based controller tracking for vr headsets

Patent PDF: 加入映维网会员获取

Publication Number: 20230089734

Publication Date: 2023-03-23

Assignee: Microsoft Technology Licensing

Abstract

Systems and methods are provided for tracking a passive controller system using an active sensor system within a mixed-reality environment. The passive controller system includes a body configured to be held in a hand of a user, as well as a plurality of retroreflectors that collectively provides at least 180 degrees of reflecting surface for reflecting a radar signal in at least 180 degrees of spherical range when the passive controller system is positioned within a predetermined distance from a source of the radar signal and with an orientation that is within the at least 180 degrees of spherical range relative to the source of the radar signal. Signals transmitted to the passive controller and reflected back from the passive controller are used to calculate the position and orientation of the passive controller system relative to the active sensor system.

Claims

What is claimed is:

1.A passive controller system comprising: a body configured to be held in a hand of a user and to be moved with the hand of the user in six degrees of freedom; and a plurality of retroreflectors attached to the body in a configuration that provides at least 180 degrees of reflecting surface for reflecting a radar signal in at least 180 degrees of spherical range when the passive controller system is positioned within a predetermined distance from a source of the radar signal with an orientation that is within the at least 180 degrees of spherical range relative to the source of the radar signal.

2.The passive controller system of claim 1, wherein the predetermined distance within a range of about 0.01 meters to about 4 meters.

3.The passive controller system of claim 1, wherein the plurality of retroreflectors is encapsulated within a housing of the body and such that the plurality of retroreflectors is not externally visible from the body.

4.The passive controller system of claim 1, wherein the passive controller system omits any active sensor device and is capable of reflecting the radar signal with the plurality of retroreflectors to a receiver that translates reflected signals from the plurality of retroreflectors to determine a relative position and an orientation of the passive controller system relative to the receiver.

5.The passive controller system of claim 1, wherein the passive controller system omits any inertial measurement unit.

6.The passive controller system of claim 1, wherein the plurality of retroreflectors is attached to the body in a configuration that provides 360 degrees of reflecting surface, for facilitating reflection of the radar signal, irrespective of the orientation of the body relative to the source of the radar signal within the predetermined distance.

7.The passive controller system of claim 6, wherein each side of at least one retroreflector of the plurality of retroreflectors is in direct contact with a different retroreflector of the plurality of retroreflectors.

8.The passive controller system of claim 7, wherein the plurality of retroreflectors comprises a single integrated reflector unit that is detachably connected to the body.

9.The passive controller system of claim 1, wherein the plurality of retroreflectors is distributed throughout a handle base of the body and such that the plurality of retroreflectors includes at least two different retroreflectors that are connected to the body while being separated from each other by at least a space in the body or a physical structure of the body.

10.The passive controller system of claim 9, wherein one or more retroflectors of the plurality of retroreflectors is attachable to the handle base at different locations of the handle base.

11.The passive controller system of claim 9, wherein a first angle of reflection of a first retroreflector of the plurality of retroreflectors overlaps with a second angle of reflection of a second retroreflector of the plurality of retroreflectors.

12.The passive controller system of claim 9, wherein each retroreflector of the plurality of retroreflectors is attached to the body such that an angle of reflection of each retroreflector is a unique angle of reflection that is non-overlapping with angles of reflection of at least two different retroreflectors in the plurality of retroreflectors.

13.The passive controller system of claim 1, wherein each retroreflector of the plurality of retroreflectors is composed of a same material.

14.The passive controller system of claim 1, wherein each retroreflector comprises at least three orthogonally connected planes.

15.The passive controller system of claim 1, wherein each retroreflector of the plurality of retroreflectors comprises a substantially similar surface area and dimensional size.

16.An active sensor system configured to track a relative orientation and a relative position of a passive controller system within a predetermined distance of the active sensor system, the active sensor system comprising: one or more monostatic transmitters configured to transmit one or more signals within a spherical range; one or more monostatic receivers configured to receive one or more signals reflected from a plurality of retroreflectors attached to the passive controller system, the plurality of retroreflectors being configured to reflect the one or more signals to the one or more monostatic receivers when the passive controller system is positioned within the predetermined distance of the active sensor system; and one or more processors for processing the one or more signals reflected from the plurality of retroreflectors and received by the one or more monostatic receivers to determine the relative position and the relative orientation of the passive controller system relative to the active sensor system.

17.The active sensor system of claim 16, wherein the one or more monostatic transmitters and one or more monostatic receivers are fixedly positioned with respect to each other in a portrait orientation.

18.The active sensor system of claim 16, wherein the active sensor system is contained within a headset that is configured to be worn by a user and such that the active sensor system is configured to track the relative orientation and the relative position of the passive controller system relative to the headset during use of the headset and the passive controller system.

19.A computing system configured for detecting an orientation and a position of a passive controller system relative to an active sensor system that is positioned within a predetermined distance from the passive controller system and that transmits one or more signals to the passive controller system such that the one or more signals are reflected from the passive controller system back to the active sensor system as one or more reflected signals, the computing system comprising: one or more processors; and one or more hardware storage devices storing one or more computer-executable instructions that are executable by the one or more processors to configure the computing system to at least: transmit the one or more signals from a plurality of monostatic transmitters of the active sensor system in a signal transmission area and a direction in which a passive controller system is located when the passive controller system is located within the predetermined distance of the active sensor system within the signal transmission area; receive and detect the one or more reflected signals reflected back from a plurality of retroreflectors attached to the passive controller system, the plurality of retroreflectors being configured on the passive controller system to reflect the one or more signals back to the active sensor system as the one or more reflected signals irrespective of orientation or position of the passive controller system when the passive controller system is positioned within the predetermined distance of the active sensor system within the signal transmission area; and determine the orientation and the position of the passive controller system relative to the active sensor system based on the one or more signals and the one or more reflected signals.

20.The computing system of claim 19, wherein the one or more computer-executable instructions are further executable by the one or more processors to configure the computing system to at least project a virtual object associated with the passive controller system in a pose and relative position within a mixed-reality environment based on the orientation and the position of the passive controller system relative to the active sensor system.

Description

BACKGROUND

Mixed-reality systems, such as virtual reality systems and augmented reality systems have received significant attention because of their ability to create unique experiences for their users. Virtual reality systems provide experiences in which a user is fully immersed in a virtually represented world, typically through a virtual reality headset or head-mounted device (HMD) that prevents the user from seeing objects located in the user’s real environment. Augmented reality systems provide a user with experiences that allow the user to interact with both virtual content and real objects located in the user’s environment. For example, virtual objects are virtually presented to the user within the user’s own real environment such that the user is able to perceive the virtual objects in relation to physical or real objects.

Typically, users perceive or view the virtual reality or augmented reality through an enclosed visual display (for virtual reality) or transparent lens (for augmented reality). Users can then interact with the perceived reality through different user input controls, as located on a user controller, or set of user controllers. In order for the user to interact well within the mixed-reality, the mixed-reality system must be able to track the user inputs, and more specifically, the user controllers, by at least tracking the orientation and the position of one or more user controllers relative to user’s display (e.g., HMD).

Current methods and systems for tracking the user controllers are expensive because the user controllers must have an active tracking system that is in real time communication with the mixed-reality system.

Accordingly, there is an on-going need and desire for improved systems, methods, and devices for user controller tracking, and particularly, for improved systems, methods, and devices that can be utilized for detecting the orientation and position of the user controller relative to a mixed-reality system.

The subject matter claimed herein is not limited to embodiments that solve any disadvantages or that operate only in environments such as those described above. Rather, this background is only provided to illustrate one exemplary technology area where some embodiments described herein may be practiced.

BRIEF SUMMARY

Disclosed embodiments include systems, methods, and devices that include and or that are configured to facilitate the tracking of a passive controller system using radio frequency retroreflectors.

Some disclosed systems include a passive controller system having a body configured to be held in a hand of a user and that incorporates a plurality of retroreflectors. The retroreflectors are attached to the body in a configuration that provides at least 180 degrees of reflecting surface for reflecting a radar signal in at least 180 degrees of spherical range when the passive controller system is positioned within a predetermined distance from a source of the radar signal, with an orientation that is within the at least 180 degrees of spherical range relative to the source of the radar signal.

Disclosed systems also include active sensor systems configured to track a relative orientation and a relative position of a passive controller system when the passive controller system is positioned within a predetermined distance of the active sensor system. In some instance, the active sensor system includes one or more monostatic transmitters configured to transmit signals within a spherical range and one or more monostatic receivers configured to receive one or more reflecting signals that are reflected from a plurality of retroreflectors attached to a passive controller system that is being tracked by the active sensor system. The plurality of retroreflectors on the passive controller system are configured to reflect the one or more signals to the one or more monostatic receivers when the passive controller system is positioned within the predetermined distance of the active sensor system, regardless of orientation and position within the predetermined distance of the active sensor system.

The active sensor system also includes one or more processors for processing the one or more signals reflected from the plurality of retroreflectors and received by the one or more monostatic receivers to determine the relative position and the relative pose or orientation of the passive controller system relative to the active sensor system.

Some of the disclosed methods include detecting an orientation and position of a passive controller system relative to an active sensor system when the passive controller is positioned within a predetermined distance of the active sensor system and based on one or more signals that originate from the active sensor system and that are reflected back to the active sensor system from the passive controller system.

This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.

Additional features and advantages will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the teachings herein. Features and advantages of the invention may be realized and obtained by means of the instruments and combinations particularly pointed out in the appended claims. Features of the present invention will become more fully apparent from the following description and appended claims or may be learned by the practice of the invention as set forth hereinafter.

BRIEF DESCRIPTION OF THE DRAWINGS

In order to describe the manner in which the above-recited and other advantages and features can be obtained, a more particular description of the subject matter briefly described above will be rendered by reference to specific embodiments which are illustrated in the appended drawings. Understanding that these drawings depict only typical embodiments and are not therefore to be considered to be limiting in scope, embodiments will be described and explained with additional specificity and detail through the use of the accompanying drawings in which:

FIG. 1 illustrates an active sensor system and passive controller system that includes and/or that is capable of being utilized to implement the disclosed embodiments.

FIGS. 2A-2B illustrates an example embodiment for a using the active sensor system to track the passive controller system within a signal transmission area.

FIG. 3 illustrates an example embodiment for using an active sensor system to track a passive controller system within a predetermined distance of the active sensor system.

FIG. 4 illustrates an example embodiment of a passive controller system that is trackable within six degrees of freedom of movement.

FIG. 5 illustrates an example embodiment of a plurality of retroreflectors.

FIGS. 6A-6B illustrates front views of various example embodiments of a passive controller system.

FIG. 7A illustrates a cross-sectional view of an example embodiment of a passive controller system with a plurality of retroreflectors housed inside a body of the passive controller system.

FIG. 7B illustrates a cross-sectional view of an example embodiment of a passive controller system with a plurality of retroreflectors partially housed inside a body of the passive controller system and partially disposed outside of the body of the passive controller.

FIGS. 8A-8C illustrate various views of different example embodiments of a multi-controller passive controller system that is trackable without line-of-sight with respect to the active sensor system.

FIGS. 9A-9B illustrate various example embodiments of a passive controller system configured to project different virtual objects associated with the passive controller system.

FIG. 10 illustrates an example embodiment of an active sensor system configured as a head-mounted device used to determine a position and orientation of a passive controller system based on a signal generated from and reflected back to the active sensor system by the passive controller system.

FIG. 11 illustrates an example embodiment of an active sensor system having a plurality of transmitters and a plurality of receivers.

FIG. 12A illustrates an example embodiment of signal being generated from an active sensor system configured as a head-mounted device and being reflected back to the active sensor system by a passive controller system.

FIG. 12B illustrates an example embodiment of signal being generated from an active sensor system configured as a beacon and being reflected back to the active sensor system by a passive controller system.

FIG. 13 illustrates another example embodiment of a process flow diagram for tracking a passive controller system using an active sensor system.

FIG. 14 illustrates a process flow diagram comprising a plurality of acts associated with a method for building a machine learning model configured to generate multimodal contrastive embeddings.

FIG. 15 illustrates an example architecture that includes a computing system that is capable of being utilized to implement the disclosed embodiments.

DETAILED DESCRIPTION

Embodiments disclosed herein relate to systems, methods, and devices that are configured to facilitate tracking of passive controller systems, and even more particularly, for systems, methods, and devices that can be utilized to track passive controller systems using radio frequency retroreflectors and an active sensor system.

The disclosed embodiments provide many technical advantages over existing systems, methods, and devices. For example, the disclosed passive controllers do not necessarily need line-of-sight to the active sensor system to be tracked, as the signals sent from and reflected back to the active sensor system from the passive controller can sometimes pass through materials that would obscure camera imaging required for line-of-sight tracking.

Furthermore, the active sensor system is able to function on low power while still meeting signal transmission requirements. These lower power requirements, particularly for the passive controller is an improvement over devices that require power, such as controllers with powered IMU tracking components.

The design of both the passive controller system and the active controller system are relatively inexpensive and highly customizable in terms of the range of operation, the size of the passive controller system, and the frequency of the signal being transmitted by the active sensor system. Beneficially, in use applications for mixed-reality systems, the passive controller system is designable as a hand-held remote control that is trackable within an arm’s length of the user’s active sensor system (as configured as an HMD), such that the active sensor system is able to track the position and orientation (also referred to herein as pose) of the passive controller system with sub-millimeter and sub-radian accuracy.

FIG. 1 illustrates an active sensor system 100 and passive controller system 120 that includes and/or that is capable of being utilized to implement the disclosed embodiments. The active sensor system 100, depicted in FIG. 1 as a head-mounted device (e.g., HMD 101) includes a plurality of transmitters 104 (e.g., transmitter 104A, transmitter 104B, and transmitter 104C) and a plurality of receivers 106 (e.g., receiver 106A and receiver 106B).

The active sensor system 100 is also illustrated as including one or more processor(s) (such as one or more hardware processor(s) 108 and a storage (i.e., storage device(s) 110) storing computer-readable instructions wherein one or more of the hardware storage device(s) 110 is able to house any number of data types and any number of computer-readable instructions by which the active sensor system 100 is configured to implement one or more aspects of the disclosed embodiments when the computer-readable instructions are executed by the one or more processor(s) 108. The active sensor system 100 is also shown including optical sensor(s) 112, display(s) 114, input/output (I/O) device(s) 116, and speaker(s) 118.

FIG. 1 also illustrates the passive controller system 120, depicted as a remote control 121, which includes one or more passive retroreflector(s) 122. In some embodiments, the passive controller system 120 optionally includes one or more inertial measurement units (e.g., IMU(s) 124), user input controls 126 (e.g., user control buttons disposed on the remote control 121), and/or other I/O(s) 128. In some instances, other I/O(s) include haptic feedback, microphones, speakers, optical sensors, light emitting devices (e.g., for light indicators), or other input/outputs.

However, it is noted that the passive controller system 120 does not need to include any IMU or other powered tracking unit to track the orientation and/or position of the controller relative to the active sensor system 100. In fact, in most preferred configurations, the passive controller system 120 does not include and/or does not use the IMU 124 to tack positioning of the passive controller system relative to the active sensor system 100. Instead, these components (e.g., IMU 124) are merely optional and/or may be selectively used for powered and/or supplementary tracking if and/or when it is determined that there is a significant amount of radio interference that might otherwise interfere with the signal transmissions used for the passive controller tracking that is described herein.

In such alternative embodiments, for example, the system can dynamically detect interference, based on analyzing sensor data received at the system and/or based on user input or third party input, and can responsively activate and/or use IMU sensor data from the IMU 124, if one is provided, to perform active tracking.

The controller may also include a separate power supply (not shown) to power any powered components in the passive tracking controller.

Attention will now be directed to FIGS. 2A-2B, which illustrate various example embodiments for detecting an orientation and a position of a passive controller system 220 relative to an active sensor system 200 that is positioned within a predetermined distance from the passive controller system. In these embodiments, the active sensor system 200 transmits one or more signals (e.g., within signal transmission area 210A and signal transmission area 210B) to the passive controller system 220 and in such a manner that the one or more signals are reflected from the passive controller system back to the active sensor system as one or more reflected signals 214.

The active sensor system 200 (currently depicted as HMD 201) is configured to transmit the one or more signals from a plurality of monostatic transmitters (e.g., transmitter 202A and/or transmitter 202B) of the active sensor system 200 in a relatively omni-directional signal transmission area (e.g., signal transmission area 210A and/or signal transmission area 210B) with a directionality towards a front of the active sensor system 200, towards a passive controller system 220 that is located within a predetermined distance of the active sensor system and signal transmission area.

The active sensor system 200 is also configured receive and detect the one or more reflected signals 214 reflected back from a plurality of retroreflectors (e.g., passive retroreflector(s) 122) attached to the passive controller system 220. The plurality of retroreflectors is configured on the passive controller system 220 to reflect the one or more signals back to the active sensor system 200 as the one or more reflected signals 214 irrespective of orientation or position of the passive controller system when the passive controller system is positioned within the predetermined distance of the active sensor system within to signal transmission area.

The one or more reflected signals 214 is/are received and detected by one or more receivers of the active sensor system 200 (e.g., receiver 206A, receiver 206B and/or receiver 206C) The active sensor system 200 is then able to calculate/determine the orientation and the position of the passive controller system 220 relative to the active sensor system based on the one or more originating signals within signal transmission area (e.g., 210A and/or 210B) and the one or more reflected signals 214.

As shown in FIG. 2A, the one or more signals transmitted by the transmitter 202A and/or transmitter 202B are transmitted within signal transmission area 210A and/or signal transmission area 210B. As shown in FIG. 2A, signal transmission area 210A and signal transmission area 210B are separate and discrete signal transmission areas. In some embodiments, signal transmission area 210A and signal transmission area 210B at least partially and/or completely overlap. In other embodiments, the signal transmission area 210A and signal transmission area 210B do not overlap.

As shown in FIG. 2B, in some embodiments, the active sensor system 200 includes a plurality of transmitters (e.g., transmitter 204A and transmitter 204B) which are configured to transmit one or more signals within a continuous signal transmission area 212 which covers at least a hemisphere of spherical range for signal transmission to the passive controller system 220, when the passive controller system 220 is located within a particular distance relative to the active sensor system 200. The plurality of retroreflectors 222 of the passive controller system 220 is configured to reflect the one or more signals within the signal transmission area 212 as one or more reflected signals 216 back to the plurality of receivers of the active sensor system 200.

In other embodiments, the signal transmission area surrounds the active sensor system 200 by more than a single hemisphere, and in some instances, by a full spherical coverage surrounding the active sensor system 200, such as by positioning more of the transmitters around different portions of the HMD 200 and/or auxiliary devices that are in communication with the HMD 200.

Attention will now be directed to FIG. 3, which illustrates an example embodiment for using an active sensor system 304 to track a passive controller system 306 within a predetermined distance 308 of the active sensor system 304.

The passive controller system 306 includes a body 312A configured to be held in a hand of a user 302 and to be moved with the hand of the user in six degrees of freedom (6DOF). In this regard, the passive controller system 306 can be viewed as a 6DOF controller.

The passive controller system 306 also includes a plurality of retroreflectors 310 attached to the body 312 in a configuration that provides at least 180 degrees of reflecting surface for reflecting a radar signal in at least 180 degrees of spherical range when the passive controller system is positioned within a predetermined distance 308 from a source of the radar signal with an orientation that is within the at least 180 degrees of spherical range relative to the source of the radar signal. In some embodiments, the passive controller system is positioned within a predetermined distance 308 from a source of the radar signal with an orientation that is within 360 degrees of spherical range relative to the source (e.g., active sensor system 304) of the radar signal.

In some embodiments, the predetermined distance 308 between the active sensor system 304 and the passive controller system 306 is within a range (or having a radius) of about 0.01 meters to about 4 meters. However, in some alternative embodiments, the range of the predetermined distance 308 can also extend beyond 4 meters and/or be within less than 0.01 meters.

It should be appreciated that the signals that are generated and transmitted by the active system as illustrated in FIGS. 2A-2B are tunable depending on the size of the passive controller system and/or the predetermined distance between the passive controller system and the active sensor system. For example, the signal ranges from about 60 GHz to about 100 GHz, or more broadly, between 24 GHz to about 110 GHz. In particular, 60 GHz is an appropriate radio frequency because it maintains signal power through short ranges, such as an arm’s length for a user (e.g., between 0.1 to 1.1 meters). 60 GHz is also usable for longer ranges, up to approximately 4 meters. Increasing the radio frequency (e.g., 110 GHz) allows the retroreflectors to be smaller, like a ping pong ball size for the plurality of retroreflectors. These frequencies and respective retroreflector sizes allow the active sensor system to obtain sub-millimeter and sub-radian tracking accuracy for the passive controller system.

The predetermined distance can be increased, especially if the plurality of retroreflectors is placed inside the passive controller system at a length that is further away from the user’s arms and hands. Where the passive controller system is configured as an elongated remote control (or as an object-based remote control such as a lightsaber or sword), when the retroreflectors are placed toward a distal end of the elongated remote control away from the user’s hand, there is decreased interference/interruption within the line-of-sight for the signal reflection, as well as less likelihood of the user touching or shorting out portions of the retroreflector(s).

Attention will now be directed to FIG. 4, which illustrates an example embodiment of a passive controller system 400 that is trackable within six degrees of freedom of movement. As shown in FIG. 4, the passive controller system 400 includes a controller body 402 comprising a handle base 406 and a plurality of user input controls 408 disposed on an outer portion of the controller body. The passive controller system 400 also includes a plurality of retroreflectors 404 which are configured to reflect one or more signals generated by an active sensor system back to the active sensor system. It should be appreciated that the controller body 402 is comprised of a material that is radio transparent, such that it allows the signal being transmitted from the active sensory system to reach one or more retroreflectors housed inside the controller body 402.

The passive controller system 400 is configured so that a user is able to hold the controller body 402 with one or more hands in order to move the passive controller system in at least six degrees of movement. For example, a user is able to move the passive controller system 400 in an x direction 410, a y direction 412, and/or a z direction 414. These cardinal directions are used to represent and/or determine a position of the passive controller system. Furthermore, the user is able to change the orientation (pitch, yaw, and/or roll) of the passive controller system at different positions.

The user is able to move the passive controller system 400 in different pitch orientations 418 about a pitch axis (e.g., z axis), in different roll orientations 420 about a roll axis (e.g., the x axis), and/or in different yaw orientations 416 about a yaw axis (e.g., they axis). Thus, the user is able to change the pose (i.e., orientation) of the passive controller system 400 by tilting, turning, and/or rotating the passive controller system 400. The different orientations of the passive controller system 400 determine the orientation and effective signal reflection surface area of the plurality of retroreflectors 404.

While the passive controller system 400 may have a total signal reflection surface area, depending on the orientation of the passive controller system 400, the plurality of retroreflectors may be positioned and/or oriented such that the usable signal reflection surface area is less than the total reflection surface area available.

In some instances, the plurality of retroreflectors 404 are disposed in a configuration on and/or within the passive controller system 400 in order to maximize the usable signal reflection surface area relative to the active sensor system. For example, the plurality of retroreflectors 404 are disposed on and/or within the passive controller system 400 such that at least one signal generated by an active sensor system is reflected back to the active sensor system irrespective of the orientation and/or position of the passive controller system 400 when the passive controller system 400 is located within a predetermined distance of the active sensor system.

The user input controls 408 are disposed so that while the user holds the passive controller system 400, the user is able to use one or more fingers to interact with the user input controls 408 to provide user input to the passive controller system 400 for controlling and/or for interacting with objects in a mixed-reality environment, for example. In particular, the user input generated by the controls is used to represent and/or affect user interactions with a mixed-reality in which the active sensor system and passive controller system 400 are being utilized. While the passive controller system 400 may further include one or more processor(s) and/or one or more hardware storage device(s) to facilitate the communication of the user input to the active sensor system and/or other computing system, the tracking functionality (e.g., the plurality of retroreflectors 404) remain a passive tracking component.

FIG. 5 illustrates an example embodiment of a plurality of retroreflectors 500. As shown in FIG. 5, the plurality of retroreflectors 500 comprises retroreflector 502A, retroreflector 5026, retroreflector 502C, retroreflector 502D, retroreflector 502E, retroreflector 502F, retroreflector 502G, retroreflector 502H, retroreflector 502I, retroreflector 502J, and/or one or more other retroreflectors. In some embodiments, each retroreflector (e.g., retroreflector 502J) comprises a plurality of reflective surfaces (e.g., reflective surface 504A, reflective surface 504B, and reflective surface 504C). In such embodiments, one or more retroreflectors are configured as corner retroreflectors, wherein the plurality of reflective surfaces are configured as orthogonally connected planes, partially forming a pyramid structure and/or prism having a center point 506.

In some embodiments, each retroreflector of the plurality of retroreflectors comprises a substantially similar surface area and dimensional size. In some instances, each individual retroreflector is configured as a corner reflector comprising at least three or more reflective planes. At least one point of each plane is attached to a point of another plane, such that the corner reflector has at least one single point or apex. The angle at which the planes connect is tunable based on the frequency of operation and size of the corner reflectors.

In some embodiments, the plurality of retroreflectors comprises a single integrated reflector unit that is detachably connected to the body. Additionally, or alternatively, one or more retroreflectors of the plurality of retroreflectors 500 is individually detachable from one or more other retroreflectors. The plurality of retroreflectors is configurable to provide 360 degrees of signal reflection surface area. Additionally, or alternatively, the plurality of retroreflectors is configurable to provide various ranges of total signal reflection surface area, including at least 90 degrees, at least 180 degrees, and/or at least 270 degrees of signal reflection surface area.

As shown in FIG. 5, each retroreflector of the plurality of retroreflectors is attached to the body such that an angle of reflection of each retroreflector is a unique angle of reflection that is non-overlapping with angles of reflection of at least two different retroreflectors in the plurality of retroreflectors. In some embodiments, a first angle of reflection of a first retroreflector of the plurality of retroreflectors overlaps with a second angle of reflection of a second retroreflector of the plurality of retroreflectors.

As shown in FIG. 5, each retroreflector of the plurality of retroreflectors is composed of a same material. However, it should be appreciated that each individual retroflector and/or one or more reflective surfaces of an individual retroreflector is customizable, such that different reflective materials are used in different portions of the plurality of retroreflectors.

In either configuration (the same material or different materials), the material(s) used for the plurality of retroreflectors are more reflective than skin (i.e., the user’s arm and/or hand). The material(s) are also more reflective than the material(s) of other objects within the predetermined range. The retroreflectors are more reflective than other human and non-human objects in part because the cross section that the radar “sees” is proportional to the side length of an individual retroreflector. Thus, the retroreflector is tunable to overcome different sized objects within a known environment in which the passive controller system is to be used. Because the effective cross-section of the retroreflector (e.g., as configured as a corner reflector) is larger than an object that may be appearing in the line-of-sight (e.g., a hand or forearm) between the passive controller system and the active sensor system, the active sensor system is still able to track the location of the passive controller, because the signal will still be reflected from the passive controller system at a greater magnitude than from the interfering object.

In some embodiments, the material used to either form or coat the retroreflectors has a metal-level conductivity or other conductivity of at last 10{circumflex over ( )}5 Ω{circumflex over ( )}−1 m{circumflex over ( )}−1. Suitable material(s) for the retroreflector coating include, for example, platinum, gold, silver, and copper. However, other metals and reflective materials can also be use. In some instances, one or more of the retroreflectors are composed of a solid metal. In other instances, the retroreflector(s) are composed of a non-metal base material (e.g., a plastic) and are coated with a tape, laminate, or paint that is reflective or at least more reflective than the underlying base material.

In some embodiments, the plurality of connected retroreflectors comprise a retroreflector assembly that includes a plurality of adjacently positioned retroreflectors that are positioned in direct contact with each adjacent retroreflector, so as to not provide any open gaps or spaces between the adjacent retroflectors and such that the effective surface area of reflection provided by the connected retroreflectors is relatively continuous within at least a spherical range of reflectivity surrounding the retroreflector assembly.

FIGS. 6A-6B illustrates front views of various example embodiments of a passive controller system. As shown in FIG. 6A, the passive controller system 600 includes a controller body 602 comprising a handle base 604 and a top portion 606, wherein user input controls 608 are disposed on the controller body 602 on and/or near the top portion 606. In such embodiments, the passive controller system 600 includes a plurality of retroreflectors (not visible in FIG. 6A) that is encapsulated within a housing of the body and such that the plurality of retroreflectors is not externally visible from the controller body 602. While not being visible, the retroreflectors are still capable of reflecting signals received at the controller (such that the incoming and reflecting signals are enabled to pass through the relatively non-reflective housing of the body). To enable such a configuration, the controller body is composed of a relatively non-reflective material, such as a non-metallic material like plastic.

As shown in FIG. 6B, passive controller system 601 includes a controller body 603 comprising a handle base 605 and a top portion 607, wherein user input controls 609 are disposed on the controller body 603 on and/or near the top portion 607. In such embodiments, the passive controller system 601 includes a plurality of retroreflectors that is only partially encapsulated within a housing of the body (and/or disposed in a retroreflector assembly 611 on top of the top portion 607 of the body 603) and such that the plurality of retroreflectors in the retroreflector assembly 611 are at least partially and/or totally externally visible from the controller body 603 so as to minimize any possible signal interference from the body, or separate electronic components within the body (e.g., control button processor, power supply, transceiver(s), sensors, etc.).

Attention will now be directed to FIG. 7A, which illustrates a cross-sectional view of an example embodiment of a passive controller system 700 with a plurality of retroreflectors housed inside a body of the passive controller system (e.g., passive controller system 600 of FIG. 6A). As shown in FIG. 7A, the passive controller system 700 includes a controller body 702 comprising a handle base 704 and a top portion 706 which are configured to entirely encapsulate passive retroreflectors.

As shown in this configuration, the passive controller system 700 includes a first plurality of retroreflectors 710 configured as a single discrete and integrally connected retroreflector assembly that provides at least 180 degrees of signal reflection surface area (in some instances, 270 degrees of signal reflection surface area and/or 360 degrees of signal reflection surface area) and a second plurality of retroreflectors that are physically separated from each other by at least a small space, and such that they are not integrally connected together into a single integrated assembly. Instead, these retroreflectors are positioned and distributed throughout the handle base 704 to provide a desired coverage of signal reflection surface area based on their collective orientations and positioning within the base. For instance, retroreflector 712A, retroreflector 712B, retroreflector 712C, retroreflector 712D, and/or retroreflector 712E) are each individually disposed and distributed within the handle base 704 of the controller body 702. Each retroreflector of the second plurality of retroreflectors is tunable to provide a different angle of reflection with respect to one or more other retroreflectors, as desired.

In other embodiments, not shown, some the distributed retroreflectors within the base are in direct contact with each other. In some embodiments the retroreflectors and/or retroreflector assemblies are redundant relative to other retroreflectors and/or retroreflector assemblies incorporated within and/or on the controller to provide overlapping and redundant signal reflection surface areas relative to one or more other retroreflectors of the controller. Such a configuration is beneficial to compensate for and mitigate situations where one or more signals are interfered with or blocked during use of the controller, based on objects being temporarily interposed between the passive controller and the active sensor system (e.g., a metal watch of a user).

In the current configuration, the controller body 702 is shown having empty space 714 which is configured as hollow space inside the controller body 702 in order to house the different retroreflectors. As shown in FIG. 7A, the passive controller system 700 omits any active sensor device and is capable of reflecting the radar signal with the plurality of retroreflectors to a receiver that translates reflected signals from the plurality of retroreflectors to determine a relative position and an orientation of the passive controller system relative to the receiver.

As mentioned earlier, the passive controller system omits any inertial measurement unit (IMU) in some instances. In other alternative embodiments, the passive controller system may optionally include an IMU, wherein the IMU is configured to selectively replace and/or supplement the tracking capability of the active sensor system and provide supporting orientation and position data to the active sensor in order to more accurately and precisely determine the orientation and position of the passive controller system relative to the active sensor system during certain detected conditions, e.g., based on user input, based on bad signal reception at the receivers, based on application requirements, etc.

Beneficially, in most embodiment, the controller is a completely passive tracking controller and such that all of the tracking components (i.e., the retroreflectors) utilized by the controller are unpowered component and do not require any specialized circuitry (even passive circuitry like passive RFID circuit), nor do they require specialized and powered printed circuit board (PCB) components or other processing chips within the controller itself.

Beneficially, because the retroreflectors are passive, the passive controller system 700 does not need to be synchronized and/or remain in network communication with the active sensor system for the tracking processes of the passive controller within a predetermined distance of the active sensor system. Additionally, due the nature of the signals being used for the tracking, the passive controller does not have to remaining within line-of-sight of the active sensor system, as required for image tracking.

As described herein, the first and second pluralities of retroreflectors are attached to and/or within the body of the passive controller are positioned on/in the body of the passive controller to provide 360 degrees of reflecting surface and/or up to 360 degrees of reflecting surface (e.g., <90 degrees of reflecting surface, >90 degrees of reflecting surface, >120 degrees of reflecting surface, >180 degrees of reflecting surface, >270 degrees of reflecting surface, and/or <360 degrees of reflecting surface).

The referenced angular range(s) of reflecting surface facilitate the reflection of the referenced radar signals that are transmitted from a signal source system (e.g., active sensor system and/or a related system) towards the retroreflectors and that are reflected back to a receiver (e.g., the active sensor system and/or related system), irrespective of the orientation of the body relative to the source of the radar signal within the predetermined distance.

Attention will now be directed to FIG. 7B, which illustrates a cross-sectional view of an example embodiment of a passive controller system 701 with a plurality of retroreflectors partially housed inside a body of the passive controller system and partially disposed outside of the body of the passive controller. As shown in FIG. 7B, the passive controller system 701 includes a controller body 703 comprising a handle base 705 and a top portion 707.

The passive controller system 701 includes a first plurality of retroreflectors 711 configured as a connected unit of retroreflectors providing at least 180 degrees of signal reflection surface area (in some instances, 360 degrees of signal reflection surface area) and a second plurality of retroreflectors (e.g., retroreflector 713A, retroreflector 713B, retroreflector 713C, retroreflector 713D, and/or retroreflector 713E) which are individually disposed and distributed within the handle base 705 of the controller body 703.

In some instances, at least one of the second plurality of retroreflectors is not in any direct contact with another retroreflector in the controller. Whereas, in this configuration, the first plurality of retroreflectors 711 comprises a single integrated reflector unit that is detachably connected to the body, which each retroreflector has at least one side that is in direct planar contact with a different retroreflector of the plurality of retroreflectors.

Each retroreflector of the second plurality of retroreflectors is tunable to provide a different angle of reflection with respect to one or more other retroreflectors. As shown in FIG. 7B, the second plurality of retroreflectors is distributed throughout a handle base of the body and such that the plurality of retroreflectors includes at least two different retroreflectors that are connected to the body while being separated from each other by at least a space in the body or a physical structure of the body. Additionally, or alternatively, one or more retroflectors of the second plurality of retroreflectors is attachable to the handle base 705 at different locations of the handle base 705.

The controller body 703 is further shown having empty space 715A and empty space 715B which is configured as hollow space inside the controller body 703 in order to house the different retroreflectors.

Notwithstanding the specific embodiments just described, it will be appreciated that the scope of the invention includes passive controllers that are configured with any combination of the first and second plurality of retroreflectors and retroreflector assemblies described above, and which may be exposed and visible, externally from the controller body 703, and/or that are encapsulated within the controller body and/or that are not visible externally from the controller body 703.

Attention will now be directed to FIGS. 8A-8C, which illustrate various views of different example embodiments of a multi-controller passive controller system that is trackable without line-of-sight with respect to the active sensor system. As shown in FIG. 8A, a passive controller system comprises a first remote control 802A (i.e., left hand controller) configured with a first plurality of retroreflectors and a second remote control 802B (i.e., right hand controller) configured with a second plurality of retroreflectors. The first remote control 802A and second remote control 802B are both within line of sight of the active sensor system 804 (depicted as an HMD).

As shown in FIG. 8B, the second remote control 802B is partially hidden behind the first remote control 802A, such that its line of sight with respect to the active sensor system 804 is interrupted by the first remote control 802A. However, because the signals that are transmitted from the active sensor system 804 are able to pass through physical objects that are less reflective than the retroreflectors of the passive control system, the active sensor system 804 is still able to detect signals reflected from the second remote control 802B in order to determine the orientation and position of both the first remote control 802A and the second remote control 802B.

As shown in FIG. 8C, the first remote control is cross-over the second remote control 802B. However, the active sensor system 804 is configured to detect signals reflected from the respective remote controls and is able to track which remote control is the right-hand control and which remote control is the left-hand control, even when the positions appear to be switched relative to the active sensor system 804. In some instances, the active sensor system used to detect the signal reflected from the respective remote controls is able to track each remote control and identify it as the respective right or left controller due to a signal filtering technique that assumes that one remote control will not suddenly jump to new location. Thus, the path of movement for each controller must be a continuous, smooth path without discontinuity in locations. In this manner, the active sensor system is able to keep track of which remote control is which even if the controllers meet or overlap, as shown in FIG. 8C.

Attention will now be directed to FIGS. 9A-9B, which illustrate various example embodiments of a passive controller system (e.g., remote control 902A and remote control 902B) configured to project different virtual objects (e.g., light saber 906 and/or steering wheel 908) associated with the passive controller system. As shown in FIGS. 9A-9B, the active sensor system 904 is configured project a virtual object associated with the passive controller system in a pose and relative position within a mixed-reality environment virtually displayed to a user based on the orientation and the position of the passive controller system relative to the active sensor system.

In some embodiments, the plurality of retroreflectors of the passive controller system (and/or discrete retroreflectors) are detachable from the remote control and are attachable to a different remote control structured as a physical object (e.g., an actual light saber replica, or actual remote control steering wheel), using different means of attachment, e.g., magnets, adhesives, Velcro® or other hook and loop fasteners, clips, threaded couplings, etc.)

Attention will now be directed to FIG. 10, which illustrates an example embodiment of an active sensor system 1004 configured as a head-mounted device used to determine a position and orientation of a passive controller system 1002 (e.g., plurality of retroreflectors) based on a signal generated from and reflected back to the active sensor system 1004 by the passive controller system. The active sensor system 1004 is contained within a headset that is configured to be worn by a user and such that the active sensor system 1004 is configured to track the relative orientation and the relative position of the passive controller system 1002 relative to the headset during use of the headset and the passive controller system.

The reflected signal is detected and tracked via computing system 1006 which is in communication with the active sensor system 1004. In some embodiments, the computing system 1006 includes a display showing the change in reflected signals in order to determine the orientation and position of the passive controller system 1002. In other instances, the computing system 1006 is integrated with the active sensor system 1004 within the HMD.

FIG. 11 illustrates an example embodiment of an active sensor system 1100 having a plurality of transmitters and a plurality of receivers. The active sensor system 1100 is configured to track a relative orientation and a relative position of a passive controller system within a predetermined distance of the active sensor system. The active sensor system comprises one or more monostatic transmitters (e.g., transmitter 1102A and/or transmitter 1102B) and is configured to transmit one or more signals within a spherical range.

The active sensor system 1100 also comprises one or more monostatic receivers (e.g., receiver 1104A, receiver 1104B, receiver 1104C, and/or receiver 1104D) configured to receive one or more signals reflected from a plurality of retroreflectors attached to the passive controller system, the plurality of retroreflectors being configured to reflect the one or more signals to the one or more monostatic receivers when the passive controller system is positioned within the predetermined distance of the active sensor system.

It should be appreciated that the active sensor system also comprises one or more processors (e.g., processor(s) 108 of FIG. 1) for processing the one or more signals reflected from the plurality of retroreflectors and received by the one or more monostatic receivers to determine the relative position and the relative orientation of the passive controller system relative to the active sensor system 1100.

As shown in FIG. 11, the one or more monostatic transmitters and one or more monostatic receivers are fixedly positioned with respect to each other in a portrait orientation. In some instances, each receiver comprises a signal reception field of view of about 54 degrees. In this manner, the plurality of receivers is able to cover at least 90 degrees field of view, even more preferably about 180 degrees field of view. In some instances, the plurality of receivers comprises a field of view that is equivalent to the signal transmission area. In some instances, the field of view is larger or greater than the signal transmission area.

In alternative embodiments, the receivers and transmitters are positioned into different configurations on the active sensor system relative to each other and/or include different quantities of transmitters (e.g., 1, 3, 4, or more than 4) and/or different quantities of receivers (e.g., 1, 2, 3, 5, or more than 5). In yet other alternative embodiments, one or more of the transmitters and/or receivers are distributed between different devices of the active sensor system (e.g., a HMD, wired peripheral, remote beacon, remote transceiver and/or other sensor system).

In the current embodiment, the active sensor system of FIG. 11 is configured as a monostatic radar. In this monostatic radar configuration, each receiver is independent of the other receivers, such that the signals being received are processed incoherently. Furthermore, the receiver and transmitter share an antenna. However, in some instances, the active sensor system is configured as a bistatic or multi-static radar. A multi-static radar comprises multiple spatially diverse monostatic or bistatic radar components with a shared area of coverage or field of view.

In some instances, each transmitter in the active sensor system sweeps frequencies from a low end to a high end within a predetermined range of frequencies that is based on the size of the passive controller system and predetermined distance between the active sensor system and the passive controller system. These parameters also determine the power consumption of the active sensor system. The disclosed embodiments herein beneficially provide a low power active sensor system. Furthermore, active sensor system is also configured to perform the tracking algorithms which process the reflected signals that are received by the one or more receivers.

Attention will now be directed to FIG. 12A, which illustrates an example embodiment of signal being generated from an active sensor system configured as a head-mounted device and being reflected back to the active sensor system by a passive controller system. FIG. 12A shows a user 1202 wearing an active sensor system 1204 configured as an HMD. The active sensor system 1204 is configured to transmit one or more signals 1206 that are reflect-able back to the active sensor system 1204 (see reflected signal 1212) by a passive controller system 1208 being maneuvered by the user 1202 within a predetermined range 1210 relative to the active sensor system 1204.

In some embodiments, the predetermined range 1210 is a sphere, or part of a sphere (e.g., a hemisphere) having a predetermined radius. When there are physical limitations to a user’s environment, the predetermined range is automatically truncated to prevent the user from trying to move the passive controller system 1208 into the unavailable spaces. The active sensor system is able to detect the different large objects int the user’s environment (e.g., wall, another person, a desk, etc.) and track the passive controller system within the user’s environment that is truncated based on the detected objects.

The predetermined range 1210 is customizable based on the size and surface area of the passive controller system. For example, a larger signal reflection surface area on the retroreflectors will allow for a larger predetermined range. Where the passive controller system 1208 is held by a user, typically the range of motion is limited to the length of the user’s arms and flexibility of the various arm joints. Thus, the passive controller system 1208 is beneficially tuned to be optimized, lightweight and inexpensive based on the motion range of a typical user (e.g., see FIG. 3).

Attention will now be directed to FIG. 12B, which illustrates an example embodiment of signal being generated from an active sensor system configured as a beacon and being reflected back to the active sensor system by a passive controller system. FIG. 12B shows a user 1202 wearing an active sensor system 1204 configured as an HMD. The active sensor system 1204 is configured to transmit one or more signals 1206A that are reflect-able back to the active sensor system 1204 (see reflected signal 1212A) by a passive controller system 1208 being maneuvered by the user 1202 within a predetermined range 1210 relative to the active sensor system 1204.

Additionally, or alternatively, the active sensor system 1204 comprises a beacon 1214 that is configured to transmit one or more signals 1206B which are reflect-able back to the beacon 1214 by the passive controller system 1208 (see reflected signal(s) 1212B). The beacon is configurable as a stand-alone active sensor system, or in communication with an HMD active sensor system. In some embodiments, the beacon is attachable to a top portion of a display such as a television being used to display a virtual reality. In other embodiments, the active sensor system 1204 comprises a plurality of beacons in order to track the passive controller system within a larger predetermined range 1210 or to accommodate for line-of-sight interruptions within the user’s environment.

Attention will now be directed to FIG. 13, which illustrates another example embodiment of a process flow diagram for tracking a passive controller system using an active sensor system. For example, an active sensor 1302 is configured to transmit a radar signal 1306 in a direction that will reach the passive controller 1304 within a particular distance from the active sensor (e.g., transmit radar signal 1306). Irrespective of the position and/or orientation of the passive controller 1304 within the particular distance, the passive controller 1304 is configured to reflect the radar signal 1310 (e.g., reflect radar signal 1308). The active sensor 1302 then receives the reflected radar signal (e.g., receive radar signal 1312). Based on the transmitted radar signal and the received radar signal, the active sensor 1302 is configured to determine the position (e.g., determine position 1314) and determine the orientation (e.g., determine orientation 1316) of the passive controller system relative to the active sensor 1302.

Attention will now be directed to FIG. 14, which illustrates an example embodiment of methods for performing the disclosed embodiments. Although the method acts may be discussed in a certain order or illustrated in a flow chart as occurring in a particular order, no ordering is required unless specifically stated, or required because an act is dependent on another act being completed prior to the act being performed.

FIG. 14, in some reference to FIG. 1, illustrates a flow diagram 1400 that includes various acts (act 1410, act 1420, and act 1430) associated with exemplary methods that can be implemented by computer system 1500 (see FIG. 15) for detecting an orientation and a position of a passive controller system relative to an active sensor system. As illustrated, the computing system first transmits the one or more signals from a plurality of monostatic transmitters of the active sensor system in a signal transmission area and a direction in which a passive controller system is located when the passive controller system is located within the predetermined distance of the active sensor system within the signal transmission area (act 1410).

The system then receives and detects the one or more reflected signals reflected back from a plurality of retroreflectors attached to the passive controller system (act 1420). The plurality of retroreflectors is configured on the passive controller system to reflect the one or more signals back to the active sensor system as the one or more reflected signals irrespective of orientation or position of the passive controller system when the passive controller system is positioned within the predetermined distance of the active sensor system within the signal transmission area.

The system determines the orientation and the position of the passive controller system relative to the active sensor system based on the one or more signals and the one or more reflected signals (act 1430). This determination is made by calculating the relative angles of the transmitted and detected reflected signals, as well as the timing of the signal transmissions and detected reflected signals. When there are multiple receivers being used, and multiple reflected signals are detected, the calculations can also include signal triangulation. When the originating/transmitted signals are transmitted from a first device and the reflected signals are detected by a second device (remote from the first device), the active sensor system (which incorporates both of the first and second device) can still calculate relative positioning of the passive controller based on an analysis of the transmitted signals and received/detected reflected signals.

In some instances, the calculation of the orientation and the position of the passive controller system relative to the active sensor system based on the one or more signals and the one or more reflected signals (act 1430) is performed by a single device (e.g., the HMD device or other device incorporating the active sensor system). In other embodiments, the signals are detected and provided to a server or remote system that performs the calculations to determine the relative orientation and position of the passive controller system relative to the active sensor system and/or the remote system (e.g., beacon).

In some embodiments, the system is also configured with and/or executes computer-readable instructions to configure the computing system to project a virtual object associated with the passive controller system based on the detected pose/orientation and/or position of the controller/controller system within a mixed-reality environment and/or based on the orientation and the position of the passive controller/controller system relative to the active sensor system.

In view of the foregoing, it will be appreciated that the disclosed embodiments provide many technical benefits over conventional systems and methods for tracking a passive controller system using retroreflectors configured to reflect a radio frequency signal back to a source of the radio frequency signal.

Example Computer/Computer Systems

Attention will now be directed to FIG. 15 which illustrates an example computer system 1500 that may include and/or be used to perform any of the operations described herein. Computer system 1500 may take various different forms. For example, computer system 1500 may be embodied as a tablet 1500A, a desktop or a laptop 1500B, a wearable device (e.g., head-mounted device 1500C), a drone 1500D, vehicle or other mobile device (e.g., the active sensor system is able to be moved and guided through a space), a beacon 1500E (e.g., the active sensor system is external to a mixed-reality headset), a mixed-reality system device, and/or any other device, as illustrated by the ellipsis 1500F.

Computer system 1500 may also be configured as a standalone device or, alternatively, as a distributed system that includes one or more connected computing components/devices that are in communication with computer system 1500.

In its most basic configuration, computer system 1500 includes various different components. FIG. 15 shows that computer system 1500 includes one or more processor(s) 1502 (aka a “hardware processing unit”) and storage 1504.

Regarding the processor(s) 1502, it will be appreciated that the functionality described herein can be performed, at least in part, by one or more hardware logic components (e.g., the processor(s) 1502). For example, and without limitation, illustrative types of hardware logic components/processors that can be used include Field-Programmable Gate Arrays (“FPGA”), Program-Specific or Application-Specific Integrated Circuits (“ASIC”), Program-Specific Standard Products (“ASSP”), System-On-A-Chip Systems (“SOC”), Complex Programmable Logic Devices (“CPLD”), Central Processing Units (“CPU”), Graphical Processing Units (“GPU”), or any other type of programmable hardware.

As used herein, the terms “executable module,” “executable component,” “component,” “module,” or “engine” can refer to hardware processing units or to software objects, routines, or methods that may be executed on computer system 1500. The different components, modules, engines, and services described herein may be implemented as objects or processors that execute on computer system 1500 (e.g., as separate threads).

Storage 1504 may be physical system memory, which may be volatile, non-volatile, or some combination of the two. The term “memory” may also be used herein to refer to non-volatile mass storage such as physical storage media. If computer system 1500 is distributed, the processing, memory, and/or storage capability may be distributed as well.

Storage 1504 is shown as including executable instructions 1506. The executable instructions 1506 represent instructions that are executable by the processor(s) 1502 of computer system 1500 to perform the disclosed operations, such as those described in the various methods.

The disclosed embodiments may comprise or utilize a special-purpose or general-purpose computer including computer hardware, such as, for example, one or more processors (such as processor(s) 1502) and system memory (such as storage 1504), as discussed in greater detail below. Embodiments also include physical and other computer-readable media for carrying or storing computer-executable instructions and/or data structures. Such computer-readable media can be any available media that can be accessed by a general-purpose or special-purpose computer system.

Computer-readable media that store computer-executable instructions in the form of data are physical or hardware computer storage media or device(s). Computer-readable media that merely carry computer-executable instructions are transitory media or transmission media. Thus, by way of example and not limitation, the current embodiments can comprise at least two distinctly different kinds of computer-readable media: (1) computer-readable hardware storage media and (2) transitory transmission media that does not include hardware storage.

The referenced computer storage device(s) (aka “hardware storage device(s)”) comprise hardware storage components/devices, such as RAM, ROM, EEPROM, CD-ROM, solid state drives (“SSD”) that are physical and tangible and that are based on RAM, Flash memory, phase-change memory (“PCM”), or other types of memory, or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to store desired program code means in the form of computer-executable instructions, data, or data structures and that can be accessed by a general-purpose or special-purpose computer and which are distinguished from mere carrier waves and signals.

Computer system 1500 may also be connected (via a wired or wireless connection) to external sensors (e.g., one or more remote cameras) or devices via a network 1508. For example, computer system 1500 can communicate with any number devices or cloud services to obtain or process data. In some cases, network 1508 may itself be a cloud network. Furthermore, computer system 1500 may also be connected through one or more wired or wireless networks 1508 to remote/separate computer systems(s) that are configured to perform any of the processing described with regard to computer system 1500.

A “network,” like network 1508, is defined as one or more data links and/or data switches that enable the transport of electronic data between computer systems, modules, and/or other electronic devices. When information is transferred, or provided, over a network (either hardwired, wireless, or a combination of hardwired and wireless) to a computer, the computer properly views the connection as a transmission medium. Computer system 1500 will include one or more communication channels that are used to communicate with the network 1508. Transmission media include a network that can be used to carry data or desired program code means in the form of computer-executable instructions or in the form of data structures. Further, these computer-executable instructions can be accessed by a general-purpose or special-purpose computer. Combinations of the above should also be included within the scope of computer-readable media.

Upon reaching various computer system components, program code means in the form of computer-executable instructions or data structures can be transferred automatically from transmission media to computer storage media (or vice versa). For example, computer-executable instructions or data structures received over a network or data link can be buffered in RAM within a network interface module (e.g., a network interface card or “NIC”) and then eventually transferred to computer system RAM and/or to less volatile computer storage media at a computer system. Thus, it should be understood that computer storage media can be included in computer system components that also (or even primarily) utilize transmission media.

Computer-executable (or computer-interpretable) instructions comprise, for example, instructions that cause a general-purpose computer, special-purpose computer, or special-purpose processing device to perform a certain function or group of functions. The computer-executable instructions may be, for example, binaries, intermediate format instructions such as assembly language, or even source code. Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the described features or acts described above. Rather, the described features and acts are disclosed as example forms of implementing the claims.

Those skilled in the art will appreciate that the embodiments may be practiced in network computing environments with many types of computer system configurations, including personal computers, desktop computers, laptop computers, message processors, hand-held devices, multi-processor systems, microprocessor-based or programmable consumer electronics, network PCs, minicomputers, mainframe computers, mobile telephones, PDAs, pagers, routers, switches, and the like. The embodiments may also be practiced in distributed system environments where local and remote computer systems that are linked (either by hardwired data links, wireless data links, or by a combination of hardwired and wireless data links) through a network each perform tasks (e.g., cloud computing, cloud services and the like). In a distributed system environment, program modules may be located in both local and remote memory storage devices.

The present invention may be embodied in other specific forms without departing from its characteristics. The described embodiments are to be considered in all respects only as illustrative and not restrictive. The scope of the invention is, therefore, indicated by the appended claims rather than by the foregoing description. All changes which come within the meaning and range of equivalency of the claims are to be embraced within their scope.

文章《Microsoft Patent | Rf retroreflector based controller tracking for vr headsets》首发于Nweon Patent

]]>
Microsoft Patent | Reprojecting holographic video to enhance streaming bandwidth/quality https://patent.nweon.com/27437 Thu, 16 Mar 2023 12:13:18 +0000 https://patent.nweon.com/?p=27437 ...

文章《Microsoft Patent | Reprojecting holographic video to enhance streaming bandwidth/quality》首发于Nweon Patent

]]>
Patent: Reprojecting holographic video to enhance streaming bandwidth/quality

Patent PDF: 加入映维网会员获取

Publication Number: 20230082705

Publication Date: 2023-03-16

Assignee: Microsoft Technology Licensing

Abstract

Improved video compression and video streaming systems and methods are disclosed for environments where camera motion is common, such as cameras incorporated into head-mounted displays. This is accomplished by combining a 3D representation of the shape of the user’s environment (walls, floor, ceiling, furniture, etc.), image data, and data representative of changes in the location and orientation (pose) of the camera between successive image frames, thereby reducing data bandwidth needed to send streaming video in the presence of camera motion.

Claims

1. 1.-20. (canceled)

21.A computer system comprising one or more processors and memory, the computer system being configured to perform operations comprising: generating, using a previous camera pose and a 3D representation of a shape of a user’s environment, a previous frame image as viewed from the previous camera pose; receiving compressed video data and camera pose data that defines a current camera pose different than the previous camera pose; generating, using the current camera pose and the 3D representation of the shape of the user’s environment, a reprojection of the previous frame image as if viewed from the current camera pose instead of the previous camera pose, the reprojection of the previous frame image being usable to predict a current frame image as viewed from the current camera pose; decompressing the compressed video data, the compressed video data defining differences relative to the reprojection of the previous frame image; and applying the differences to the reprojection of the previous frame image to generate the current frame image.

22.The computer system of claim 21, wherein the operations further comprise: causing the previous frame image to be rendered on a head-mounted display device; and causing the current frame image to be rendered on the head-mounted display device.

23.The computer system of claim 22, wherein the head-mounted display device includes see-through holographic lenses onto which the previous frame image and current frame image are projected.

24.The computer system of claim 21, wherein the computer system receives the compressed video data from a compressor that generates the compressed video data by calculating and compressing, for a version of the current frame image captured by a camera at the current camera pose, differences between the reprojection of the previous frame image and the version of the current frame image captured by the camera at the current camera pose.

25.The computer system of claim 21, wherein the compressed video data is received with surface (depth) information associated with the 3D representation of the shape of the user’s environment, the 3D representation of the shape of the user’s environment changing based at least in part on the surface (depth) information.

26.The computer system of claim 21, wherein the previous camera pose has a previous location and/or a previous orientation, and the current camera pose has a current location different than the previous location and/or has a current orientation different than the previous orientation.

27.The computer system of claim 21, wherein the generating the reprojection of the previous frame image includes: updating the 3D representation of the shape of the user’s environment based on the current camera pose; and rendering the updated 3D representation to generate the reprojection of the previous frame image.

28.A method for generating image data with compressed video data, the method being implemented by a computer system, the method comprising: generating, using a previous camera pose and a 3D representation of a shape of a user’s environment, a previous frame image as viewed from the previous camera pose; receiving compressed video data and camera pose data that defines a current camera pose different than the previous camera pose; generating, using the current camera pose and the 3D representation of the shape of the user’s environment, a reprojection of the previous frame image as if viewed from the current camera pose instead of the previous camera pose, the reprojection of the previous frame image being usable to predict a current frame image as viewed from the current camera pose; decompressing the compressed video data, the compressed video data defining differences relative to the reprojection of the previous frame image; and applying the differences to the reprojection of the previous frame image to generate the current frame image.

29.The method of claim 28, further comprising: causing the previous frame image to be rendered on a head-mounted display device; and causing the current frame image to be rendered on the head-mounted display device.

30.The method of claim 29, wherein the head-mounted display device includes see-through holographic lenses onto which the previous frame image and current frame image are projected.

31.The method of claim 28, wherein the computer system receives the compressed video data from a compressor that generates the compressed video data by calculating and compressing, for a version of the current frame image captured by a camera at the current camera pose, differences between the reprojection of the previous frame image and the version of the current frame image captured by the camera at the current camera pose.

32.The method of claim 28, wherein the compressed video data is received with surface (depth) information associated with the 3D representation of the shape of the user’s environment, the 3D representation of the shape of the user’s environment changing based at least in part on the surface (depth) information.

33.The method of claim 28, wherein the previous camera pose has a previous location and/or a previous orientation, and the current camera pose has a current location different than the previous location and/or has a current orientation different than the previous orientation.

34.The method of claim 28, wherein the generating the reprojection of the previous frame image includes: updating the 3D representation of the shape of the user’s environment based on the current camera pose; and rendering the updated 3D representation to generate the reprojection of the previous frame image.

35.One or more physical computer-readable storage media having stored thereon compressed video data and camera pose data that defines a current camera pose, the compressed video data defining differences relative to a reprojection of a previous frame image, the compressed video data and camera pose data being organized to facilitate reconstruction of a current frame image by operations comprising: generating, using a previous camera pose and a 3D representation of a shape of a user’s environment, the previous frame image as viewed from the previous camera pose, the previous camera pose being different than the current camera pose; generating, using the current camera pose and the 3D representation of the shape of the user’s environment, the reprojection of the previous frame image as if viewed from the current camera pose instead of the previous camera pose, the reprojection of the previous frame image being usable to predict the current frame image as viewed from the current camera pose; decompressing the compressed video data, thereby producing the differences relative to the reprojection of the previous frame image; and applying the differences to the reprojection of the previous frame image to generate the current frame image.

36.The one or more physical computer-readable storage media of claim 35, wherein the operations further comprise: causing the previous frame image to be rendered on a head-mounted display device; and causing the current frame image to be rendered on the head-mounted display device.

37.The one or more physical computer-readable storage media of claim 36, wherein the head-mounted display device includes see-through holographic lenses onto which the previous frame image and current frame image are projected.

38.The one or more physical computer-readable storage media of claim 35, wherein the compressed video data is received with surface (depth) information associated with the 3D representation of the shape of the user’s environment, the 3D representation of the shape of the user’s environment changing based at least in part on the surface (depth) information.

39.The one or more physical computer-readable storage media of claim 35, wherein the previous camera pose has a previous location and/or a previous orientation, and the current camera pose has a current location different than the previous location and/or has a current orientation different than the previous orientation.

40.The one or more physical computer-readable storage media of claim 35, wherein the generating the reprojection of the previous frame image includes: updating the 3D representation of the shape of the user’s environment based on the current camera pose; and rendering the updated 3D representation to generate the reprojection of the previous frame image.

Description

BACKGROUNDBackground and Relevant Art

Mixed reality is a technology that allows virtual imagery to be mixed with a real world physical environment in a display. Systems for mixed reality may include, for example, see through head-mounted display (HMD) devices or smart phones with built in cameras. Such systems typically include processing units which provide the imagery under the control of one or more applications. Full virtual reality environments, in which no real world objects are viewable, can also be supported using HMD and other devices.

Many HMDs also include one or more forward-facing cameras that capture the environment in front of the user as viewed from the user’s perspective. Such forward-facing cameras may be depth cameras, which not only capture image data, but also capture depth or surface data about the user’s environment. Image data captured from the forward-facing camera may be used by on-board processors located on the HMD to generate mixed reality or virtual reality display data that can be rendered to the user via the user display incorporated into the HMD. In addition, image data captured from the forward-facing camera can be compressed, via video compression algorithms, and transmitted to other devices for viewing by others at other locations. For example, in a training situation, video from the forward-facing camera(s) of a HMD can be transmitted to a trainer at a remote location so that the trainer can observe in almost real time what the trainee is doing, allowing the trainer to guide or instruct, via audio and/or video communications, the trainee on how to perform a specific task (all the while observing what the trainee is actually doing). Video quality is important to making this scenario work, but video quality is limited by the user’s wifi connection and Internet bandwidth. Technologically, Internet bandwidth is growing at a slower rate than the resolution of cameras and displays, so data compression can be very important to video streaming.

HMDs present some unique challenges in relation to video compression and streaming live video. As mentioned in the preceding paragraph, bandwidth limitations can be one challenge. In addition, many existing video compression techniques, such as P-frame and B-frame techniques, rely on the notion that much of the background of successive frames of video remain relatively static from one video frame to the next, so that only data relating to those pixels that actually change between one video frame and the next need to be transmitted. However, in the context of a HMD, head movement can result in much more frequent and greater changes in the image data, including the background, which reduces the effectiveness of known video compression techniques. Typical streaming video compression strategies are not tuned to accommodate a large amount of camera movement.

Thus, in situations where camera movement (i.e., changes in camera location and/or orientation) occurs frequently, such as in the context of HMDs, improved systems, devices and methods are needed for video compression and video streaming to reduce bandwidth requirements and/or improve resolution.

The subject matter claimed herein is not limited to embodiments that solve any disadvantages or that operate only in environments such as those described above. Rather, this background is only provided to illustrate one exemplary technology area where some embodiments described herein may be practiced.

BRIEF SUMMARY

Improved video compression and video streaming systems and methods are disclosed for environments where camera motion is common, such as cameras incorporated into handheld devices, such as mobile phones, tablets, laptops, etc., or HMDs. This is accomplished by combining a 3D representation of the shape of the user’s environment (walls, floor, ceiling, furniture, etc.), image data, and data representative of changes in the location and orientation (pose) of the camera between successive image frames. The benefit of this is that it reduces data bandwidth needed to send streaming video in the presence of camera motion.

This technology applies to a holographic device with a video camera, depth sensor, and the ability to determine its position in 3D space. A streaming application running on the device can execute an image-warping 3D transformation applied to previous frames before passing them to a video compressor that computes P-frames (or B-frames). This makes use of the known 3D position of the camera and 3D geometry in front of the camera to reproject the previous frame based on the current location of the camera.

This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.

Additional features and advantages will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the teachings herein. Features and advantages of the invention may be realized and obtained by means of the instruments and combinations particularly pointed out in the appended claims. Features of the present invention will become more fully apparent from the following description and appended claims, or may be learned by the practice of the invention as set forth hereinafter.

BRIEF DESCRIPTION OF THE DRAWINGS

In order to describe the manner in which the above-recited and other advantages and features can be obtained, a more particular description of the subject matter briefly described above will be rendered by reference to specific embodiments which are illustrated in the appended drawings. Understanding that these drawings depict only typical embodiments and are not therefore to be considered to be limiting in scope, embodiments will be described and explained with additional specificity and detail through the use of the accompanying drawings in which:

FIG. 1 is a functional block diagram of one illustrative example of an operating environment for implementing the systems and methods disclosed herein.

FIG. 2 is a schematic representation of one illustrative example of an HMD.

FIG. 3 is a general block diagram of one illustrative example of a teleconferencing system.

FIG. 4 is a schematic representation of a first FOV of a 3D environment when viewed from a first location through an HMD.

FIG. 5 is a schematic representation of a second FOV of the 3D environment when viewed from a second location through the HMD.

FIG. 6 is functional block diagram illustrating an embodiment of the systems and methods disclosed herein.

FIG. 7 is functional block diagram illustrating an embodiment of the systems and methods disclosed herein.

FIG. 8 is functional block diagram illustrating an embodiment of the systems and methods disclosed herein.

DETAILED DESCRIPTION

The following discussion now refers to a number of systems, methods and method acts that may be performed. Although various method acts may be discussed in a certain order or illustrated in a flow chart as occurring in a particular order, no particular ordering is required unless specifically stated, or required because an act is dependent on another act being completed prior to the act being performed.

Technology is disclosed herein for providing improved video compression and video streaming systems and methods that are particularly suited for environments where camera motion is common, such as cameras incorporated into handheld devices, such as mobile phones, tablets, laptops, etc., or HMDs. This is accomplished by combining a 3D representation of the shape of the user’s environment (walls, floor, ceiling, furniture, etc.), image data, and data representative of changes in the location and orientation (pose) of the camera between successive image frames. The benefit of this is that it reduces data bandwidth needed to send streaming video in the presence of camera motion.

One embodiment disclosed herein includes video compression systems and methods that can include: capturing successive frames of video from a camera incorporated into a first device, the successive frames of video comprising a previous frame image and a current frame image and including surface (depth) information; constructing a first 3D representation of the shape of a user’s environment based on the surface (depth) information of the previous frame image and a first camera location and orientation (pose) corresponding to the previous frame image; projecting the previous frame image onto the first 3D representation; detecting a change in the location and orientation (pose) of the camera between the previous frame image and the current frame image and generating current frame camera pose data representative of the location and orientation (pose) of the camera for the current frame image; constructing a second 3D representation of the shape of the user’s environment based on a second camera location and orientation (pose) corresponding to the current frame image, the second 3D representation corresponding to the shape of the user’s environment as if it were viewed from the second location and orientation (pose) of the camera, and rendering the second 3D representation to generate a re-projected previous frame image as viewed from the second location and orientation (pose) of the camera; passing the re-projected previous frame image to a video compressor for computing differences between the re-projected previous frame image and the actual current frame image; and generating compressed video data comprising only the differences between the re-projected previous frame image and the actual current frame image.

Another embodiment includes video streaming systems and methods that can decode compressed video data by: receiving at a second device, compressed video data and current frame camera pose data; constructing, by the second device, a 3D representation of the shape of the user’s environment based on the received current frame camera pose data and rendering the 3D representation to generate a re-projected previous frame image data; applying the received compressed video data to the re-projected previous frame image data to generate current frame image data.

In yet another embodiment, the compressed video data can be generated by a first device, in the manner described herein, and then compressed video data communicated to a second device for decompression in the manner described herein.

These systems and methods will be described below in the context of head mounted augmented or mixed reality (AR) display, such as the Microsoft HoloLens. It should be understood, however, that the systems and methods described and claimed here in are not limited to HMDs, the HoloLens or any other specific device, but may be adapted to any device that is capable of capturing images, generating 3D representations of the shape of the user’s environment (walls, floor, ceiling, furniture, etc.), and tracking the location and orientation (pose) of the device.

Exemplary Operating Environment

FIG. 1 is a block diagram of one embodiment of a networked computing environment 100 in which the disclosed technology may be practiced. Networked computing environment 100 includes a plurality of computing devices interconnected through one or more networks 180. The one or more networks 180 allow a particular computing device to connect to and communicate with another computing device. The depicted computing devices include mobile device 11, mobile device 12, mobile device 19, and server 15. In some embodiments, the plurality of computing devices may include other computing devices not shown. In some embodiments, the plurality of computing devices may include more than or less than the number of computing devices shown in FIG. 1. The one or more networks 180 may include a secure network such as an enterprise private network, an unsecure network such as a wireless open network, a local area network (LAN), a wide area network (WAN), and the Internet. Each network of the one or more networks 180 may include hubs, bridges, routers, switches, and wired transmission media such as a wired network or direct-wired connection.

Server 15, which may comprise a supplemental information server or an application server, may allow a client to download information (e.g., text, audio, image, and video files) from the server or to perform a search query related to particular information stored on the server. In general, a “server” may include a hardware device that acts as the host in a client-server relationship or a software process that shares a resource with or performs work for one or more clients. Communication between computing devices in a client-server relationship may be initiated by a client sending a request to the server asking for access to a particular resource or for particular work to be performed. The server may subsequently perform the actions requested and send a response back to the client.

One embodiment of server 15 includes a network interface 155, processor 156, memory 157, and translator 158, all in communication with each other. Network interface 155 allows server 15 to connect to one or more networks 180. Network interface 155 may include a wireless network interface, a modem, and/or a wired network interface. Processor 156 allows server 15 to execute computer readable instructions stored in memory 157 in order to perform processes discussed herein. Translator 158 may include mapping logic for translating a first file of a first file format into a corresponding second file of a second file format (i.e., the second file may be a translated version of the first file). Translator 158 may be configured using file mapping instructions that provide instructions for mapping files of a first file format (or portions thereof) into corresponding files of a second file format.

One embodiment of mobile device 19 includes a network interface 145, processor 146, memory 147, camera 148, sensors 149, and display 150, all in communication with each other. Network interface 145 allows mobile device 19 to connect to one or more networks 180. Network interface 145 may include a wireless network interface, a modem, and/or a wired network interface. Processor 146 allows mobile device 19 to execute computer readable instructions stored in memory 147 in order to perform processes discussed herein. Camera 148 may capture color images and/or depth images of an environment. The mobile device 19 may include outward facing cameras that capture images of the environment and inward facing cameras that capture images of the end user of the mobile device. Sensors 149 may generate motion and/or orientation information associated with mobile device 19. In some cases, sensors 149 may comprise an inertial measurement unit (IMU). Display 150 may display digital images and/or videos. Display 150 may comprise a see-through display. Display 150 may comprise an LED or OLED display.

In some embodiments, various components of mobile device 19 including the network interface 145, processor 146, memory 147, camera 148, and sensors 149 may be integrated on a single chip substrate. In one example, the network interface 145, processor 146, memory 147, camera 148, and sensors 149 may be integrated as a system on a chip (SOC). In other embodiments, the network interface 145, processor 146, memory 147, camera 148, and sensors 149 may be integrated within a single package.

In some embodiments, mobile device 19 may provide a natural user interface (NUI) by employing camera 148, sensors 149, and gesture recognition software running on processor 146. With a natural user interface, a person’s body parts and movements may be detected, interpreted, and used to control various aspects of a computing application. In one example, a computing device utilizing a natural user interface may infer the intent of a person interacting with the computing device (e.g., that the end user has performed a particular gesture in order to control the computing device).

Networked computing environment 100 may provide a cloud computing environment for one or more computing devices. Cloud computing refers to Internet-based computing, wherein shared resources, software, and/or information are provided to one or more computing devices on-demand via the Internet (or other global network). The term “cloud” is used as a metaphor for the Internet, based on the cloud drawings used in computer networking diagrams to depict the Internet as an abstraction of the underlying infrastructure it represents.

In one example, mobile device 19 comprises a HMD that provides an augmented reality environment or a mixed reality environment to an end user of the HMD. The HMD may comprise a video see-through and/or an optical see-through system. An optical see-through HMD worn by an end user may allow actual direct viewing of a real-world environment (e.g., via transparent lenses) and may, at the same time, project images of a virtual object into the visual field of the end user thereby augmenting the real-world environment perceived by the end user with the virtual object.

Utilizing an HMD, an end user may move around a real-world environment (e.g., a living room) wearing the HMD and perceive views of the real-world overlaid with images of virtual objects. The virtual objects may appear to maintain coherent spatial relationship with the real-world environment (i.e., as the end user turns their head or moves within the real-world environment, the images displayed to the end user will change such that the virtual objects appear to exist within the real-world environment as perceived by the end user). The virtual objects may also appear fixed with respect to the end user’s point of view (e.g., a virtual menu that always appears in the top right corner of the end user’s point of view regardless of how the end user turns their head or moves within the real-world environment). In one embodiment, environmental mapping of the real-world environment may be performed by server 15 (i.e., on the server side) while camera localization may be performed on mobile device 19 (i.e., on the client side). The virtual objects may include a text description associated with a real-world object.

In some embodiments, a mobile device, such as mobile device 19, may be in communication with a server in the cloud, such as server 15, and may provide to the server location information (e.g., the location of the mobile device via GPS coordinates) and/or image information (e.g., information regarding objects detected within a field of view of the mobile device) associated with the mobile device. In response, the server may transmit to the mobile device one or more virtual objects based upon the location information and/or image information provided to the server. In one embodiment, the mobile device 19 may specify a particular file format for receiving the one or more virtual objects and server 15 may transmit to the mobile device 19 the one or more virtual objects embodied within a file of the particular file format.

In some embodiments, an HMD, such as mobile device 19, may use images of an environment captured from an outward facing camera in order to determine a six degree of freedom (6DOF) pose corresponding with the images relative to a 3D model of the environment. The 6DOF pose may comprise information associated with the position and orientation of the HMD within the environment. The 6DOF pose may be used for localizing the HMD and for generating images of virtual objects such that the virtual objects appear to exist at appropriate locations within the environment. More information regarding determining a 6DOF pose can be found in U.S. patent application Ser. No. 13/152,220, “Distributed Asynchronous Localization and Mapping for Augmented Reality,” incorporated herein by reference in its entirety. More information regarding performing pose estimation and/or localization for a mobile device can be found in U.S. patent application Ser. No. 13/017,474, “Mobile Camera Localization Using Depth Maps,” incorporated herein by reference in its entirety.

In some embodiments, an HMD, such as mobile device 19, may display images of virtual objects within an augmented reality (AR) environment at a frame rate that is greater than a rendering frame rate for the core rendering pipeline or rendering graphics processing unit (GPU). The HMD may modify pre-rendered images or forward predicted images that are rendered at the rendering frame rate based on updated pose estimates that are provided at a higher frequency than the rendering frame rate. In some embodiments, the HMD may generate the pre-rendered image based on a predicted pose at the rendering frame rate (e.g., every 16 ms), determine one or more updated poses associated with the HMD subsequent to generating the pre-rendered image (e.g., every 2 ms), generate one or more updated images based on the one or more updated poses and the pre-rendered image, and display the one or more updated images on the HMD. In some cases, the one or more updated images may be generated via homographic transformations and/or a pixel offset adjustments using circuitry within the display, such as display 150.

FIG. 2 depicts one embodiment of a portion of an HMD 200, such as mobile device 19 in FIG. 1. Only the right side of HMD 200 is depicted. HMD 200 includes right temple 202, nose bridge 204, eye glass 216, and eye glass frame 214. Right temple 202 includes a capture device 213 (e.g., a front facing camera and/or microphone) in communication with processing unit 236. The capture device 213 may include one or more cameras for recording digital images and/or videos and may transmit the visual recordings to processing unit 236. The one or more cameras may capture color information, IR information, and/or depth information. The capture device 213 may also include one or more microphones for recording sounds and may transmit the audio recordings to processing unit 236.

Right temple 202 also includes biometric sensor 220, eye tracking system 221, ear phones 230, motion and orientation sensor 238, GPS receiver 232, power supply 239, and wireless interface 237, all in communication with processing unit 236. Biometric sensor 220 may include one or more electrodes for determining a pulse or heart rate associated with an end user of HMD 200 and a temperature sensor for determining a body temperature associated with the end user of HMD 200. In one embodiment, biometric sensor 220 includes a pulse rate measuring sensor which presses against the temple of the end user. Motion and orientation sensor 238 may include a three axis magnetometer, a three axis gyro, and/or a three axis accelerometer. In one embodiment, the motion and orientation sensor 238 may comprise an inertial measurement unit (IMU). The GPS receiver may determine a GPS location associated with HMD 200. Processing unit 236 may include one or more processors and a memory for storing computer readable instructions to be executed on the one or more processors. The memory may also store other types of data to be executed on the one or more processors.

In one embodiment, the eye tracking system 221 may include one or more inward facing cameras. In another embodiment, the eye tracking system 221 may comprise an eye tracking illumination source and an associated eye tracking image sensor. In one embodiment, the eye tracking illumination source may include one or more infrared (IR) emitters such as an infrared light emitting diode (LED) or a laser (e.g. VCSEL) emitting about a predetermined IR wavelength or a range of wavelengths. In some embodiments, the eye tracking sensor may include an IR camera or an IR position sensitive detector (PSD) for tracking glint positions. More information about eye tracking systems can be found in U.S. Pat. No. 7,401,920, entitled “Head Mounted Eye Tracking and Display System”, issued Jul. 22, 2008, and U.S. patent application Ser. No. 13/245,700, entitled “Integrated Eye Tracking and Display System,” filed Sep. 26, 2011, both of which are herein incorporated by reference.

In one embodiment, eye glass 216 may comprise a see-through display, whereby images generated by processing unit 236 may be projected and/or displayed on the see-through display. The see-through display may display images of virtual objects by modulating light provided to the display, such as a liquid crystal on silicon (LCOS) display, or by generating light within the display, such as an OLED display. The capture device 213 may be calibrated such that a field of view (FOV) captured by the capture device 213 corresponds with the FOV as seen by an end user of HMD 200. The ear phones 230 may be used to output sounds associated with the projected images of virtual objects. In some embodiments, HMD 200 may include two or more front facing cameras (e.g., one on each temple) in order to obtain depth from stereo information associated with the FOV captured by the front facing cameras. The two or more front facing cameras may also comprise 3D, IR, and/or RGB cameras. Depth information may also be acquired from a single camera utilizing depth from motion techniques. For example, two images may be acquired from the single camera associated with two different points in space at different points in time. Parallax calculations may then be performed given position information regarding the two different points in space.

In some embodiments, HMD 200 may perform gaze detection for each eye of an end user’s eyes using gaze detection elements and a three-dimensional coordinate system in relation to one or more human eye elements such as a cornea center, a center of eyeball rotation, or a pupil center. Gaze detection may be used to identify where the end user is focusing within a FOV. Examples of gaze detection elements may include glint generating illuminators and sensors for capturing data representing the generated glints. In some cases, the center of the cornea can be determined based on two glints using planar geometry. The center of the cornea links the pupil center and the center of rotation of the eyeball, which may be treated as a fixed location for determining an optical axis of the end user’s eye at a certain gaze or viewing angle.

Microsoft HoloLens

One example of a HMD is the Microsoft HoloLens, which is a pair of mixed reality head-mounted smartglasses. HoloLens has see-through holographic lenses that use an advanced optical projection system to generate multi-dimensional full-color holograms with very low latency so a user can see holographic objects in a real world setting.

Located at the front of the HoloLens are sensors and related hardware, including cameras and processors. The HoloLens also incorporates an inertial measurement unit (IMU), which includes an accelerometer, gyroscope, and a magnetometer, four “environment understanding” sensors, an energy-efficient depth camera with a 120°×120° angle of view, a forward-facing 2.4-megapixel photographic video camera, a four-microphone array, and an ambient light sensor. HoloLens contains advanced sensors to capture information about what the user is doing and the environment the user is in. The built in cameras also enable a user to record (mixed reality capture (MRC)) HD pictures and video of the holograms in the surrounding world to share with others.

Enclosed within the visor is a pair of transparent combiner lenses, in which the projected images are displayed in the lower half. The HoloLens must be calibrated to the interpupillary distance (IPD), or accustomed vision of the user.

Along the bottom edges of the side, located near the user’s ears, are a pair of small, 3D audio speakers. The speakers do not obstruct external sounds, allowing the user to hear virtual sounds, along with the environment. Using head-related transfer functions, the HoloLens generates binaural audio, which can simulate spatial effects; meaning the user, virtually, can perceive and locate a sound, as though it is coming from a virtual pinpoint or location.

On the top edge are two pairs of buttons: display brightness buttons above the left ear, and volume buttons above the right ear. Adjacent buttons are shaped differently-one concave, one convex-so that the user can distinguish them by touch.

At the end of the left arm is a power button and row of five, small individual LED nodes, used to indicate system status, as well as for power management, indicating battery level and setting power/standby mode. A USB 2.0 micro-B receptacle is located along the bottom edge. A 3.5 mm audio jack is located along the bottom edge of the right arm.

In addition to a central processing unit (CPU) and GPU, HoloLens features a custom-made Microsoft Holographic Processing Unit (HPU), a coprocessor manufactured specifically for the HoloLens. The main purpose of the HPU is processing and integrating data from the sensors, as well as handling tasks such as spatial mapping, gesture recognition, and voice and speech recognition. The HPU processes terabytes of information from the HoloLens’s sensors from real-time data.

The lenses of the HoloLens use optical waveguides to color blue, green, and red across three different layers, each with diffractive features. A light engine above each combiner lens projects light into the lens, a wavelength which then hits a diffractive element and is reflected repeatedly along a waveguide until it is output to the eye. Similar to that of many other optical head-mounted displays, the display projection for the HoloLens occupies a limited portion of the user’s FOV, particularly in comparison to virtual reality head-mounted displays, which typically cover a much greater FOV.

The HoloLens contains an internal rechargeable battery, but can be operated while charging. HoloLens also features IEEE 802.11ac Wi-Fi and Bluetooth 4.1 Low Energy (LE) wireless connectivity.

With HoloLens a user can create and shape holograms with gestures, communicate with apps using voice commands, and navigate with a glance, hand gestures, Controllers and/or other pointing devices. HoloLens understands gestures, gaze, and voice, enabling the user to interact in the most natural way possible. With spatial sound, HoloLens synthesizes sound so the user can hear holograms from anywhere in the room, even if they are behind the user.

Additional details about the HoloLens are provided in U.S. Patent Application Ser. No. 62/029,351, filed Jul. 25, 2014, and entitled “Head Mounted Display Apparatus,” which is incorporated herein by reference.

As mentioned above, the HoloLens includes a depth camera, which is capable of detecting the 3D location of objects located within the depth camera’s FOV. Technical details of exactly how the depth camera accomplishes such detection are known to those skilled in the art, but are not necessary for the present disclosure. Suffice it to say that the depth camera is able to accurately detect, on a pixel-by-pixel basis, the exact 3D location of each point on a physical object within the camera’s FOV. While the HoloLens uses a depth camera, stereoscopic optics can also be used to detect the distance of objects from the HMD and the locations of such objects in 3D space via triangulation. In either event, such sensors can detect the 3D location (x, y and z coordinates) of real objects located within the FOV relative to the HMD. In the case of a Controller, the depth camera of the HMD can be used to detect the 3D location of the Controller relative to the HMD.

Referring to FIG. 3, in another embodiment, the augmented reality features of an HMD can be combined with commercially available videoconferencing services, such as Skype, to provide a unique communications experience (“HoloSkype”) that is particularly suited for training and other interactive activities between two or more user’s at different locations. Utilizing the forward facing depth cameras integrated into an HMD, a person located at a remote location can observe the environment of an HMD user and provide real time feedback, instructions and directions to the HMD user for performing tasks on objects that are located within the HMD user’s field of vision. For example, as schematically illustrated in FIG. 3, HMD 19 may communicate audio and video data via one or more network(s) 180 to another person operating a computing device at another, remote location (“remote device 300”). Remote device 300 can be a conventional desktop computer, laptop, tablet, smart phone, or any other computing device that is capable of receiving and rendering audio and/or video data via network(s) 180. In addition, by using a camera, microphone and other input devices incorporated in remote device 300, audio and/or video of the person located at the remote location can be transmitted from remote device 300 via network(s) 180 back to HMD 19 for rendering and display to the HMD user in substantially real time (which can be layered over a portion of the real world environment within the HMD user’s FOV). In addition, via the HoloSkype interface, the remote device 300 can transmit instructions to HMD 19 that cause HMD 19 to render holographic images, such as written instructions and/or drawings, within the HMD user’s FOV.

In order to stream video between HMD 19 and remote device 300, various data compression methods are commonly used to reduce data bandwidth and/or increase resolution. However, as previously discussed, conventional data compressions systems and methods tend to be less efficient in environments in which camera movement is common, as is frequently the case with HMDs.

Video is commonly compressed using P-frames (or B-frames). This reduces the amount of data that needs to be sent because only differences from a previous frame need to be sent. Areas of the image that are not moving do not require any additional data to be sent. Conventional P-frame or B-frame video compression works best if the camera is stationary. When the camera is moving, every pixel of the image can appear to change from one frame to the next. In order to handle motion, additional information is needed about how pixels are moving. For example, the camera is moving to the right, one way to process the image could be to shift everything right by a certain number of pixels and then compute the differences based on the shifted image. Of course, it is not that simple because pixels will move differently because objects are at different depths. A HoloLens user may rotate their head, move forward (zooming pixels outward), or sideways (moving pixels by a different parallax amount at different depths).

FIG. 4 illustrates an augmented reality configuration of a HMD device 200 worn by a user 400, displaying a virtual cursor, which is a holographic cursor 402 in this example, on an at least partially see-through stereoscopic display of HMD 200 so as to appear to at a location 404 in a three dimensional environment 406. In the specific example shown in FIG. 4, the three dimensional environment 406 is a room in the real world, and the holographic cursor 402 is displayed on the at least partially see-through stereoscopic display such that the holographic cursor 402 appears to the user 400, to be hovering in the middle of the room at the location 404.

Additionally, in the example illustrated in FIG. 4, the forward facing depth cameras of the HMD 200 worn by the user 400 captures image data of the three dimensional environment 406, which may include the physical layout of the room, actual tangible objects located in the room, such as objects 412 and 414, and their respective locations relative to one another and relative to HMD 200. As mentioned previously, HMD 200 creates a detailed 3D textured model of three dimensional environment 406, which is continuously updated over time. In addition, for each frame of video captured by the depth cameras, each pixel of image data is mapped to its corresponding location on the 3D model. This mapping of image pixels to the 3D textured model or mesh also makes it possible to perform homographic transformations to reproject an image so that it appears as if it were viewed by the user 400 from a different relative location. The graphics processing unit (GPU) can execute such mappings and transformations with high performance.

For example, FIG. 4 schematically illustrates user 400 and time T1 and at position P1. At that time, a first frame of image data (I1) and depth data (D1) is captured by the forward facing depth cameras of HMD 200. For purposes of this discussion, the image data associated with I1 may be referred to as simply I1, image data 1 or the previous frame image. As discussed elsewhere, HMD 200 can create a 3D textured model or mesh of the environment within the user’s FOV using the depth data D1. In addition, HMD 200 maps the image data I1 to the 3D model or mesh (D1). By virtue of the mapping of the image data I1 to the 3D model or mesh D1, it is possible to reproject, re-draw or re-render I1 from a different viewing location to create a reprojected previous frame image.

Such reprojection can be accomplished via known methods of image transformation or reprojection techniques of varying computational complexity. For example, image reprojection techniques may include texturing the mesh and rendering it from a new camera position, per pixel reprojection (e.g., where each pixel of a rendered image is reprojected based on an updated pose), multi-plane homography (e.g., where multiple rendered images associated with multiple planes within a 3D scene are used to generate the composite updated image), single plane homography (e.g., where a single rendered image associated with a single plane within a 3D scene is used to generate the updated image), affine homography, and pixel offset based adjustments. The 2D plane (or a set of one or more 2D planes) within a 3D scene may be determined based on which virtual objects the end user of an HMD has been focusing on within a particular period of time. In one example, eye tracking may be used to determine the most frequently viewed virtual objects within the particular period of time (e.g., within the previous 50 ms or 500 ms). In the case of a single plane, the single plane may be selected based on a depth of the most frequently viewed virtual object within the particular period of time (i.e., the single plane may be set based on the location of the most frequently viewed virtual object within the augmented reality environment). In the case of multiple planes, virtual objects within an augmented reality environment may be segmented into a plurality of groups based on proximity to the multiple planes; for example, a first virtual object may be mapped to a near plane if the near plane is the closest plane to the first virtual object and a second virtual object may be mapped to a far plane if the far plane is the closest plane to the second virtual object. A first rendered image may then be generated including the first virtual object based on the near plane and a second rendered image may be generated including the second virtual object based on the far plane. Also, different graphical adjustments may be performed on different portions of a pre-rendered image in order to incorporate higher frequency pose estimates. In one example, a first homographic transformation associated with a first pose of an HMD at a first point in time may be applied to a first portion of the pre-rendered image (e.g., a top portion of the pre-rendered image) and a second homographic transformation associated with a second pose of the HMD at a second point in time subsequent to the first point in time may be applied to a second portion of the pre-rendered image different from the first portion (e.g., a bottom portion of the pre-rendered image). In the case of a scanning display or a progressive scanning display, the first homographic transformation may be applied to pixels associated with a first set of scan lines and the second homographic transformation may be applied to pixels associated with a second set of scan lines different from the first set of scan lines. In one embodiment, the first homographic transformation may be applied to a single first scan line and the second homographic transformation may be applied to a single second scan line (i.e., homographic transformations may be applied on a per scan line basis). Another alternative could be to render 3D surface geometry into a Z-buffer with the same projection as the image camera. Then, reprojection could be applied as a screenspace operation. Each pixel of the image has color (from the camera) and depth (from the z-buffer). This is used to warp the image to a new perspective based on the 3D position of each pixel. In any event, any suitable method or means of performing such a transformation can be used with the systems and methods disclosed herein.

Similarly, FIG. 5 schematically illustrates user 400 at a subsequent point in time, T2, and at a different position, P2, relative to the 3D environment 406. For example, T1 and T2 may be associated with two successive image frames, I1 (the previous frame image) and I2 (the current frame image). If a conventional frame rate of 16 ms is selected, then T1 and T2 could be separated by 16 ms. At time T2, a second frame of image data (I2) and depth data (D2) is captured by the forward facing depth cameras of HMD 200. For purposes of this discussion, the image data associated with I2 may be referred to as simply I2, image data 2 or the current frame image. As discussed elsewhere, HMD 200 can create a 3D textured model of the environment within the user’s FOV using the depth data D2. In addition, HMD 200 maps the image data I2 to the 3D model (D2).

Comparing FIGS. 4 and 5 graphically illustrates one of the problems with using conventional video compression in situations where camera movement is common. In such cases, large portions of the background do not remain static (between the previous frame image and the current frame image) due to the change in the location and/or orientation of the camera. Therefore, conventional P-frame processing is less efficient and may not significantly reduce the size of the data set needed to render the next frame image.

Improved Video Compression Methods and Systems

HoloSkype currently sends the following in two independent streams: (1) surface data, which provides a 3D representation of the shape of the user’s environment (walls, floor, ceiling, furniture, etc); and (2) video data from the camera on the front of the HoloLens. As described below, the surface data can be combined with the video data, resulting in a lower bandwidth or higher-quality streaming video.

Before compression: (1) project previous frame image onto 3D geometry at previous frame position of camera; (2) detect a change in the pose (position and orientation) of the camera between the previous frame and the current frame; (3) construct 3D geometry using the current frame pose of the camera to produce a reprojected the image of the previous frame as if it were viewed from the location of the camera in the current frame; (5) pass the reprojected previous frame image to the compressor for computing differences between it and the current frame image. This will eliminate differences due to camera movement. Any differences now present are: contents of the scene that have actually moved. (e.g., a person waved their hand), or small differences that are revealed by the new position of the camera (e.g., the side of an object that was not visible in the previous frame, but is now in view from the pose of the camera of the current frame). These differences should be small relative to the entire frame and, therefore, the amount of data that needs to be transmitted to the receiving device can be significantly less than with prior, conventional video compression methods.

This process is schematically illustrated in FIG. 6. Referring first to the upper left-hand portion of FIG. 6, at time T1 and location P1, the forward facing cameras of HMD 200 capture image data I1 and depth data D1, all of which are associated with the previous frame image (I1). As discussed above, the HMD 200 constructs a model of the 3D environment located within the FOV of HMD 200. More specifically, the HMD 200 can construct a surface map of each visible surface of each physical object located within the FOV. For example, in the illustrated example, the textured map could include areas corresponding to physical object 412 (in this case a cube), which could consist of polygons corresponding to the three visible surfaces 412A, 412B, and 412C of object 412. Next the HMD 200 will map the pixels of image data I1 to the appropriate corresponding locations on the textured model (as schematically illustrated by the stippling). As discussed above, this mapping allows the previous frame image I1 to be reprojected, via homographic transformation techniques, to appear as it would when viewed from a different camera pose (location and/or orientation).

If a change in location and/or orientation (pose) of HMD 200 is detected between the previous frame and the current frame, then the HMD 200 uses he textured model and the new location (P2) of the HMD/camera to perform the necessary homographic transformation to reproject I1 to generate reprojected previous frame image I1-R. Then I1-R and I2 are passed to a compression engine to compare the differences between I1-R and I2 and to generate compressed image data.

The compressed image data can then transmitted to a second device along with the updated pose data, P2, for the HMD/camera. Using the previous frame image (which the second device already has) in combination with the updated pose data, P2, the second device perform a similar homographic transformation to generate a reprojection of the previous frame image, I1, to produce reprojected previous frame image I1-R. Then, the second device applies the compressed video data to the reprojected previous frame image I1-R to render the current frame image I2.

In rendering the re-projected previous frame image, the same 3D model should be used by both the first device and the second device. For example, the first device can send a 3D model to the second device, and the two devices agree on the 3D model that will be used to render the re-projected previous frame image. Thus, in one embodiment, the 3D model associated with the previous frame can be communicated between and used by both devices for generating the re-projected previous frame. Similarly, in another embodiment, the 3D model associated with the current frame can be communicated between and used by both devices for generating the re-projected previous frame. In still yet another embodiment, updates to the 3D model (i.e., only differences between a prior 3D model and a subsequent 3D model) could be communicated between and used by the first and second devices. In a situation in which the 3D environment does not change significantly, yet another possibility could be to scan the 3D model, send it at the beginning of each session, and then perform all re-projections relative to that model without sending changes.

The bandwidth savings achieved by using the systems and methods disclosed herein can be best where much of the contents of the 3D environment are static. In such cases, any errors introduced by changes in the 3D model itself will generally be relatively minor. When the 3D model is different between the previous frame and the current frame, this can result in errors (distortion) in the re-projected previous frame image regardless of whether the previous frame model or the current frame model is used. When the 3D environment itself is moving, it can be difficult to understand this 3D motion and use it to re-project in a way that accounts for this motion. However, with the systems and methods disclosed herein these errors are corrected when they are included in the compressed video data that captures differences between the re-projected previous frame and the actual current frame.

In another embodiment, the surface geometry can be continuously updated over time. As the user looks at different areas of the 3D environment, different surfaces may become visible. Over time, the textured model can be updated and refined to include these additional surfaces. All previous frames can be used to refine the textures applied to surfaces as the user looks around the room. If the sender and receiver build matching models, then frame differences can be computed relative to a render of these models based on the current position of the camera. In this way, whenever a new surface comes into view, the most recent view of that object can be used even if it was not visible in the previous frame image.

One embodiment of the foregoing video compression system and method is more particularly illustrated in FIG. 7. The system can include a computer system having one or more processors and one or more computer-readable media having stored thereon executable instructions that when executed by the one or more processors configure the computer system to perform the various steps of the compression process, as follows. At step 702, the system or method can capture successive frames of video from a camera incorporated into a first device, the successive frames of video comprising a previous frame image and a current frame image and including surface (depth) information. At step 704, the system or method can construct a first 3D representation of the shape of a user’s environment based on the surface (depth) information of the previous frame image and a first camera location and orientation (pose) corresponding to the previous frame image. At step 706, the system or method can project the previous frame image onto the first 3D representation. At step 708, the system or method can detect a change in the location and orientation (pose) of the camera between the previous frame image and the current frame image and generate current frame camera pose data representative of the location and orientation (pose) of the camera for the current frame image. At step 710, the system or method can construct a second 3D representation of the shape of the user’s environment based on a second camera location and orientation (pose) corresponding to the current frame image, the second 3D representation corresponding to the shape of the user’s environment as if it were viewed from the second location and orientation (pose) of the camera, and render the second 3D representation to generate a re-projected previous frame image as viewed from the second location and orientation (pose) of the camera. At step 712, the system or method can pass the re-projected previous frame image to a video compressor for computing differences between the re-projected previous frame image and the actual current frame image. And, at step 714, the system or method can generate compressed video data comprising only the differences between the re-projected previous frame image and the actual current frame image. The compressed video data can then be saved to a file for additional processing. In addition, the compressed video data can be transmitted to a second, receiving device for further processing as illustrated in FIG. 8.

The system and method illustrated in FIG. 7 can also transmit changes in the surface (depth) data based on the current frame camera pose data. In addition, the system and method illustrated in FIG. 7 can be configured to accumulate, for all previous image frames, a textured model of the 3D environment to refine the textures applied to surfaces as a user encounters the 3D environment from different camera positions over time.

In the context of a HoloSkpye or other videoconferencing session, the compressed video data derived in the manner described above can then be transmitted to a second, receiving device, together with the position of the device camera is sent via the position and rotation of the camera, which is a very small amount of data. At the receiving device, decompression performs the same steps before applying differences to the same reprojection of the previous frame, resulting in a matching image on the other side.

More specifically, referring to FIG. 8, the systems and methods can perform the following additional processing. As indicated at step 802, the system or method can communicate to a second device the compressed video data, the current frame camera pose data, and a 3D representation of the shape of the user’s environment. At step 804, the system or method can receive at the second device the compressed video data, the current frame camera pose data, and the 3D representation of the shape of the user’s environment. At step 806, the system or method can construct, by the second device, a 3D representation of the shape of the user’s environment based on the received current frame camera pose data and render the 3D representation to generate a re-projected previous frame image data. At step 808, the system or method can apply the received compressed video data to the re-projected previous frame image data to generate current frame image data. And, at step 810, the system or method can render the current image data on a display associated with the second device.

Further, the methods may be practiced by a computer system including one or more processors and computer-readable media such as computer memory. In particular, the computer memory may store computer-executable instructions that when executed by one or more processors cause various functions to be performed, such as the acts recited in the embodiments.

Embodiments of the present invention may comprise or utilize a special purpose or general-purpose computer including computer hardware, as discussed in greater detail below. Embodiments within the scope of the present invention also include physical and other computer-readable media for carrying or storing computer-executable instructions and/or data structures. Such computer-readable media can be any available media that can be accessed by a general purpose or special purpose computer system. Computer-readable media that store computer-executable instructions are physical storage media. Computer-readable media that carry computer-executable instructions are transmission media. Thus, by way of example, and not limitation, embodiments of the invention can comprise at least two distinctly different kinds of computer-readable media: physical computer-readable storage media and transmission computer-readable media.

Physical computer-readable storage media includes RAM, ROM, EEPROM, CD-ROM or other optical disk storage (such as CDs, DVDs, etc.), magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store desired program code means in the form of computer-executable instructions or data structures and which can be accessed by a general purpose or special purpose computer.

A “network” is defined as one or more data links that enable the transport of electronic data between computer systems and/or modules and/or other electronic devices. When information is transferred or provided over a network or another communications connection (either hardwired, wireless, or a combination of hardwired or wireless) to a computer, the computer properly views the connection as a transmission medium. Transmissions media can include a network and/or data links which can be used to carry or desired program code means in the form of computer-executable instructions or data structures and which can be accessed by a general purpose or special purpose computer. Combinations of the above are also included within the scope of computer-readable media.

Further, upon reaching various computer system components, program code means in the form of computer-executable instructions or data structures can be transferred automatically from transmission computer-readable media to physical computer-readable storage media (or vice versa). For example, computer-executable instructions or data structures received over a network or data link can be buffered in RAM within a network interface module (e.g., a “NIC”), and then eventually transferred to computer system RAM and/or to less volatile computer-readable physical storage media at a computer system. Thus, computer-readable physical storage media can be included in computer system components that also (or even primarily) utilize transmission media.

Computer-executable instructions comprise, for example, instructions and data which cause a general purpose computer, special purpose computer, or special purpose processing device to perform a certain function or group of functions. The computer-executable instructions may be, for example, binaries, intermediate format instructions such as assembly language, or even source code. Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the described features or acts described above. Rather, the described features and acts are disclosed as example forms of implementing the claims.

Those skilled in the art will appreciate that the invention may be practiced in network computing environments with many types of computer system configurations, including, personal computers, desktop computers, laptop computers, message processors, hand-held devices, multi-processor systems, microprocessor-based or programmable consumer electronics, network PCs, minicomputers, mainframe computers, mobile telephones, PDAs, pagers, routers, switches, and the like. The invention may also be practiced in distributed system environments where local and remote computer systems, which are linked (either by hardwired data links, wireless data links, or by a combination of hardwired and wireless data links) through a network, both perform tasks. In a distributed system environment, program modules may be located in both local and remote memory storage devices.

Alternatively, or in addition, the functionality described herein can be performed, at least in part, by one or more hardware logic components. For example, and without limitation, illustrative types of hardware logic components that can be used include Field-programmable Gate Arrays (FPGAs), Program-specific Integrated Circuits (ASICs), Program-specific Standard Products (ASSPs), System-on-a-chip systems (SOCs), Complex Programmable Logic Devices (CPLDs), etc.

The present invention may be embodied in other specific forms without departing from its spirit or characteristics. The described embodiments are to be considered in all respects only as illustrative and not restrictive. The scope of the invention is, therefore, indicated by the appended claims rather than by the foregoing description. All changes which come within the meaning and range of equivalency of the claims are to be embraced within their scope.

文章《Microsoft Patent | Reprojecting holographic video to enhance streaming bandwidth/quality》首发于Nweon Patent

]]>
Microsoft Patent | Wireless programmable media processing system https://patent.nweon.com/27495 Thu, 16 Mar 2023 05:03:43 +0000 https://patent.nweon.com/?p=27495 ...

文章《Microsoft Patent | Wireless programmable media processing system》首发于Nweon Patent

]]>
Patent: Wireless programmable media processing system

Patent PDF: 加入映维网会员获取

Publication Number: 20230077904

Publication Date: 2023-03-16

Assignee: Microsoft Technology Licensing

Abstract

Embodiments of the subject matter described herein relate to a wireless programmable media processing system. In the media processing system, a processing unit in a computing device generates a frame to be displayed based on a graphics content for an application running on the computing device. The frame to be displayed is then divided into a plurality of block groups which are compressed. The plurality of compressed block groups are sent to a graphics display device over a wireless link. In this manner, both the generation and the compression of the frame to be displayed may be completed at the same processing unit in the computing device, which avoids data copying and simplifies processing operations. Thereby, the data processing speed and efficiency is improved significantly.

Claims

1.A computing device, comprising: a first processing unit; and a memory coupled to the first processing unit and storing instructions which, when executed by the first processing unit, perform compression processing of graphics contents, including acts comprising: generating a frame to be displayed on a graphics display based on a graphics content for an application running on the computing device; dividing the frame to be displayed into a plurality of block groups; compressing the plurality of block groups to generate a plurality of compressed block groups; and in parallel with compressing the plurality of block groups, sending the plurality of compressed block groups to a graphics display device in the media processing system of the graphics display over a wireless link using a network interface card.

2.The computing device according to claim 1, wherein the first processing unit comprises a plurality of cores, and the acts further comprising: generating the frame to be displayed comprises generating the frame to be displayed based on the graphics content by using a first set of cores among the plurality of cores, and compressing the plurality of block groups comprises compressing the plurality of block groups by using a second set of cores among the plurality of cores, the second set of cores being different from the first set of cores.

3.The computing device according to claim 2, wherein the graphics content is a first graphics content and the frame is a first frame, and the acts further comprises: in parallel with compressing the plurality of block groups by using the second set of cores, generating a second frame to be displayed based on a second graphics content for the application by using the first set of cores, the second graphics content being different from the first graphics content, the second frame being different from the first frame.

4.The computing device according to claim 1, wherein the plurality of compressed block groups are sent via a direct link established between memory coupled to the first processing unit and the network interface card bypassing host memory.

5.The computing device according to claim 1, wherein the first processing unit executes a plurality of threads, and the acts further comprising: compressing the plurality of block groups comprises compressing a first block group among the plurality of block groups by using a first thread among the plurality of threads to generate a first compressed block group; and sending the plurality of compressed block groups comprises sending the first compressed block group to the graphics display device over the wireless link by using a second thread among the plurality of threads, the second thread being different from the first thread.

6.The computing device according to claim 5, wherein the acts further comprises: in parallel with sending the first compressed block group by using the second thread, compressing a second block group among the plurality of block groups by using the first thread, the second block group being different from the first block group.

7.The computing device according to claim 1, wherein the acts further comprises: establishing a set of parallel processing pipelines for compression of the plurality of block groups, wherein compressing the plurality of block groups comprises distributing block groups of the plurality of block groups among the set of parallel processing pipelines to compress the block groups in parallel.

8.At least one non-transitory machine-readable medium storing instructions that, when executed by a first processing unit, cause the first processing unit to perform operations to: generate a frame to be displayed on a graphics display based on a graphics content for an application running on a computing device; divide the frame to be displayed into a plurality of block groups; compress the plurality of block groups to generate a plurality of compressed block groups; and in parallel with compression of the plurality of block groups, send the plurality of compressed block groups to a graphics display device in a media processing system of the graphics display over a wireless link using a network interface card.

9.The at least one non-transitory machine-readable medium of claim 8, wherein the first processing unit comprises a plurality of cores and the instructions further comprising instructions that cause the first processing unit to perform operations to: generate the frame to be displayed comprises generating the frame to be displayed based on the graphics content by using a first set of cores among the plurality of cores, and compress the plurality of block groups comprises compressing the plurality of block groups by using a second set of cores among the plurality of cores, the second set of cores being different from the first se of cores.

10.The at least one non-transitory machine-readable medium of claim 9, wherein the graphics content is a first graphics content and the frame is a first frame, and the instructions further comprising instructions that cause the first processing unit to perform operations to: in parallel with compression of the plurality of block groups by using the second set of cores, generate a second frame to be displayed based on a second graphics content for the application by using the first set of cores, the second graphics content being different from the first graphics content, the second frame being different from the first frame.

11.The at least one non-transitory machine-readable medium of claim 8, w, therein the plurality of compressed block groups are sent via a direct link established between memory coupled to a first processing unit and the network interface card bypassing host memory.

12.The at least one non-transitory machine-readable medium of claim 8, wherein the first processing unit executes a plurality of threads, and the instructions further comprising instructions that cause the first processing unit to perform operations to: compress the plurality of block groups comprises compressing a first block group among the plurality of block groups by using a first thread among the plurality of threads to generate a first compressed block group; and send the plurality of compressed block groups comprises sending the first compressed block group to the graphics display device over the wireless link by using a second thread among the plurality of threads, the second thread being different from the first thread.

13.The at least one non-transitory machine-readable medium of claim 12, further comprising instructions that cause the first processing unit to perform operations to: in parallel with sending the first compressed block group by using the second thread, compress a second block group among the plurality of block groups by using the first thread, the second block group being different from the first block group.

14.The at least one non-transitory machine-readable medium of claim 8, further comprising instructions that cause the first processing unit to perform operations to: establish a set of parallel processing pipelines for compression of the plurality of block groups, wherein the instructions to compress the plurality of block groups comprises instructions to distribute block groups of the plurality of block groups among the set of parallel processing pipelines to compress the block groups in parallel.

15.A method; comprising: generating a frame to be displayed on a graphics display based on a graphics content for an application running on the method; dividing the frame to be displayed into a plurality of block groups; compressing the plurality of block groups by a first processing unit to generate a plurality of compressed block groups; and in parallel with compressing the plurality of block groups, sending the plurality of compressed block groups to a graphics display device in the media processing system of the graphics display over a wireless link using a network interface card.

16.The method of claim 15, wherein the first processing unit comprises a plurality of cores, and further comprising: generating the frame to be displayed comprises generating the frame to be displayed based on the graphics content by using a first set of cores among the plurality of cores, and compressing the plurality of block groups comprises compressing the plurality of block groups by using a second set of cores among the plurality of cores, the second set of cores being different from the first set of cores.

17.The method of claim 16, wherein the graphics content is a first graphics content and the frame is a first frame, and further comprising: in parallel with compressing the plurality of block groups by using the second set of cores, generating a second frame to be displayed based on a second graphics content for the application by using the first set of cores, the second graphics content being different from the first graphics content, the second frame being different from the first frame.

18.The method of claim 15, wherein the plurality of compressed block groups are sent via a direct link established between memory coupled to a first processing unit and the network interface card bypassing host memory.

19.The method of claim 15, wherein the first processing unit executes a plurality of threads, and further comprising: compressing the plurality of block groups comprises compressing a first block group among the plurality of block groups by using a first thread among the plurality of threads to generate a first compressed block group; and sending the plurality of compressed block groups comprises sending the first compressed block group to the graphics display device over the wireless link by using a second thread among the plurality of threads, the second thread being different from the first thread.

20.The method of claim 19, further comprising: in parallel with sending the first compressed block group by using the second thread, compressing a second block group among the plurality of block groups by using the first thread, the second block group being different from the first block group.

Description

CROSS-REFERENCE TO RELATED APPLICATION

This application is a continuation application of U.S. patent application Ser. No. 16/635,628, filed Jan. 31, 2020, which application is a U.S. National Stage Filing under 35 U.S.C. 371 of International Patent Application Serial No. PCT/US2018/040666, filed Jul. 3, 2018, and published as WO 2019/040187 A1 on Feb. 28, 2019, which claims priority to Chinese Application No. 201710744954.7 filed Aug. 25, 2017, which applications and publication are incorporated herein by reference in their entirety.

BACKGROUND

Virtual reality (VR) can simulate images, sounds and touches of the real world and create immersive virtual environments for users. In the context of the subject matter described herein, the VR may comprise augmented reality (AR). A VR system usually includes a computing device such as a personal computer (PC) and a graphics display device such as a head-mounted display (HMD). The graphics display device can provide high-quality VR experiences to a user by leveraging a computing device to render rich graphics contents at high frame rates and high visual quality.

Conventionally, the computing device and the graphics display device are typically connected via a cable. For example, the graphics display device may be connected to the computing device via a high-definition multimedia interface (HDMI) cable for receiving graphics contents from the computing device. The graphics display device may further send data such as sensor data to the computing device via a universal serial bus (USB) cable. However, those cables not only limit user mobility but also impose hazards to users, for example, might trip a user or wrap around the neck of the user.

SUMMARY

Unlike a conventional wireless media processing system that provides a wireless transmission interface between a computing device and a graphics display device, embodiments of the subject matter described herein provide a novel graphics processing flow to improve the processing efficiency and latency performance of a wireless media processing system.

According to the embodiments of the subject matter described herein, a frame to be displayed is generated at a processing unit in a computing device based on a graphics content for an application running on the computing device. The frame to be displayed is divided into a plurality of block groups which are compressed. Then, the plurality of compressed block groups are sent to a graphics display device over a wireless link. In this manner, rendering and compression associated with the graphics content is implemented at the same processing unit in the computing device, which greatly simplifies the processing flow at the computing device side and improves the efficiency.

It is to be understood that the Summary is not intended to identify key or essential features of implementations of the subject matter described herein, nor is it intended to be used to limit the scope of the subject matter described herein. Other features of the subject matter described herein will become easily comprehensible through the description below.

BRIEF DESCRIPTION OF THE DRAWINGS

Through the more detailed description in the accompanying drawings, the above and other features, advantages and aspects of the subject matter described herein will become more apparent. In the drawings, the same or similar reference numerals refer to the same or similar elements, where:

FIG. 1 shows an architecture of an example wireless programmable media processing system according to some embodiments of the subject matter described herein;

FIG. 2 shows an architecture of an example wireless programmable media processing system according to some other embodiments of the subject matter described herein;

FIG. 3 shows a flowchart of a method according to some embodiments of the subject matter described herein; and

FIG. 4 shows a flow chart of a method according to some other embodiments of the subject matter described herein.

DETAILED DESCRIPTION

Embodiments of the subject matter described herein will be described in more detail with reference to the accompanying drawings, in which some embodiments of the subject matter described herein have been illustrated. However, the subject matter described herein can be implemented in various manners, and thus should not be construed to be limited to the embodiments disclosed herein. On the contrary, those embodiments are provided for the thorough and complete understanding of the subject matter described herein, and completely conveying the scope of the subject matter described herein to those skilled in the art. It should be understood that the accompanying drawings and embodiments of the subject matter described herein are merely for the illustration purpose, rather than limiting the protection scope of the subject matter described herein.

As used herein, the term “media processing system” refers to any suitable system with a high-definition or ultra high-definition media transmission capability. Examples of the media processing system include, but are not limited to, a VR system and an AR system. For the purpose of discussion, some embodiments will be described by taking the VR system as an example of the media processing system.

As used herein, the term “computing device” refers to any suitable device with a computing capability. The computing device may support any suitable application such as a VR or AR application and may process graphics contents used for the application so as to display the graphics contents on a graphics display device. Examples of the computing device include, but are not limited to, a mainframe, a server, a personal computer (PC), a desktop computer, a laptop computer, a notebook computer, a tablet computer, a netbook, a, personal digital assistant (PDA), a mobile phone, or a smart phone. For the purpose of discussion, some embodiments will be described by taking the PC as an example of the computing device.

As used herein, the term “graphics display device” refers to any suitable device with a graphics display capability. The graphics display device may display graphics information that has been processed by the computing device, so as to provide VR experiences to users. Examples of the graphics display device include, but are not limited to, a desktop computer, a laptop computer, a notebook computer, a tablet computer, a netbook, PDA, a mobile phone, a smart phone, smart glasses, a smart watch, a personal communication system (PCS) device, an ebook device, a game device, or a head-mounted display (HMD). For the purpose of discussion, some embodiments will be described by taking the HMD is taken as an example of the graphics display device.

As used herein, the term “processing unit” may be any suitable physical or virtual processor that can perform various processing according to program code instructions. The processing unit may include one or more cores. In case that a plurality of cores are included, the plurality of cores may operate in parallel so that the processing efficiency of the processing unit is enhanced.

Examples of the processing unit include, but are not limited to, a central processing unit (CPU), a graphics processing unit (GPU), a field-programmable gate array (FPGA), an application-specific integrated circuit (ASIC), an application-specific standard product (ASSP), a system on chip (SoC), a complex programmable logic device (CPLD), and the like. For the purpose of discussion, some embodiments will be described by taking the GPU as an example of the processing unit.

As used herein, the term “include” and its variants used in embodiments of the subject matter described herein are to be read as open terms that mean “include, but is not limited to”. The term “based on” is to be read as “based at least in part on”. The terms “one embodiment” and “an implementation” are to be read as “at least one embodiment”. The term “another embodiment” is to be read as “at least one other embodiment”. Definitions of other terms will be presented in description below.

As described above, the computing device and the graphics display device in the VR system are conventionally connected via a cable, which not only limits user mobility but also might impose hazards to users. The use of wireless transmission instead of cable-based wired transmission has been explored to implement high-quality wireless VR systems.

For example, the proprietary WirelessHD standard has been proposed, which enables wireless high-definition video transmission on frequencies of 60 GHz above. On the basis of the WirelessHD standard, a wireless HDMI interface over the frequencies of 60 GHz above is implemented between the computing device and the graphics display device. 5, Further, there has been proposed to replace a USB cable between the computing device and the graphics display device by wireless fidelity (Wi-Fi). Thereby, the cable is removed from the VR system, and further the above problems resulting from the wired transmission can be avoided.

However, inventors have noticed that the HDMI interface can only enable graphics contents of 2160×1200 pixels at a frame rate of 90 Hz and cannot meet requirements of high-quality VR applications on the system performance. In addition, the graphics display device of the wireless VR system only has display functionality but is not programmable, and thus has limited extensibility and flexibility. Further, it is impossible to leverage various software programming-based techniques to improve the performance. To this end, in one aspect of embodiments of the subject matter described herein, the inventors have studied to discover and propose a wireless programmable media processing system. According to the media processing system proposed herein, in particular, a programmable device is added at the graphics display device side, so that the graphics display device is programmable. FIG. 1 shows an example wireless programmable media processing system 100 according to some embodiments of the subject matter described herein. In this example, the media processing system 100 is implemented as a VR system. However, it should be understood this is merely for the purpose of illustration, without suggesting any limitations on the scope of the subject matter described herein.

As shown, in the system 100, a programmable device 105 is arranged and coupled to a graphics display device 110 (HMD in this example) so as to provide programmability to the graphics display device 110. The programmable device 105 may be implemented in any suitable form. As an example, the programmable device 105 may include a portable and low-power system on chip (SoC) at the smart phone level. According to embodiments of the subject matter described herein, the programmable device 105 may include any suitable component(s), and an example in this regard will be described in the following paragraphs with reference to FIG. 2.

The coupling between the graphics display device 110 and the programmable device 105 may be implemented in any suitable manner. As an example, the programmable device 105 may connected with the graphics display device 110 via an HDMI cable 115 and a USB cable 120, so as to send frames related to an application (for example, the VR application) and to be displayed to the graphics display device 110 via the HDMI cable 115 and receive data such as sensor data from the graphics display device 110 via the USB cable 120. It should be understood that other coupling manners is also suitable.

In the system 100, data transmission is performed over a wireless link between the programmable device 105 and a computing device 125. For example, the programmable device 105 may receive graphics contents used for a specific application from the computing device 125 over a wireless link and send sensor data from the graphics display device 110 to the computing device 125. In this example, as shown in FIG. 1, the wireless link between the programmable device 110 and the computing device 125 enables Internet Protocol (IP)-based transmissions on the basis of the Wireless Gigabit (WiGig) Alliance standard. It should be understood that this is merely illustrative but not limited. Any wireless communication technology and communication protocol currently known or to be developed in the future are applicable. Examples of the communication technology include, but are not limited to, a wireless local area network (WLAN), Worldwide Interoperability for Microwave Access(WiMAX), Bluetooth, Zigbee technology, machine-type communication (MTC), D2D, or M2M, etc. Examples of the communication protocol include, but not limited to, the Transmission Control Protocol (TCP) or Internet Protocol (IP), the Hypertext Transfer Protocol (HTTP), the User Datagram Protocol (UDP), the Session Description Protocol (SDP), etc.

Since programmability is provided at the graphics display device side, the computing device and the graphics display device may work in collaboration on the basis of software programming, which improves extensibility and flexibility of the wireless media processing system. In addition, various software programming-based techniques such as compression algorithms, content prefetching, pose prediction and collaborative rendering may be used to increase the frame rate and resolution of the wireless media processing system, thereby improving the system performance and user experiences.

However, the system 100 might still face challenges from the transmission rate and processing latency. For example, future VR systems target at a very high frame rate (for example, 120 Hz) and resolution. As an example, high-end three-dimensional (3D) VR games impose very high requirements on network throughput and end-to-end system latency. Table 1 below shows required example data throughput in different display resolutions with a frame rate of 90 Hz.

TABLE 1 Display resolution (pixels) Raw data rate (Gbps) 2048 × 1080 (2K) 4.8 2160 × 1200 (HTC Vive) 5.6 3840 × 2160 (4K UHD) 17.9 7680 × 4320 (8K UHD) 71.7

In this example, it is assumed that the RGB data of each pixel is encoded using three bytes. Without compression, the raw data rate required by a 2160×1200 display resolution is 5.6. In the cases of 4 K ultra high-definition (UHD) and 8 K UHD, the required data rates are even as high as 17.9 Gbps and 71.7 Gbps, respectively.

As for the system latency, in the case of a frame rate of 90 Hz, the VR system has to be able to render, transmit, and display a high-resolution frame every Isms, to ensure a smooth user experience. For the future VR targeting at a frame rate of 120 Hz, the frame period is even reduced to be only 8.3 ms. Furthermore, the high-quality VR also requires a total end-to-end (namely, motion-to-photon) latency of 20-25 ms. That is, once the graphics display device moves, the VR system has to be able to display in 20 ms to 25 ms a new frame generated from a new pose of the graphics display device.

In the case that the wired transmission is employed, a frame to be displayed and generated at the computing device side may be directly sent to the graphics display device via a cable (e.g., HDMI cable). Unlike this, the wireless transmission requires some extra processing, which will be described below with reference to FIG. 2.

FIG. 2 shows an architecture of an example wireless programmable media processing system 200 according to some other embodiments of the subject matter described herein. In this example, a VR system is taken as an example of the media processing system for the purpose of discussion. As shown in FIG. 2, in the system 200, the computing device 125 comprises an application (for example, the VR, application) 205 running thereon, for providing corresponding services to the user, for example, displaying a graphics content 210. The computing device 125 further comprises a processing unit (referred to as “a first processing unit”) 215 for performing operations such as rendering of the graphics content of the application 205, In this example, as shown in FIG. 2, the processing unit 215 is implemented by a GPU. However, this is merely illustrative but not limited. The processing unit 215 may be implemented as any suitable form. For example, the first processing unit may further be implemented as an FPGA or ASIC.

In addition to the first processing unit 215, in some embodiments, the computing device 125 may further comprise one or more other suitable processing units. As an example, in the embodiment where the first processing unit 215 is implemented by a GPU or FPGA, the computing device 125 may further comprise a CPU. At this point, the GPU or FPGA is used for performing functions such as graphics rendering, and the CPU is used for performing a general processing function. A plurality of processing units may execute computer-executable instructions in parallel, so as to increase the parallel processing capability of the computing device 125.

As shown in FIG. 2, the computing device 125 further comprises a wireless network interface unit (referred to as “a first wireless network interface unit”) 220 for providing an interface for wireless communication with the programmable device 105. As an example, as shown in this figure, the first wireless network interface unit 220 is implemented by a network interface card (NIC). Other implementation forms of the first wireless network interface unit 220 are also possible.

In addition, the computing device 125 may further comprise any other suitable communication interface for enabling communication with other external devices via a communication medium. Other external devices include, but are not limited to, a further computing device server, such as a storage device, a display device, an input device such as a mouse, keyboard, touchscreen, trackball, voice input device and the like, an output device such as a display, loudspeaker, printer and the like, or any middleware (for example, a network card, modem, and the like) for enabling the computing device 125 to communicate with other external devices.

In the system 200, the computing device 125 further comprises a graphics stack and a network stack 230. The graphics stack 225 may be accessible to the first processing unit 215 to store a graphics content 210 to be processed. The network stack 230 and the first wireless network interface unit 220 cooperate with each other to store data to be transmitted over a wireless link. The graphics stack 225 and the network stack 230 each may be implemented by any suitable storage device such as computer-readable or machine-readable storage media. Such media may be any available media accessible to the computing device 125, including, but not limited to, volatile or nonvolatile media and removable or non-removable media. In addition to this, the computing device 125 may further comprise one or more other storage devices for storing other information and/or data accessible within the computing device 125.

As shown in FIG. 2, the computing device 125 further comprises a sending unit for performing, operations such as compression of a frame used for the application 205 to reduce the size of data to be transmitted over the wireless link. Detailed operations of the sending unit 235 will be described in the following paragraphs.

In addition to the components as shown, the computing device 125 may further comprise any other suitable component. For example, the computing device 125 may comprise a memory, which may be a volatile memory such as register, cache and random-access memory (RAM), non-volatile memory such as read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM) and flash memory, or some combination thereof. The memory may comprise one or more program modules, Which are configured to execute various functions implemented by the various embodiments of the subject matter described herein. In particular, the program modules may be accessible and run by the first processing unit 215 to perform the corresponding functions.

In this example, at the computing device 125 side, the application 205 traverses the boundary of a user mode and a kernel mode via VR software development kit (SDK) (not shown) and generates based on the associated graphics content 210 a frame to be displayed by using the first processor 215. The frame may be stored in a memory (not shown) of the first processing unit 215.

In the embodiment where the computing device 125 comprises a GPU as the first processing unit 215 and comprises a CPU for performing a general processing function, the sending unit 235 may traverse the boundary of the kernel mode and the user mode by using the system’s graphics application programming interface (API) and extracts the frame to be displayed from the memory of the first processing unit 215 to a memory (not shown) of the CPU. The sending unit 235 may compress the frame to reduce the data size. Subsequently, the sending unit 235 re-traverses the boundary of the user mode and the kernel mode and sends the compressed data to the first wireless network interface unit 220 via the network stack 230. The first wireless network interface unit 220 sends the compressed data over the wireless link.

In the system 200, the data sent by the computing device 125 over the wireless link may be received by the programmable device 105. As shown in FIG. 2, the programmable device 105 comprises a wireless network interface unit (referred to as “a second wireless network interface unit”) 240, a network stack 245 for storing a frame received by the second wireless network interface unit 240, a receiving unit 250 for decompressing the received frame, a processing unit (referred to as “a second processing unit”) 255, and a graphics stack 260 for storing graphics contents which is accessible to the second processing unit 255. Functions and implementations of these components in the programmable device 105 are similar to those of the corresponding components in the computing device 125 described above and thus will not be detailed here. Similar to the computing device 125, the programmable device 105 may comprise any other suitable component in addition to these components mentioned above.

In the programmable device 105, the second wireless network interface unit (for example, NIC) 240 receives data from the computing device 125 over a wireless link and stores the data in the network stack 245. The receiver 250 traverses the boundary of the user mode and the kernel mode, obtains from the network stack 245 the data received by the second wireless network interface unit 240 and decompresses the data. Next, the receiver delivers the decompressed frame to the second processing unit (for example, GPU) 255 across the above boundary again via the graphics stack 260. As shown in FIG. 2, the second processing unit 255 of the programmable device 105 is connected with the graphics display device 110 via a cable (for example, HMI′ cable) 260, so that the decompressed frame may be delivered to the graphics display device 110 for display on the graphics display device 110.

As described above, in the system 200, all data processing operations of the graphics display device 110 can be executed on the programmable device 105. This makes the graphics display device 110 similar to a thin client end system. In this way, the extensibility and flexibility of the graphics display device 110 is enhanced.

The above data processing procedure involves multiple times of data copying and thereby generates many extra data copies. For example, in the computing device 125, the frame to be displayed possibly needs to be copied from the memory of the first processing unit 215 to a host memory and then copied from the sending unit 235 to the first wireless network interface unit 220. In the programmable device 105, the data needs to be copied from the second wireless network interface unit 240 to the receiver 250 and then to the second processing unit 255. With the data compression and decompression, the amount of data to be transmitted over the wireless link can be reduced significantly, and thereby the efficiency of data transmission is improved.

However, inventors have noticed that if there is large amount of data, the data copying, compressing and decompressing operations may increase the processing burden on the system. In order to further simplify the processing operation to increase the processing efficiency and reduce the processing latency, in another aspect of the embodiments of the subject matter described herein, inventors have further proposed a high-efficiency data compression and decompression scheme. According to embodiments of the subject matter described herein, a first processing unit in a computing device performs such operations as generating, compressing and sending a frame to be displayed. Specifically, the first processing unit generates a frame to be displayed based on a graphics content for an application running on the computing device. Then, the first processing unit divides the frame to be displayed into a plurality of block groups and compresses these block groups. Next, the first processing unit sends the plurality of compressed block groups to a graphics display device over a wireless link. Thereby, these generating, compressing and sending operations may be performed by the same processing unit at the sending end, so that the frequent data copying shown in FIG. 2 may be reduced and the processing latency is further decreased.

Accordingly, at the receiving end, operations regarding receiving and decompressing the graphics content may also be performed by the same processing unit, in particular, by a second processing unit in a programmable device coupled to the graphics display device. Specifically, the second processing unit receives, from the computing device over a wireless link, a plurality of compressed block groups which are generated based on a graphics content used for a specific application running on the computing device. Subsequently, the second processing unit decompresses the plurality of received block groups and generates a frame to be displayed based on the plurality of decompressed block groups so as to display the frame on the graphics display device. In this way, the frequent data copying is also avoided at the receiving end, and the processing latency is further decreased.

With reference to FIGS. 3 and 4, basic principles and several example implementations of the subject matter described herein in this regard will be described below. Referring to FIG. 3 first, there is shown a flowchart of a method 300 implemented at a first processing unit of a computing device according to some embodiments of the subject matter described herein. The method 300 can be implemented by the first processing unit 215 in the computing device 125 shown in FIG. 2. For the purpose of discussion, the method 300 will be described below with reference to FIG. 2.

As shown in FIG. 3, at block 305, the first processing unit 215 generates a frame to be displayed based on the graphics content 210 for the application 205 running on the computing device 125. As an example, the frame to be displayed may be generated by rendering the graphics content 210. Any rendering approach currently known or to be developed in the future is applicable.

After obtaining the frame to be displayed, the first processing unit 215 divides the obtained frame into a plurality of block groups at block 310 and compresses these block groups at block 315. According to embodiments of the subject matter described herein, the compression may be implemented by using any suitable compression algorithm. In particular, in some embodiments, the first processing unit 215 may compress the plurality of block groups in parallel so as to further improve the processing efficiency and reduce the latency. An example in this regard will be presented in the following paragraphs.

In the case where the first processing unit 215 comprises a plurality of cores, operations of the generation and compression of the frame to be displayed may be performed using the plurality of cores in parallel. That is, in some embodiments, blocks 305, 310 and may be executed in parallel. For example, the first processing unit 215 may use one set of cores (referred to as “a first set of cores”) among the plurality of cores to generate the frame to be displayed based on the graphics content 210 (referred to as “a first graphics content”). Concurrently, the first processing unit 215 uses a different set of cores (referred to as “a second set of cores”) to compress the plurality of block groups obtained from the frame to be displayed. As such, while using the second set of cores for graphics compression, the first processing unit may simultaneously use the first set of cores to generate a frame to be displayed based on other graphics content (referred to as “a second graphics content”) used for the application, thereby significantly improving the system processing efficiency.

After compressing the plurality of block groups, at block 320, the first processing unit 215 sends the plurality of compressed block groups to the graphics display device 110 over a wireless link. Any wireless transmission scheme currently known or to be developed in the future is applicable here. The first processing unit 215 may enable the transmission of the plurality of block groups to the graphics display device in any suitable manner. In order to further improve the processing efficiency, in some embodiments, the first processing unit 215 may arrange parallel processing pipelines for compression and transmission. That is, blocks 315 and 320 may be executed concurrently. For example, the first processing unit 215 may execute a plurality of threads, use one thread (referred to as “a first thread) for compression and uses another different thread (referred to as “a second thread”) for transmission.

As an example, after dividing the frame to be displayed into the plurality of block groups, the first processing unit 215 may use the first thread to compress block groups one after another. After completing the compression of one block group (referred to as “a first block group”), the first processing unit 215 may immediately use a separate second thread to send the block group over the wireless link. At the same time, the first processing unit may continue to use the first thread to compress another block group (referred to as “a second block group”). Such parallel compression and transmission processing significantly shortens the processing time at the sending end.

Accordingly, at the receiving end, once a compressed block group is received, the group may be decompressed immediately, instead of waiting for all block groups to be received. This greatly shortens the total time of the end-to-end data processing. Detailed operations at the receiving end will be described in the following paragraphs with reference to FIG. 4.

In order to further shorten the data processing pipelines at the sending end so as to further reduce the latency, in some embodiments, the first processing unit 215 may be coupled to the first wireless network interface unit 220 in the computing device 125, to establish a direct data transmission path between them. The coupling may be implemented in any suitable manner. As an example, the first processing unit 215 and the first wireless network interface unit 220 may be connected to the same peripheral component interconnect express (PCIe) bus and then directly access a memory of each other via the PCIe protocol.

In this manner, the graphics data can be sent to the graphics display device from the computing device with no need to traverse the kernel/user mode boundary with multiple times of data copying. As an example, for the wireless system 200 shown in FIG. 2, after the first processing unit (for example, GPU) 215 divides the frame to be displayed into the plurality of block groups and compresses the respective block groups, the compressed block groups may be stored in the memory of the first processing unit 215, instead of being copied to the host memory. The first wireless network interface unit (for example, NIC) 220 may be allowed to directly access the memory of the first processing unit 215, so that the compressed block groups may be directly sent out over the wireless link bypass the host memory. In this manner, the length of the data path at the sending end is further shortened, thereby further improving the system performance and reducing the system latency.

Accordingly, a high-efficiency data compression method may be used to further reduce the processing latency at the receiving end. Detailed operations at the receiving side will be described below with reference to FIG. 4. FIG. 4 shows a flowchart of a method 400 implemented at a second processing unit in a programmable device coupled to the graphics display device according to some embodiments of the subject matter described herein. The method 400 can be implemented by the second processing unit 255 in the programmable device 105 shown in FIG. 2. For the purpose of discussion, the method 400 will be described with reference to FIG. 2.

As shown in FIG. 4, at block 405, the second processing unit 255 receives, from the computing device 125 over the wireless link, the plurality of compressed block groups which are generated based on the graphics content 210 for the application 205 running on the computing device 125. The second processing unit 255 may implement the receiving in any suitable manner. In order shorten the data transmission path at the receiving end and further simplify the operations at the receiving end, in some embodiments, the second processing unit 255 may be coupled to the second wireless network interface unit 240 in the programmable device 105 so as to establish a direct data transmission path. In this case, the second processing unit 255 may receive the plurality of compressed block groups from the computing device 125 via the second wireless network interface unit 240 over the wireless link. The implementations of the coupling between the second processing unit 255 and the second wireless network interface unit 220 is similar to that of the coupling between the first processing unit 215 and the second wireless network interface unit 240 and thereby will not be detailed here.

At block 410, the second processing unit 255 decompresses the received block groups. Any decompression approach currently known or to be developed in the future is applicable here. In particular, as described above, in the embodiments where the plurality of block groups are received and decompressed in parallel at the sending end, the second processing unit 255 may receive and decompress the plurality of block groups in parallel. That is, blocks 405 and 410 may be executed in parallel. For example, the second processing unit 255 may execute a plurality of threads, use one thread (referred to as “a third thread”) to receive one block group (referred to as “a third block group”), and use another different thread (referred to as “a fourth thread”) to decompress the received block group. Thereby, while decompressing the received block group, the second processing unit 255 may use the third thread to receive another block group (referred to as “a fourth block group”). In this manner, the processing time at the receiving end is further shortened.

After decompressing the plurality of block groups, at block 415, the second processing unit 255 generates, based on the plurality of decompressed block groups, a frame for display on the graphics display device 110. Approaches of forming a frame from block groups and displaying the frame is well known in the art and thus will not be detailed here. Thereby, at the receiving end, the decompression and recombination of the plurality of block groups may be performed at the same processing unit, for example, the second processing unit 255. This further simplifies the processing at the receiving side, improves the system efficiency and reduces the system latency. In particular, in the embodiments where there is a direct data transmission path between the second processing unit 255 and the second wireless network interface 240, the programmable device 105 no longer needs to traverse the kernel/user mode boundary with multiple times of data copying. For example, the second wireless network interface 240 may directly store the plurality of received block groups into the memory of the second processing unit 255 bypass the host memory. In this manner, the length of the data transmission path at the receiving side is further shortened, and the system performance is further improved.

The functions described herein may be at least partly executed by one or more hardware logic components. Illustrative types of usable hardware logical components include, for example, but are not limited to, field-programmable gate array (FPGA), application-specific integrated circuit (ASIC), application-specific standard product (ASSP), system on chip (SOC), complex programmable logic device (CPLD), and the like.

Program codes for carrying out the methods of the subject matter described herein may be written in any combination of one or more programming languages. These computer program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowcharts and/or block diagrams to be implemented. The program code may be executed entirely on a machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of the subject matter described herein, a machine readable medium may be any tangible medium that may contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine readable medium may be a machine readable signal medium or a machine readable storage medium. A machine readable medium may include but not limited to an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of the machine readable storage medium would include an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

Further, while operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. In certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are contained in the above discussions, these should not be construed as limitations on the scope of the subject matter described herein. Certain features that are described in the context of separate implementations may also be combined in a single implementation. Conversely, the various features that are described in the context of a single implementation may also be implemented in a plurality of implementations separately or in any suitable sub-combination.

Listed below are some example implementations of the subject matter described herein.

In one aspect, there is provided a method implemented at a first processing unit in a computing device for a media processing system. The method comprises: generating a frame to be displayed based on a graphics content for an application running on the computing device; dividing the frame to be displayed into a plurality of block groups; compressing the plurality of block groups; and sending the plurality of compressed block groups to a graphics display device in the media processing system over a wireless link.

In some implementations, the first processing unit comprises a plurality of cores, generating the frame to be displayed comprises generating the frame to be displayed based on the graphics content by using a first set of cores among the plurality of cores, and compressing the plurality of block groups comprises compressing the plurality of block groups by using a second set of cores among the plurality of cores, the second set of cores being different from the first set of cores.

In some implementations, the graphics content is a first graphics content, the frame to be displayed is a first frame to be displayed, and the method further comprises: in parallel with compressing the plurality of block groups by using the second set of cores, generating a second frame to be displayed based on a second graphics content for the application by using the first set of cores, the second graphics content being different from the first graphics content, the second frame being different from the first frame.

In some implementations, sending the plurality of compressed block groups to the graphics display device comprises: in parallel with compressing the plurality of block groups, sending the plurality of compressed block groups to the graphics display device over the wireless link.

In some implementations, the first processing unit executes a plurality of threads, compressing the plurality of block groups comprises compressing a first block group among the plurality of block groups by using a first thread among the plurality of threads, and sending the plurality of compressed block groups comprises sending the first compressed block group to the graphics display device over the wireless link by using a second thread among the plurality of threads the second thread being different from the first thread.

In some embodiment, the method further comprises: in parallel with sending the first compressed block group by using the second thread, compressing a second block group among the plurality of block groups by using the first thread, the second block group being different from the first block group.

In some implementations, the first processing unit is coupled to a first wireless network interface unit in the computing device, and sending the plurality of compressed block groups comprises sending the plurality of compressed block groups to the graphics display device via the first wireless network interface unit over the wireless link.

In some implementations, the method further comprises: in parallel with sending one block group by using the second thread, compressing another block group among the plurality of block groups by using the first thread.

In some implementations, the first processing unit comprises at least one of a graphics processing unit, a field-programmable gate array and an application-specific integrated circuit.

In one aspect, there is provided a method implemented at a second processing unit in a programmable device in a media processing system, the programmable device being coupled to a graphics display device. The method comprises: receiving a plurality of compressed block groups from a computing device over a wireless link, the plurality of compressed block groups being generated based on a graphics content for an application running on the computing device; decompressing the plurality of received block groups; and generating, based on the plurality of decompressed block groups, a frame for display on the graphics display device.

In some implementations, decompressing the plurality of block groups comprises: decompressing the plurality of block groups in parallel with receiving the plurality of block groups.

In some implementations, the second processing unit executes a plurality of threads; receiving the plurality of block groups comprises receiving a third block group among the a plurality of block groups from the computing device over the wireless link by using a third thread among the plurality of threads; decompressing the plurality of block groups comprises decompressing the third received block group by using a fourth thread among the plurality of threads, the fourth thread being different from the third thread.

In some implementations, the method further comprises: in parallel with decompressing the third received block group by using the fourth thread, receiving a fourth block group among the plurality of block groups from the computing device over the wireless link by using the third thread, the fourth block group being different from the third block group.

In some implementations, the method further comprises: while decompressing a received block group by using the fourth thread, receiving a further block group among the plurality of block groups from the computing device over the wireless link by using the third thread.

In some implementations, the second processing unit is coupled to a second wireless network interface unit in the programmable device, and receiving the plurality of block groups comprises receiving the plurality of block groups from the computing device via the second wireless network interface unit over the wireless link.

In some implementations, the second processing unit comprises at least one of a graphics processing unit, a field-programmable gate array and an application-specific integrated circuit.

In one aspect, there is provided a computing device for a media processing system. The computing device comprises: a processing unit; and a memory coupled to the processing unit and storing instructions which, when executed by the processing unit, perform compression processing of graphics contents, including acts comprising: generating a frame to be displayed based on a graphics content for an application running on the computing device; dividing the frame to be displayed into a plurality of block groups; compressing the plurality of block groups; and sending the plurality of compressed block groups to a graphics display device in the media processing system over a wireless link.

In some implementations, the first processing unit comprises a plurality of cores, generating the frame to be displayed comprises generating the frame to be displayed based on the graphics content by using a first set of cores among the plurality of cores; compressing the plurality of block groups comprises compressing the plurality of block groups by using a second set of cores among the plurality of cores, the second set of cores being different from the first set of cores.

In some implementations, the graphics content is a first graphics content, the frame to be displayed is a first frame to be displayed, and the acts further comprise: in parallel with compress the plurality of block groups by using the second set of cores, generating a second frame to be displayed based on a second graphics content for the application by using the first set of cores, the second graphics content being different from the first graphics content, the second frame being different from the first frame. In some implementations, sending the plurality of compressed block groups to the graphics display device comprises: in parallel with compressing the plurality of block groups, sending the plurality of compressed block groups to the graphics display device over the wireless link.

In some implementations, the first processing unit executes a plurality of threads, compressing the plurality of block groups comprises compressing a first block group among the plurality of block groups by using a first thread among the plurality of threads; sending the plurality of compressed block groups comprises sending the first compressed block group to the graphics display device over the wireless link by using a second thread among the plurality of threads, the second thread being different from the first thread. In some embodiment, the acts further comprise: in parallel with sending the compressed first block group by using the second thread, compressing a second block group among the plurality of block groups by using the first thread, the second block group being different from the first block group.

In some implementations, the first processing unit is coupled to a first wireless network interface unit in the computing device, and sending the plurality of compressed block groups comprises sending the plurality of compressed block groups to the graphics display device via the first wireless network interface unit over the wireless link. In some implementations, the first processing unit comprises at least one of a graphics processing unit, a field-programmable gate array and an application-specific integrated circuit.

In one aspect, there is provided a programmable device in a media processing system. The programmable device comprises: a processing unit; and a memory coupled to the processing unit and storing instructions which, when executed by the processing unit, perform decompression processing of graphics contents, including acts comprising: receiving a plurality of compressed block groups from a computing device over a wireless link, the plurality of compressed block groups being generated based on a graphics content for an application running on the computing device; decompressing the plurality of received block groups; and generating, based on the plurality of decompressed block groups, a frame for display on a graphics display device.

In some implementations, decompressing the plurality of block groups comprises: decompressing the plurality of block groups in parallel with receiving the plurality of block groups.

In some implementations, the second processing unit executes a plurality of threads; receiving the plurality of block groups comprises receiving a third block group among the plurality of block groups from the computing device over the wireless link by using a third thread among the plurality of threads; decompressing the plurality of block groups comprises decompressing the third received block group by using a fourth thread among the plurality of threads, the fourth thread being different from the third thread.

In some implementations, the acts further comprise: in parallel with decompressing the third received block group by using the fourth thread, receiving a fourth block group among the plurality of block groups from the computing device over the wireless link by using the third thread, the fourth block group being different from the third block group.

In some implementations, the second processing unit is coupled to a second wireless network interface unit in the programmable device, and receiving the plurality of block groups comprises receiving the plurality of block groups from the computing device via the second wireless network interface unit over the wireless link.

In some implementations, the second processing unit comprises at least one of a graphics processing unit, a field-programmable gate array and an application-specific integrated circuit.

In one aspect, there is provided a machine readable storage medium storing machine executable instructions therein, the machine executable instructions, when running on a device, causing the device to perform compression processing of graphics contents, including acts comprising: generating a frame to be displayed based on a graphics content for an application running on the computing device; dividing the frame to be displayed into a plurality of block groups; compressing the plurality of block groups; and sending the plurality of compressed block groups to a graphics display device in the media processing system over a wireless link.

In some implementations, the first processing unit comprises a plurality of cores; generating the frame to be displayed comprises generating the frame to be displayed based on the graphics content by using a first set of cores among the plurality of cores; compressing the plurality of block groups comprises compressing the plurality of block groups by using a second set of cores among the plurality of cores, the second set of cores being different from the first set of cores.

In some implementations, the graphics content is a first graphics content, the frame to be displayed is a first frame to be displayed, and the acts further comprise: in parallel with compressing the plurality of block groups by using the second set of cores, generating a second frame to be displayed based on a second graphics content for the application by using the first set of cores, the second graphics content being different from the first graphics content, the second frame being different from the first frame.

In some implementations, sending the plurality of compressed block groups to the graphics display device comprises: in parallel with compressing the plurality of block groups, sending the plurality of compressed block groups to the graphics display device over the wireless link.

In some implementations, the first processing unit executes a plurality of threads, compressing the plurality of block groups comprises compressing a first block group among the plurality of block groups by using a first thread among the plurality of threads; sending the plurality of compressed block groups comprises sending the first compressed block group to the graphics display device over the wireless link by using a second thread among the plurality of threads, the second thread being different from the first thread.

In some embodiment, the acts further comprise: in parallel with sending the first compressed block group by using the second thread, compressing a second block group among the plurality of block groups by using the first thread, the second block group being different from the first block group.

In some implementations, the first processing unit is coupled to a first wireless network interface unit in the computing device, and sending the plurality of compressed block groups comprises sending the plurality of compressed block groups to the graphics display device via the first wireless network interface unit over the wireless link.

In some implementations, the first processing unit comprises at least one of a graphics processing unit, a field-programmable gate array and an application-specific integrated circuit.

In one aspect, there is provided a machine readable storage medium storing machine executable instructions therein, the machine executable instructions, when running on a device, causing the device to perform decompression processing of graphics contents, including acts comprising: receiving a plurality of compressed block groups from a computing device over a wireless link, the plurality of compressed block groups being generated based on a graphics content for an application running on the computing device; decompressing the plurality of received block groups; and generating, based on the plurality of decompressed block groups, a frame for display on a graphics display device.

In some implementations, decompressing the plurality of block groups comprises: decompressing the plurality of block groups in parallel with receiving the plurality of block is groups.

In some implementations, the second processing unit executes a plurality of threads, receiving the plurality of block groups comprises receiving a third block group among the plurality of block groups from the computing device over the wireless link by using a third thread among the plurality of threads, and decompressing the plurality of block groups comprises decompressing the third received block group by using a fourth thread among the plurality of threads, the fourth thread being different from the third thread.

In some implementations, the acts further comprise: in parallel with decompressing the third received block group by using the fourth thread, receiving a fourth block group among the plurality of block groups from the computing device over the wireless link by using the third thread, the fourth block group being different from the third block group.

In some implementations, the second processing unit is coupled to a second wireless network interface unit in the programmable device, and receiving the plurality of block groups comprises receiving the plurality of block groups from the computing device via the second wireless network interface unit over the wireless link.

In some implementations, the second processing unit comprises at least one of a graphics processing unit, a field-programmable gate array and an application-specific integrated circuit.

Although the subject matter described herein has been described in a language specific to structural features and/or methodologic acts, it should be appreciated that the subject matter as defined in the appended claims is not limited to the specific features or acts described above. On the contrary, the specific features and acts described above are merely example forms for implementing the claims.

文章《Microsoft Patent | Wireless programmable media processing system》首发于Nweon Patent

]]>
Microsoft Patent | Spatially consistent representation of hand motion https://patent.nweon.com/27477 Thu, 16 Mar 2023 04:07:50 +0000 https://patent.nweon.com/?p=27477 ...

文章《Microsoft Patent | Spatially consistent representation of hand motion》首发于Nweon Patent

]]>
Patent: Spatially consistent representation of hand motion

Patent PDF: 加入映维网会员获取

Publication Number: 20230079335

Publication Date: 2023-03-16

Assignee: Microsoft Technology Licensing

Abstract

Examples are disclosed that relate to representing recorded hand motion. One example provides a computing device comprising a logic subsystem and a storage subsystem comprising instructions executable by the logic subsystem to receive a recorded representation of hand motion determined relative to a virtual model aligned to a first instance of an object, receive image data corresponding to an environment, and recognize a second instance of the object in the environment. The instructions are further executable to align the virtual model to the second instance of the object, and output a parametric representation of hand motion for display relative to the virtual model as aligned to the second instance of the object, such that the parametric representation is spatially consistent with the recorded representation of hand motion relative to the virtual model as aligned to the first instance of the object.

Claims

1.A computing device, comprising: a logic subsystem; and a storage subsystem comprising instructions executable by the logic subsystem to: receive a recorded representation of hand motion with respect to a first object; receive image data corresponding to an environment; recognize a second object in the environment; and based on the recorded representation of hand motion, output a parametric representation of hand motion for display with respect to the second object, such that the parametric representation of hand motion is spatially consistent with the recorded representation of hand motion relative to the first object, wherein the parametric representation includes a multi-dimensional vector, wherein each dimension of the multi-dimensional vector encodes an articulation of at least one of a plurality of hand joints.

2.The computing device of claim 1, wherein the parametric representation of hand motion is output for display on a head-mounted display device.

3.The computing device of claim 1, further comprising instructions executable to use the multi-dimensional vector to reproduce an overall pose of a hand.

4.The computing device of claim 1, further comprising instructions executable to: use the image data to obtain a virtual model corresponding to the second object; and use the virtual model to confirm a presence of the second object in the environment.

5.The computing device of claim 1, further comprising instructions executable to convert a non-parametric representation of hand motion to the parametric representation of hand motion.

6.The computing device of claim 5, wherein the non-parametric representation of hand motion comprises a geometric representation of hand motion.

7.A computing device, comprising: a logic subsystem; and a storage subsystem comprising instructions executable by the logic subsystem to: receive image data corresponding to an environment; recognize a first object in the environment; receive a recording of hand motion; based on the recording, determine a parametric representation of hand motion with respect to the first object; and configure the parametric representation of hand motion for display with respect to a second object, such that the parametric representation of hand motion is spatially consistent with the parametric representation of hand motion relative to first object, wherein the parametric representation includes a multi-dimensional vector, wherein each dimension of the multi-dimensional vector encodes an articulation of at least one of a plurality of hand joints.

8.The computing device of claim 7, further comprising instructions executable to output the parametric representation of hand motion for display at another computing device.

9.The computing device of claim 7, further comprising instructions executable to use the multi-dimensional vector to reproduce an overall pose of a hand.

10.The computing device of claim 7, further comprising instructions executable to: use the image data to obtain a virtual model corresponding to the first object; and use the virtual model to confirm a presence of the first object in the environment.

11.The computing device of claim 7, further comprising instructions executable to convert a non-parametric representation of hand motion to the parametric representation of hand motion.

12.The computing device of claim 11, wherein the non-parametric representation of hand motion comprises a geometric representation of hand motion.

13.At a robotic device, a method of controlling a robot manipulator, comprising: receiving a parametric representation of hand motion determined with respect to a first object; receiving image data corresponding to an environment; recognizing a second object in the environment; and based on the parametric representation of hand motion, determining a sequence of actions for performance, relative to the second object, by a manipulator of the robotic device, where the sequence of actions is spatially consistent with the parametric representation of hand motion relative to the first object, wherein the parametric representation includes a multi-dimensional vector, wherein each dimension of the multi-dimensional vector encodes an articulation of at least one of a plurality of hand joints.

14.The method of claim 13, further comprising, for each action of the sequence of actions, generating one or more corresponding commands configured to cause the manipulator to perform the action.

15.The method of claim 14, further comprising updating the one or more commands based on the image data to thereby align the manipulator to the second object.

16.The method of claim 13, wherein determining the sequence of actions comprises translating human hand motion into motion of the manipulator.

17.The method of claim 13, wherein a number of joints of the manipulator differs from a number of the hand joints encoded in the parametric representation, and wherein determining the sequence of actions comprises transforming one or more articulations encoded in the parametric representation to reproduce an overall pose of a hand.

18.The method of claim 13, further comprising using a predetermined vocabulary to translate the parametric representation into the sequence of actions.

19.The method of claim 13, further comprising inputting the parametric representation into a neural network or a support vector machine trained on a predetermined vocabulary of actions to thereby classify the parametric representation based upon the predetermined vocabulary of actions; and converting the classification of the parametric representation into the sequence of actions for the manipulator.

20.The method of claim 13, further comprising generating feedback based upon the image data to thereby update one or more control commands issued to the manipulator.

Description

CROSS REFERENCE TO RELATED APPLICATIONS

This application is a continuation of U.S. application Ser. No. 16/529,632, filed Aug. 1, 2019, which is a continuation-in-part of U.S. application Ser. No. 16/363,964, filed Mar. 25, 2019, the entirety of each of which is hereby incorporated herein by reference for all purposes.

BACKGROUND

In video tutorials, instructors may teach viewers how to perform a particular task by performing the task themselves. For a hands-on task, a video tutorial may demonstrate hand motion performed by an instructor. Viewers may thus learn the hands-on task by mimicking the hand motion and other actions shown in the video tutorial. In other scenarios, a robotic device may learn to perform a task by observing the performance of the task in video.

BRIEF DESCRIPTION OF THE DRAWINGS

FIGS. 1A-1C illustrate the recording of hand motion.

FIGS. 2A-2C illustrate playback of a representation of recorded hand motion.

FIG. 3 shows an example head-mounted display (HMD) device.

FIG. 4 shows a flowchart illustrating a method of recording hand motion.

FIG. 5 illustrates separately scanning an object instance.

FIG. 6 schematically shows an example system in which recorded data is transmitted to a computing device.

FIG. 7 shows example static and time-varying representations of an environment.

FIG. 8 shows an example image frame including a plurality of depth pixels.

FIG. 9 illustrates an object-centric coordinate system.

FIG. 10 shows an articulated object instance.

FIG. 11 illustrates switching object-centric coordinate systems.

FIG. 12 shows an example graphical user interface of an editor application.

FIGS. 13A-13B show a flowchart illustrating a method of processing recording data including recorded hand motion.

FIG. 14 schematically shows an example system in which playback data is transmitted to an HMD device.

FIG. 15 shows a flowchart illustrating a method of outputting a geometric representation of hand motion.

FIGS. 16A-16C illustrate an example process of determining a representation of hand motion.

FIG. 17 illustrates an example of displaying a parametric representation of the hand motion illustrated in FIGS. 16A-16C.

FIG. 18 illustrates an example in which a manipulator of a robotic device is controlled according to a parametric representation of hand motion.

FIG. 19 shows an example system for sharing representations of hand motion.

FIG. 20 shows a flowchart illustrating an example method of determining a parametric representation of hand motion.

FIG. 21 shows a flowchart illustrating an example method of outputting a parametric representation of hand motion.

FIG. 22 shows a block diagram of an example computing system.

DETAILED DESCRIPTION

In video tutorials, instructors may teach viewers how to perform a particular task by performing the task themselves. For hands-on tasks, a video tutorial may demonstrate hand motion performed by an instructor. Viewers may thus learn the hands-on task by mimicking the hand motion and other actions shown in the video tutorial.

Recording a video tutorial may prove cumbersome, however. For example, the presence of another person in addition to an instructor demonstrating a task may be required to record the demonstration. Where instructors instead record video tutorials themselves, an instructor may alternate between demonstrating a task and operating recording equipment. Frequent cuts and/or adjustments to the recorded scene may increase the difficulty and length of the recording process.

Video tutorials may pose drawbacks for viewers as well. Where a video tutorial demonstrates actions performed with respect to an object—as in repairing equipment, for example—viewers may continually alternate between watching the tutorial on a display (e.g., of a phone or tablet) and looking at the object and their hands to mimic those actions. Complex or fine hand motion may render its imitation even more difficult, causing viewers to frequently alternate their gaze and pause video playback. In some examples, viewers may be unable to accurately mimic hand motion due to its complexity and/or the angle from which it was recorded.

As such, alternative solutions for recording and demonstrating hand motion have been developed. In some alternatives, hand motion is represented by animating a virtual three-dimensional model of a hand using computer graphics rendering techniques. While this may enable hand motion to be perceived in ways a real hand recorded in video cannot, modeling the motion of human hands can be highly challenging and time-consuming, requiring significant effort and skill. Further, where a real hand represented by a virtual model holds a real object, the virtual model may be displayed without any representation of the object. Other approaches record hand motion via wearable input devices (e.g., a glove) that sense kinematic motion or include markers that are optically imaged to track motion. Such devices may be prohibitively expensive, difficult to operate, and/or unsuitable for some environments, however.

Accordingly, examples are disclosed that relate to representing hand motion in a manner that may streamline both its recording and viewing. As described below, a user may employ a head-mounted display (HMD) device to optically record hand motion simply by directing their attention toward their hands. As such, the user’s hands may remain free to perform hand motion without requiring external recording equipment, body suits/gloves, or the presence of another person. Via the HMD device or another device, the recorded hand motion may be separated from irrelevant parts of the background environment recorded by the HMD device. A graphical representation (e.g., virtual model) of the hand motion may then be programmatically created, without forming a manual representation using a three-dimensional graphics editor. The representation can be shared with viewers (e.g., via a see-through display of an augmented-reality device), enabling the hand motion—without the irrelevant background environment—to be perceived from different angles and positions in a viewer’s own environment.

In some scenarios, recorded hand motion may be performed relative to one or more objects. As examples, a user’s hands may rotate a screwdriver to unscrew a threaded object, open a panel, or otherwise manipulate an object. The disclosed examples provide for recognizing an object manipulated by the user and the pose of the user’s hands relative to the object as the hands undergo motion. At the viewer side, an instance of that object, or a related object, in the viewer’s environment may also be recognized. The user’s hand motion may be displayed relative to the viewer’s instance of the object, and with the changing pose that was recorded in the user’s environment as the hands underwent motion. Examples are also disclosed in which hand-object interaction is parameterized. In some examples in which hand motion is recorded as part of a tutorial in another educational/instructive context, the user may be referred to as an “instructor”, and the viewer a “student” (e.g., of the instructor).

Other spatial variables of recorded hand motion may be preserved between user and viewer sides. For example, one or more of the position, orientation, and scale of a user’s hand motion relative to an object may be recorded, such that the recorded hand motion can be displayed at the viewer’s side with the (e.g., substantially same) recorded position, orientation, and scale relative to a viewer’s instance of the object. The display of recorded hand motion and/or object instances with one or more spatial attributes consistent with those assumed by the hand motion/object instances when recorded may be referred to as “spatial consistency”. By displaying recorded hand motion in such a spatially consistent manner, the viewer may gain a clear and intuitive understanding of the hand motion and how it relates to the object, making the hand motion easier to mimic. Further, spatial consistency may help give the viewer the impression that the user is present in the viewer’s environment. This presence may be of particular benefit where hand motion is recorded as part of an instructive tutorial intended to teach the viewer a task.

As one example of how hand motion may be recorded in one location and later shared with viewers in other locations, FIGS. 1A-1C illustrate respective steps in the recording process of a home repair guide. In the depicted example, an HMD device 100 worn by an instructor 102 is used to record motion of the right hand 104 of the instructor, and to image various objects manipulated by the instructor as described below. Instructor 102 performs hand motion in demonstrating how to repair a dimming light switch 106 in an environment 108 occupied by instructor 102. The examples disclosed herein may utilize any suitable device to record hand motion, however, including but not limited to a video camera, a depth camera (e.g., including one or more time-of-flight or structured light depth sensors), and any suitable combination of such devices.

FIG. 1A represents a particular instance of time in the recording process at which instructor 102 is gesticulating toward light switch 106 with hand 104, and is narrating the current step in the repair process, as represented by speech bubble 110. HMD device 100 records video data capturing motion of hand 104. In some examples, HMD device 100 may record audio data capturing the speech uttered by instructor 102, and/or eye-tracking data that enables the determination of a gaze point 112 representing the location at which the instructor is looking. The video data may capture both motion of hand 104 and portions of instructor environment 108 that are irrelevant to the hand motion and repair of light switch 106. Accordingly, the video data may be processed to discard the irrelevant portions and create a representation of the hand motion that can be shared with viewers located in other environments. As described below, in some examples this representation may include a three-dimensional video representation of the hand motion.

FIG. 2A illustrates the playback of represented hand motion in a viewer environment 200 different from the instructor environment 108 in which the hand motion was recorded. FIG. 2A depicts an instant of time during playback that corresponds to the instant of time of the recording process depicted in FIG. 1A. Via a display 202 of an HMD device 204 worn by a viewer 206, a representation 208 of the motion of hand 104 recorded in instructor environment 108 is displayed relative to a light switch 210 in viewer environment 200. Representation 208 resembles hand 104 and is animated with the hand’s time-varying pose recorded by HMD device 100 (e.g., by configuring the representation with its own time-varying pose that substantially tracks the time-varying pose of the real hand). In this way, the hand motion recorded in instructor environment 108 may be played back in viewer environment 200 without displaying irrelevant portions of the instructor environment. As described below, representation 208 may also be played back with respect to relevant objects of interest—e.g., objects manipulated by hand motion—where the objects may be used to depict hand motion in the appropriate spatial context.

Representation 208 is displayed upon the determination by HMD device 204 that the object which the representation should be displayed in relation to—viewer light switch 210—corresponds to the object that the hand motion was recorded in relation to—instructor light switch 106. HMD device 204 may receive data indicating an identity, object type/class, or the like of instructor light switch 106 obtained from the recognition of the light switch by HMD device 100. HMD device 204 itself may recognize viewer light switch 210, and determine that the viewer light switch corresponds to instructor light switch 106.

Viewer light switch 210 is referred to as a “second instance” of a designated object (in this case, a light switch), and instructor light switch 106 is referred to as a “first instance” of the designated object. As described below, light switch 106 may be identified as a designated object based on user input from instructor 102, via hand tracking, through automatic detection as a relevant object of interest (e.g., based on a virtual model representing the object) and/or inferred during the recording of hand motion. As represented by the examples shown in FIGS. 1A and 2A, object instances may be the same model of an object. Object instances may exhibit any suitable correspondence, however—for example, object instances may be a similar but different model of object, or of the same object class. As such, hand motion recorded in relation to a first object instance may be represented in relation to a second object instance that differs in model, type, or in any other suitable attribute. As described in further detail below with reference to FIG. 6, any suitable object recognition/detection techniques may be used to detect an object instance as a designated object instance, to detect the correspondence of an object instance to another object instance, or to recognize, identify, and/or detect an object instance in general.

In addition to animating representation 208 in accordance with the time-varying pose of hand 104 recorded in instructor environment 108, the representation may be consistent with other attributes of the recorded hand motion. With respect to the time instances depicted in FIGS. 1A and 2A, the three-dimensional position (e.g., x/y/z), three-dimensional orientation (e.g., yaw/pitch/roll), and scale of representation 208 relative to light switch 210 are substantially equal to the three-dimensional position, three-dimensional orientation, and scale of hand 104 relative to light switch 106. Such spatial consistency may be maintained throughout playback of the recorded hand motion. As described in further detail below, spatial consistency may be achieved by associating recorded hand motion and its representation with respective object-centric coordinate systems specific to the objects they are recorded/displayed in relation to.

Even with such spatial consistency, viewer 206 may perceive a different portion of hand 104—via representation 208—than the portion of the hand recorded by HMD device 100. This arises from viewer 206 perceiving viewer light switch 210 from an angle that is significantly different than the angle from which instructor light switch 106 was recorded by HMD device 100. By altering the position, angle, and distance from which representation 208 is viewed, viewer 206 may observe different portions of the recorded hand motion.

Other aspects of the demonstration recorded in instructor environment 108 may be represented in viewer environment 200. As examples, FIG. 2A illustrates the playback at HMD device 204 of the narration spoken by instructor 102, and the display of gaze point 112 at a position relative to light switch 210 that is consistent with its position determined relative to light switch 106. The playback of instructor narration and gaze point may provide additional information that helps viewer 114 understand how to perform the task at hand. FIG. 2A also shows the output, via display 202, of controls 212 operable to control the playback of recorded hand motion. For example, controls 212 may be operable to pause, fast forward, and rewind playback of recorded hand motion, and to move among different sections in which the recording is divided.

Objects manipulated through hand motion recorded in instructor environment 108 may be represented and displayed in locations other than the instructor environment. Referring again to the recording process carried out by instructor 102, FIG. 1B depicts an instance of time at which the instructor handles a screwdriver 128 in the course of removing screws 130 from a panel 132 of light switch 106. HMD device 100 may collect image data capturing screwdriver 128, where such data is used to form a representation of the screwdriver for display at another location. As described in further detail below, data enabling the representation of screwdriver 128—and other objects manipulated recorded hand motion—may be collected as part of the hand motion recording process, or in a separate step in which manipulated objects are separately scanned.

Referring to viewer environment 200, FIG. 2B shows the output, via display 202, of hand representation 208 holding a screwdriver representation 218. FIG. 2B depicts an instant of time during playback that corresponds to the instant of time of the recording process depicted in FIG. 1B. As with representation 208 alone, the collective representation of hand 104 holding screwdriver 128 is displayed relative to viewer light switch 210 in a manner that is spatially consistent with the real hand and screwdriver relative to instructor light switch 106. As described below, representation 208 of hand 104 may be associated with an object-centric coordinate system determined for screwdriver 128 for the duration that the hand manipulates the screwdriver. Further, representation 218 of screwdriver 128 may be displayed for the duration that the screwdriver is manipulated or otherwise undergoes motion. Once screwdriver 128 remains substantially stationary for a threshold duration, the display of representation 218 may cease. Any other suitable conditions may control the display of hand/object representations and other virtual imagery on display 202, however, including user input from instructor 102.

In some examples, a removable part of a designated object may be manipulated by recorded hand motion and represented in another location. Referring again to the recording process carried out by instructor 102, FIG. 1C depicts an instance of time at which the instructor handles panel 132 after having removed the panel from light switch 106. HMD device 100 may collect image data capturing panel 132, where such data is used to form a representation of the panel for display at another location.

Referring to viewer environment 200, FIG. 2C shows the output, via display 202, of hand representation 208 holding a representation 220 of panel 132. FIG. 2C depicts an instant of time during playback that corresponds to the instant of time of the recording process depicted in FIG. 1C. The collective representation of hand 104 holding screwdriver 128 is displayed relative to viewer light switch 210 in a manner that is spatially consistent with the real hand holding the panel relative to instructor light switch 106.

FIGS. 1A-2C illustrate how hand motion recorded relative to one object instance in an environment may be displayed in a spatially consistent manner relative to a corresponding object instance in a different environment. The disclosed examples are applicable to any suitable context, however. As further examples, recorded hand motion may be shared to teach users how to repair home appliances, perform home renovations, diagnose and repair vehicle issues, and play musical instruments. In professional settings, recorded hand motion may be played back to on-board new employees, to train doctors on medical procedures, and to train nurses to care for patients. Other contexts are possible in which recorded hand motion is shared for purposes other than learning and instruction, such as interactive (e.g., gaming) and non-interactive entertainment contexts and artistic demonstrations. Further, examples are possible in which spatially consistent hand motion is carried between object instances in a common environment. For example, a viewer in a given environment may observe hand motion previously-recorded in that environment, where the recorded hand motion may be overlaid on a same or different object instance as the object instance that the hand motion was recorded in relation to.

FIG. 3 shows an example HMD device 300. As described in further detail below, HMD device 300 may be used to implement one or more phases of a pipeline in which hand motion recorded in one context is displayed in another context. Generally, these phases include (1) recording data capturing hand motion in one context (as illustrated in FIGS. 1A-1C), (2) processing the data to create a sharable representation of the hand motion, and (3) displaying the representation in another context (as illustrated in FIGS. 2A-2C). Aspects of HMD device 300 may be implemented in HMD device 100 and/or HMD device 204, for example.

HMD device 300 includes a near-eye display 302 configured to present any suitable type of visual experience. In some example, display 302 is substantially opaque, presenting virtual imagery as part of a virtual-reality experience in which a wearer of HMD device 300 is completely immersed in the virtual-reality experience. In other implementations, display 302 is at least partially transparent, allowing a user to view presented virtual imagery along with a real-world background viewable through the display to form an augmented-reality experience, such as a mixed-reality experience. In some examples, the opacity of display 302 is adjustable (e.g. via a dimming filter), enabling the display to function both as a substantially opaque display for virtual-reality experiences and as a see-through display for augmented reality experiences.

In augmented-reality implementations, display 302 may present augmented-reality objects that appear display-locked and/or world-locked. A display-locked augmented-reality object may appear to move along with a perspective of the user as a pose (e.g., six degrees of freedom (DOF): x/y/z/yaw/pitch/roll) of HMD device 300 changes. As such, a display-locked, augmented-reality object may appear to occupy the same portion of display 302 and may appear to be at the same distance from the user, even as the user moves in the surrounding physical space. A world-locked, augmented-reality object may appear to remain in a fixed location in the physical space, even as the pose of HMD device 300 changes. In some examples, a world-locked object may appear to move in correspondence with movement of a real, physical object. In yet other examples, a virtual object may be displayed as body-locked, in which the object is located to an estimated pose of a user’s head or other body part.

HMD device 300 may take any other suitable form in which a transparent, semi-transparent, and/or non-transparent display is supported in front of a viewer’s eye(s). Further, examples described herein are applicable to other types of display devices, including other wearable display devices and non-wearable display devices such as a television, monitor, and mobile device display. In some examples, a display device including a non-transparent display may be used to present virtual imagery. Such a display device may overlay virtual imagery (e.g., representations of hand motion and/or objects) on a real-world background presented on the display device as sensed by an imaging system.

Any suitable mechanism may be used to display images via display 302. For example, display 302 may include image-producing elements located within lenses 306. As another example, display 302 may include a liquid crystal on silicon (LCOS) device or organic light-emitting diode (OLED) microdisplay located within a frame 308. In this example, the lenses 306 may serve as, or otherwise include, a light guide for delivering light from the display device to the eyes of a wearer. In yet other examples, display 302 may include a scanning mirror system (e.g., a microelectromechanical display) configured to scan light from a light source in one or more directions to thereby form imagery. In some examples, eye display 302 may present left-eye and right-eye imagery via respective left-eye and right-eye displays.

HMD device 300 includes an on-board computer 304 operable to perform various operations related to receiving user input (e.g., voice input and gesture recognition, eye gaze detection), recording hand motion and the surrounding physical space, processing data obtained from recording hand motion and the physical space, presenting imagery (e.g., representations of hand motion and/or objects) on display 302, and/or other operations described herein. In some implementations, some to all of the computing functions described above may be performed off board. Example computer hardware is described in more detail below with reference to FIG. 16.

HMD device 300 may include various sensors and related systems to provide information to on-board computer 304. Such sensors may include, but are not limited to, one or more inward facing image sensors 310A and 310B, one or more outward facing image sensors 312A, 312B, and 312C of an imaging system 312, an inertial measurement unit (IMU) 314, and one or more microphones 316. The one or more inward facing image sensors 310A, 310B may acquire gaze tracking information from a wearer’s eyes (e.g., sensor 310A may acquire image data for one of the wearer’s eye and sensor 310B may acquire image data for the other of the wearer’s eye). One or more such sensors may be used to implement a sensor system of HMD device 300, for example.

Where gaze-tracking sensors are included, on-board computer 304 may determine gaze directions of each of a wearer’s eyes in any suitable manner based on the information received from the image sensors 310A, 310B. The one or more inward facing image sensors 310A, 310B, and on-board computer 304 may collectively represent a gaze detection machine configured to determine a wearer’s gaze target on display 302. In other implementations, a different type of gaze detector/sensor may be employed to measure one or more gaze parameters of the user’s eyes. Examples of gaze parameters measured by one or more gaze sensors that may be used by on-board computer 304 to determine an eye gaze sample may include an eye gaze direction, head orientation, eye gaze velocity, eye gaze acceleration, change in angle of eye gaze direction, and/or any other suitable tracking information. In some implementations, gaze tracking may be recorded independently for both eyes.

Imaging system 312 may collect image data (e.g., images, video) of a surrounding physical space in any suitable form. Image data collected by imaging system 312 may be used to measure physical attributes of the surrounding physical space. While the inclusion of three image sensors 312A-312C in imaging system 312 is shown, the imaging system may implement any suitable number of image sensors. As examples, imaging system 312 may include a pair of greyscale cameras (e.g., arranged in a stereo formation) configured to collect image data in a single color channel. Alternatively or additionally, imaging system 312 may include one or more color cameras configured to collect image data in one or more color channels (e.g., RGB) in the visible spectrum. Alternatively or additionally, imaging system 312 may include one or more depth cameras configured to collect depth data. In one example, the depth data may take the form of a two-dimensional depth map having a plurality of depth pixels that each indicate the depth from a corresponding depth camera (or other part of HMD device 300) to a corresponding surface in the surrounding physical space. A depth camera may assume any suitable form, such as that of a time-of-flight depth camera or a structured light depth camera. Alternatively or additionally, imaging system 312 may include one or more infrared cameras configured to collect image data in the infrared spectrum. In some examples, an infrared camera may be configured to function as a depth camera. In some examples, one or more cameras may be integrated in a common image sensor—for example, an image sensor may be configured to collect RGB color data and depth data.

Data from imaging system 312 may be used by on-board computer 304 to detect movements, such as gesture-based inputs or other movements performed by a wearer, person, or physical object in the surrounding physical space. In some examples, HMD device 300 may record hand motion performed by a wearer by recording image data via imaging system 312 capturing the hand motion. HMD device 300 may also image objects manipulated by hand motion via imaging system 312. Data from imaging system 312 may be used by on-board computer 304 to determine direction/location and orientation data (e.g., from imaging environmental features) that enables position/motion tracking of HMD device 300 in the real-world environment. In some implementations, data from imaging system 312 may be used by on-board computer 304 to construct still images and/or video images of the surrounding environment from the perspective of HMD device 300. In some examples, HMD device 300 may utilize image data collected by imaging system 312 to perform simultaneous localization and mapping (SLAM) of the surrounding physical space.

IMU 314 may be configured to provide position and/or orientation data of HMD device 300 to on-board computer 304. In one implementation, IMU 314 may be configured as a three-axis or three-degree of freedom (3DOF) position sensor system. This example position sensor system may, for example, include three gyroscopes to indicate or measure a change in orientation of HMD device 300 within three-dimensional space about three orthogonal axes (e.g., roll, pitch, and yaw).

In another example, IMU 314 may be configured as a six-axis or six-degree of freedom (6DOF) position sensor system. Such a configuration may include three accelerometers and three gyroscopes to indicate or measure a change in location of HMD device 300 along three orthogonal spatial axes (e.g., x/y/z) and a change in device orientation about three orthogonal rotation axes (e.g., yaw/pitch/roll). In some implementations, position and orientation data from imaging system 312 and IMU 314 may be used in conjunction to determine a position and orientation (or 6DOF pose) of HMD device 300. In yet other implementations, the pose of HMD device 300 may be computed via visual inertial SLAM.

HMD device 300 may also support other suitable positioning techniques, such as GPS or other global navigation systems. Further, while specific examples of position sensor systems have been described, it will be appreciated that any other suitable sensor systems may be used. For example, head pose and/or movement data may be determined based on sensor information from any combination of sensors mounted on the wearer and/or external to the wearer including, but not limited to, any number of gyroscopes, accelerometers, inertial measurement units, GPS devices, barometers, magnetometers, cameras (e.g., visible light cameras, infrared light cameras, time-of-flight depth cameras, structured light depth cameras, etc.), communication devices (e.g., WIFI antennas/interfaces), etc.

The one or more microphones 316 may be configured to collect audio data from the surrounding physical space. Data from the one or more microphones 316 may be used by on-board computer 304 to recognize voice commands provided by the wearer to control the HMD device 300. In some examples, HMD device 300 may record audio data via the one or more microphones 316 by capturing speech uttered by a wearer. The speech may be used to annotate a demonstration in which hand motion performed by the wearer is recorded.

While not shown in FIG. 3, on-board computer 304 may include a logic subsystem and a storage subsystem holding instructions executable by the logic subsystem to perform any suitable computing functions. For example, the storage subsystem may include instructions executable to implement one or more of the recording phase, editing phase, and display phase of the pipeline described above in which hand motion recorded in one context is displayed in another context. Example computing hardware is described below with reference to FIG. 16

FIG. 4 shows a flowchart illustrating a method 400 of recording hand motion. Method 400 may represent the first phase of the three-phase pipeline mentioned above in which hand motion recorded in one context is displayed in another context. Additional detail regarding the second and third phases is described below with reference to FIGS. 4 and 5. Further, reference to the examples depicted in FIGS. 1A-2C is made throughout the description of method 400. As such, method 400 may be at least partially implemented on HMD device 100. Method 400 also may be at least partially implemented on HMD device 204. However, examples are possible in which method 400 and the recording phase are implemented on a non-HMD device having a hardware configuration that supports the recording phase.

At 402, method 400 includes, at an HMD device, three-dimensionally scanning an environment including a first instance of a designated object. Here, the environment in which a demonstration including hand motion is to be performed is scanned. As examples, instructor environment 108 may be scanned using an imaging system integrated in HMD device 100, such as imaging system 312 of HMD device 300. The environment may be scanned by imaging the environment from different perspectives (e.g., via a wearer of the HMD device varying the perspective from which the environment is perceived by the HMD device), such that a geometric representation of the environment may be later constructed as described below. The geometric representation may assume any suitable form, such as that of a three-dimensional point cloud or mesh.

The environmental scan also includes scanning the first instance of the designated object, which occupies the environment. The first instance is an object instance that at least a portion of hand motion is performed in relation to. For example, the first instance may be instructor light switch 106 in instructor environment 108. As with the environment, the first instance may be scanned from different angles to enable a geometric representation of the first instance to be formed later.

At 404, method 400 optionally includes separately scanning one or more objects in the environment. In some examples, object(s) to be manipulated by later hand motion or otherwise involved in a demonstration to be recorded may be scanned in discrete step separate from the environmental scan conducted at 402. Separately scanning the object(s) may include, at 406, scanning the first instance of the designated object; at 408, scanning a removable part of the first instance (e.g., panel 132 of instructor light switch 106); and/or, at 410, scanning an object instance other than the first instance of the designated object (e.g., screwdriver 128).

FIG. 5 illustrates how a separate scanning step may be conducted by instructor 102 via HMD device 102 for screwdriver 128. At a first instance of time indicated at 500, screwdriver 128 is scanned from a first perspective. At a second instance of time indicated at 502, screwdriver 128 is scanned from a second perspective obtained by instructor 102 changing the orientation of the screwdriver through hand motion. By changing the orientation of an object instance through hand motion, sufficient image data corresponding to the object instance may be obtained to later construct a geometric representation of the object instance. This may enable a viewer to perceive the object instance from different angles, and thus see different portions of the object instance, via the geometric representation. Any suitable mechanism may be employed to scan an object instance from different perspectives, however. For scenarios in which separately scanning an object instance is impracticable (e.g., for a non-removable object instance fixed in a surrounding structure), the object instance instead may be scanned as part of scanning its surrounding environment. In other examples, a representation of an object instance in the form of a virtual model of the object instance may be created, instead of scanning the object instance. For example, the representation may include a three-dimensional representation formed in lieu of three-dimensionally scanning the object instance. Three-dimensional modeling software, or any other suitable mechanism, may be used to create the virtual model. The virtual model, and a representation of hand motion performed in relation to the virtual model, may be displayed in an environment other than that in which the hand motion is recorded.

Returning to FIG. 4, at 412, method 400 includes recording video data capturing motion of a hand relative to the first instance of the designated object. For example, HMD device 100 may record video data capturing motion of hand 104 of instructor 102 as the hand gesticulates relative to light switch 106 (as shown in FIG. 1A), handles screwdriver 128 (as shown in FIG. 1B), and handles panel 132 (as shown in FIG. 1C). The video data may assume any suitable form—for example, the video data may include a sequence of three-dimensional point clouds or meshes captured at 30 Hz or any other suitable rate. Alternatively or additionally, the video data may include RGB and/or RGB+D video, where D refers to depth map frames acquired via one or more depth cameras. As the field of view in which the video data is captured may include both relevant object instances and irrelevant portions of the background environment, the video data may be processed to discard the irrelevant portions as described below. In other examples, non-HMD devices may be used to record hand motion, however, including but not limited to a mobile device (e.g., smartphone), video camera, and webcam.

At 414, method 400 optionally includes recording user input from the wearer of the HMD device. User input may include audio 416, which in some examples may correspond to narration of the recorded demonstration by the wearer—e.g., the narration spoken by instructor 102. User input may include gaze 418, which as described above may be determined by a gaze-tracking system implemented in the HMD device. User input may include gesture input 420, which may include gaze gestures, hand gestures, or any other suitable form of gesture input. As described below, gesture input from the wearer of the HMD device may be used to identify the designated object that hand motion is recorded in relation to.

As mentioned above, a pipeline in which hand motion recorded in one context is displayed in another context may include a processing phase following the recording phase in which hand motion and related objects are captured. In the processing phase, data obtained in the recording phase may be processed to remove irrelevant portions corresponding to the background environment, among other purposes. In some examples, at least a portion of the processing phase may be implemented at a computing device different than an HMD device at which the recording phase is conducted.

FIG. 6 schematically shows an example system 600 in which recorded data 602 obtained by an HMD device 604 from recording hand motion and associated object(s) is transmitted to a computing device 606 configured to process the recorded data. HMD device 604 may be instructor HMD device 100 or HMD device 300, as examples. Computing device 606 may implement aspects of an example computing system described below with reference to FIG. 16. HMD device 604 and computing device 606 are communicatively coupled via a communication link 608. Communication link 608 may assume any suitable wired or wireless form, and may directly or indirectly couple HMD device 604 and computing device 606 through one or more intermediate computing and/or network devices. In other examples, however, at least a portion of recorded data 602 may be obtained by a non-HMD device, such as a mobile device (e.g., smartphone), video camera, and webcam.

Recorded data 602 may include scan data 610 including scan data capturing an environment (e.g., instructor environment 108) and an instance of a designated object (e.g., light switch 106) in the environment. Scan data 610 may assume any suitable form, such as that of three-dimensional point cloud or mesh data. Recorded data 602 may include video data 612 capturing motion of a hand (e.g., hand 104), including hand motion alone and/or hand motion performed in the course of manipulating an object instance. Video data 612 may include a sequence of three-dimensional point clouds or meshes, as examples.

Further, recorded data 602 may include audio data 614, for example audio data corresponding to narration performed by a wearer of HMD device 604. Recorded data 602 may include gaze data 616 representing a time-varying gaze point of the wearer of HMD device 604. Recorded data 602 may include gesture data 618 representing gestural input (e.g., hand gestures) performed by the wearer of HMD device 604. Further, recorded data 602 may include object data 620 corresponding to one or more object instances that are relevant to the hand motion captured in the recorded data. In some examples, object data 620 may include, for a given relevant object instance, an identity of the object, an identity of a class or type of the object, and/or output from a recognizer fed image data capturing the object instance. Generally, object data 620 may include data that, when received by another HMD device in a location different from that of HMD device 604, enables the other HMD device to determine that an object instance in the different location is an instance of the object represented by the object data. Finally, recorded data 602 may include pose data 621 indicating a sequence of poses of HMD device 604 and/or the wearer of the HMD device. Poses may be determined via data from an IMU and/or via SLAM as described above.

Computing device 606 includes various engines configured to process recorded data 602 received from HMD device 604. Specifically, computing device 606 may include a fusion engine 622 configured to fuse image data from different image sensors. In one example, video data 612 in recorded data 602 may include image data from one or more of greyscale, color, infrared, and depth cameras. Via fusion engine 622, computing device 606 may perform dense stereo matching of image data received from a first greyscale camera and of image data received from a second greyscale camera to obtain a depth map, based on the greyscale camera image data, for each frame in video data 612. Via fusion engine 622, computing device 606 may then fuse the greyscale depth maps with temporally corresponding depth maps obtained by a depth camera. As the greyscale depth maps and the depth maps obtained by the depth camera may have a different field of view and/or framerate, fusion engine 622 may be configured to fuse image data of such differing attributes.

Computing device 606 may include a representation engine 624 configured to determine static and/or time-varying representations of the environment captured in recorded data 602. Representation engine 624 may determine a time-varying representation of the environment based on fused image data obtained via fusion engine 622. In one example in which fused image frames are obtained by fusing a sequence of greyscale image frames and a sequence of depth frames, representation engine 624 may determine a sequence of three-dimensional point clouds based on the fused image frames. Then, color may be associated with each three-dimensional point cloud by projecting points in the point cloud into spatially corresponding pixels of a temporally corresponding image frame from a color camera. This sequence of color point clouds may form the time-varying representation of the environment, which also may be referred to as a four-dimensional reconstruction of the environment. In this example, the time-varying representation comprises a sequence of frames each consisting of a three-dimensional point cloud with per-point (e.g., RGB) color. The dynamic elements of the time-varying (e.g., three-dimensional) representation may include hand(s) undergoing motion and object instances manipulated in the course of such hand motion. Other examples are possible in which representation engine 624 receives or determines a non-scanned representation of an object instance—e.g., a virtual (e.g., three-dimensional) model of the object instance.

In some examples, representation engine 624 may determine a static representation of the environment in the form of a three-dimensional point cloud reconstruction of the environment. The static representation may be determined based on one or more of scan data 610, video data 612, and pose data 621, for example. In particular, representation engine 624 may determine the static representation via any suitable three-dimensional reconstruction algorithms, including but not limited to structure from motion and dense multi-view stereo reconstruction algorithms (e.g., based on image data from color and/or greyscale cameras, or based on a surface reconstruction of the environment based on depth data from a depth camera).

FIG. 7 shows an example static representation 700 of instructor environment 108 of FIGS. 1A-1C. In this example, static representation 700 includes a representation of the environment in the form of a three-dimensional point cloud or mesh, with different surfaces in the representation represented by different textures. FIG. 7 illustrates representation 700 from one angle, but as the representation is three-dimensional, the angle from which it is viewed may be varied. FIG. 7 also shows an example time-varying representation of the environment in the form of a sequence 702 of point cloud frames. Unlike static representation 700, the time-varying representation includes image data corresponding to hand motion performed in the environment.

In some examples, a static representation may be determined in a world coordinate system different than a world coordinate system in which a time-varying representation is determined. As a brief example, FIG. 7 shows a first world coordinate system 704 determined for static representation 700, and a second world coordinate system 706 determined for the time-varying representation. Accordingly, computing device 606 may include a coordinate engine 626 configured to align the differing world coordinate systems of static and time-varying representations and thereby determine an aligned world coordinate system. The coordinate system alignment process may be implemented in any suitable manner, such as via image feature matching and sparse 3D-3D point cloud registration algorithms. In other examples, dense alignment algorithms or iterated closest point (ICP) techniques may be employed.

As described above, the field of view in which video data 612 is captured may include relevant hand motion and object instances, and irrelevant portions of the background environment. Accordingly, computing device 606 may include a segmentation engine 628 configured to segment a relevant foreground portion of the video data, including relevant hand motion and object instances, from an irrelevant background portion of the video data, including irrelevant motion and a static background of the environment. In one example, segmentation engine 628 performs segmentation on a sequence of fused image frames obtained by fusing a sequence of greyscale image frames and a sequence of depth frames as described above. The sequence of fused image frames may be compared to the static representation of the environment produced by representation engine 624 to identify static and irrelevant portions of the fused image frames. For example, the static representation may be used to identify points in the fused image data that remain substantially motionless, where at least a subset of such points may be identified as irrelevant background points. Any suitable (e.g., three-dimensional video) segmentation algorithms may be used. For example, a segmentation algorithm may attempt to identify the subset of three-dimensional points that within a certain threshold are similar to corresponding points in the static representation, and discard these points from the fused image frames. Here, the segmentation process may be likened to solving a three-dimensional change detection task.

As a particular example regarding the segmentation of hand motion, FIG. 8 shows an example image frame 800 including a plurality of pixels 802 that each specify a depth value of that pixel. Image frame 800 captures hand 104 of instructor 102 (FIGS. 1A-1C), which, by virtue of being closer to the image sensor that captured the image frame, has corresponding pixels with substantially lesser depth than pixels that correspond to the background environment. For example, a hand pixel 804 has a depth value of 15, whereas a non-hand pixel 806 has a depth value of 85. In this way, a set of hand pixels correspond to hand 104 may be identified and segmented from non-hand pixels. As illustrated by the example shown in FIG. 8, segmentation engine 628 may perform hand segmentation based on depth values for each frame having depth data in a sequence of such frames.

Returning to FIG. 6, in some examples segmentation engine 628 may receive, for a sequence of frames, segmented hand pixels that image a hand in that frame. Segmentation engine 628 may further label such hand pixels, and determine a time-varying geometric representation of the hand as it undergoes motion throughout the frames based on the labeled hand pixels. In some examples, the time-varying geometric representation may also be determined based on a pose of HMD 604 determined for each frame. The time-varying geometric representation of the hand motion may take any suitable form—for example, the time-varying geometric representation may include a sequence of geometric representations for each frame, with each representation including a three-dimensional point cloud encoding the pose of the hand in that frame. In this way, a representation of hand motion may be configured with a time-varying pose that corresponds (e.g., substantially matches or mimics) the time-varying pose of the real hand represented by the representation. In other examples, a so-called “2.5D” representation of hand motion may be generated for each frame, with each representation for a frame encoded as a depth map or height field mesh. Such 2.5D representations may be smaller compared to fully three-dimensional representations, making their storage, transmission, and rendering less computationally expensive.

In other examples, skeletal hand tracking may be used to generate a geometric representation of hand motion. As such, computing device 606 may include a skeletal tracking engine 630. Skeletal tracking engine 630 may receive labeled hand pixels determined as described above, and fit a skeletal hand model comprising a plurality of finger joints with variable orientations to the imaged hand. This in turn may allow representation engine 624 to fit a deformable mesh to the hand and ultimately facilitate a fully three-dimensional model to be rendered as a representation of the hand. This may enable the hand to be viewed from virtually any angle. In some examples, skeletal tracking may be used to track an imaged hand for the purpose of identifying a designated object.

In some examples, video data 612 may capture both the left and right hands of the wearer of HMD device 604. In these examples, both hands may be segmented via segmentation engine 628 and separately labeled as the left hand and right hand. This may enable separate geometric representation of the left and right hands to be displayed.

As mentioned above, segmentation engine 628 may segment object instances in addition to hand motion. For objects that undergo motion, including articulated motion about a joint, segmentation engine 628 may employ adaptive background segmentation algorithms to subtract irrelevant background portions. As examples of objects undergoing motion, in one demonstration an instructor may open a panel of a machine by rotating the panel about a hinge. Initially, the panel may be considered a foreground object instance that should be represented for later display by a viewer. Once the panel stops moving and is substantially motionless for at least a threshold duration, the lack of motion may be detected, causing the panel to be considered part of the irrelevant background. As such, the panel may be segmented, and the viewer may perceive the representation of the panel fade from display. To this end, a representation of the panel may include a transparency value for each three-dimensional point that varies with time.

Computing device 606 may further include a recognition engine 632 configured to recognize various aspects of an object instance. In some examples, recognition engine 632 further detect an object instance as a designated object instance, detect the correspondence of an object instance to another object instance, or to recognize, identify, and/or detect an object instance in general. To this end, recognition engine 632 may utilize any suitable machine vision and/or object recognition/detection/matching techniques.

Alternatively or additionally, recognition engine 632 may recognize the pose of an object instance. In some examples, a 6DOF pose of the object instance may be recognized via any suitable 6D detection algorithm. More specifically, pose recognition may utilize feature matching algorithms (e.g., based on hand-engineered features) and robust fitting or learning-based methods. Pose recognition may yield a three-dimensional position (e.g., x/y/z) and a three-dimensional orientation (e.g., yaw/pitch/roll) of the object instance. Recognition engine 632 may estimate the pose of an object instance based on any suitable data in recorded data 602. As examples, the pose may be recognized based on color (e.g., RGB) images or images that include both color and depth values (e.g., RGB+D).

For an object instance that undergoes motion, a time-varying pose (e.g., a time-stamped sequence of 6DOF poses) may be estimated for the object instance. In some examples, time intervals in which the object instance remained substantially motionless may be estimated, and a fixed pose estimate may be used for such intervals. Any suitable method may be used to estimate a time-varying pose, including but not limited to performing object detection/recognition on each of a sequence of frames, or performing 6DOF object detection and/or tracking. As described below, an editor application may be used to receive user input for refining an estimated pose. Further, for an object instance that has multiple parts undergoing articulated motion, a 6DOF pose may be estimated for each part.

For an object instance with an estimated pose, an object-centric coordinate system specific to that object instance may be determined. Segmented (e.g., three-dimensional) points on hand(s) recorded when hand motion was performed may be placed in the object-coordinate system by transforming the points using the estimated (e.g., 6DOF) object pose, which may allow the hand motion to be displayed (e.g., on an augmented-reality device) relative to another object instance in a different scene in a spatially consistent manner. To this end, coordinate engine 626 may transform a geometric representation of hand motion from a world coordinate system (e.g., a world coordinate system of the time-varying representation) to an object-centric coordinate system of the object instance. As one example, FIG. 9 shows representation 208 (FIG. 2A) of hand 104 (FIG. 1) placed in an object-centric coordinate system 900 associated with viewer light switch 210. While shown as being placed toward the upper-right of light switch 210, the origin of coordinate system 900 may be placed at an estimated centroid of the light switch, and the coordinate system may be aligned with the estimated pose of the light switch.

For an object instance with multiple parts that undergo articulated motion, a particular part of the object instance may be associated with its own object-centric coordinate system. As one example, FIG. 10 shows a laptop computing device 1000 including an upper portion 1002 coupled to a lower portion 1004 via a hinge 1006. A hand 1008 is manipulating upper portion 1002. As such, a coordinate system 1010 is associated with upper portion 1002, and not lower portion 1004. Coordinate system 1010 may remain the active coordinate system with which hand 1008 is associated until lower portion 1004 is manipulated, for example. Generally, the portion of an articulating object instance that is associated with an active coordinate system may be inferred by estimating the surface contact between a user’s hands and the portion.

For an object instance with removable parts, the active coordinate system may be switched among the parts according to the particular part being manipulated at any given instance. As one example, FIG. 11 shows a coordinate system 1100 associated with light switch 106 (FIG. 1A). At a later instance in time, panel 132 is removed from light switch 106 and manipulated by hand 104. Upon detecting that motion of hand 104 has changed from motion relative to light switch 106 to manipulation of panel 132, the active coordinate system is switched from coordinate system 1100 to a coordinate system 1102 associated with the panel. As illustrated by this example, each removable part of an object instance may have an associated coordinate system that is set as the active coordinate system while that part is being manipulated or is otherwise relative to hand motion. The removable parts of a common object may be determined based on object recognition, scanning each part separately, explicit user input identifying the parts, or in any other suitable manner. Further, other mechanisms for identifying the active coordinate system may be used, including setting the active coordinate system based on user input, as described below.

Returning to FIG. 6, computing device 606 may include an editor application 634 configured to receive user input for processing recorded data 602. FIG. 12 shows an example graphical user interface (GUI) 1200 of editor application 634. As shown, GUI 1200 may display video data 612 in recorded data 602, though any suitable type of image data in the recorded data may be represented in the GUI. Alternatively or additionally, GUI 1200 may display representations (e.g., three-dimensional point clouds) of hand motion and/or relevant object instances. In the depicted example, GUI 1200 is switchable between the display of video data and representations via controls 1202.

GUI 1200 may include other controls selectable to process recorded data 602. For example, GUI 1200 may include an insert pause control 1204 operable to insert pauses into playback of the recorded data 602. At a viewer’s side, playback may be paused where the pauses are inserted. A user of application 1200 may specify the duration of each pause, that playback be resumed in response to receiving a particular input from the viewer, or any other suitable criteria. The user of application 1200 may insert pauses to divide the recorded demonstration into discrete steps, which may render the demonstration easier to follow. As an example, the instances of time respectively depicted in FIGS. 1A-1C may correspond to a respective step each separated from each other by a pause.

GUI 1200 may include a coordinate system control 1206 operable to identify, for a given time period in the recorded demonstration, the active coordinate system. In some examples, control 1206 may be used to place cuts where the active coordinate system changes. This may increase the accuracy with which hand motion is associated with the correct coordinate system, particularly for demonstrations that include the manipulation of moving and articulated object instances, and the removal of parts from object instances.

GUI 1200 may include a designated object 1208 control operable to identify the designated object that is relevant to recorded hand motion. This may supplement or replace at least a portion of the recognition process described above for determining the designated object. Further, GUI 1200 may include a gaze control 1210 operable to process a time-varying gaze in the recorded demonstration. In some examples, the gaze of an instructor may vary erratically and rapidly in the natural course of executing the demonstration. As such, gaze control 1210 may be used to filter, smooth, suppress, or otherwise process recorded gaze.

While FIG. 6 depicts the implementation of computing device 606 and its functions separately from HMD device 604, examples are possible in which aspects of the computing device are implemented at the HMD device. As such, HMD device 604 may perform at least portions of image data fusion, representation generation, coordinate alignment and association, segmentation, skeletal tracking, and recognition. Alternatively or additionally, HMD device 604 may implement aspects of editor application 634—for example by executing the application. This may enable the use of HMD 604 for both recording and processing a demonstration. In this example, a user of HMD device 604 may annotate a demonstration with text labels or narration (e.g., via one or more microphones integrated in the HMD device), oversee segmentation (e.g., via voice input or gestures), and insert pauses into playback, among other functions.

FIGS. 13A-13B show a flowchart illustrating a method 1300 of processing recording data including recorded hand motion. Method 1300 may represent the second phase of the three-phase pipeline mentioned above in which hand motion recorded in one context is displayed in another context. Reference to the example depicted in FIG. 6 is made throughout the description of method 1300. As such, method 1300 may be at least partially implemented on HMD device 604 and/or computing device 606.

At 1302, method 1300 includes receiving recording data obtained in the course of recording a demonstration in an environment. The recording data (e.g., recording data 602) may be received from HMD device 604, for example. The recorded data may include one or more of scan data (e.g., scan data 610) obtained from three-dimensionally scanning the environment, video data (e.g., video data 612) obtained from recording the demonstration, object data (e.g., object data 620) corresponding to a designated object instance relating to the recorded hand motion and/or a removable part of the object instance, and pose data (e.g., pose data 621) indicating a sequence of poses of an HMD device, for examples in which the recording data is received from the HMD device.

At 1304, method 1300 includes, based on the scan data obtained by three-dimensionally scanning the environment, determining a static representation of the environment. Representation engine 624 may be used to determine the static representation, for example. The static representation may include a three-dimensional point cloud, mesh, or any other suitable representation of the environment.

At 1306, method 1300 includes, based on the video data, determining a time-varying representation of the environment. The time-varying representation may be determined via representation engine 624 based on fused image data, for example. In some examples, the time-varying representation comprises a sequence of frames each consisting of a three-dimensional point cloud with per-point (e.g., RGB) color.

At 1308, method 1300 includes determining a first pose of a first instance of a designated object. As indicated at 1310, the first pose may be a time-varying pose that varies in time. The first pose may be determined via recognition engine 632, for example.

At 1312, method 1300 includes, based on the first pose, associating a first coordinate system with the first instance of the designated object. In some examples, the origin of the first coordinate system may be placed at an estimated centroid of the first instance, and the first coordinate system may be aligned to the first pose.

At 1314, method 1300 includes associating a first world coordinate system with the static representation. At 1316, method 1300 includes associating a second world coordinate system with the time-varying representation. At 1318, method 1300 includes aligning the first and second coordinate systems to determine an aligned world coordinate system. Such coordinate system association and alignment may be performed via coordinate engine 626, for example.

Turning to FIG. 13B, at 1320, method 1300 includes determining a geometric representation of hand motion, captured in the time-varying representation, in the aligned world coordinate system. At 1322, the geometric representation may be determined based on a foreground portion of the time-varying representation segmented from a background portion. In some examples, the foreground portion may include hand motion, moving object instances, and other dynamic object instances, and generally relevant object instances, whereas the background portion may include static and irrelevant data. At 1324, the background portion may be identified based on the three-dimensional scan data in the recorded data received at 1302. The geometric representation may be determined via representation engine 626 using segmentation engine 628, for example.

At 1326, method 1300 includes transforming the geometric representation of the hand motion from the aligned world coordinate system to the first coordinate system associated with the first instance of the designated object to thereby determine a geometric representation of the hand motion in the first coordinate system. Such transformation may be performed via coordinate engine 626, for example.

At 1328, method 1300 includes configuring the geometric representation of the hand motion in the first coordinate system for display relative to a second instance of the designated object in a spatially consistent manner. Configuring this geometric representation may include saving the geometric representation at a storage device that can be accessed and received at another HMD device for viewing the geometric representation in a location different than the location hand motion was recorded. Alternatively or additionally, configuring the geometric representation may include transmitting the geometric representation to the other HMD device. Here, spatial consistency may refer to the display of a geometric representation of hand motion recorded to a first object instance, relative to a second object instance with the changing pose of the hand motion that was recorded in relation to the first object instance. Spatial consistency may refer to the preservation of other spatial variables between first and second object instance sides. For example, the position, orientation, and scale of the recorded hand motion relative to the first object instance may be assigned to the position, orientation, and scale of the geometric representation, such that the geometric representation is displayed relative to the second object instance with those spatial variables.

At 1330, method 1300 optionally includes, based on the static and time-varying representations of the environment, determining a geometric representation of hand motion in the recorded data relative to a first instance of a removable part of the designated object, relative to a third coordinate system associated with the removable part. At 1332, method 1300 optionally includes configuring the geometric representation of hand motion, relative to the first instance of the removable part, for display relative to a second instance of the removable part with spatial consistency.

At 1334, method 1300 optionally includes determining a geometric representation of the first instance of the designated object. The geometric representation of the first instance of the designated object may be determined via representation engine 624, for example. Such representation alternatively or additionally may include a representation of a removable or articulated part of the first instance. At 1336, method 1300 optionally includes configuring the geometric representation of the first instance of the designated object for display with the second instance of the designated object.

FIG. 14 schematically shows an example system 1400 in which playback data 1402, produced by HMD device 604 in processing recorded data 602, is transmitted to an HMD device 1404 for playback. In particular, HMD device 1404 may play back representations of hand motion and/or object instances encoded in processed data 1402. HMD device 1404 may be viewer HMD device 204 or HMD device 300, as examples. HMD device 1404 and computing device 606 are communicatively coupled via a communication link 1406, which may assume any suitable wired or wireless, and direct or indirect form. Further, playback data 1402 may be transmitted to HMD device 1404 in any suitable manner—as examples, the playback data may be downloaded as a whole or streamed to the HMD device.

Playback data 1402 may include a geometric representation of recorded hand motion 1408. Geometric representation 1408 may include a three-dimensional point cloud or mesh, or in other examples a 2.5D representation. For examples in which the pose of hand motion varies in time, geometric representation 1408 may include be a time-varying geometric representation comprising a sequence of poses. Playback data 1402 may include a geometric representation of an object instance 1410, which may assume 3D or 2.5D forms. Geometric representation 1410 may represent an instance of a designated object, a removable part of the designated object, an articulated part of the designated object, or any other suitable aspect of the designated object. Further, in some examples, geometric representation 1410 may be formed by scanning an object as described above. In other examples, geometric representation 1410 may include a virtual model of an object instance created without scanning the object instance (e.g., by creating the virtual model via modeling software).

Further, playback data 1402 may include object data 1412, which may comprise an identity, object type/class, and/or output from a recognizer regarding the object instance that the recorded hand motion was performed in relation to. HMD device 1404 may utilize object data 1412 to identify that a second object instance in the surrounding physical space of the HMD device corresponds to the object instance that the recorded hand motion was performed in relation to, and thus that geometric representation 1408 of the recorded hand motion should be displayed in relation to the second instance. Generally, object data 1412 may include any suitable data to facilitate this identification.

To achieve spatial consistency between geometric representation 1408 relative to the second object instance and the recorded hand motion relative to the first object instance, playback data 1402 may include spatial data 1414 encoding one or more of a position, orientation, and scale of the geometric representation. Geometric representation 1408 may be displayed with these attributes relative to the second object instance.

Further, playback data 1402 may include audio data 1416, which may include narration spoken by a user that recorded the playback data, where the narration may be played back by HMD device 1404. Playback data 1402 may include gaze data 1418 of the user, which may be displayed via a display of HMD device 1404.

In other implementations, a non-HMD device may be used to present playback data 1402. For example, a non-HMD device including an at least partially transparent display may enable the viewing of representations of object instances and/or hand motion, along with a view of the surrounding physical space. As another example, a non-transparent display (e.g., mobile device display such as that of a smartphone or tablet, television, monitor) may present representations of object instances and/or hand motion, potentially along with image data capturing the physical space surrounding the display or the environment in which the hand motion was recorded. In yet another example, an HMD device may present representations of object instances and/or hand motion via a substantially opaque display. Such an HMD device may present imagery corresponding to a physical space via passthrough stereo video, for example.

FIG. 15 shows a flowchart illustrating a method 1500 of outputting a geometric representation of hand motion relative to a second instance of a designated object. The geometric representation may have been recorded relative to a first instance of the designated object. Method 1500 may be performed by HMD device 1404 and/or HMD device 300, as examples. The computing device on which method 1500 is performed may implement one or more of the engines described above with reference to FIG. 6.

At 1502, method 1500 includes, at an HMD device, receiving a geometric representation of motion of a hand, the geometric representation having a time-varying pose determined relative to a first pose of a first instance of a designated object in a first coordinate system. At 1504, method 1500 optionally includes receiving a geometric representation of motion of the hand determined relative to a first instance of a removable part of the first instance of the designated object in a third coordinate system. At 1506, method 1500 optionally includes receiving a geometric representation of the first instance of the removable part.

At 1508, method 1500 includes receiving image data obtained by scanning an environment occupied by the HMD device and by a second instance of the designated object. The HMD device may collect various forms of image data (e.g., RGB+D) and construct a three-dimensional point cloud or mesh of the environment, as examples. At 1510, method 1500 includes, based on the image data, determining a second pose of the second instance of the designated object. To this end, the HMD device may implement recognition engine 632, for example. The second pose may include a 6DOF pose of the second object instance, in some examples. At 1512, the second pose may be time-varying in some examples.

At 1514, method 1500 includes associating a second coordinate system with the second instance of the designated object based on the second pose. To this end, the HMD device may implement coordinate engine 626, for example. At 1516, method 1500 includes outputting, via a display of the HMD device, the geometric representation of hand motion relative to the second instance of the designated object with a time-varying pose relative to the second pose that is spatially consistent with the time-varying pose relative to the first pose. Here, the geometric representation of hand motion may be rendered with respect to the second object instances with specific 6D poses, such that the relative pose between the hand motion and second object instance substantially matches what the relative pose had been between the hand and the first object instance that the hand was recorded in relation to.

At 1518, method 1500 optionally includes outputting, via the display, the geometric representation of the motion of the hand determined relative to the first instance of the removable part relative to a second instance of the removable part in a fourth coordinate system. At 1520, method 1500 optionally includes outputting, via the display, a geometric representation of the first instance of the removable part for viewing with the second instance of the removable part. In other implementations, however, a non-HMD device (e.g., mobile device display, television, monitor) may be used to present representations of object instances and/or hand motion, potentially along with a view of a physical space.

Modifications to the disclosed examples are possible, as are modifications to the contexts in which the disclosed examples are practiced. For example, motion of both of a user’s hands may be recorded and represented for viewing in another location. In such examples, motion of both hands may be recorded in relation to a common object, or to objects respectively manipulated by the left and right hands. For example, a demonstration may be recorded and represented for later playback in which an object is held in one hand, and another object (e.g., in a fixed position) is manipulated by the other hand. Where two objects are respectively relevant to left and right hands, representations of both objects may be determined and displayed in another location.

Further, aspects of the disclosed examples may interface with other tools for authoring demonstrations and data produced by such tools. For example, aspects of the processing phase described above in which a recorded demonstration is processed (e.g., labeled, segmented, represented, recognized) for later playback may be carried out using other tools and provided as input to the processing phase. As a particular example with reference to FIG. 6, object instance labels (e.g., identities) and user annotations created via other tools, and thus not included in recorded data 602, may be provided as input to editor application 634. Such data may be determined via a device other than HMD device 604, for example.

Still further, the disclosed examples are applicable to the annotation of object instances, in addition to the recording of hand motion relative to object instances. For example, user input annotating an object instance in one location, where annotations may include hand gestures, gaze patterns, and/or audio narration, may be recorded and represented for playback in another location. In yet other examples, the disclosed examples are applicable to recording other types of motion (e.g., object motion as described above) in addition to hand motion, including motion of other body parts, motion of users external to the device on which the motion is recorded, etc.

In examples described above, a representation of hand motion may be determined in the coordinate system of an object in an environment. The object coordinate system may be determined based on the pose of the object, with the pose being estimated based on image data capturing the object. Similarly, image data may be used to determine the representation of hand motion—a static representation of the environment (generated by three-dimensionally imaging the environment) and a time-varying representation of the environment (generated from video data capturing hand motion in the environment) may be compared to determine a varying foreground portion of the time-varying representation that is segmented from a substantially fixed background portion to thereby produce the representation of hand motion. However, other techniques may enable the representation of hand motion with increased accuracy, reduced complexity, and in manners beyond display of the representation.

In view of the above, examples are disclosed that employ a parametric approach to representing hand motion. This parametric approach differs from the non-parametric, image-based approaches described above in several ways. For example, in the parametric approach a representation of hand motion may be determined in the coordinate system of a virtual model representing an object that the hand motion is performed in relation to, rather than in the coordinate system of the object itself as estimated from image data. As the virtual model may be encoded by a computational data structure, its coordinate system may be known rather than estimated (or at least to a higher degree of precision). As such, aligning the representation of hand motion to the virtual model may be more accurate than aligning the representation to the estimated coordinate system of the object—both for the recording and playback of the hand motion, as the virtual model may be aligned to different instances of the object (e.g., in different environments).

The parametric approach may further differ in its use of a parametric representation of hand motion rather than a geometric representation (e.g., mesh, point cloud). The parametric representation may be determined via a hand tracking engine, and may encode the respective articulation of one or more joints of a human hand. In contrast, the geometric representation is determined by segmenting different portions of a time-varying representation of an environment as described above. As such, the computational expense of segmentation, as well as other associated steps in the non-parametric approach including the construction of static and time-varying environmental representations, coordinate association, coordinate alignment, and other image processing, may be saved using the parametric approach. The parametric representation may also occupy less space in storage/memory compared to the geometric representation. Further, as described below, the parametric approach may enable the transfer of a parametric representation of hand motion to the manipulator of a robotic device, thereby enabling the robotic device to mimic or model hand motions. Specifically, as described below, the use of parameters enables easy mapping of hand motion to how mechanical robotic structures move. This transfer may facilitate an additional class of use cases and scenarios, with increased accuracy and reduced complexity relative to observation-based approaches to training robotic devices.

FIGS. 16A-C illustrate an example process of recording hand motion in accordance with the parametric approach introduced above. Via an HMD device 1600, a user 1602 records motion of a hand 1604 of the user performed in the opening of a fuel cap 1606 of a vehicle—e.g., as part of a video tutorial demonstrating how to refuel a vehicle. The recording captures the unscrewing of fuel cap 1606, with FIG. 16A illustrating the fuel cap and hand 1604 at an initial orientation at the start of the unscrewing, and FIG. 16B illustrating the fuel cap and hand at a subsequent orientation with the fuel cap partially unscrewed. Any suitable device other than an HMD device may be used to record hand motion, however.

The motion of hand 1604—including its rotation illustrated between FIGS. 16A-16B—is recorded in relation to a virtual model 1608, schematically shown in FIGS. 16C, that represents fuel cap 1606. HMD device 1600 aligns the pose of virtual model 1608 to the pose of fuel cap 1606, such that the three-dimensional position and orientation (e.g., 6DOF pose) of the virtual model are respectively aligned with the three-dimensional position and orientation (e.g., 6DOF pose) of fuel cap 1606. A representation of the recorded motion of hand 1604—referred to herein as a “recorded representation”—may then be determined in a coordinate system 1610 associated with virtual model 1608. With virtual model 1608 aligned to fuel cap 1606, and the representation placed in the coordinate system of the virtual model, the representation is accordingly aligned to the pose of the fuel cap. Placing the representation in this coordinate system also enables the representation to be transferred to other devices (which, in some examples may modify the representation or determine another representation based on the received representation) and accurately displayed in relation to other instances of a fuel cap. Thus, the representation and fuel cap 1606 (e.g., through its own representation via a virtual model described below) may be used to depict hand motion in the appropriate spatial context with respect to the fuel cap. Other potential portions of the recording not illustrated similarly may be represented—for example, articulation of an articulable fuel door 1612 may be represented by an articulating virtual model, and removal of a removable screw 1614 may also be represented by a virtual model. In other examples, virtual model 1608 may include such articulating and/or removable portions.

FIG. 17 illustrates one such example of the display of a representation 1700 of motion of hand 1604 relative to a fuel cap 1702. Representation 1700 is output based on the recorded representation of the motion of hand 1604 determined via HMD device 1600—for example, representation 1700 may be the recorded representation, or may be determined based on the recorded representation (e.g., modified relative to the recorded representation), as described in further detail below. Fuel cap 1702 may be the same model as, or a model similar to, fuel cap 1606, or both may generally be a similar type of object. As such, fuel cap 1606 is referred to as a “first instance” of a fuel cap, and fuel cap 1702 is referred to as a “second instance” of the fuel cap. Representation 1700 is displayed via an HMD device 1704, which may occupy a different environment than that occupied by HMD device 1600. Representation 1700, relative to fuel cap 1702, is spatially consistent with the recorded representation of the motion of hand 1604 relative to fuel cap 1606, with respect to the instant of time depicted in FIG. 16A. In the depicted example, this spatial consistency is of the form that the three-dimensional position (e.g., x/y/z), three-dimensional orientation (e.g., yaw/pitch/roll), and scale of representation 1700 relative to fuel cap 1702 are substantially and respectively equal to the three-dimensional position, three-dimensional orientation, and scale of the recorded representation of hand 1604 relative to fuel cap 1606 (and in some examples substantially equal to the three-dimensional position, three-dimensional orientation, and scale of the hand itself, respectively). A user 1706 of HMD device 1704 perceives a different portion of hand 1604 (via representation 1700), however, due to the differing view perspective from that of user 1602 who recorded motion of their hand.

To achieve spatial consistency between representation 1700 and the recorded representation of the motion of hand 1604, HMD device 1704 aligns the pose of virtual model 1610 to the pose of fuel cap 1702. Accordingly, coordinate system 1608 is accurately placed within the scene observed by user 1706 such that representation 1700—determined in this coordinate system—is accurately rendered in relation to fuel cap 1702 in spatial consistency with the recorded representation of the motion of hand 1604 in relation to fuel cap 1606. Such alignment may be maintained as fuel cap 1702 is rotated (and potentially translated—e.g., as a result of being removed from a surrounding fuel door). Thus, the determined pose of an object may encompass changes to the object’s pose as it undergoes rotation and/or translation (e.g., encoded as a time-varying sequence of poses). It will be understood, however, that in other examples different virtual models may be aligned to different object instances—for example, where the instances differ (e.g., due to differing model, wear and tear).

In the parametric approach to representing hand motion illustrated by this example, objects that hand motion is recorded in relation to are parameterized by virtual models that represent those objects. For many objects, corresponding virtual models may be readily available (e.g., accessible from a remote source via a network connection), allowing their availability to be leveraged for the representation of hand motion. Virtual models may take the form of three-dimensional computer-aided design (CAD) models, for example, or any other suitable form. Further, in some examples virtual models may be used to identify and/or recognize the objects the models represent, as described in further detail below.

Hand motion is also parameterized in the parametric approach. For example, representation 1700 of hand motion may be a parametric representation that encodes the articulation of one or more joints of hand 1604. Representation 1700 is determined based on the recorded representation of the hand motion taken by HMD device 1600 as mentioned above. In some examples, the recorded representation itself may be a parametric representation of the hand motion, in which case the parametric representation displayed on HMD device 1704 may be the parametric representation determined by HMD device 1600. In other examples, parametric representation 1700 may be determined based on the recorded representation but modified relative to the recorded representation, for example with respect to its geometry, pose, animation, or any other suitable aspect.

As a particular example, one parametric form of a parametric representation of motion of a hand may include a 28-dimensional vector that encodes the articulation of each hand joint, thereby enabling the articulation of the fingers and joints of the hand, as well as the overall pose (e.g., 6DOF pose) of the hand, to be reproduced. A parametric representation of hand motion may include a time-varying sequence of such 28D vectors that collectively encode the hand motion as it changes over time. As described above, a parametric representation of hand motion may be determined in the coordinate system (e.g., coordinate system 1608) of a virtual model, enabling its transfer (whether with or without modification) between different object instances and environments. The parametric representation may further enable the computational cost and complexity associated with segmenting image data to determine a geometric representation of hand motion described above to be avoided.

As alluded to above, parametric representations of hand motion may enable recorded hand motion to be used in manners other than playback. FIG. 18 illustrates one such example in which a manipulator 1800 of a robotic device 1802 is controlled according to a parametric representation of motion of a hand 1804. The parametric representation, which also may be considered a recorded representation, is determined relative to a virtual model 1806 aligned to a first instance 1808 of a screw, in a coordinate system of the virtual model. The parametric representation is then transferred (e.g., via a network connection) to robotic device 1802, which upon identifying a second instance 1810 of a screw, aligns virtual model 1806 to the second instance based at least in part on image data collected by an image sensor 1812. Based on the parametric representation, a corresponding sequence of actions to be performed by manipulator 1800 is then determined. For example, for each action in the sequence of actions, one or more corresponding commands may be generated and issued to manipulator 1800 to thereby cause the manipulator to perform that action. FIG. 18 illustrates a particular action in the form of a pinching gesture carried out by manipulator 1800 in accordance with the pinching of first instance 1808 of the screw by hand 1804. Other gestures and actions carried out by hand 1804 may be substantially mimicked by manipulator 1800 to perform the hand motion recorded by HMD device 1801.

FIG. 18 illustrates an example in which the number of fingers of manipulator 1800 differs from the number of fingers of hand 1804 whose motion informs that of the manipulator. In such examples, predetermined transformation(s) can be used to convert the pose and articulation of hand 1804 encoded in its parametric representation to a corresponding pose and articulation of manipulator 1800 and other manipulators with other numbers of fingers or types of articulating appendages such as hands, arms, and grippers. Thus, “manipulator” as used herein refers to any suitable type of robotic appendage. Any suitable methods may be employed to determine such transformation(s), including but not limited to parameter-to-parameter transfer and deep learning. As additional examples, a neural network or support vector machine may be trained to classify human grasps and manipulations in a (e.g., fixed) vocabulary of actions with associated parameter(s) (e.g., width of grasp). Further engineering may be performed to convert this vocabulary into a parameterized set of actions for a robot manipulator. In some examples, a synthetic model of human hands may be used. Alternatively or additionally, a mapping between human hand and robot manipulator actions may be determined using image data capturing motion of real human hands. Potentially in addition, a (e.g., teleoperated) robot manipulator may be controlled in accordance with the imaged human hand motion to build the mapping. The translation of human hand motion to motion of a robot manipulator in this manner may provide a simpler, more accurate, and less computationally expensive method of controlling robot manipulators in accordance with hand motion relative to approaches that employ observational learning (e.g., of image data) to do so, and potentially with reduced human engineering.

In some examples, error in the control of manipulator 1800, and/or in the optical sensing of second instance 1810 of the screw via image sensor 1812, may result in a mismatch between the intended positioning of the manipulator (e.g., where the manipulator is to be positioned based on the location of hand 1804) and the actual positioning of the manipulator. As such, robotic device 1802 may employ a technique referred to as “visual servoing” in which image data from image sensor 1812 is used as a feedback signal in the control of manipulator 1800 to thereby update and correct positioning error. In particular, such image data may be used to continuously estimate the pose of second instance 1810 of the screw in real time, and, where error is detected, update commands issued to manipulator 1800 to thereby align the manipulator with the second instance of the screw.

FIG. 19 shows an example system 1900 for sharing representations (e.g., recorded and/or parametric representations) of hand motion recorded by an HMD device 1902 with another HMD device 1904 and/or a robotic device 1906. HMD device 1904 may playback the representation of hand motion via a display 1905, while robotic device 1906 may generate commands to control a manipulator 1908 in accordance with the hand motion. System 1900 also includes a computing device 1910, aspects of which may be implemented by the HMD devices and/or robotic devices described herein to implement the recording and sharing of representations of hand motion.

Computing device 1910 includes a recognition engine 1912 configured to recognize an instance of an object in an environment. Recognition engine 1912 may recognize the instance based on image data (e.g., RGB, greyscale, and/or depth data), for example collected by HMD device 1902. In some examples, recognition engine 1912 may recognize the instance based on a virtual model 1914 representing the instance, whose reception by HMD device 1902 is schematically shown. In such examples, a pre-recognition process may be performed in which image data indicating or suggesting the presence of the instance is used to obtain virtual model 1914, which is then used to confirm the presence of the instance. Other data may be used alternatively or in addition to image data to obtain virtual model 1914, such as user input 1916 (e.g., voice input, hand gestures, gaze patterns) identifying or suggesting the presence of the instance. In other examples, virtual model 1914 may be obtained (e.g., from a remote source via a network connection, or stored locally on HMD device 1902) in response to identifying the instance. As described above, virtual model 1914 may assume any suitable form, including but not limited to that of a mesh, point cloud, three-dimensional CAD model, etc. In some examples, virtual model 1914 may include an articulable part (representing an articulable part of the instance) and/or a removable part (representing a removable part of the instance), while in other examples, separate virtual models may be used for such parts. Recognition engine 1912 may implement aspects of recognition engine 632 (FIG. 6), in some examples.

Computing device 1910 further includes an alignment engine 1918 configured to align virtual model 1914 to the instance. To this end, alignment engine 1918 may determine the pose of the instance, and align the pose of virtual model 1914 to the pose of the instance. This may enable knowledge of the location of various portions/parts of the instance. Virtual model 1914 may have an associated coordinate system, where alignment of the virtual model with the instance properly places the coordinate system in the relevant environmental scene. Alignment engine 1918 may maintain the alignment of virtual model 1914 and its coordinate system to the instance as the instance undergoes motion—for example, in the event of an articulable part of the instance moving, or a removable part—or the instance itself—being removed from attachment to another part.

Computing device 1910 further includes a representation engine 1920 configured to determine a parametric representation of hand motion. As such, representation engine 1920 may receive image data (e.g., video data) recording hand motion, and determine the parametric representation of the recorded hand motion. Alternatively or additionally, representation engine 1920 may receive a recorded representation of hand motion (which itself may be in parametric form), and determine the parametric representation based on the recorded representation. As described above, the recorded representation may be used as the parametric representation (e.g., where the recorded representation is in parametric form), in which case determination of a parametric representation separate from the recorded representation may be foregone. In other examples, the parametric representation may be determined based on the recorded representation but may differ from the recorded representation—e.g., with respect to geometry, pose, articulation, animation/variance in time, or any other suitable aspect. In yet other examples, representation engine 1920 may convert a non-parametric representation (e.g., a geometric representation of hand motion) to the parametric representation.

The parametric representation may be determined relative to virtual model 1914 representing the instance that the hand motion was performed in relation to—e.g., the parametric representation may be determined in the coordinate system of the virtual model. Then, virtual model 1914 may be aligned with another instance of the object (e.g., in another environment), such that its coordinate system is aligned with the other instance, enabling the parametric representation of hand motion (or another representation) to be displayed relative to the other instance in spatial consistency with how the hand motion was performed in relation to the initial instance.

The parametric representation may encode the articulation of one or more joints of a hand, and/or the overall pose of the hand. As such, representation engine 1920 may utilize a hand tracking engine 1922 to determine the parametric representation. As described above, the parametric representation may assume any suitable form, such as that of a 28D vector encoding the respective articulation of each joint of a human hand, and thus the overall pose of the hand. Further, the parametric representation may include a time-varying sequence of poses that each represent respective articulations of a plurality of hand joints.

In some examples, representation engine 1920/hand tracking engine 1922 may produce the parametric representation of hand motion in a head coordinate system associated with the head of a user of HMD device 1902. As such, computing device 1910 may utilize a transformation engine 1924 to transform the parametric representation from the head coordinate system to the coordinate system of virtual model 1914. The head coordinate system may be determined based on the head pose of the user, which in turn may be determined via an IMU implemented in HMD device 1902, image data collected by the HMD device, wireless sensing performed by the HMD device, or in any other suitable manner. The determination of the head pose/head coordinate system may enable the parametric representation to be accurately rendered with respect to an object instance as well as a viewer’s perspective.

FIG. 19 illustrates an example in which a parametric representation 1926 of hand motion, recorded by HMD device 1902, is shared with HMD device 1904 via a network connection 1930. Parametric representation 1926 may be determined via computing device 1910, aspects of which may be implemented by HMD device 1902. As shown, parametric representation 1926 and virtual model 1914 are transmitted from HMD device 1902 to HMD device 1904. Collectively, parametric representation 1926 and virtual model 1914 may encode a sequence of representations of hand/joint poses and articulations, the location of such representations in the coordinate system of the virtual model, as well as the pose(s) of the virtual model (and potentially parts thereof). This data may be considered to parameterize the tasks performed in connection with the represented hand motion. Alternative or additional suitable data may be shared, however, including but not limited to user gaze patterns, voice data, hand gestures, and other forms of annotations. In other examples, HMD device 1902 may share a recorded representation (parametric or non-parametric) with HMD device 1904, where HMD device 1904 may determine a parametric representation based on the recorded representation, as described above.

FIG. 19 also depicts the sharing of parametric representation 1926 of hand motion with robotic device 1906 via a network connection 1932. As described above, a sequence of actions may be determined for performance by manipulator 1908 in accordance with parametric representation 1926, with one or more commands being generated for each action that cause the manipulator to perform that action. To this end, computing device 1910 further includes a robot engine 1934 configured to determine the sequence of actions and corresponding commands for manipulator 1908. Robotic device 1906 may implement aspects of computing device 1910, including robot engine 1934, for example. As described above, actions and commands for manipulator 1908 may be determined based on parametric representation 1926 in any suitable manner, such as via transformations computed via a support vector machine or neural network, and/or via a fixed vocabulary. Further, robot engine 1934 may be configured to perform visual servoing using feedback (e.g., from an image sensor such as image sensor 1812 of FIG. 18) to correct errors in the positioning of manipulator 1800 and update control commands issued to the manipulator that effect such correction.

FIG. 20 shows a flowchart illustrating a method 2000 of determining a parametric representation of hand motion. Method 2000 may be implemented by one or more the HMD devices described herein and/or computing device 1910, for example.

At 2002, method 2000 includes receiving image data corresponding to an environment. The image data may include RGB data, greyscale data, depth data, and/or any other suitable type of image data. At 2004, method 2000 includes recognizing a first instance of an object in the environment. In some examples, the first instance may be recognized based on a virtual model 2006 representing the first instance. In other examples, the virtual model may be received in response to recognizing the first instance.

At 2008, method 2000 includes receiving a virtual model representing the first instance of the object. At 2010, method 2000 includes aligning the virtual model to the first instance and maintaining such alignment (e.g., as the first instance undergoes motion). In some examples, aligning the virtual model may include aligning 2012 the pose of the virtual model to the pose of the first instance. The pose of the first instance may be recognized as part of recognizing the first instance at 2004, for example.

At 2014, method 2000 includes receiving a recording of hand motion. The recording may include video data capturing the hand motion, for example. At 2016, method 2000 includes, based on the recording, determining a parametric representation of hand motion relative to the virtual model aligned with the first instance of the object. The parametric representation may include a sequence of vectors representing respective articulations of a plurality of hand joints, for example. In some examples, the parametric representation may be determined in a coordinate system 2018 of the virtual model. At 2020, method 2000 includes configuring the parametric representation of hand motion for display relative to the virtual model as aligned to a second instance of the object, where the display is spatially consistent with the parametric representation of hand motion relative to the virtual model as aligned to the first instance of the object. For example, the three-dimensional position, three-dimensional orientation, and scale of the parametric representation relative to the second instance may be substantially equal to the three-dimensional position, three-dimensional orientation, and scale of the recorded hand relative to the first instance.

In other examples, a recorded representation of hand motion may be received a 2014, and the parametric representation may be determined at 2016 based on the recorded representation. The recorded representation may be a parametric representation, in which case determining the parametric representation at 2016 may include configuring the parametric representation received at 2014 for use with or without modification. In other examples, a non-parametric (e.g., geometric) representation may be received at 2014, and determining the parametric representation at 2016 may include converting the non-parametric representation to the parametric representation.

FIG. 21 shows a flowchart illustrating a method 2100 of outputting a parametric representation of hand motion. Method 2000 may be implemented by one or more the HMD devices described herein and/or computing device 1910, for example.

At 2102, method 2100 includes receiving a recorded representation of hand motion determined relative to a virtual model aligned to a first instance of an object. At 2104, method 2100 includes receiving image data corresponding to an environment. At 2106, method 2100 includes recognizing a second instance of the object in the environment. At 2108, method 2100 includes aligning the virtual model to the second instance of the object. At 2110, method 2100 optionally includes based on the recorded representation of hand motion, outputting a parametric representation of hand motion for display relative to the virtual model as aligned to the second instance of the object, such that the parametric representation of hand motion relative to the virtual model as aligned to the second instance of the object is spatially consistent with the recorded representation of hand motion relative to the virtual model as aligned to the first instance of the object. For example, the parametric representation may be output for display at an HMD device. As described above, outputting the parametric representation at 2110 may include outputting the recorded representation with or without modification, or may include determining the parametric representation based on the recorded representation (e.g., via method 2000).

At 2112, method 2100 optionally includes, based on the parametric representation of hand motion, determining a sequence of actions for performance, relative to the virtual model as aligned to the second instance of the object, by a manipulator of the robotic device, where the sequence of actions is spatially consistent with the parametric representation of hand motion relative to the virtual model as aligned to the first instance of the object. In such examples, method 2100 may include, for each action of the sequence of actions, generating one or more corresponding commands 2114 configured to cause the manipulator to perform the action. In such examples, method 2100 may include updating 2116 the one or more commands based on the image data to thereby align the manipulator to the second instance of the object.

In some embodiments, the methods and processes described herein may be tied to a computing system of one or more computing devices. In particular, such methods and processes may be implemented as a computer-application program or service, an application-programming interface (API), a library, and/or other computer-program product.

FIG. 22 schematically shows a non-limiting embodiment of a computing system 2200 that can enact one or more of the methods and processes described above. Computing system 2200 is shown in simplified form. Computing system 2200 may take the form of one or more personal computers, server computers, tablet computers, home-entertainment computers, network computing devices, gaming devices, mobile computing devices, mobile communication devices (e.g., smart phone), and/or other computing devices.

Computing system 2200 includes a logic subsystem 2202 and a storage subsystem 2204. Computing system 2200 may optionally include a display subsystem 2206, input subsystem 2208, communication subsystem 2210, and/or other components not shown in FIG. 22.

Logic subsystem 2202 includes one or more physical devices configured to execute instructions. For example, the logic subsystem may be configured to execute instructions that are part of one or more applications, services, programs, routines, libraries, objects, components, data structures, or other logical constructs. Such instructions may be implemented to perform a task, implement a data type, transform the state of one or more components, achieve a technical effect, or otherwise arrive at a desired result.

The logic subsystem may include one or more processors configured to execute software instructions. Additionally or alternatively, the logic subsystem may include one or more hardware or firmware logic machines configured to execute hardware or firmware instructions. Processors of the logic subsystem may be single-core or multi-core, and the instructions executed thereon may be configured for sequential, parallel, and/or distributed processing. Individual components of the logic subsystem optionally may be distributed among two or more separate devices, which may be remotely located and/or configured for coordinated processing. Aspects of the logic subsystem may be virtualized and executed by remotely accessible, networked computing devices configured in a cloud-computing configuration.

Storage subsystem 2204 includes one or more physical devices configured to hold instructions executable by the logic subsystem to implement the methods and processes described herein. When such methods and processes are implemented, the state of storage subsystem 2204 may be transformed—e.g., to hold different data.

Storage subsystem 2204 may include removable and/or built-in devices. Storage subsystem 2204 may include optical memory (e.g., CD, DVD, HD-DVD, Blu-Ray Disc, etc.), semiconductor memory (e.g., RAM, EPROM, EEPROM, etc.), and/or magnetic memory (e.g., hard-disk drive, floppy-disk drive, tape drive, MRAM, etc.), among others. Storage subsystem 2204 may include volatile, nonvolatile, dynamic, static, read/write, read-only, random-access, sequential-access, location-addressable, file-addressable, and/or content-addressable devices.

It will be appreciated that storage subsystem 2204 includes one or more physical devices. However, aspects of the instructions described herein alternatively may be propagated by a communication medium (e.g., an electromagnetic signal, an optical signal, etc.) that is not held by a physical device for a finite duration.

Aspects of logic subsystem 2202 and storage subsystem 2204 may be integrated together into one or more hardware-logic components. Such hardware-logic components may include field-programmable gate arrays (FPGAs), program- and application-specific integrated circuits (PASIC/ASICs), program- and application-specific standard products (PSSP/ASSPs), system-on-a-chip (SOC), and complex programmable logic devices (CPLDs), for example.

The terms “module,” “program,” and “engine” may be used to describe an aspect of computing system 2200 implemented to perform a particular function. In some cases, a module, program, or engine may be instantiated via logic subsystem 2202 executing instructions held by storage subsystem 2204. It will be understood that different modules, programs, and/or engines may be instantiated from the same application, service, code block, object, library, routine, API, function, etc. Likewise, the same module, program, and/or engine may be instantiated by different applications, services, code blocks, objects, routines, APIs, functions, etc. The terms “module,” “program,” and “engine” may encompass individual or groups of executable files, data files, libraries, drivers, scripts, database records, etc.

It will be appreciated that a “service”, as used herein, is an application program executable across multiple user sessions. A service may be available to one or more system components, programs, and/or other services. In some implementations, a service may run on one or more server-computing devices.

When included, display subsystem 2206 may be used to present a visual representation of data held by storage subsystem 2204. This visual representation may take the form of a graphical user interface (GUI). As the herein described methods and processes change the data held by the storage subsystem, and thus transform the state of the storage subsystem, the state of display subsystem 2206 may likewise be transformed to visually represent changes in the underlying data. Display subsystem 2206 may include one or more display devices utilizing virtually any type of technology. Such display devices may be combined with logic subsystem 2202 and/or storage subsystem 2204 in a shared enclosure, or such display devices may be peripheral display devices.

When included, input subsystem 2208 may comprise or interface with one or more user-input devices such as a keyboard, mouse, touch screen, or game controller. In some embodiments, the input subsystem may comprise or interface with selected natural user input (NUI) componentry. Such componentry may be integrated or peripheral, and the transduction and/or processing of input actions may be handled on- or off-board. Example NUI componentry may include a microphone for speech and/or voice recognition; an infrared, color, stereoscopic, and/or depth camera for machine vision and/or gesture recognition; a head tracker, eye tracker, accelerometer, and/or gyroscope for motion detection and/or intent recognition; as well as electric-field sensing componentry for assessing brain activity.

When included, communication subsystem 2210 may be configured to communicatively couple computing system 2200 with one or more other computing devices. Communication subsystem 2210 may include wired and/or wireless communication devices compatible with one or more different communication protocols. As non-limiting examples, the communication subsystem may be configured for communication via a wireless telephone network, or a wired or wireless local- or wide-area network. In some embodiments, the communication subsystem may allow computing system 2200 to send and/or receive messages to and/or from other devices via a network such as the Internet.

Another example provides a computing device comprising a logic subsystem, and a storage subsystem comprising instructions executable by the logic subsystem to receive a recorded representation of hand motion determined relative to a virtual model aligned to a first instance of an object, receive image data corresponding to an environment, recognize a second instance of the object in the environment, align the virtual model to the second instance of the object, and based on the recorded representation of hand motion, output a parametric representation of hand motion for display relative to the virtual model as aligned to the second instance of the object, such that the parametric representation of hand motion relative to the virtual model as aligned to the second instance of the object is spatially consistent with the recorded representation of hand motion relative to the virtual model as aligned to the first instance of the object. In such an example, the second instance may be recognized based on the virtual model. In such an example, the instructions may be further executable to determine a pose of the second instance of the object. In such an example, the instructions executable to align the virtual model to the second instance of the object may be executable to align a pose of the virtual model to the pose of the second instance of the object. In such an example, the recorded representation of hand motion may be determined in a coordinate system of the virtual model. In such an example, the parametric representation of hand motion may be output for display on a head-mounted display device. In such an example, the parametric representation of hand motion alternatively or additionally may be output for display based on a head pose of a user of the head-mounted display device. In such an example, the instructions alternatively or additionally may be executable to maintain an alignment of the virtual model to the second instance of the object as the second instance of the object undergoes motion. In such an example, the motion may include motion of an articulable part of the second instance of the object. In such an example, the motion alternatively or additionally may include motion of a removable part of the second instance of the object. In such an example, the recorded representation of hand motion may include a time-varying sequence of poses that each represent respective articulations of a plurality of hand joints.

Another example provides a computing device comprising a logic subsystem, and a storage subsystem comprising instructions executable by the logic subsystem to receive image data corresponding to an environment, recognize a first instance of an object in the environment, receive a virtual model representing the first instance of the object, align the virtual model to the first instance of the object and maintain such alignment, receive a recording of hand motion, based on the recording, determine a parametric representation of hand motion relative to the virtual model as aligned with the first instance of the object, and configure the parametric representation of hand motion for display relative to the virtual model as aligned to a second instance of the object, such that the parametric representation of hand motion relative to the virtual model as aligned to the second instance of the object is spatially consistent with the parametric representation of hand motion relative to the virtual model as aligned to the first instance of the object. In such an example, the first instance of the object may be recognized based on the virtual model. In such an example, the computing device may further comprise instructions executable to determine a pose of the first instance of the object. In such an example, the instructions executable to align the virtual model to the first instance of the object may be executable to align a pose of the virtual model to the pose of the first instance of the object. In such an example, the parametric representation of hand motion may be determined in a coordinate system of the virtual model. In such an example, the computing device alternatively or additionally may comprise instructions executable to output the parametric representation of hand motion for display at another computing device.

Another example provides, at a robotic device, a method of controlling a robot manipulator, comprising receiving a parametric representation of hand motion determined relative to a virtual model aligned to a first instance of an object, receiving image data corresponding to an environment, recognizing a second instance of the object in the environment, aligning the virtual model to the second instance of the object, and based on the parametric representation of hand motion, determining a sequence of actions for performance, relative to the virtual model as aligned to the second instance of the object, by a manipulator of the robotic device, where the sequence of actions is spatially consistent with the parametric representation of hand motion relative to the virtual model as aligned to the first instance of the object. In such an example, the method may further comprise, for each action of the sequence of actions, generating one or more corresponding commands configured to cause the manipulator to perform the action. In such an example, the method may further comprise updating the one or more commands based on the image data to thereby align the manipulator to the second instance of the object.

It will be understood that the configurations and/or approaches described herein are exemplary in nature, and that these specific embodiments or examples are not to be considered in a limiting sense, because numerous variations are possible. The specific routines or methods described herein may represent one or more of any number of processing strategies. As such, various acts illustrated and/or described may be performed in the sequence illustrated and/or described, in other sequences, in parallel, or omitted. Likewise, the order of the above-described processes may be changed.

The subject matter of the present disclosure includes all novel and non-obvious combinations and sub-combinations of the various processes, systems and configurations, and other features, functions, acts, and/or properties disclosed herein, as well as any and all equivalents thereof.

文章《Microsoft Patent | Spatially consistent representation of hand motion》首发于Nweon Patent

]]>
Microsoft Patent | Control of variable-focus lenses in a mixed-reality device for presbyopes https://patent.nweon.com/27399 Thu, 09 Mar 2023 15:08:04 +0000 https://patent.nweon.com/?p=27399 ...

文章《Microsoft Patent | Control of variable-focus lenses in a mixed-reality device for presbyopes》首发于Nweon Patent

]]>
Patent: Control of variable-focus lenses in a mixed-reality device for presbyopes

Patent PDF: 加入映维网会员获取

Publication Number: 20230069895

Publication Date: 2023-03-09

Assignee: Microsoft Technology Licensing

Abstract

Variable-focus lenses are arranged as a lens pair that work on opposite sides of a see-through optical combiner used in a mixed-reality head-mounted display (HMD) device. An eye-side variable-focus lens is configured as a negative lens over an eyebox of the see-through optical combiner to enable virtual-world objects to be set at a close distance. The negative lens is compensated by its conjugate using a real-world-side variable-focus lens configured as a positive lens to provide for an unperturbed see-through experience. For non-presbyopes, the powers of the lenses are perfectly offset. For presbyopes, the lens powers may be mismatched at times to provide simultaneous views of both virtual-world and real-world objects on the display in sharp focus. Responsively an eye tracker indicating that the user is engaged in close viewing, optical power is added to the real-world-side lens to push close real-world objects optically farther away and into sharp focus for the presbyopic user.

Claims

What is claimed:

1.A mixed-reality display system that is utilizable by a presbyopic user, comprising: a see-through optical combiner through which real-world objects are viewable by the user, the see-through optical combiner being adapted to display virtual-world images that are superimposed over the real-world objects over an eyebox of the display system, the see-through optical combiner having an eye-side and a real-world side; a first variable-focus lens disposed on the eye-side of the see-through optical combiner; a second variable-focus lens disposed on the real-world side of the see-through optical combiner; and an optical power controller operatively coupled to the first and second variable-focus lenses, in which the optical power controller controls a baseline configuration for each of the first and second variable-focus lenses, wherein the optical power controller is adapted to add positive optical power to the baseline configuration of the second variable-focus lens responsive to the presbyopic user accommodating to the predetermined distance or less than the predetermined distance.

2.The mixed-reality display system of claim 1 in which the baseline configuration for the first variable-focus lens provides negative optical power over the eyebox to display the virtual-world images in a focal plane at a predetermined distance from the user, and the baseline configuration of the second variable-focus lens provides positive optical power to offset the negative power of the first variable-focus lens.

3.The mixed-reality display system of claim 2 in which the baseline configuration for the first variable-focus lens comprises negative optical power having of a range between -0.20 and -3.0 diopters.

4.The mixed-reality display system of claim 2 in which the baseline configuration for the second variable-focus lens includes optical power comprising a positive conjugate of the negative optical power of the baseline configuration of the first variable-focus lens.

5.The mixed-reality display system of claim 1 in which each of the variable-focus lenses comprises technologies using one or more of liquid oil push/pull, liquid crystal, reflective MEMS (micro-electromechanical system), MEMS Fresnel structures, geometric phase holograms, meta-surface optical elements, deformable membranes, Alvarez lenses, or multi-order DOEs (diffractive optical elements).

6.The mixed-reality display system of claim 1 as configured for use in a head-mounted display (HMD) device wearable by the presbyopic user.

7.A head-mounted display (HMD) device wearable by a presbyopic user and configured for supporting a mixed-reality experience including viewing, by the presbyopic user, of holographic images from a virtual world that are combined with views of real-world objects in a physical world, comprising: a see-through display system through which the presbyopic user can view the real-world objects and on which the holographic images are displayed within a field of view (FOV) of the see-through display system; a negative lens disposed between the see-through display system and an eye of the presbyopic user, the negative lens acting over the FOV and configured to render the holographic images at a focal plane having a predetermined depth from the presbyopic user; a variable-focus positive lens disposed on an opposite side of the see-through display system from the negative lens, the variable-focus positive lens being controllably configured to cancel effects of the negative lens on the views of the real-world objects responsive to the presbyopic user being engaged in viewing beyond the predetermined depth, and the variable-focus positive lens being controllably configured with increased optical power to optically push real-world objects into sharp focus responsive to the presbyopic user being engaged in viewing within the predetermined depth.

8.The HMD device of claim 7 further comprising an optical power controller operatively coupled to the variable-focus positive lens.

9.The HMD device of claim 8 further comprising an eye tracker operatively coupled to the optical power controller, the eye tracker tracking vergence of the presbyopic user’s eyes or tracking a gaze direction of at least one eye of the presbyopic user, in which the optical power controller controls the variable-focus positive lens responsively to operations of the eye tracker.

10.The HMD device of claim 9 further comprising one or more illumination sources for producing glints for the eye tracker.

11.The HMD device of claim 10 further comprising one or more sensors configured to capture glints from the illumination sources that are reflected from features of an eye of the user for eye tracking.

12.The HMD device of claim 8 in which the negative lens comprises a variable-focus lens that is operatively coupled to the optical power controller.

13.The HMD device of claim 12 in which the optical power controller is configured to control the negative lens to include a corrective lens prescription for an eye of the presbyopic user.

14.The HMD device of claim 13 in which the corrective lens prescription provides correction for myopia.

15.The HMD device of claim 7 in which the see-through display system comprises one or more waveguides that each include an input coupler and an output coupler, in which the input coupler is configured to in-couple one or more optical beams for the holographic images into the waveguide from a virtual image source and the output coupler is configured to out-couple the holographic image beams from the waveguide to an eye of the presbyopic user, in which holographic images associated with the out-coupled beams are rendered within the FOV of the display system.

16.The HMD device of claim 15 in which the input coupler and output coupler each comprise a diffractive optical element (DOE) and in which each of the one or more display system waveguides further comprise an intermediate DOE disposed on a light path between the input coupler and the output coupler, wherein the intermediate DOE provides exit pupil expansion of the display system in a first direction and the output coupler provides exit pupil expansion of the display system in a second direction.

17.The HMD device of claim 7 in which the predetermined depth is within arm’s length of the presbyopic user.

18.A method for operating an electronic device that includes an eye tracker and a mixed-reality see-through optical display system for showing scenes comprising virtual images that are rendered over views of real-world objects, the method comprising: calibrating the electronic device for utilization by a presbyopic user; operating the mixed-reality see-through optical display system to support a near field and a far field, the near field being closer to the presbyopic user relative to the far field, and the mixed-reality see-through optical display system having an eye side and a real-world side; operating a conjugate pair of variable-focus lenses in matched configurations to provide for setting rendered virtual images within the near field without perturbing the views of the real-world objects in the far field; using the eye tracker to determine a depth of the presbyopic user’s gaze in the scene; and responsively to a depth determination by the eye tracker, operating the conjugate pair of variable-focus lenses in mismatched configurations to enable the presbyopic user to simultaneously accommodate rendered virtual images and real-world objects in the near field.

19.The method of claim 18 in which variable-focus lenses in the conjugate pair are located on opposite sides of the mixed-reality see-through optical display system, and in which the matched configurations comprise the conjugate pair of variable-focus lenses providing zero net optical power to the views of the real-world objects, and in which the mismatched configuration comprises optical power being added to the variable-focus lens disposed on the real-world side.

20.The method of claim 18 further comprising adding optical power to the variable-focus lens on the eye side to incorporate a corrective prescription of the presbyopic user for distance vision.

Description

BACKGROUND

Presbyopia is an ocular condition in which one loses the ability to optically focus (or accommodate) one’s eyes to varying distances. While the age of people experiencing the onset of presbyopia (referred to as “presbyopes”) may vary, rapid reduction in accommodation range typically begins around 45 years of age, with virtually 100 percent of people over the age of 55 years being presbyopic. For presbyopes who began with normal vision, their natural accommodation state effectively rests at optical infinity, making it hard to accommodate on near (e.g., < 1 m) objects.

SUMMARY

Variable-focus lenses are arranged as a conjugate lens pair that work on opposite sides of a see-through optical combiner used in a mixed-reality head-mounted display (HMD) device in which virtual images are superimposed over views of real-world objects. An eye-side variable-focus lens is configured as a negative lens over an eyebox of the optical combiner to enable virtual images to be placed at predetermined (i.e., non-infinite) depth from the device user to enhance visual comfort. The negative lens is compensated by its conjugate using a real-world-side variable-focus lens that is configured as a positive lens to provide for an unperturbed see-through experience.

For non-presbyopes (i.e., emmetropes), the powers of the negative and positive lenses are perfectly offset so that no net optical power is provided to the real world viewed through the see-through optical combiner. For a presbyopic HMD device user, the lens powers may be mismatched at times to enable the user to simultaneously view both virtual-world and real-world objects on the display in sharp focus. Responsively to an eye tracker in the HMD device that indicates that the user is engaged in close viewing, optical power is added to the real-world-side variable-focus lens to push close real-world objects optically farther away and into sharp focus for the presbyopic user.

In an illustrative embodiment, the variable-focus lens pair may be configured to work in combination to integrate a user’s corrective lens prescription into the HMD device. Such integration enables the HMD device to be utilized without the need for the user to wear glasses or contact lenses. The HMD device can replicate dual-prescription functionality to correct for both near and far vision impairments of the user by adapting the eye-side variable focus lens in a modified configuration to include the user’s corrective prescription for both close and far use cases. The real-world-side lens may provide additional optical power when the eye tracker indicates that the user is engaged in close viewing to push close real-world objects optically farther away and into sharp focus.

This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter. Furthermore, the claimed subject matter is not limited to implementations that solve any or all disadvantages noted in any part of this disclosure.

DESCRIPTION OF THE DRAWINGS

FIG. 1 shows a pictorial partially cutaway view of an illustrative HMD device that is configured with the present control of variable-focus lenses in a mixed-reality device for presbyopes;

FIG. 2 illustratively shows holographic virtual images that are overlayed onto real-world images within a field of view (FOV) of a mixed-reality head-mounted display (HMD) device;

FIGS. 3A, 3B, and 3C show illustrative partially spherical wavefronts that are respectively associated with a distant object, an object at infinity, and a nearby object;

FIG. 4 shows an illustrative negative lens that provides for a virtual image that is located at a focal point of the lens;

FIG. 5 shows a side view of an illustrative virtual display system that includes a waveguide-based optical combiner providing for rendering of virtual images in a focal plane having predetermined depth that may be used in an HMD device;

FIG. 6 shows a side view of an illustrative virtual display system in which variable-focus lenses are arranged as a conjugate lens pair;

FIG. 7 shows a side view of an illustrative virtual display system in operative relationship with HMD device components including an eye tracking system, optical power controller, and processors;

FIG. 8 is a table that shows illustrative operational configurations for the variable-focus lens pair for different user types and use cases;

FIG. 9 is a flowchart of an illustrative workflow for operating a variable-focus lens pair in an HMD device;

FIG. 10 is a flowchart of an illustrative method for operating an electronic device that includes an eye tracker and a mixed-reality see-through optical display system for showing scenes comprising virtual images that are superimposed over views of real-world objects;

FIG. 11 shows a pictorial front view of an illustrative sealed visor that may be used as a component of an HMD device;

FIG. 12 shows a pictorial rear view of an illustrative sealed visor;

FIG. 13 shows a partially disassembled view of an illustrative sealed visor;

FIG. 14 shows an illustrative arrangement of diffractive optical elements (DOEs) configured for in-coupling, exit pupil expansion in two directions, and out-coupling;

FIG. 15 shows a side view of an illustrative assembly of three waveguides with integrated coupling elements that are stacked to form an optical combiner, in which each waveguide handles a different color in an RGB (red, green, blue) color model;

FIG. 16 is a pictorial view of an illustrative example of a virtual-reality or mixed-reality HMD device that may use the present control of variable-focus lenses in a mixed-reality device for presbyopes;

FIG. 17 shows a block diagram of an illustrative example of a virtual-reality or mixed-reality HMD device that may use the present control of variable-focus lenses in a mixed-reality device for presbyopes; and

FIG. 18 schematically shows an illustrative example of a computing system that can enact one or more of the methods and processes described above for the present control of variable-focus lenses in a mixed-reality device for presbyopes.

Like reference numerals indicate like elements in the drawings. Elements are not drawn to scale unless otherwise indicated.

DETAILED DESCRIPTION

Presbyopes (i.e., persons who have presbyopia) present a unique challenge to mixed-reality HMD devices that do not afford wearing spectacles during use. These devices usually have a fixed focal plane for the digital content they overlay on the real world. Thus, even if a presbyopic user did not require glasses for distance viewing and the virtual reality images were placed at a far distance that appeared sharp, if the user needed to look at a close-by object (e.g., their smartphone), they would not be able to see that object sharply without removing the HMD device and donning reading glasses. Meanwhile, if digital content were meant to be overlaid on nearby real-world objects, the user would not be able to see both the digital and real content in sharp focus at the same time.

While some conventional HMD devices can provide comfortable user experiences for users who wear glasses, such devices typically do not accommodate presbyopes. The disclosed arrangement provides for control of a pair of variable-focus lenses to enable presbyopes to sharply and comfortably view both virtual-world and real-world objects at any distance.

Turning now to the drawings, FIG. 1 shows a pictorial partially cutaway view of an illustrative mixed-reality HMD device 100 that is configured to implement the present control of variable-focus lenses in a mixed-reality device for presbyopes. In this example, the HMD device includes a display device 105 and a frame 110 that wraps around the head of a user 115 to position the display device near the user’s eyes to provide a mixed-reality experience to the user.

Any suitable technology and configuration may be used to display virtual images, which may also be referred to as holograms or holographic images, using the display device 105. For a mixed-reality experience, the display device may be see-through so that the user of the HMD device 100 can view physical, real-world objects in the physical environment over which pixels for virtual objects are overlayed. For example, the display device may include one or more partially transparent waveguides used in conjunction with a virtual image source such as, for example, a microdisplay comprising RGB (red, green, blue) LEDs (light emitting diodes), an organic LED (OLED) array, liquid crystal on silicon (LCoS) device, and/or MEMS device, or any other suitable displays or microdisplays operating in transmission, reflection, or emission. The virtual image source may also include electronics such as processors, optical components such as mirrors and/or lenses, and/or mechanical and other components that enable a virtual display to be composed and provide one or more input optical beams to the display system. Virtual image sources may be referred to as light or display engines in some contexts.

In some implementations, outward facing cameras 120 that are configured to capture images of the surrounding physical environment may be provided. Such captured images may be rendered on the display device 105 along with computer-generated virtual images that augment the captured images of the physical environment.

The frame 110 may further support additional components of the HMD device 100, including a processor 125, an inertial measurement unit (IMU) 130, and an eye tracker 135. In some implementations, the eye tracker can be configured to support one or more of vergence tracking and/or gaze tracking functions. The processor may include logic and associated computer memory configured to receive sensory signals from the IMU and other sensors, to provide display signals to the display device 105, to derive information from collected data, and to enact various control processes described herein.

The display device 105 may be arranged in some implementations as a near-eye display. In a near-eye display, the virtual image source does not actually shine the images on a surface such as a glass lens to create the display for the user. This is not feasible because the human eye cannot focus on something that is that close. Rather than create a visible image on a surface, the near-eye display uses an optical system to form a pupil and the user’s eye acts as the last element in the optical chain and converts the light from the pupil into an image on the eye’s retina as a virtual display. It may be appreciated that the exit pupil is a virtual aperture in an optical system. Only rays which pass through this virtual aperture can exit the system. Thus, the exit pupil describes a minimum diameter of the holographic virtual image light after leaving the display system. The exit pupil defines the eyebox which comprises a spatial range of eye positions of the user in which the holographic virtual images projected by the display device are visible.

FIG. 2 shows the HMD device 100 worn by a user 115 as configured for mixed-reality experiences in which the display device 105 is configured as a near-eye display system having at least a partially transparent, see-through waveguide, among various other components, and may be further adapted to utilize variable-focus lenses in accordance with the principles discussed herein. As noted above, a virtual image source (not shown) generates holographic virtual images that are guided by the waveguide in the display device to the user. Being see-through, the waveguide in the display device enables the user to perceive light from the real world.

The see-through waveguide-based display device 105 can render holographic images of various virtual objects that are superimposed over the real-world images that are collectively viewed using the see-through waveguide display to thereby create a mixed-reality environment 200 within the HMD device’s FOV (field of view) 220. It is noted that the FOV of the real world and the FOV of the holographic images in the virtual world are not necessarily identical, as the virtual FOV provided by the display device is typically a subset of the real FOV. FOV is typically described as an angular parameter in horizontal, vertical, or diagonal dimensions.

It is noted that FOV is just one of many parameters that are typically considered and balanced by HMD device designers to meet the requirements of a particular implementation. For example, such parameters may include eyebox size, brightness, transparency and duty time, contrast, resolution, color fidelity, depth perception, size, weight, form factor, and user comfort (i.e., wearable, visual, and social), among others.

In the illustrative example shown in FIG. 2, the user 115 is physically walking in a real-world urban area that includes city streets with various buildings, stores, etc., with a countryside in the distance. The FOV of the cityscape viewed on HMD device 100 changes as the user moves through the real-world environment and the device can render static and/or dynamic virtual images over the real-world view. In this illustrative example, the holographic virtual images include a tag 225 that identifies a restaurant business and directions 230 to a place of interest in the city. The mixed-reality environment 200 seen visually on the waveguide-based display device may also be supplemented by audio and/or tactile/haptic sensations produced by the HMD device in some implementations.

Virtual images and digital content can be located in various positions within the FOV along all three axes of the coordinate system 235. The immersiveness of the content in three dimensions may be enhanced as the reach of the display along the “z” axis extends from the near field focus plane (i.e., generally within arm’s length of the HMD device user) to the far field focus plane (i.e., generally beyond arm’s reach) to facilitate arm’s length virtual display interactions. Many mixed-reality HMD device experiences will employ a mix of near-field and far-field visual components. The boundary between near and far fields is not necessarily strictly defined and can vary by implementation. For example, distances beyond 2 m may be considered as a part of the far field in some mixed-reality HMD device scenarios.

During natural viewing, the human visual system relies on multiple sources of information, or “cues,” to interpret three-dimensional shapes and the relative positions of objects. Some cues rely only on a single eye (monocular cues), including linear perspective, familiar size, occlusion, depth-of-field blur, and accommodation. Other cues rely on both eyes (binocular cues), and include vergence (essentially the relative rotations of the eyes required to look at an object) and binocular disparity (the pattern of differences between the projections of the scene on the back of the two eyes).

To view objects clearly, humans must accommodate, or adjust their eyes’ focus, to the distance of the object. At the same time, the rotation of both eyes must converge to the object’s distance to avoid seeing double images. In natural viewing, vergence and accommodation are linked. When viewing something near (e.g., a housefly close to the nose) the eyes cross and accommodate to a near point. Conversely, when viewing something at optical infinity, the eyes’ lines of sight become parallel, and the eyes’ lenses accommodate to infinity.

In typical HMD devices, users will always accommodate to the focal distance of the display (to get a sharp image) but converge to the distance of the object of interest (to get a single image). When users accommodate and converge to different distances, the natural link between the two cues must be broken and this can lead to visual discomfort or fatigue due to such vergence-accommodation conflict (VAC). Accordingly, to maximize the quality of the user experience and comfort with the HMD device 100, virtual images may be rendered in a plane to appear at a constant distance from the user’s eyes. For example, virtual images, including the images 225 and 230, can be set at a fixed depth (e.g., 2 m) from the user 115. Thus, the user will always accommodate near 2 m to maintain a clear image in the HMD device. It may be appreciated that 2 m is an illustrative distance and is intended to be non-limiting. Other distances may be utilized, and virtual images may typically be optimally placed at distances between 1.5 and 5 m from the HMD device user for many applications of a mixed-reality HMD device while ensuring user comfort, however in some applications and use cases, virtual images can be rendered more closely to the user.

In the real world as shown in FIG. 3A, light rays 305 from distant objects 310 reaching an eye of a user 115 are almost parallel. Real-world objects at optical infinity (roughly around 6 m and farther for normal vision) have light rays 320 that are exactly parallel when reaching the eye, as shown in FIG. 3B. Light rays 325 from a nearby real-world object 330 reach the eye with different, more divergent angles, as shown in FIG. 3C, compared to those for more distant objects.

Various approaches may be utilized to render virtual images with the suitable divergent angles to thereby appear at the targeted depth of focus. For example, FIG. 4 shows that a negative (i.e., concave) lens 405 can diverge the collimated/parallel rays 450 that are received from a conventional output coupler element (not shown) in an HMD device to produce a holographic virtual image having a location that is apparent to the user at a focal point, F (as indicated by reference numeral 415), that is determined by the focal length of the lens. For example, in various mixed-reality HMD device scenarios, focal lengths can range between -0.2 to -3.0 diopters (i.e., 33 cm to 5 m) to position virtual objects from the boundary of the far field (near infinity) to slightly more than one foot away. As shown, the rays from the negative lens arriving at the user’s eye 115 are non-parallel and divergent and converge using the eye’s internal lens to form the image on the retina, as indicated by reference numeral 420.

FIG. 5 shows a simplified side view of an illustrative virtual display system 500 that is incorporated into the display device 105 (FIG. 1) and which may be used in the HMD device 100 to render virtual images. The virtual display system may function as an optical combiner by superimposing the rendered virtual images over the user’s view of light from real-world objects to thus form a mixed-reality display.

It is noted that the side view of FIG. 5 shows virtual display components for a single eye of the user 115. However, it may be appreciated that the components can be extended such that separate displays are provided for each eye of the user in binocular implementations. Such arrangement may facilitate, for example, stereoscopic rendering of virtual images in the FOV of the HMD device and enable prescription lens integration, as discussed below, on a per-eye basis.

The display system includes at least one partially transparent (i.e., see-through) waveguide 510 that is configured to propagate visible light. While a single waveguide is shown in FIG. 5 for the sake of clarity in exposition of the present principles, it will be appreciated that a plurality of waveguides may be utilized in some applications. For example, a stack of two or three waveguides can support a red, green, blue (RGB) color model that is utilized for rendering full color virtual images in some cases.

The waveguide 510 facilitates light transmission between the virtual image source and the eye. One or more waveguides can be utilized in the near-eye display system because they are transparent and because they are generally small and lightweight. This is desirable in applications such as HMD devices where size and weight are generally sought to be minimized for reasons of performance and user comfort. Use of the waveguide 510 can enable the virtual image source to be located out of the way, for example on the side of the user’s head or near the forehead, leaving only a relatively small, light, and transparent waveguide optical element in front of the eyes.

In an illustrative implementation, the waveguide 510 operates using a principle of total internal reflection (TIR) so that light can be coupled among the various optical elements in the HMD device 100. TIR is a phenomenon which occurs when a propagating light wave strikes a medium boundary (e.g., as provided by the optical substrate of a waveguide) at an angle larger than the critical angle with respect to the normal to the surface. In other words, the critical angle (θc) is the angle of incidence above which TIR occurs, which is given by Snell’s Law, as is known in the art. More specifically, Snell’s law states that the critical angle (θc) is specified using the following equation:

θc=sin1n2/n1

where θc is the critical angle for two optical mediums (e.g., the waveguide substrate and air or some other medium that is adjacent to the substrate) that meet at a medium boundary, n1 is the index of refraction of the optical medium in which light is traveling towards the medium boundary (e.g., the waveguide substrate, once the light is coupled therein), and n2 is the index of refraction of the optical medium beyond the medium boundary (e.g., air or some other medium adjacent to the waveguide substrate).

The user 115 can look through the waveguide 510 to see real-world objects on the real-world side of the display device 105 (the real-world side is indicated by reference numeral 512 in FIG. 5). For the virtual part of the FOV of the display system, virtual image light 515 is provided by a virtual image source 520 (e.g., a microdisplay or light engine, etc.). The virtual image light is in-coupled to the waveguide by an input coupler 525 and propagated through the waveguide in total internal reflection. The image light is out-coupled from the waveguide by an output coupler 530. The combination of see-through waveguide and coupling elements may be referred to as a mixed-reality optical combiner 535 because it functions to combine real-world and virtual-world images into a single display.

Typically, in such waveguide-based optical combiners, the input pupil needs to be formed over a collimated field, otherwise each waveguide exit pupil will produce an image at a slightly different distance. This results in a mixed visual experience in which images are overlapping with different focal depths in an optical phenomenon known as focus spread.

In some embodiments, the input coupler 525 and output coupler530 may be configured as diffractive optical elements (DOEs). DOEs may comprise, for example, surface relief grating (SRG) structures and volumetric holographic grating (VHG) structures. An intermediate DOE (not shown) may also be disposed in the light path between the input coupler and output coupler in some cases. The intermediate DOE may be configured to provide exit pupil expansion in one direction (e.g., horizontal) while the output coupler may be configured to provide exit pupil expansion in a second direction (e.g., vertical).

In alternative embodiments, the optical combiner functionality provided by the waveguide and DOEs may be implemented using a reflective waveguide combiner. For example, partially reflective surfaces may be embedded in a waveguide and/or stacked in a geometric array to implement an optical combiner that uses partial field propagation. The reflectors can be half-tone, dielectric, holographic, polarized thin layer, or be fractured into a Fresnel element.

In other embodiments, the principles of the present control of variable-focus lenses in a mixed-reality device for presbyopes may be implemented using a reflective waveguide combiner with any suitable in-coupling and/or out-coupling methods. A reflective waveguide combiner may utilize a single waveguide in some implementations for all colors in the virtual images which may be desirable in some applications. By comparison, diffractive combiners typically require multiple waveguides to meet a target FOV in polychromatic applications due to limitations on angular range that are dictated by the waveguide TIR condition.

The present control of variable-focus lenses in a mixed-reality device for presbyopes may also be utilized with various other waveguide/coupling configurations beyond reflective and diffractive. For example, it may be appreciated that the principles of the present invention may be alternatively applied to waveguides that are refractive, polarized, hybrid diffractive/refractive, phase multiplexed holographic, and/or achromatic metasurfaces.

A variable-focus lens 540 configured to function as a negative lens is located on the eye side of the waveguide 510 (the eye side is indicated by reference numeral 514 in FIG. 5). The negative lens acts over the entire extent of the eyebox associated with the user’s eye to thereby create the diverging rays 545 from the collimated rays 550 that exit the output coupler 530. When the virtual image source 520 is operated to project virtual images that are in-coupled into the waveguide 510, the output diverging rays present the virtual images at a predetermined focal depth, d, from the display system at an apparent or virtual point of focus, F. For example, if the negative lens is configured with -0.5 diopters of optical power, then d is equal to 2 m.

To ensure that the user’s view of the real world remains unperturbed by the negative lens, a variable-focus lens 605 is configured to function as a conjugate positive (i.e., convex) lens, as shown in FIG. 6. This variable-focus lens is located on the real-world side of the waveguide 510 to compensate for the impact of the negative lens on the eye side. The conjugate pair of positive and negative lenses may be referred to as a push-pull lens pair in some contexts. For example, if the eye side variable-focus lens is controlled to provide -0.5 diopters of optical power, then the real-world side lens is controlled to provide an opposite +0.5 diopters of optical power to cancel out the effect of the negative lens. Accordingly, light 610 reflected from a real-world object 615 reaches the user with no net optical power being applied by the combined operations of the pair of variable-focus lenses. In this example, the object is in the distance so the parallel rays of real-world light incident on the display system 500 remain parallel when viewed by the user 115.

The eye-side variable-focus lens 540 and real-world-side variable-focus lens 605 may be implemented using various known technologies. Variable-focus lenses may also be referred to as “tunable” lenses. Exemplary technologies include liquid oil push/pull, liquid crystal, reflective MEMS (micro-electromechanical system), MEMS Fresnel structures, geometric phase holograms, meta-surface optical elements, deformable membranes, Alvarez lenses, multi-order DOEs, combinations thereof, and the like. The lenses may be implemented using single optical elements in some applications, or as arrays in other applications.

FIG. 7 is a side view the virtual display system 500 in operative relationship with HMD device components including an eye tracker 705, optical power controller 710, and one or more processors 715. The components and the variable-focus lenses 540 and 605 are operatively coupled by one or more buses as representatively indicated by reference numeral 720. The components may be disposed in a frame (not shown) or other suitable structure of the HMD device 100 or the exemplary HMD device 1600 shown in FIGS. 16 and 17 and described in the accompanying text.

The eye tracker 705 is operatively coupled to one or more illumination sources 725 and one or more sensors 730. For example, the illumination sources may comprise IR (infrared) LEDs that are located around the periphery of the display system 500 (FIG. 5) and/or optical combiner 535 and/or be disposed in some other suitable HMD device component such as a frame. The eye tracker illumination sources can function as glint sources and/or provide general or structured illumination of the user’s eye features. The eye tracker sensors may comprise inward-facing cameras that have sensitivity, for example, to IR light. Image-based and/or feature-based eye tracking, or other suitable eye-tracking techniques may be utilized to meet requirements of an implementation of the present control of variable-focus lenses in a mixed-reality device for presbyopes.

In an illustrative example, the IR light from the illumination sources 725 cause highly visible reflections, and the eye tracker sensors 730 capture an image of the eye showing these reflections. The images captured by the sensors are used to identify the reflection of the light source on the cornea (i.e., “glints”) and in the pupil. Typically, a vector formed by the angle between the cornea and pupil reflections may be calculated using real-time image analysis, and the vector direction combined with other geometrical features of the reflections is then used to determine where the user is looking – the gaze point – and calculate eye movement, location, and orientation.

During operation of the HMD device 100, the optical power controller 710 controllably varies the optical power of the eye-side variable-focus lens 540 and real-world-side variable focus lens 605. Different amounts of optical power may be utilized at the eye-side variable-focus lens when configured as a negative lens to provide for focal planes that are located at different fixed or variable distances to suit requirements of a particular application. The power of the negative lens does not affect the zeroth diffraction order that travels in TIR down the waveguide 510 (i.e., from top to bottom in the drawings), but instead only the diffracted out-coupled field. In addition, the see-through field is not affected by the negative lens because whatever portion of the see-through field that is diffracted by the output coupler 530 is trapped by TIR in the waveguide and is therefore not transmitted to the user’s eye.

A static lens 735 may be optionally utilized in some implementations of the HMD device 100. For example, the static lens may be implemented as an optical insert to a portion of the HMD device such as a sealed visor shown in FIGS. 1113 and described in the accompanying text. In some HMD devices having size and space limits due to eyebox and/or form factor considerations, it may not be comfortable or possible for users to wear prescription glasses. The static lens can be provided to correct impairments in the vision of the user 115 and may comprise, for example, the user’s corrective lens prescription for glasses or contact lenses. The static lens may be used in combination with a modified configuration for the eye-side variable-focus lens discussed below in some scenarios. Worldwide, visual impairments due to refractive errors are distributed among people with myopia, hyperopia, and presbyopia. Corrections for most of the population fall between -6.0 and +4.0 diopters.

FIG. 8 provides an illustrative table 800 that shows illustrative operational configurations for the variable-focus lens pair for different user types and use cases. It may be noted that FIG. 8 refers to the elements shown in FIG. 7. Table 800 shows how the optical power controller 710 may controllably vary the optical power of eye-side variable-focus lens 540 and real-world-side variable-focus lens 605 for different types of users and HMD device use cases. Two different types of presbyopic users are shown in the first column 805 of the table. User 1 is able to see far away real-world objects clearly without glasses. User 1 may have always had clear (i.e., emmetropic) vision but developed presbyopia with age. User 1 may currently use reading glasses to see close real-world objects and read text. For example, prescriptions for reading glasses typically increase by 0.25 diopters, such as +1.00, +1.25, +1.50, and so on.

User 2 may have developed myopia as a child and is unable to see far away real-world objects clearly without corrective lenses such as glasses or contacts. To deal with presbyopia, user 2 may currently use bifocals or progressive lenses, or wear contact lenses for distance vision and don reading glasses for close accommodation. Monovision is another solution in which different accommodative distances are provided each eye of the user via contact lenses or surgical methods, for example. User 2 may also remove or lift their glasses to focus on near objects in some situations.

The second column 810 in table 800 shows two use cases for a presbyopic user of an HMD device, including far viewing and close viewing. It is noted that the terms “close” and “far” are relative to each other and that specific distances associated with each term can vary by context and application of the present principles. Close regions of interest are generally within an arm’s length of the user, for example < 1 m and within the near field of an HMD device. The far field for the device may generally start around 2 m and a user’s eye generally accommodates to optical infinity around distances of 6 m.

As noted above, virtual images may typically be displayed at fixed focal plane depths of around 1.25 to 2.5 m in mixed-reality HMD and other immersive devices to reduce user discomfort due to VAC. Accordingly, in typical implementations, an objective of the optical power controller 710 is to enable presbyopes to simultaneously view both close virtual and real-world objects in sharp focus through the HMD device.

The third column 815 in table 800 shows the operations of the eye-side variable-focus lens 540 responsive to the optical power controller 710 for each use case and for each user type. The fourth column 820 shows the operations of the real-world-side variable-focus lens 605 responsive to the optical power controller for each use case and for each user type.

For user 1 during far viewing, the eye-side variable-focus lens 540 is operated in its baseline configuration to support the rendering of virtual images at some predetermined mixed-reality focal plane depth. For example, in an illustrative and non-limiting embodiment, the optical power controller 710 can set the optical power of the eye-side variable-focus lens at -0.5 diopters to fix the mixed-reality focal plane at 2 m. In alternative embodiments, focus tuning for the virtual images at some non-infinite distance may be implemented in the optical display system before light for the virtual images is out-coupled from the waveguide to the user’s eye. In such alternative embodiments, it may be appreciated that the out-coupled light is not necessarily collimated, and thus the optical power of the eye-side variable-focus lens may be set by the optical power controller to zero or some other suitable value for its baseline configuration. For example, with an optical combiner employing a reflective waveguide having no exit pupil replication, focus tuning may be performed at the virtual image source, at a tunable display engine, or using some other suitable technique. In another alternative embodiment, focus tuning of the virtual images may be performed by the output-coupler.

The optical power controller 710 may operate the real-world-side variable-focus lens 605 in its baseline configuration in which the optical power provided by the eye-side lens is canceled out for real-world image light 610 entering the see-through HMD display system. Here, for example, the baseline configuration for the real-world-side lens may be +0.5 diopters so that the net optical power applied by the lens pair to light from real-world objects equals zero.

For close viewing by user 1, the optical power controller 710 also configures the eye-side variable-focus lens 540 to support the rendering of virtual images at a predetermined mixed-reality focal plane depth, for example 2 m. In addition, the optical power controller 710 adds optical power to the real-world-side variable-focus lens 605 to push close real-world objects optically farther away and into sharp focus for user 1. The amount of added optical power can vary according to one or more of degree of presbyopia experienced by user 1, the amount of ambient light, or other factors. For example, the added optical power could be +1.5 diopters for moderate presbyopia correction.

For far viewing by user 2, the real-world-side variable-focus lens 605 is operated by the optical power controller 710 in its baseline configuration to counteract operations of the eye-side variable-focus lens 540 in its respective baseline configuration. In this illustrative example, the baseline configuration of the real-world-side lens is +0.5 diopters, and the baseline configuration of the eye-side lens is -0.5 diopters, as discussed above.

To enable user 2 to utilize the HMD device 100 without needing to wear glasses or contacts, the optical power controller 710 may operate the eye-side variable-focus lens 540 in a modified configuration. The modified configuration includes incorporating the prescription of the user’s corrective lenses into the baseline configuration of the eye-side variable-focus lens. For example, if user 2 has mild myopia with a corrective lens prescription of-1.5 diopters, then the optical power controller 710 can control the eye-side variable-focus lens to provide -2.0 diopters of optical power, in which -1.5 diopters provides for a corrective lens prescription for user 2 and -0.5 diopters provides counteraction for the +0.5 of optical power provided by the real-world-side variable-focus lens.

It may be appreciated that in alternative configurations, various combinations of optical powers can be utilized to meet particular implementation requirements. For example, in the above scenario in which user 2 has a corrective lens prescription of-1.5 diopters, in the far viewing use case, the eye-side variable-focus lens 540 could be controlled to provide -1.5 diopters of optical power and the real-world-side variable-focus lens could be controlled to provide zero optical power. A given lens-pair configuration can depend, for example, on physical characteristics of the HMD device and variable-focus lenses such as switching speed/refresh rate, range of optical powers supported, display FOV, virtual image rendering plane depth, etc., as well as application factors such as motion blur, virtual scene composition, etc.

During close viewing by user 2, the optical power controller 710 can control the eye-side variable-focus lens 540 to provide -2.0 diopters of optical power, as discussed above, to enable the user to simultaneously see both close virtual-world and real-world objects in sharp focus. In addition, the optical power controller adds optical power to the real-world-side variable-focus lens 605 to push close-by real-world objects optically farther away and into sharp focus. The amount of added optical power can vary according to one or more of degree of presbyopia experienced by the user, level of ambient light, or other factors.

FIG. 9 is a flowchart of an illustrative method 900 for operating the HMD device 100 that includes an optical display system 500 and an eye tracker 705. Unless specifically stated, the methods or steps shown in the flowchart and described in the accompanying text are not constrained to a particular order or sequence. In addition, some of the methods or steps thereof can occur or be performed concurrently and not all the methods or steps have to be performed in a given implementation depending on the requirements of such implementation and some methods or steps may be optionally utilized. FIG. 9 makes references to the elements shown in FIG. 7.

At block 905, the user 115 dons the HMD device 100. Typically, the user will have already undertaken an initialization, personalization, calibration, or other suitable processes or procedures to enhance user comfort and/or enable, for example, various systems and subsystems to perform accurate tracking of eyes, hands, head, and/or other body parts, or provide for virtual image display alignment (e.g., if the HMD device shifts on the user’s head). Such processes may be utilized to determine a suitable amount of presbyopia correction to be implemented for the user and identify the user type (e.g., user type 1 or 2 from table 800 shown in FIG. 8). Integration of the user’s vision prescription into the HMD device can also be supported by such initialization/personalization/calibration processes to improve visual comfort and enhance mitigation effects for VAC.

At block 910, the optical power controller 710 can control the optical power of the eye-side variable-focus lens 540 depending on user type. The eye-side lens may be operated in its modified configuration responsively to the user being a type 2 user. In the modified configuration, as discussed above when referring to the description accompanying FIG. 8, appropriate optical power for the user’s corrective prescription, for example -1.5 diopters, may be added to the baseline configuration. Otherwise, for a type 1 user, the eye-side lens operates just in its baseline configuration to provide for the rendering of virtual images at the fixed focal plane depth (e.g., 2 m).

At block 915, the HMD device 100 including the eye-side variable-focus lens 540 is operated to render one or more virtual images at the predetermined focal plane depth (e.g., 2 m), as appropriate for a given HMD device user experience. At block 920, the location of the user’s gaze in the FOV of the display system is determined. The location includes depth along the z axis of the display and may be determined, for example, using vergence tracking of the user’s eyes, using a projection of a gaze vector and its intersection with a rendered scene.

At decision block 925, if the determination is made from the eye tracking that the user is looking at far objects, then at block 930, the optical power controller 710 operates the real-world-side variable-focus lens 605 in its baseline configuration. The baseline configuration of the real-world-side lens provides opposite optical power to that of the eye-side variable-focus lens to cancel out the impact of that lens’s baseline configuration. For a type 1 user, this means that no net optical power is provided to real-world image light by the variable-lens pair.

For the type 2 user, optical power is provided by the real-world-side lens 605 to offset only the baseline optical power provided by eye-side lens 540 without impacting the added optical power for the user’s prescription (e.g., -1.5 diopters) for the modified configuration of the eye-side lens. For example, for a 2 m virtual image focal plane, the eye-side lens is controlled to provide -0.5 diopters of optical power; therefore, the real-world-side lens is controlled to provide +0.5 diopters of optical power as its baseline. This offset enables the eye-side lens to provide the prescribed correction for the type 2 user’s distance vision.

If a determination is made at decision block 925 that the user is engaged in close viewing, then at block 935 the optical power controller 710 controls the real-world-side variable-focus lens 605 to add optical power to push close real-world objects optically farther away and into sharp focus for the user. For example, +1.5 diopters for mild presbyopia correction could be added to the +0.5 diopters of baseline optical power of the real-world-side lens.

FIG. 10 is a flowchart of an illustrative method 1000 for operating an electronic device that includes an eye tracker and a mixed-reality see-through optical display system for showing scenes comprising virtual images that are rendered over views of real-world objects. At block 1005, the electronic device is calibrated for utilization by a presbyopic user. Such calibration may include, for example, initially setting up the electronic device such as an HMD device for a particular presbyopic user, personalizing the device to the user such as providing for a corrective prescription, and/or performing suitable procedures to ensure that various systems and subsystems in the device can accurately perform their functions.

At block 1010, the mixed-reality see-through optical display system is operated to support a near field and a far field, in which the near field is closer to the presbyopic user relative to the far field, and in which the mixed-reality see-through optical display system has an eye side and a real-world side. As noted above, 2 m may be considered a threshold between near and far fields, although other threshold distances may be utilized depending on application requirements. At block 1015, a conjugate pair of variable-focus lenses are operated in matched configurations to provide for setting rendered virtual images within the near field without perturbing the views of the real-world objects in the far field. For example, matching configurations include the variable-focus lenses operating to cancel the effects of their respective optical powers.

At block 1020, the eye tracker is used to determine a depth of the presbyopic user’s gaze in the scene. At block 1025, responsively to a depth determination by the eye tracker, the conjugate pair of variable-focus lenses are operated in mismatched configurations to enable the presbyopic user to simultaneously accommodate rendered virtual images and real-world objects in the near field. For example, the mismatch can provide for additional optical powering being added to the real-world-side variable focus lens to thereby enable real-world objects in the near field to be pushed out optically and into sharp focus by the presbyopic user.

FIGS. 11 and 12 show respective front and rear views of an illustrative example of a visor 1100 that incorporates an internal near-eye display device 105 (FIG. 1) that is used in the HMD device 100 as worn by a user 115. The visor, in some implementations, may be sealed to protect the internal display device. The visor typically interfaces with other components of the HMD device such as head-mounting/retention systems and other subsystems including sensors, power management, controllers, etc., as illustratively described in conjunction with FIGS. 16 and 17. Suitable interface elements (not shown) including snaps, bosses, screws and other fasteners, etc. may also be incorporated into the visor.

The visor 1100 may include see-through front and rear shields, 1105 and 1110 respectively, that can be molded using transparent or partially transparent materials to facilitate unobstructed vision to the display device and the surrounding real-world environment. Treatments may be applied to the front and rear shields such as tinting, mirroring, anti-reflective, anti-fog, and other coatings, and various colors and finishes may also be utilized. The front and rear shields are affixed to a chassis 1305 shown in the disassembled view in FIG. 13.

The sealed visor 1100 can physically protect sensitive internal components, including a display device 105, when the HMD device is operated and during normal handling for cleaning and the like. The display device in this illustrative example includes left and right waveguides 1310L and 1310R that respectively provide holographic virtual images to the user’s left and right eyes for mixed- and/or virtual-reality applications. The visor can also protect the display device from environmental elements and damage should the HMD device be dropped or bumped, impacted, etc.

As shown in FIG. 12, the rear shield 1110 is configured in an ergonomically suitable form 1205 to interface with the user’s nose, and nose pads and/or other comfort features can be included (e.g., molded-in and/or added-on as discrete components). In some applications, the sealed visor 1110 can also incorporate some level of optical diopter curvature (i.e., eye prescription) within the molded shields in some cases, as discussed above. The sealed visor 1100 can also be configured to incorporate the conjugate lens pair – the negative lens 540 and positive lens 605 (FIG. 6) on either side of display device 105.

FIG. 14 shows an illustrative waveguide display 1400 having multiple DOEs that may be used as an embodiment of the see-through waveguide 510 in the display device 105 (FIG. 1) to provide in-coupling, expansion of the exit pupil in two directions, and out-coupling. The waveguide display 1400 may be utilized to provide holographic virtual images from a virtual imager to one of the user’s eyes. Each DOE is an optical element comprising a periodic structure that can modulate various properties of light in a periodic pattern such as the direction of optical axis, optical path length, and the like. The structure can be periodic in one dimension such as one-dimensional (1D) grating and/or be periodic in two dimensions such as two-dimensional (2D) grating.

The waveguide display 1400 includes an in-coupling DOE 1405, an out-coupling DOE 1415, and an intermediate DOE 1410 that couples light between the in-coupling and out-coupling DOEs. The in-coupling DOE is configured to couple image light comprising one or more imaging beams from a virtual image source 520 (FIG. 5) into a waveguide 1430. The intermediate DOE expands the exit pupil in a first direction along a first coordinate axis (e.g., horizontal), and the out-coupling DOE expands the exit pupil in a second direction along a second coordinate axis (e.g., vertical) and couples light out of the waveguide to the user’s eye (i.e., outwards from the plane of the drawing page). The angle ρ is a rotation angle between the periodic lines of the in-coupling DOE and the intermediate DOE as shown. As the light propagates in the intermediate DOE (horizontally from left to right in the drawing), it is also diffracted (in the downward direction) to the out-coupling DOE.

While DOEs are shown in this illustrative example using a single in-coupling DOE disposed to the left of the intermediate DOE 1410, which is located above the out-coupling DOE, in some implementations, the in-coupling DOE may be centrally positioned within the waveguide and one or more intermediate DOEs can be disposed laterally from the in-coupling DOE to enable light to propagate to the left and right while providing for exit pupil expansion along the first direction. It may be appreciated that other numbers and arrangements of DOEs may be utilized to meet the needs of a particular implementation.

As noted above, in implementations using a color model such as RGB, multiple waveguides may be utilized in the display device 105 (FIG. 1). FIG. 15 shows illustrative propagation of light from the virtual image source 520 through an optical combiner 1500 that uses a separate waveguide for each color component in the RGB color model. In alternative implementations, two waveguides may be utilized in which one waveguide can support two color components and the other waveguide may support a single color component.

For a given angular range within the virtual FOV, light for each color component 1505, 1510, and 1515 provided by the virtual image source 520 is in-coupled into respective waveguides 1530, 1535, and 1540 using respective individual input couplers (representatively indicated by element 1520). The light for each color propagates through the respective waveguides in TIR and is out-coupled by respective output couplers (representatively indicated by element 1525) to the user’s eye 115. In some implementations the output may have an expanded pupil relative to the input in the horizontal and vertical directions, for example when using DOEs that provide for pupil expansion, as discussed above.

The input coupler 1520 for each waveguide 1530, 1535, and 1540 is configured to in-couple light within an angular range described by the FOV and within a particular wavelength range into the waveguide. Light outside the wavelength range passes through the waveguide. For example, the blue light 1505 is outside the range of wavelength sensitivity for both of the input couplers in the red waveguide 1540 and green waveguide 1535. The blue light therefore passes through the red and green waveguides to reach the in-coupling DOE in the blue waveguide 1530 where it is in-coupled, propagated in TIR within the waveguide, propagated to the output coupler and out-coupled to the user’s eye 115.

As noted above, the present control of variable-focus lenses in a mixed-reality device for presbyopes may be utilized in mixed- or virtual-reality applications. FIG. 16 shows one particular illustrative example of a mixed-reality HMD device 1600, and FIG. 17 shows a functional block diagram of the device 1600. The HMD device 1600 provides an alternative form factor to the HMD device 100 shown in FIGS. 1, 2, 11, 12, and 13. HMD device 1600 comprises one or more lenses 1602 that form a part of a see-through display subsystem 1604, so that images may be displayed using lenses 1602 (e.g., using projection onto lenses 1602, one or more waveguide systems, such as a near-eye display system, incorporated into the lenses 1602, and/or in any other suitable manner).

HMD device 1600 further comprises one or more outward-facing image sensors 1606 configured to acquire images of a background scene and/or physical environment being viewed by a user and may include one or more microphones 1608 configured to detect sounds, such as voice commands from a user. Outward-facing image sensors 1606 may include one or more depth sensors and/or one or more two-dimensional image sensors. In alternative arrangements, as noted above, a mixed-reality or virtual-reality display system, instead of incorporating a see-through display subsystem, may display mixed-reality or virtual-reality images through a viewfinder mode for an outward-facing image sensor.

The HMD device 1600 may further include a gaze detection subsystem 1610 configured for detecting a direction of gaze of each eye of a user or a direction or location of focus, as described above. Gaze detection subsystem 1610 may be configured to determine gaze directions of each of a user’s eyes in any suitable manner. For example, in the illustrative example shown, a gaze detection subsystem 1610 includes one or more glint sources 1612, such as virtual IR light or visible sources as described above, that are configured to cause a glint of light to reflect from each eyeball of a user, and one or more image sensors 1614, such as inward-facing sensors, that are configured to capture an image of each eyeball of the user. Changes in the glints from the user’s eyeballs and/or a location of a user’s pupil, as determined from image data gathered using the image sensor(s) 1614, may be used to determine a direction of gaze.

In addition, a location at which gaze lines projected from the user’s eyes intersect the external display may be used to determine an object at which the user is gazing (e.g., a displayed virtual object and/or real background object). Gaze detection subsystem 1610 may have any suitable number and arrangement of light sources and image sensors. In some implementations, the gaze detection subsystem 1610 may be omitted.

The HMD device 1600 may also include additional sensors. For example, HMD device 1600 may comprise a global positioning system (GPS) subsystem 1616 to allow a location of the HMD device 1600 to be determined. This may help to identify real-world objects, such as buildings, etc., that may be located in the user’s adjoining physical environment.

The HMD device 1600 may further include one or more motion sensors 1618 (e.g., inertial, multi-axis gyroscopic, or acceleration sensors) to detect movement and position/orientation/pose of a user’s head when the user is wearing the system as part of a mixed-reality or virtual-reality HMD device. Motion data may be used, potentially along with eye-tracking glint data and outward-facing image data, for gaze detection, as well as for image stabilization to help correct for blur in images from the outward-facing image sensor(s) 1606. The use of motion data may allow changes in gaze direction to be tracked even if image data from outward-facing image sensor(s) 1606 cannot be resolved.

In addition, motion sensors 1618, as well as microphone(s) 1608 and gaze detection subsystem 1610, also may be employed as user input devices, such that a user may interact with the HMD device 1600 via gestures of the eye, neck and/or head, as well as via verbal commands in some cases. It may be understood that sensors illustrated in FIGS. 16 and 17 and described in the accompanying text are included for the purpose of example and are not intended to be limiting in any manner, as any other suitable sensors and/or combination of sensors may be utilized to meet the needs of a particular implementation. For example, biometric sensors (e.g., for detecting heart and respiration rates, blood pressure, brain activity, body temperature, etc.) or environmental sensors (e.g., for detecting temperature, humidity, elevation, UV (ultraviolet) light levels, etc.) may be utilized in some implementations.

The HMD device 1600 can further include a controller 1620 such as one or more processors having a logic subsystem 1622 and a data storage subsystem 1624 in communication with the sensors, gaze detection subsystem 1610, display subsystem 1604, and/or other components through a communications subsystem 1626. The communications subsystem 1626 can also facilitate the display system being operated in conjunction with remotely located resources, such as processing, storage, power, data, and services. That is, in some implementations, an HMD device can be operated as part of a system that can distribute resources and capabilities among different components and subsystems.

The storage subsystem 1624 may include instructions stored thereon that are executable by logic subsystem 1622, for example, to receive and interpret inputs from the sensors, to identify location and movements of a user, to identify real objects using surface reconstruction and other techniques, and dim/fade the display based on distance to objects so as to enable the objects to be seen by the user, among other tasks.

The HMD device 1600 is configured with one or more audio transducers 1628 (e.g., speakers, earphones, etc.) so that audio can be utilized as part of a mixed-reality or virtual-reality experience. A power management subsystem 1630 may include one or more batteries 1632 and/or protection circuit modules (PCMs) and an associated charger interface 1634 and/or remote power interface for supplying power to components in the HMD device 1600.

It may be appreciated that the HMD device 1600 is described for the purpose of example, and thus is not meant to be limiting. It may be further understood that the display device may include additional and/or alternative sensors, cameras, microphones, input devices, output devices, etc. than those shown without departing from the scope of the present arrangement. Additionally, the physical configuration of an HMD device and its various sensors and subcomponents may take a variety of different forms without departing from the scope of the present arrangement.

FIG. 18 schematically shows an illustrative example of a computing system that can enact one or more of the methods and processes described above for the present control of variable-focus lenses in a mixed-reality device for presbyopes. Computing system 1800 is shown in simplified form. Computing system 1800 may take the form of one or more personal computers, server computers, tablet computers, home-entertainment computers, network computing devices, gaming devices, mobile computing devices, mobile communication devices (e.g., smartphone), wearable computers, and/or other computing devices.

Computing system 1800 includes a logic processor 1802, volatile memory 1804, and a non-volatile storage device 1806. Computing system 1800 may optionally include a display subsystem 1808, input subsystem 1810, communication subsystem 1812, and/or other components not shown in FIG. 18.

Logic processor 1802 includes one or more physical devices configured to execute instructions. For example, the logic processor may be configured to execute instructions that are part of one or more applications, services, programs, routines, libraries, objects, components, data structures, or other logical constructs. Such instructions may be implemented to perform a task, implement a data type, transform the state of one or more components, achieve a technical effect, or otherwise arrive at a desired result.

The logic processor may include one or more processors configured to execute software instructions. In addition, or alternatively, the logic processor may include one or more hardware or firmware logic processors configured to execute hardware or firmware instructions. Processors of the logic processor may be single-core or multi-core, and the instructions executed thereon may be configured for sequential, parallel, and/or distributed processing. Individual components of the logic processor optionally may be distributed among two or more separate devices, which may be remotely located and/or configured for coordinated processing. Aspects of the logic processor may be virtualized and executed by remotely accessible, networked computing devices configured in a cloud-computing configuration. In such a case, these virtualized aspects may be run on different physical logic processors of various different machines.

Non-volatile storage device 1806 includes one or more physical devices configured to hold instructions executable by the logic processors to implement the methods and processes described herein. When such methods and processes are implemented, the state of non-volatile storage device 1806 may be transformede.g., to hold different data.

Non-volatile storage device 1806 may include physical devices that are removable and/or built-in. Non-volatile storage device 1806 may include optical memory (e.g., CD, DVD, HD-DVD, Blu-Ray Disc, etc.), semiconductor memory (e.g., ROM, EPROM, EEPROM, FLASH memory, etc.), and/or magnetic memory (e.g., hard-disk drive, floppy-disk drive, tape drive, MRAM, etc.), or other mass storage device technology. Non-volatile storage device 1806 may include nonvolatile, dynamic, static, read/write, read-only, sequential-access, location-addressable, file-addressable, and/or content-addressable devices. It will be appreciated that non-volatile storage device 1806 is configured to hold instructions even when power is cut to the non-volatile storage device 1806.

Volatile memory 1804 may include physical devices that include random access memory. Volatile memory 1804 is typically utilized by logic processor 1802 to temporarily store information during processing of software instructions. It will be appreciated that volatile memory 1804 typically does not continue to store instructions when power is cut to the volatile memory 1804.

Aspects of logic processor 1802, volatile memory 1804, and non-volatile storage device 1806 may be integrated together into one or more hardware-logic components. Such hardware-logic components may include field-programmable gate arrays (FPGAs), program- and application-specific integrated circuits (PASIC/ASICs), program- and application-specific standard products (PSSP/ASSPs), system-on-a-chip (SOC), and complex programmable logic devices (CPLDs), for example.

The term “program” may be used to describe an aspect of computing system 1800 typically implemented in software by a processor to perform a particular function using portions of volatile memory, which function involves transformative processing that specially configures the processor to perform the function. Thus, a program may be instantiated via logic processor 1802 executing instructions held by non-volatile storage device 1806, using portions of volatile memory 1804. It will be understood that different programs may be instantiated from the same application, service, code block, object, library, routine, API, function, etc. Likewise, the same program may be instantiated by different applications, services, code blocks, objects, routines, APIs, functions, etc. The term “program” may encompass individual or groups of executable files, data files, libraries, drivers, scripts, database records, etc.

When included, display subsystem 1808 may be used to present a visual representation of data held by non-volatile storage device 1806. This visual representation may take the form of a graphical user interface (GUI). As the herein described methods and processes change the data held by the non-volatile storage device, and thus transform the state of the non-volatile storage device, the state of display subsystem 1808 may likewise be transformed to visually represent changes in the underlying data. Display subsystem 1808 may include one or more display devices utilizing virtually any type of technology; however, one utilizing a MEMS projector to direct laser light may be compatible with the eye-tracking system in a compact manner. Such display devices may be combined with logic processor 1802, volatile memory 1804, and/or non-volatile storage device 1806 in a shared enclosure, or such display devices may be peripheral display devices.

When included, input subsystem 1810 may comprise or interface with one or more user-input devices such as a keyboard, mouse, touch screen, or game controller. In some embodiments, the input subsystem may comprise or interface with selected natural user input (NUI) componentry. Such componentry may be integrated or peripheral, and the transduction and/or processing of input actions may be handled on- or off-board. Example NUI componentry may include a microphone for speech and/or voice recognition; an infrared, color, stereoscopic, and/or depth camera for machine vision and/or gesture recognition; a head tracker, eye tracker, accelerometer, and/or gyroscope for motion detection and/or intent recognition; as well as electric-field sensing componentry for assessing brain activity.

When included, communication subsystem 1812 may be configured to communicatively couple various computing devices described herein with each other, and with other devices. Communication subsystem 1812 may include wired and/or wireless communication devices compatible with one or more different communication protocols. As non-limiting examples, the communication subsystem may be configured for communication via a wireless telephone network, or a wired or wireless local- or wide-area network. In some embodiments, the communication subsystem may allow computing system 1800 to send and/or receive messages to and/or from other devices via a network such as the Internet.

Various exemplary embodiments of the present control of variable-focus lenses in a mixed-reality device for presbyopes are now presented by way of illustration and not as an exhaustive list of all embodiments. An example includes a mixed-reality display system that is utilizable by a presbyopic user, comprising: a see-through optical combiner through which real-world objects are viewable by the user, the see-through optical combiner being adapted to display virtual-world images that are superimposed over the real-world objects over an eyebox of the display system, the see-through optical combiner having an eye-side and a real-world side; a first variable-focus lens disposed on the eye-side of the see-through optical combiner; a second variable-focus lens disposed on the real-world side of the see-through optical combiner; and an optical power controller operatively coupled to the first and second variable-focus lenses, in which the optical power controller controls a baseline configuration for each of the first and second variable-focus lenses, wherein the optical power controller is adapted to add positive optical power to the baseline configuration of the second variable-focus lens responsive to the presbyopic user accommodating to the predetermined distance or less than the predetermined distance.

In another example, the baseline configuration for the first variable-focus lens provides negative optical power over the eyebox to display the virtual-world images in a focal plane at a predetermined distance from the user, and the baseline configuration of the second variable-focus lens provides positive optical power to offset the negative power of the first variable-focus lens. In another example, the baseline configuration for the first variable-focus lens comprises negative optical power having of a range between -0.20 and -3.0 diopters. In another example, the baseline configuration for the second variable-focus lens includes optical power comprising a positive conjugate of the negative optical power of the baseline configuration of the first variable-focus lens. In another example, each of the variable-focus lenses comprises technologies using one or more of liquid oil push/pull, liquid crystal, reflective MEMS (micro-electromechanical system), MEMS Fresnel structures, geometric phase holograms, meta-surface optical elements, deformable membranes, Alvarez lenses, or multi-order DOEs (diffractive optical elements). In another example, the mixed-reality display system is configured for use in a head-mounted display (HMD) device wearable by the presbyopic user.

A further example includes a head-mounted display (HMD) device wearable by a presbyopic user and configured for supporting a mixed-reality experience including viewing, by the presbyopic user, of holographic images from a virtual world that are combined with views of real-world objects in a physical world, comprising: a see-through display system through which the presbyopic user can view the real-world objects and on which the holographic images are displayed within a field of view (FOV) of the see-through display system; a negative lens disposed between the see-through display system and an eye of the presbyopic user, the negative lens acting over the FOV and configured to render the holographic images at a focal plane having a predetermined depth from the presbyopic user; a variable-focus positive lens disposed on an opposite side of the see-through display system from the negative lens, the variable-focus positive lens being controllably configured to cancel effects of the negative lens on the views of the real-world objects responsive to the presbyopic user being engaged in viewing beyond the predetermined depth, and the variable-focus positive lens being controllably configured with increased optical power to optically push real-world objects into sharp focus responsive to the presbyopic user being engaged in viewing within the predetermined depth.

In another example, the HMD device further comprises an optical power controller operatively coupled to the variable-focus positive lens. In another example, the HMD device further comprises an eye tracker operatively coupled to the optical power controller, the eye tracker tracking vergence of the presbyopic user’s eyes or tracking a gaze direction of at least one eye of the presbyopic user, in which the optical power controller controls the variable-focus positive lens responsively to operations of the eye tracker. In another example, the HMD device further comprises one or more illumination sources for producing glints for the eye tracker. In another example, the HMD device further comprises one or more sensors configured to capture glints from the illumination sources that are reflected from features of an eye of the user for eye tracking. In another example, the negative lens comprises a variable-focus lens that is operatively coupled to the optical power controller. In another example, the optical power controller is configured to control the negative lens to include a corrective lens prescription for an eye of the presbyopic user. In another example, the corrective lens prescription provides correction for myopia. In another example, the see-through display system comprises one or more waveguides that each include an input coupler and an output coupler, in which the input coupler is configured to in-couple one or more optical beams for the holographic images into the waveguide from a virtual image source and the output coupler is configured to out-couple the holographic image beams from the waveguide to an eye of the presbyopic user, in which holographic images associated with the out-coupled beams are rendered within the FOV of the display system. In another example, the input coupler and output coupler each comprise a diffractive optical element (DOE) and in which each of the one or more display system waveguides further comprise an intermediate DOE disposed on a light path between the input coupler and the output coupler, wherein the intermediate DOE provides exit pupil expansion of the display system in a first direction and the output coupler provides exit pupil expansion of the display system in a second direction. In another example, the predetermined depth is within arm’s length of the presbyopic user.

A further example includes a method for operating an electronic device that includes an eye tracker and a mixed-reality see-through optical display system for showing scenes comprising virtual images that are rendered over views of real-world objects, the method comprising: calibrating the electronic device for utilization by a presbyopic user; operating the mixed-reality see-through optical display system to support a near field and a far field, the near field being closer to the presbyopic user relative to the far field, and the mixed-reality see-through optical display system having an eye side and a real-world side; operating a conjugate pair of variable-focus lenses in matched configurations to provide for setting rendered virtual images within the near field without perturbing the views of the real-world objects in the far field; using the eye tracker to determine a depth of the presbyopic user’s gaze in the scene; and responsively to a depth determination by the eye tracker, operating the conjugate pair of variable-focus lenses in mismatched configurations to enable the presbyopic user to simultaneously accommodate rendered virtual images and real-world objects in the near field.

In another example, variable-focus lenses in the conjugate pair are located on opposite sides of the mixed-reality see-through optical display system, and in which the matched configurations comprise the conjugate pair of variable-focus lenses providing zero net optical power to the views of the real-world objects, and in which the mismatched configuration comprises optical power being added to the variable-focus lens disposed on the real-world side. In another example, the method further comprises adding optical power to the variable-focus lens on the eye side to incorporate a corrective prescription of the presbyopic user for distance vision.

Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.

文章《Microsoft Patent | Control of variable-focus lenses in a mixed-reality device for presbyopes》首发于Nweon Patent

]]>
Microsoft Patent | Thickness-modulated conformal coatings on optical components https://patent.nweon.com/27375 Thu, 09 Mar 2023 13:27:35 +0000 https://patent.nweon.com/?p=27375 ...

文章《Microsoft Patent | Thickness-modulated conformal coatings on optical components》首发于Nweon Patent

]]>
Patent: Thickness-modulated conformal coatings on optical components

Patent PDF: 加入映维网会员获取

Publication Number: 20230072472

Publication Date: 2023-03-09

Assignee: Microsoft Technology Licensing

Abstract

A near-eye optical display system that may be utilized in mixed reality applications and devices includes a see-through waveguide on which diffractive optical elements (DOEs) are disposed that are configured for in-coupling, exit pupil expansion, and out-coupling. The optical display system includes a conformal coating that is thickness modulated over different areas of the display to enable tuning of the optical parameters such as refractive index and reflectivity to meet various design requirements. The conformal coating may also be utilized to enhance physical characteristics of the optical display system to thereby improve reliability and resist wear and damage from handling and exposure to environmental elements.

Claims

1. 114. (canceled)

15.A method for growing a thickness-modulated conformal coating on an optical component having diffractive optical elements (DOEs) with three-dimensional grating structures that are configured to transmit a field of view (FOV) in a near-eye display system, the method comprising: maintaining an array of coating heads in a thin-film reaction chamber in which the coating head array provides spatially diverse conformal coating coverage over the optical component when transported through the thin-film reaction chamber; configuring elements in the array of coating heads to emit a first precursor in a first zone of the thin-film reaction chamber; configuring elements in the array of coating heads to emit a second precursor in a second zone of the thin-film reaction chamber, wherein the first and second precursors are reactive; alternately transporting the optical component between the first and second zones to expose a surface of the optical component to the precursors multiple times, wherein each exposure results in some of the precursors being adsorbed on the surface whereby a thin film is formed on the surface; and tuning optical parameters of the DOEs in the optical component over different angles of the FOV by controlling growth of the conformal coating with modulated thickness over the optical component.

16.The method of claim 15 in which one or more members of the coating head array are configured to emit at least two different precursors.

17.The method of claim 15 in which one or more members in the coating head array are configured to emit an inert gas.

18.The method of claim 15 in which the thin-film reaction chamber uses one or more of thermal ALD (atomic layer deposition), spatial ALD, PEALD (plasma enhanced ALD), CVD (chemical vapor deposition), or pulsed plasma CVD.

19.The method of claim 15 in which the thin-film reaction chamber has a configuration including one of linear reactor or rotary reactor.

20.The method of claim 15 in which the array of coating heads comprises one of linear array, two-dimensional spatial array, or pixelated array.

21.The method of claim 15 further comprising controlling exposure of selected areas of the surface of the optical component to the first or second precursor.

22.The method of claim 21 in which the controlled exposure comprises one of a masking, preventative, or subtractive process.

23.The method of claim 22 in which the preventative process comprises poisoning and the subtractive process comprises etching.

24.The method of claim 15 in which the optical component comprises an in-coupling DOE and an out-coupling DOE and wherein the thickness-modulated conformal coating coats the optical component from at least a portion of the in-coupling DOE to at least a portion of the out-coupling DOE, wherein the coating is thicker on the in-coupling DOE relative to the coating on the out-coupling DOE.

25.The method of claim 15 in which the thickness-modulated conformal coating comprises one or more of zinc sulfide, zinc oxide, tantalum oxide, hafnium oxide, zirconium oxide, titanium oxide, aluminum oxide, magnesium fluoride, silicon nitride or silicon oxide.

26.A method for manufacturing polymeric optical components each having diffractive optical elements (DOEs) with three-dimensional grating structures that are configured to transmit a field of view (FOV) in a near-eye display system, the DOEs including an in-coupling DOE and an out-coupling DOE, the method comprising: providing a master grating from which the polymeric optical components are replicated; maintaining an array of coating heads in a thin-film reaction chamber in which the coating head array provides spatially diverse conformal coating coverage over the master grating when transported through the thin-film reaction chamber; configuring elements in the array of coating heads to emit a first precursor in a first zone of the thin-film reaction chamber; configuring elements in the array of coating heads to emit a second precursor in a second zone of the thin-film reaction chamber, wherein the first and second precursors are reactive; alternately transporting the master grating between the first and second zones to expose a surface of the optical component to the precursors multiple times, wherein each exposure results in some of the precursors being adsorbed on the surface whereby a thin film is formed on the surface as a conformal coating; and selectively exposing the master grating to the first or second precursors to control a thickness of the conformal coating to adjust dimensions of the replicated polymeric optical components.

27.The method of claim 26 in which the master grating comprises a quartz crystalline structure.

28.The method of claim 26 further comprising subjecting the master grating to a subtractive process.

29.The method of claim 26 in which the adjusting of dimensions of the replicated polymeric optical components enables grating parameters for the DOEs to be tuned over different angles of the FOV.

30.The method of claim 26 further comprising replicating the polymeric optical components from the master grating.

31.A method for manufacturing a polymeric optical component, the method comprising: providing the polymeric optical component with diffractive optical elements (DOEs) having three-dimensional grating structures that are configured to transmit a field of view (FOV) in a near-eye display system, the DOEs including an in-coupling DOE and an out-coupling DOE disposed on the optical component; maintaining an array of coating heads in a thin-film reaction chamber in which the coating head array provides spatially diverse conformal coating coverage over the optical component when transported through the thin-film reaction chamber; configuring elements in the array of coating heads to emit a first precursor in a first zone of the thin-film reaction chamber; configuring elements in the array of coating heads to emit a second precursor in a second zone of the thin-film reaction chamber, wherein the first and second precursors are reactive; alternately transporting the optical component between the first and second zones to expose the DOEs of the optical component to the precursors multiple times, wherein each exposure results in some of the precursors being adsorbed on the DOEs to form a thin film as a conformal coating; and controlling the exposure to modulate a thickness of the conformal coating, wherein the conformal coating thickness decreases with increasing distance from the in-coupling DOE.

32.The method of claim 31 in which the modulation of thickness is smooth without sharp thickness transitions.

33.The method of claim 31 further including forming multiple thickness-modulated conformal coatings on the DOEs as a stack, in which a refractive index of adjacent thickness-modulated conformal coatings in the stack is different.

34.The method of claim 31 further including providing a thickness-modulated conformal coating on the in-coupling DOE that is higher relative to the thickness-modulated conformal coating on the out-coupling DOE.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation of and claims priority to U.S. patent application Ser. No. 16/451,866, filed on Jun. 25, 2019 entitled “Thickness-Modulated Conformal Coatings on Optical Components,” the disclosure of which is incorporated herein by reference in its entirety.

BACKGROUND

Mixed reality computing devices, such as head mounted display (HMD) devices and systems and handheld mobile devices (e.g. smart phones, tablet computers, etc.), may be configured to display information to a user about virtual and/or real objects in a field of view of the user and/or a field of view of a camera of the device. For example, an HMD device may be configured to display, using a see-through display system, virtual environments with real-world objects mixed in, or real-world environments with virtual objects mixed in. Similarly, a mobile device may display such information using a camera viewfinder window.

SUMMARY

A near-eye optical display system that may be utilized in mixed reality applications and devices includes a see-through waveguide on which diffractive optical elements (DOEs) are disposed that are configured for in-coupling, exit pupil expansion, and out-coupling. The optical display system includes a conformal coating that is thickness-modulated over different areas of the display to enable tuning of the optical parameters such as refractive index and reflectivity to meet various design requirements. The conformal coating may comprise layers of different materials in a thickness-modulated thin film stack that may be utilized to enhance physical characteristics of the optical display system to thereby improve reliability and resist wear and damage from handling and exposure to environmental elements.

In various illustrative embodiments, the conformal coating on the optical display system may be thickness-modulated in a single direction or in multiple directions. For example, a relatively thick high refractive index conformal coating may be disposed on the in-coupling DOE and the thickness of the conformal coating over the rest of the display can gradually diminish towards the out-coupling DOE. The smooth transition in thickness advantageously avoids degradation of the MTF (modulation transfer function) in the optical display system that might otherwise occur with sharp transitions. The relatively thinner conformal coating on the out-coupling DOE reduces undesirable reflections in the area of the optical display through which a user looks to see the real-world environment. Such anti-reflection properties of the conformal coating on the grating side of the optical display system can further optimize the see-through characteristics of the system. Light is either transmitted through the display or is coupled out of the display by the out-coupling DOE so that unwanted reflected light in the system is minimized.

The thickness-modulated conformal coating may be implemented using spatial ALD (atomic layer deposition) which may be enhanced with plasma in a technique known as plasma enhanced ALD (PEALD). Other CVD (chemical vapor deposition) processes such as pulsed plasma CVD and traditional thermal ALD may also be utilized in some implementations. In addition to additive processes using ALD or CVD, the parameters of the optical display system can be further refined using subtractive processes such as etching. In applications where the DOEs are replicated with polymeric materials, thickness-modulated conformal coatings may be applied to a hard master grating, and optionally refined using subtractive processes. This advantageously enables multiple types of DOEs with varying characteristics to be replicated by changing the thickness-modulated conformal coating on a single hard master.

This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter. Furthermore, the claimed subject matter is not limited to implementations that solve any or all disadvantages noted in any part of this disclosure.

DESCRIPTION OF THE DRAWINGS

FIG. 1 shows an illustrative mixed reality environment, a portion of which is rendered within the field of view of a user of a head-mounted display (HMD) device;

FIG. 2 shows a block diagram of an illustrative see-through near-eye display system that supports a mixed reality environment;

FIG. 3 shows propagation of light in a waveguide by total internal reflection;

FIG. 4 shows a view of an illustrative exit pupil expander;

FIG. 5 shows a view of the illustrative exit pupil expander in which the exit pupil is expanded along two directions;

FIG. 6 shows a pictorial front view of a sealed visor that may be used as a component of a head mounted display (HMD) device;

FIG. 7 shows a partially disassembled view of the sealed visor;

FIG. 8 shows an illustrative arrangement of three DOEs using surface relief gratings (SRG) configured for in-coupling, exit pupil expansion, and out-coupling;

FIG. 9 shows an illustrative stack of three waveguides with integrated DOEs in which each waveguide handles a different color in an RGB (red, green, blue) color space;

FIG. 10 shows a profile of a portion of an illustrative diffraction grating that has straight grating features;

FIG. 11 shows an asymmetric profile of a portion of an illustrative diffraction grating that has asymmetric or slanted grating features;

FIG. 12 shows an illustrative conformal coating having substantially uniform thickness that is disposed over straight diffraction grating features;

FIGS. 13 and 14 show an illustrative distribution of a thickness-modulated conformal coating on an optical component;

FIG. 15 shows a cutaway side view of an illustrative thickness-modulated conformal coating reactor in which coating heads or nozzles are arranged to operate in a linear manner;

FIG. 16 shows a cutaway top view of an illustrative thickness-modulated conformal coating reactor;

FIG. 17 shows cutaway top and side views of an illustrative thickness-modulated conformal coating reactor that operates in a rotary manner;

FIG. 18 shows an illustrative sequence of processing cycles in a rotary thickness-modulated conformal coating reactor in which a variably operable opening is configured to expose certain areas of the SRG optical component to a precursor or plasma;

FIG. 19 shows an illustrative pixelated plasma or gas source;

FIG. 20 shows an illustrative mechanical mask that is applied to an SRG optical component;

FIG. 21 shows cutaway top and side views of an illustrative rotary thickness-modulated conformal coating reactor in which an SRG optical component is positioned vertically against a wall of the reactor;

FIG. 22 shows an illustrative method that may be used to implement the present thickness-modulated conformal coatings on optical components;

FIG. 23 is a pictorial view of an illustrative example of a virtual reality or mixed reality HMD device;

FIG. 24 shows a block diagram of an illustrative example of a virtual reality or mixed reality HMD device; and

FIG. 25 shows a block diagram of an illustrative electronic device that incorporates a mixed reality display system.

Like reference numerals indicate like elements in the drawings. Elements are not drawn to scale unless otherwise indicated.

DETAILED DESCRIPTION

FIG. 1 shows an illustrative mixed reality environment 100, a portion of which is rendered within the field of view of a user of a head-mounted display (HMD) device 110. A mixed reality environment typically combines real-world elements and computer-generated virtual elements to enable a variety of user experiences. In the illustrative example shown in FIG. 1, a user 105 can employ the HMD device to experience a mixed reality environment 100 that is rendered visually on an optics display and may include audio and/or tactile/haptic sensations in some implementations. In this particular non-limiting example, the HMD device user is physically walking in a real-world urban area that includes city streets with various buildings, stores, etc. The field of view (FOV), represented by the area 112 in FIG. 1, of the cityscape supported on HMD device changes as the user moves through the environment and the device can render virtual images over the real-world view. Here, the virtual images illustratively include a tag 115 that identifies a restaurant business and directions 120 to a place of interest in the city.

FIG. 2 shows a block diagram of an illustrative near-eye display system 200 that can be utilized to support a mixed reality environment and may include an imager 205 that works with an optical system 210 to deliver images as a virtual display to a user’s eye 215. The imager 205 may include, for example, RGB (red, green, blue) light emitting diodes (LEDs), LCOS (liquid crystal on silicon) devices, OLED (organic light emitting diode) arrays, MEMS (micro-electro mechanical system) devices, or any other suitable displays or micro-displays operating in transmission, reflection, or emission. The imager 205 may also include mirrors and other components that enable a virtual display to be composed and provide one or more input optical beams to the optical system. The optical system 210 can typically include pupil expanding optics 220, pupil forming optics 225, and one or more waveguides 230 that are collectively referred to here as an exit pupil expander (EPE) 235.

In a near-eye display system the imager does not actually shine the images on a surface such as a glass lens to create the visual display for the user. This is not feasible because the human eye cannot focus on something that is that close. Rather than create a visible image on a surface, the near-eye optical system 200 uses the pupil forming optics 225 to form a pupil and the eye 215 acts as the last element in the optical chain and converts the light from the pupil into an image on the eye’s retina as a virtual display.

The waveguide 230 facilitates light transmission between the imager and the eye. One or more waveguides can be utilized in the near-eye display system 200 because they are transparent and because they are generally small and lightweight (which are desirable in applications such as HMD devices where size and weight is generally sought to be minimized for reasons of performance and user comfort). For example, the waveguide 230 can enable the imager 205 to be located out of the way, for example, on the side of the head, leaving only a relatively small, light, and transparent waveguide optical element in front of the eyes. In one implementation, the waveguide 230 operates using a principle of total internal reflection, as shown in FIG. 3, so that light can be coupled among the various optical elements in the system 200.

FIG. 4 shows a view of an illustrative exit pupil expander (EPE) 235. EPE 235 receives an input optical beam from the imager 205 through pupil expanding optics 220 to produce one or more output optical beams with expanded exit pupil in one or two directions relative to the exit pupil of the imager (in general, the input may include more than one optical beam which may be produced by separate sources). The expanded exit pupil typically facilitates a virtual display to be sufficiently sized to meet the various design requirements of a given optical system, such as image resolution, field of view, and the like, while enabling the imager and associated components to be relatively light and compact.

The EPE 235 is configured, in this illustrative example, to support binocular operation for both the left and right eyes. Components that may be utilized for stereoscopic operation such as scanning mirrors, lenses, filters, beam splitters, MEMS devices, or the like are not shown in FIG. 4 for sake of clarity in exposition. The EPE 235 utilizes two out-coupling gratings, 410L and 410R that are supported on a waveguide 430 and a central in-coupling grating 440. In alternative embodiments, multiple in-coupling gratings may be utilized. The in-coupling and out-coupling gratings may be configured using multiple DOEs, as described in the illustrative example described below and shown in FIG. 8. While the EPE 235 is depicted as having a planar configuration, other shapes may also be utilized including, for example, curved or partially spherical shapes, in which case, the gratings disposed thereon are non-co-planar.

As shown in FIG. 5, the EPE 235 may be configured to provide an expanded exit pupil in two directions (i.e., along each of a first and second coordinate axis). As shown, the exit pupil is expanded in both the vertical and horizontal directions. It may be understood that the terms “direction,” “horizontal,” and “vertical” are used primarily to establish relative orientations in the illustrative examples shown and described herein for ease of description. These terms may be intuitive for a usage scenario in which the user of the near-eye display device is upright and forward facing, but less intuitive for other usage scenarios. The listed terms are not to be construed to limit the scope of the configurations (and usage scenarios therein) of the present mixed reality display system using optical components with thickness-modulated conformal coatings.

FIG. 6 shows an illustrative example of a visor 600 that incorporates an internal near-eye optical display system that is used in an exemplary HMD device 605 worn by a user 105. The visor 600, in this example, is sealed to protect the internal near-eye optical display system. The visor 600 typically interfaces with other components of the HMD device 605 such as head mounting/retention systems and other subsystems including sensors, power management, controllers, etc., as illustratively described in conjunction with FIGS. 23 and 24. Suitable interface elements (not shown) including snaps, bosses, screws and other fasteners, etc. may also be incorporated into the visor 600.

The visor 600 includes see-through front and rear shields, 610 and 615 respectively, that can be molded using transparent materials to facilitate unobstructed vision to the optical displays and the surrounding real-world environment. Treatments may be applied to the front and rear shields such as tinting, mirroring, anti-reflective, anti-fog, and other coatings, and various colors and finishes may also be utilized. The front and rear shields are affixed to a chassis 705 shown in the disassembled view in FIG. 7.

The sealed visor 600 can physically protect sensitive internal components, including an instance of a near-eye optical display system 710 (shown in FIG. 7), when the HMD device is used in operation and during normal handling for cleaning and the like. The near-eye optical display system 710 includes left and right waveguide displays 720 and 725 that respectively provide virtual world images to the user’s left and right eyes for mixed- and/or virtual-reality applications. The visor 600 can also protect the near-eye optical display system 710 from environmental elements and damage should the HMD device be dropped or bumped, impacted, etc.

As shown in FIG. 7, the rear shield 715 is configured in an ergonomically suitable form to interface with the user’s nose, and nose pads and/or other comfort features can be included (e.g., molded-in and/or added-on as discrete components). The sealed visor 600 can also incorporate some level of optical diopter curvature (i.e., eye prescription) within the molded shields in some cases.

FIG. 8 shows an illustrative arrangement 800 of three DOEs that may be used with, or as a part of, a diffractive waveguide to provide in-coupling, expansion of the exit pupil in two directions, and out-coupling in an EPE. In this particular illustrative example, DOEs are utilized for in-coupling and out-coupling, however in other implementations either or both the in-coupling and out-coupling may be performed using one or more of dichroic mirrors, polarization-selective coatings or materials, or prism structures that operate in refraction or reflection.

Each DOE is an optical element comprising a periodic structure that can modulate various properties of light in a periodic pattern such as the direction of optical axis, optical path length, and the like. The first DOE, DOE 1 (indicated by reference numeral 805), is configured to couple an imaging beam from an imager into the waveguide. The second DOE, DOE 2 (810), expands the exit pupil in a first direction along a first coordinate axis, and the third DOE, DOE 3 (815), expands the exit pupil in a second direction along a second coordinate axis and couples light out of the waveguide 820 (it is noted that the various directions of propagation in FIG. 8 are depicted in an arbitrary manner and that the directions are not necessarily orthogonal). The angle □ is a rotation angle between the periodic lines of DOE 2 and DOE 3 as shown.

DOE 1 thus functions as an in-coupling grating and DOE 3 functions as an out-coupling grating while expanding the pupil in one direction. DOE 2 may be considered as an intermediate grating that functions to couple light between the in-coupling and out-coupling gratings while performing exit pupil expansion in another direction. Using such intermediate grating may eliminate a need for conventional functionalities for exit pupil expansion in an EPE such as collimating lenses. Some near-eye display system applications, such as those using HMD devices for example, can benefit by minimization of weight and bulk. As a result, the DOEs and waveguides used in an EPE may be fabricated using lightweight polymers. Such polymeric components can support design goals for, size, weight, and cost, and generally facilitate manufacturability, particularly in volume production settings. While DOE 2 is shown as a single grating in FIG. 8, multiple intermediate gratings may be used, or DOE 2 may include a plurality of discrete grating areas depending on requirements for a given implementation of thickness-modulated conformal coatings.

FIG. 9 shows an illustrative stack 900 of three waveguides with integrated DOEs in a waveguide display in which each waveguide 905, 910, and 915 handles a different color in the RGB (red, green, blue) color space. The color order within the stack can vary by implementation and other color spaces may also be used. Use of the waveguide stack enables virtual images to be guided to the eye 115 across a full-color spectrum. In alternative implementations, stacks with more or fewer waveguides can be utilized, for example, for monochromatic and reduced-color spectrum applications. A single plate may be used in some applications, while other applications can use other plate counts.

The three-dimensional microstructure forming the DOEs, can be configured to provide particular targeted optical characteristics by manipulating a combination of grating parameters such as grating depth, line asymmetry, and fill ratio. Grating line asymmetry is described in more detail while making reference to FIGS. 10 and 11. FIG. 10 shows a profile of straight (i.e., non-slanted) grating features 1000 (referred to as grating bars, grating lines, or simply “gratings”), that are formed in a substrate 1005. By comparison, FIG. 11 shows grating features 1100 formed in a substrate 1105 that have an asymmetric profile. That is, the gratings may be slanted (i.e., non-orthogonal) relative to a plane of the waveguide. In implementations where the waveguide is non-planar, then the gratings may be slanted relative to a direction of light propagation in the waveguide. Asymmetric grating profiles can also be implemented using blazed gratings, or echelette gratings, in which grooves are formed to create grating features with asymmetric triangular or sawtooth profiles. In FIGS. 10 and 11, the grating period is represented by d, the grating height by h (also referred to as grating “depth”), bar width by c, and the fill factor by f, where f=c/d. The aspect ratio is defined by h/w. The slanted gratings in FIG. 11 may be described by slant angles □1 and □2.

FIG. 12 shows an illustrative coating 1205 having substantially uniform thickness that is disposed over straight diffraction grating structures 1210, The coating is considered conformal as it reaches and covers all of the three-dimensional grating structures including the floors and walls of the trenches. In most thin-film deposition processes, conformal film growth is managed by manipulating exposure and purge times to increase saturation of the grating’s nonplanar surfaces (e.g., the bottoms of the trenches). Accordingly, conformal coatings for very high aspect ratio gratings can become impractical due to lengthy exposure times.

FIGS. 13 and 14 show an illustrative distribution of a thickness-modulated conformal coating on an SRG optical component 710 such as the DOE elements in the near-eye waveguide display used in the HMD device 605 (FIG. 6). In some implementations, the conformal coating thickness may be modulated along a single direction, for example, in either the X or Y direction shown (it is noted that the coordinate system is arbitrary and the axes may be oriented to match particular features of the optical elements in the display such as DOE shape, groove direction, and the like). In other implementations, the conformal coating thickness may be modulated in two directions, for example in both the X and Y directions.

In an exemplary implementation, a relatively thick high refractive index conformal coating is disposed on the in-coupling DOE 805 such as zinc sulfide, zinc oxide, tantalum oxide, hafnium oxide, zirconium oxide, titanium oxide, aluminum oxide, silicon nitride, or mixtures thereof. For example, increasing the refractive index at the in-coupling DOE can increase the field-of-view of the optical display system. The conformal thickness is modulated to gradually diminish towards the out-coupling DOE 815. The smooth transition advantageously avoids degradation of MTF in the SRG optical component 710. Thus, in both the one-dimensional thickness modulation and two-dimensional thickness modulation scenarios, conformal coating thickness decreases with increasing distance from the in-coupling DOE, as shown in graphs 1405 and 1410 in FIG. 14. The relatively thinner conformal coating on the out-coupling DOE 815 reduces undesirable reflections in the area of the optical display through which the user looks to see the real-world environment. In another embodiment, multiple thickness-controlled conformal coatings that are either high or low refractive index could be combined together to form a thickness modulated optical thin film stack. Suitable lower index materials could be for example aluminum oxide, silicon oxide, magnesium fluoride,

The present thickness-modulated conformal coatings may be applied to the SRG optical component 710 using a thermal ALD process. In alternative implementations, the thickness-modulated conformal coating may be applied to a hard master, such as a quartz crystalline structure. Such coating application can thereby be utilized to adjust various physical characteristics to enable multiple different types and configurations of gratings to be cast from a single master. For example, a conformal coating may enable certain dimensions to be adjusted more readily than can be otherwise achieved using subtractive processes such as etching.

FIG. 15 shows a cutaway side view of an illustrative thickness-modulated conformal coating reactor 1500 in which coating heads or nozzles (collectively referred to by reference numeral 1505) are arranged operate in a linear manner at some distance D from the SRG substrate. In a typical spatial ALD process, the reactor heats the components to a desired deposition temperature. Precursors 1510 and 1515 (labeled precursor 1 and precursor 2 in the drawing) and/or co-reactants (not shown) are delivered from manifolds 1518 and injected into the interior reactor volume 1520. An inert gas 1525 may be utilized used to form a diffusion barrier in between each ALD precursor inlet. The inert gas/precursor mixture is then pumped away around the precursor nozzles to prevent the precursors reacting with each other in the gas phase. In this illustrative example, the ALD processes are carried out at atmospheric pressures. In alternative implementations, the reactor volume may be continuously pumped to achieve a certain pressure. Pressures are typically between 0.1 and 750 Torr, but pressures at an mTorr level may also be utilized. Typically, constant pumping of the reactor volume is needed even at atmospheric pressure, utilizing one or more pumping sources 1530, to form an inert gas diffusion barrier between the precursors.

As an alternative to traditional thermal ALD, in some implementations the thickness-modulated conformal coating reactor 1500 may be configured to utilize plasma-enhanced ALD (PEALD). For example, mechanical masking may be utilized in some scenarios to enable additional control over conformal coating thickness modulation. PEALD may offer better opportunities for mechanical masking as the lifetime of plasma radicals is limited. PEALD can also be advantageously utilized to lower costs at large-scale manufacturing volumes.

PEALD typically provides much sharper coating edges because the plasma radical lifetime limits the deposition distance and macroscopic cavities. Plasma processes also enable more control over the film stresses, which enables the thicker film stacks on organic materials. However, temporal PEALD processes have been difficult to scale into large batch reactors. Spatial PEALD may be expected to enable easier control over the ALD growth as both the precursor exposure and plasma exposure can be more easily limited spatially. Also, the low temperature, fast cycle times and better stress control should enable deposition of thicker ALD layers cost-effectively.

The thickness-modulated conformal coating reactor 1500 is arranged to limit the precursor or plasma exposure to certain areas of the SRG optical component 710. As shown in FIG. 15, the first and second precursor heads 1505 each deliver a substantially continuous flow of precursor gas or plasma or pulses of precursor gas or plasma. In an ALD reaction, the first precursor delivered from the first head reacts with and chemically alters the coating surface of the SRG optical component 710 before being exposed to the second precursor exiting from the second head. The chemically altered coating surface then reacts with the second precursor to form a solid material layer or thin film onto the SRG optical component 710. For example, the precursors gases or plasmas may include TMA (trimethyl aluminum) and carbon dioxide plasma for the aluminum oxide conformal coating, TTIP (titanium tetraisopropoxide) and carbon dioxide plasma for the titanium dioxide conformal coating, DEZ (diethyl zinc) and H2S (hydrogen sulfide) for zinc sulfide conformal coating.

Using a transport mechanism (not shown), the SRG optical component 710 is moved within the reactor volume over a path (from left to right in the drawing) to thereby be exposed to a linear array of coating heads. As shown in the top view of the thickness-modulated conformal coating reactor 1500 in FIG. 16, alternating precursors are sequentially exposed to the SRG optical component 710 by the coating head array 1505. Individual coating heads in the array 1505 may be configured with different sizes and/or shapes. The coating heads may be fixed or variably configurable in some implementations. In this illustrative example, the coating heads for precursor 1 increase in width along the length of the reactor. The coating heads for precursor 2 have a fixed width that is approximately the width of the SRG optical component 710.

The sequence of increasing coating head width for precursor 1 provides for thickness modulation of the conformal coatings that are grown on the SRG optical component 710 because each sequential exposure increases the coating thickness. However, as some portions of the SRG optical component are exposed to fewer coating heads in the sequence compared to other portion, those portions will have a relatively thinner conformal coating thickness. This thickness modulation is shown on the ALD-processed SRG optical component 1605 where the darker shading indicates relative increased coating thickness. Multiple passes of the SRG optical component through the reactor 1500 may be utilized to grow the conformal coating to a target thickness, as needed.

FIG. 17 shows cutaway top and side views of an illustrative thickness-modulated conformal coating reactor 1700 that operates in a rotary manner whereby a substrate carrier 1705 rotates about its axis within an ALD reaction chamber that is divided into several sub-chambers or processing zones 1710, 1715, 1720, and 1725. In some configurations, the processing zones and be physically separated using one or more mechanical barriers 1730, while in other configurations the processing zones or more simply implemented within a single interior volume. Physically isolated processing zones may be utilized, for example, when differential environments are utilized between processing steps such as different temperatures and/or pressures. In addition, physical partitions may help inhibit precursors in one zone from leaking into others.

The SRG optical component 710 is transported along a circular path (as indicated by reference numeral 1735) sequentially through first and second precursor zones 1710 and 1720. Purge zones 1715 and 1725 may be interleaved between the precursor zones in some implementations. The processing zones can vary in size and shape to meet the needs of a particular implementation. As with the illustrative reactor example shown in FIGS. 15 and 16 and described in the accompanying text, PEALD or other CVD processes may be utilized to grow the conformal coating.

As shown in a cutaway top view in FIG. 18, the SRG optical component 710 may be cyclically processed in the reactor 1700 whereby each 360° rotation of the substrate carrier 1705 exposes the component to one ALD growth cycle comprising precursor 1 followed by precursor 2. As discussed above, reaction between the precursors provides for growth of the conformal coating.

A variable plasma/gas opening 1750 is disposed in the reactor 1700 that is configured to operate to preferentially expose particular areas of the SRG optical component 710 to precursor 2 in processing zone 1720. By changing the size of the variable opening at each rotational cycle, thickness modulated conformal growth can be achieved on the component.

FIG. 19 shows an illustrative pixelated plasma or gas source 1900 which may alternatively be utilized in either or both of the linear or rotary reactors 1500 and 1700 described above. The pixelated source 1900 includes a two-dimensional array of coating heads where each element in the array provides a discrete spatial location for the discharge of an appropriate gas or plasma. For purposes of illustration, each array element is simplified, however, it may be understood that multiple different coating heads may be provided at each array location. Thus, for example, different precursors and/or inert materials may be discharged in a desired two-dimensional spatial pattern.

The pixelated source 1900 may be operated in a cyclical manner to provide spatially thickness-modulated conformal coatings on the SRG optical component 710 (FIG. 7) along two directions, X and Y, as shown. It is noted that the dimensions of the pixelated array can be varied to meet the needs of a particular implementation.

Thickness-modulated conformal coatings may also be implemented, as a standalone technique or in combination with other techniques (e.g. variably sized coating heads and variable openings) using mechanical masks. For example, as shown in FIG. 20, a mechanical mask 2005 is operably coupled to a motor 2010 via a linkage 2015 to enable select portions of the SRG optical component 710 to be preferentially masked during one or more processing cycles in an ALD reactor. In some implementations, the mask can be variably configurable (e.g., be expanded or contracted, change size/shape, etc.) to enable additional processing flexibility. Alternatively, multiple different variably configurable or statically configured masks may be utilized.

In addition to use of mechanical masks, various areas 2020 of the SRG optical component 710 can be subjected to surface treatments known as poisoning that prevent film growth on the treated areas. Poisoning could be also done using the pixelated gas/plasma source. Other subtractive and preventative processes to control conformal coating thickness may also be utilized in some implementations including, for example, etching.

FIG. 21 shows cutaway top and side views of an illustrative rotary thickness-modulated conformal coating reactor 2100 in which an SRG optical component 710 is positioned vertically against an interior wall of the reactor. Sets of vertically oriented coating head/nozzle arrays (as representatively indicated by reference numeral number 2120) are positioned in the reactor at various different processing zones to enable preferential spatial exposure of the component to precursors in an ALD process. As the substrate carrier number 2105 rotates about its axis, the SRG optical component traverses a circular path 2125 in the reactor. Utilization of multiple rotational cycles can thereby enable thickness modulated conformal coating growth on the component. In an alternative embodiment, the coating head/nozzle array 2120 may be rotating and the substrates may stay still. In another alternative embodiment, the substrates may be mounted on the outer wall of a rotating cylinder and nozzles 2125 may be placed against the reactor wall.

FIG. 22 is a flowchart of an illustrative method 2200 for growing a thickness-modulated conformal coating an optical component having diffractive optical elements with three-dimensional grating structures that are configured for use in a near-eye display system. Unless specifically stated, the methods or steps shown in the flowchart and described in the accompanying text are not constrained to a particular order or sequence. In addition, some of the methods or steps thereof can occur or be performed concurrently and not all the methods or steps have to be performed in a given implementation depending on the requirements of such implementation and some methods or steps may be optionally utilized.

In step 2205, an array of coating heads is maintained in a thin-film reaction chamber in which the coating head array provides spatially diverse conformal coating coverage over the optical component when transported through the thin-film reaction chamber. In step 2210, elements in the array of coating heads are configured to emit a first precursor in a first zone of the thin-film reaction chamber. In step 2215, elements in the array of coating heads are configured to emit a second precursor in a second zone of the thin-film reaction chamber, wherein the first and second precursors are reactive. In step 2220, the optical component is alternately transported between the first and second zones to expose a surface of the optical component to the precursors multiple times, wherein each exposure results in some of the precursors being adsorbed on the surface whereby a thin film is formed on the surface, wherein the elements in the array of coating heads are configured to grow the conformal coating with modulated thickness over the optical component.

FIG. 23 shows one particular illustrative example of a see-through, mixed reality or virtual reality display system 2300, and FIG. 24 shows a functional block diagram of the system 2300. Display system 2300 comprises one or more lenses 2302 that form a part of a see-through display subsystem 2304, such that images may be displayed using lenses 2302 (e.g. using projection onto lenses 2302, one or more waveguide systems incorporated into the lenses 2302, and/or in any other suitable manner). Display system 2300 further comprises one or more outward-facing image sensors 2306 configured to acquire images of a background scene and/or physical environment being viewed by a user, and may include one or more microphones 2308 configured to detect sounds, such as voice commands from a user. Outward-facing image sensors 2306 may include one or more depth sensors and/or one or more two-dimensional image sensors. In alternative arrangements, as noted above, an mixed reality or virtual reality display system, instead of incorporating a see-through display subsystem, may display mixed reality or virtual reality images through a viewfinder mode for an outward-facing image sensor.

The display system 2300 may further include a gaze detection subsystem 2310 configured for detecting a direction of gaze of each eye of a user or a direction or location of focus, as described above. Gaze detection subsystem 2310 may be configured to determine gaze directions of each of a user’s eyes in any suitable manner. For example, in the illustrative example shown, a gaze detection subsystem 2310 includes one or more glint sources 2312, such as infrared light sources, that are configured to cause a glint of light to reflect from each eyeball of a user, and one or more image sensors 2314, such as inward-facing sensors, that are configured to capture an image of each eyeball of the user. Changes in the glints from the user’s eyeballs and/or a location of a user’s pupil, as determined from image data gathered using the image sensor(s) 2314, may be used to determine a direction of gaze.

In addition, a location at which gaze lines projected from the user’s eyes intersect the external display may be used to determine an object at which the user is gazing (e.g. a displayed virtual object and/or real background object). Gaze detection subsystem 2310 may have any suitable number and arrangement of light sources and image sensors. In some implementations, the gaze detection subsystem 2310 may be omitted.

The display system 2300 may also include additional sensors. For example, display system 2300 may comprise a global positioning system (GPS) subsystem 2316 to allow a location of the display system 2300 to be determined. This may help to identify real-world objects, such as buildings, etc. that may be located in the user’s adjoining physical environment.

The display system 2300 may further include one or more motion sensors 2318 (e.g., inertial, multi-axis gyroscopic, or acceleration sensors) to detect movement and position/orientation/pose of a user’s head when the user is wearing the system as part of an mixed reality or virtual reality HMD device. Motion data may be used, potentially along with eye-tracking glint data and outward-facing image data, for gaze detection, as well as for image stabilization to help correct for blur in images from the outward-facing image sensor(s) 2306. The use of motion data may allow changes in gaze location to be tracked even if image data from outward-facing image sensor(s) 2306 cannot be resolved.

In addition, motion sensors 2318, as well as microphone(s) 2308 and gaze detection subsystem 2310, also may be employed as user input devices, such that a user may interact with the display system 2300 via gestures of the eye, neck and/or head, as well as via verbal commands in some cases. It may be understood that sensors illustrated in FIGS. 23 and 24 and described in the accompanying text are included for the purpose of example and are not intended to be limiting in any manner, as any other suitable sensors and/or combination of sensors may be utilized to meet the needs of a particular implementation. For example, biometric sensors (e.g., for detecting heart and respiration rates, blood pressure, brain activity, body temperature, etc.) or environmental sensors (e.g., for detecting temperature, humidity, elevation, UV (ultraviolet) light levels, etc.) may be utilized in some implementations.

The display system 2300 can further include a controller 2320 having a logic subsystem 2322 and a data storage subsystem 2324 in communication with the sensors, gaze detection subsystem 2310, display subsystem 2304, and/or other components through a communications subsystem 2326. The communications subsystem 2326 can also facilitate the display system being operated in conjunction with remotely located resources, such as processing, storage, power, data, and services. That is, in some implementations, an HMD device can be operated as part of a system that can distribute resources and capabilities among different components and subsystems.

The storage subsystem 2324 may include instructions stored thereon that are executable by logic subsystem 2322, for example, to receive and interpret inputs from the sensors, to identify location and movements of a user, to identify real objects using surface reconstruction and other techniques, and dim/fade the display based on distance to objects so as to enable the objects to be seen by the user, among other tasks.

The display system 2300 is configured with one or more audio transducers 2328 (e.g., speakers, earphones, etc.) so that audio can be utilized as part of an mixed reality or virtual reality experience. A power management subsystem 2330 may include one or more batteries 2332 and/or protection circuit modules (PCMs) and an associated charger interface 2334 and/or remote power interface for supplying power to components in the display system 2300.

It may be appreciated that the display system 2300 is described for the purpose of example, and thus is not meant to be limiting. It may be further understood that the display device may include additional and/or alternative sensors, cameras, microphones, input devices, output devices, etc. than those shown without departing from the scope of the present arrangement. Additionally, the physical configuration of a display device and its various sensors and subcomponents may take a variety of different forms without departing from the scope of the present arrangement.

As shown in FIG. 25, a mixed reality display system using optical components with thickness-modulated conformal coatings can be used in a mobile or portable electronic device 2500, such as a mobile phone, smartphone, personal digital assistant (PDA), communicator, portable Internet appliance, hand-held computer, digital video or still camera, wearable computer, computer game device, specialized bring-to-the-eye product for viewing, or other portable electronic device. As shown, the portable device 2500 includes a housing 2505 to house a communication module 2510 for receiving and transmitting information from and to an external device, or a remote system or service (not shown).

The portable device 2500 may also include an image processing module 2515 for handling the received and transmitted information, and a virtual display system 2520 to support viewing of images. The virtual display system can include a micro-display or an imager 2525 and an optical display system 2530 that may use thickness-modulated conformal coatings on various optical components therein. The image processing module 2515 may be operatively connected to the optical display system to provide image data, such as video data, to the imager to display an image thereon. An EPE 2535 can be optically linked to an optical display system.

A mixed reality display system using optical components with thickness-modulated conformal coatings may also be utilized in non-portable devices, such as gaming devices, multimedia consoles, personal computers, vending machines, smart appliances, Internet-connected devices, and home appliances, such as an oven, microwave oven and other appliances, and other non-portable devices.

Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.

文章《Microsoft Patent | Thickness-modulated conformal coatings on optical components》首发于Nweon Patent

]]>
Microsoft Patent | Ambient light based mixed reality object rendering https://patent.nweon.com/27367 Thu, 09 Mar 2023 13:18:40 +0000 https://patent.nweon.com/?p=27367 ...

文章《Microsoft Patent | Ambient light based mixed reality object rendering》首发于Nweon Patent

]]>
Patent: Ambient light based mixed reality object rendering

Patent PDF: 加入映维网会员获取

Publication Number: 20230072701

Publication Date: 2023-03-09

Assignee: Microsoft Technology Licensing

Abstract

Implementations of the subject matter described herein relate to mixed reality object rendering based on ambient light conditions. According to the embodiments of the subject matter described herein, while rendering an object a wearable computing device acquires light conditions of the real world, thereby increasing the reality of the rendered object. In particular, the wearable computing deice is configured to acquire an image of an environment where the wearable computing deice is located. The image is adjusted based on a cement parameter used w hen the image is captured. Subsequently, ambient light information is determined based on the adjusted image. In this way, the wearable computing deice can obtain more real and accurate emblem light information, so as to render to the user an object with enhanced reality. Accordingly, the user can have a better interaction experience.

Claims

1. 115. (canceled)

16.A wearable computing device, comprising: a processing unit; and a memory coupled to the processing unit and storing instructions which, when executed by the processing unit, perform operations comprising: receiving image data of an image of an environment in which the wearable computing device is situated; adjusting pixel color values and pixel opacity of the image based on camera parameters used by a camera in capturing the image resulting in an adjusted image, the camera parameters including one or more of an exposure time, light sensitivity, and a gamma correction parameter; determining, based on the adjusted image, ambient light information that indicates light intensities in the environment; and adjusting specular or reflection of an object in a subsequent image of the environment based on the ambient light information.

17.The device according to claim 16, wherein the camera is onboard the wearable computing device, wherein the image data is received from the camera and wherein capturing the image of the environment comprises: determining, based on a parameter indicating a field of view range of the camera, a plurality of shooting directions required for covering the environment; and causing the camera to capture a plurality of images according to the plurality of shooting directions.

18.The device according to claim 16, wherein the determining ambient light information of the wearable computing device comprises: generating a panorama image of the environment based on the adjusted image; and mapping the panorama image to a cube map indicating the ambient light information.

19.The device according to claim 16, wherein the operations further comprise: determining whether there is original ambient light information for the environment where the wearable computing device is located; and in response to determining that there is the original ambient light information, updating the original ambient light information by using the determined ambient light information.

20.The device according to claim 19, wherein the updating the original ambient light information by using the determined ambient light information comprises: dividing the original ambient light information into a first plurality of portions; dividing the determined ambient light information into a second plurality of portions; and in response to a difference between one of the first plurality of portions and a corresponding one of the second plurality of portions exceeding a threshold difference, modifying the original ambient light information with the determined ambient light information.

21.The device according to claim 16, wherein the operations further comprise: rendering an object to a user of the wearable computing device based on the ambient light information.

22.The device according to claim 21, wherein the rendering an object to a user of the wearable computing device based on the ambient light information comprises: generating an initial light map associated with the object based on the ambient light information; down-sampling the initial light map to generate a set of down-sampled light maps having different resolutions; and rendering the object based on the set of down-sampled light maps.

23.A method implemented by a wearable computing device, comprising: receiving image data of an image of an environment in which the wearable computing device is situated; adjusting pixel color values and pixel opacity of the image based on camera parameters used by a camera in capturing the image resulting in an adjusted image, the camera parameters including one or more of an exposure time, light sensitivity, and a gamma correction parameter; determining, based on the adjusted image, ambient light information that indicates light intensities; and adjusting specular or reflection of an object in a subsequent image of the environment based on the ambient light information.

24.The method according to claim 23, wherein the camera is onboard the wearable computing device and wherein the capturing the image of the environment where the wearable computing device is located comprises: determining, based on a parameter indicating a field of view range of the camera, a plurality of shooting directions required for covering the environment; and causing the camera to capture a plurality of images according to the plurality of shooting directions.

25.The method according to claim 23, wherein the determining ambient light information of the wearable computing device comprises: generating a panorama image of the environment based on the adjusted image; and mapping the panorama image to a cube map indicating the ambient light information.

26.The method according to claim 23, further comprising: determining whether there is original ambient light information for the environment where the wearable computing device is located; and in response to determining that there is the original ambient light information, updating the original ambient light information by using the determined ambient light information.

27.The method according to claim 26, wherein the updating the original ambient light information by using the determined ambient light information comprises: dividing the original ambient light information into a first plurality of portions; dividing the determined ambient light information into a second plurality of portions; and in response to a difference between one of the first plurality of portions and a corresponding one of the second plurality of portions exceeding a threshold difference, modifying the original ambient light information with the determined ambient light information.

28.The method according to claim 23, further comprising: rendering an object to a user of the wearable computing device based on the ambient light information.

29.The method according to claim 28, wherein the rendering an object to a user of the wearable computing device based on the ambient light information comprises: generating an initial light map associated with the object based on the ambient light information; down-sampling the initial light map to generate a set of down-sampled light maps having different resolutions; and rendering the object based on the set of down-sampled light maps.

30.A non-transitory computer storage medium storing a computer program product and comprising machine executable instructions which, when running on a wearable computing device, cause the wearable computing device to perform operations comprising: receiving image data of an image of an environment in which the wearable computing device is situated; adjusting pixel color values and pixel opacity of the image based on a camera parameter used by a camera in capturing the image resulting in an adjusted image, the camera parameter including one or more of an exposure time, light sensitivity, and a gamma correction parameter; determining, based on the adjusted image, ambient light information that indicates light intensities in the environment, and adjusting specular or reflection of an object in a subsequent image of the environment based on the ambient light information.

31.The non-transitory computer storage medium according to claim 30, wherein adjusting pixel color values of the image based on camera parameters includes raising the pixel color values to the gamma correction parameter and dividing by a result of multiplying the light sensitivity by the exposure time.

32.The non-transitory computer storage medium according to claim 30, wherein the operations further comprise, wherein the camera is onboard the wearable computing device and wherein capturing the image of the environment comprises: determining, based on a parameter indicating a field of view range of the camera, a plurality of shooting directions required for covering the environment; and causing the camera to capture a plurality of images according to the plurality of shooting directions.

33.The non-transitory computer storage medium according to claim 30, wherein the determining ambient light information of the wearable computing device comprises: generating a panorama image of the environment based on the adjusted image; and mapping the panorama image to a cube map indicating the ambient light information.

34.The non-transitory computer storage medium according to claim 30, wherein the operations further comprise: determining whether there is original ambient light information for the environment where the wearable computing device is located; and in response to determining that there is the original ambient light information, updating the original ambient light information by using the determined ambient light information.

35.The non-transitory computer storage medium according to claim 34, wherein the updating the original ambient light information by using the determined ambient light information comprises: dividing the original ambient light information into a first plurality of portions; dividing the determined ambient light information into a second plurality of portions; and in response to a difference between one of the first plurality of portions and a corresponding one of the second plurality of portions exceeding a threshold difference, modifying the original ambient light information with the determined ambient light information.

Description

BACKGROUND

A wearable computing device is a portable device which can be directly borne on a user’s body or worn on a user’s clothes or accessories. Wearable computing devices take various forms, such as head-mounted devices like glasses and helmets, hand-mounted devices like watches and bracelets, leg-mounted devices like shoes and socks, as well as other forms like smart clothing, bags, crutches and accessories.

Through hardware and software support as well as data interaction and cloud interaction, wearable computing devices may provide a variety of functions, exerting an increasingly great influence on people’s work, living and learning. Take a head-mounted device as an example. By combining virtuality and reality, the head-mounted device can provide better interactivity to users. In particular, a user may easily identify a virtual object in a real scenario and send instructions to the object, so that the object is caused to complete corresponding operations according to the instructions. By means of such kind of head-mounted devices, users may carry out operations in games. simulate real meetings and perform 3D modeling by gesturing, thereby effectively improving user interaction experience.

SUMMARY

While rendering a virtual object, light conditions of the real world are important to reality of the rendered object and user experience. Embodiments of the subject matter described herein provide a method and device for mixed reality object rendering. According to the embodiments of the subject matter described herein, while rendering an object, a wearable computing device takes light conditions in the real world into account, thereby improving reality of the rendered object. In particular, the wearable computing device is configured to acquire an image of an environment where the wearable computing device is located. The image is adjusted based on a camera parameter used when the image is captured. Subsequently, ambient light information is determined based on the adjusted image. In this way, the wearable computing device can obtain more real and accurate ambient light information, so as to render to the user an object with enhanced reality. Accordingly, the user can have a better interaction experience.

It is to be understood that the Summary is not intended to identify key or essential features of implementations of the subject matter described herein, nor is it intended to be used to limit the scope of the subject matter described herein. Other features of the subject matter described herein will become easily comprehensible through the description below.

BRIEF DESCRIPTION OF THE DRAWINGS

Through the more detailed description in the accompanying drawings, the above and other features, advantages and aspects of the subject matter described herein will become more apparent, wherein the same or similar reference numerals refer to the same or similar elements.

FIG. 1 shows a block diagram of a wearable computing device 100 in which one or more embodiments of the subject matter described herein can be implemented;

FIG. 2 shows a flowchart of a method 200 for acquiring ambient light information according to embodiments of the subject matter described herein;

FIG. 3 shows a schematic view of a shooting direction 300 according to embodiments of the subject matter described herein;

FIG. 4 shows a schematic view of a process 400 of generating ambient light information according to embodiments of the subject matter described herein;

FIG. 5 shows a schematic view of a cube map 500 of ambient light information according to embodiments of the subject matter described herein;

FIG. 6 shows a flowchart of a method 600 for updating ambient light information according to embodiments of the subject matter described herein; and

FIGS. 7A and 7B show schematic diagrams of an object rendered according to the prior art and an object rendered according to embodiments of the subject matter described herein, respectively.

Throughout the figures, same or similar reference numbers will always indicate same or similar elements.

DETAILED DESCRIPTION

Embodiments of the subject matter described herein will be described in more detail with reference to the accompanying drawings, in which some embodiments of the subject matter described herein have been illustrated. However, the subject matter described herein can be implemented in various manners, and thus should not be construed to be limited to the embodiments disclosed herein. On the contrary, those embodiments are provided for the thorough and complete understanding of the subject matter described herein, and completely conveying the scope of the subject matter described herein to those skilled in the art. It should be understood that the accompanying drawings and embodiments of the subject matter described herein are merely for the illustration purpose, rather than limiting the protection scope of the subject matter described herein.

The term “comprise” and its variants used in embodiments of the subject matter described herein are to be read as open terms that mean “comprise, but is not limited to.” The term “based on” is to be read as “based at least in part on.” The terms “one embodiment” and “an implementation” are to be read as “at least one embodiment.” The term “another embodiment” is to be read as “at least one other embodiment.” Definitions of other terms will be presented in description below.

The subject matter described herein proposes a method and device for mixed reality object rending, which determine ambient light information based on an image of an environment where a wearable computing device is located, and further render the object to a user based on the ambient light information. The ambient light information discussed herein comprises information about light intensities in a plurality of directions of an environment where the wearable computing device is located. In this way, the method and device according to embodiments of the subject matter described here can apply ambient light factors to the object rendering process, thereby rendering the object to the user more really and accurately. As such, the user experience can be improved effectively.

FIG. 1 shows a block diagram illustrating a wearable computing device 100 in which the embodiments of the subject matter described herein can be implemented. It should be understood that the wearable computing device 100 shown in FIG. 1 is merely illustrative and does not form any limitation to the functionality and scope of the embodiments described herein.

The wearable computing device 100 may be used for implementing the object rendering process according to the embodiments of the subject matter described herein, and may be implemented in various forms such as smart glasses, smart helmets, smart headphones which are wearable for a user 101.

An image 105 of an environment 103 in which the wearable computing device 100 is located may be acquired by a camera 104, and the image 105 may be adjusted on the basis of a camera parameter used by the camera 104 for capturing the image 105. Then, ambient light information may be determined based on the adjusted image. The ambient light information determined as such at least indicates light intensities in a plurality of directions under the environment, such that the wearable computing device 100 can render an object 106 to the user 101 by using the ambient light information.

The wearable computing device 100 may further adjust a shooting direction of the image 105, depending on a Field of View (FOV) range of the camera 104. In addition, the wearable computing device 100 may further update existing ambient light information (also referred to as “original ambient light information” below) by using the determined ambient light information.

Components of the wearable computing device 100 may comprise, but not limited to, one or more processors or processing units 110, a storage device 120, one or more input devices 130 as well as one or more output devices 140. The processing unit 110 may be a real or virtual processor and can execute various processing according to programs stored in the storage device 120. In a multi-processor system, multiple processing units concurrently execute computer executable instructions so as to increase the concurrent processing capability of the wearable computing device 100.

The wearable computing device 100 usually comprises a plurality of computer storage media. Such media may be any available media that are accessible to the wearable computing device 100, comprising, but not limited to, volatile and non-volatile media, removable and non-removable media. The storage device 120 may be a volatile memory (e.g., register, cache, random-access memory (RAM)), non-volatile memory (e.g., read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory), or some combination thereof. The storage device 120 may also be removable or non-removable media, and may comprise machine readable media, such as flash drivers, magnetic disks or any other media, which can be used for storing information and/or data and which can be accessed within the wearable computing device 100.

The wearable computing device 100 may further comprise other removable/non-removable and volatile/non-volatile storage media. Although not shown in FIG. 1, there may be provided magnetic disk drivers for reading from or writing to removable and non-volatile magnetic disks, and optical disk drivers for reading from or writing to removable and non-volatile optical disks. In these cases, each driver may be connected to a bus (not shown) by one or more data media interfaces. The storage device 120 may comprise one or more program products 122, with one or more program module sets, which program modules are configured to perform functions of various embodiments described herein.

The input device 130 may be one or more of different input devices, such as a mouse, keyboard, trackball, voice input device, etc. The output device 140 may be one or more output devices, such as a display, speaker, printer, etc.

As shown in FIG. 1, the camera 104 acquires the image 105 of the environment 103 where the wearable computing device 100 is located, and provides the image to the input device 130 of the wearable computing device 100. Then, the wearable computing device 100 acquires ambient light information based on the received image 105 and thereby renders the object 106 based on the ambient light information, so that the user 101 can see the object 106 having an ambient light effect. It should be understood that the ambient light information may be determined according to one or more images 105 captured by the camera 104. Although FIG. 1 shows a plurality of images 105, this is merely exemplary and not intended to limit the scope of the subject matter described herein.

Several exemplary embodiments of the method and device for object rendering by considering ambient light conditions will be described in more details. FIG. 2 shows a flowchart of a method 200 for acquiring ambient light information according to an embodiment of the subject matter described herein. In some embodiments, the method 200 may be executed by the processing unit 110 described with reference to FIG. 1.

In block 210, at least one image 105 of the environment 103 where the wearable computing device 100 is located is acquired. Ambient light information at least indicates light intensities in multiple directions of the environment 103 where the wearable computing device 100 is located. According to the embodiments of the subject matter described herein, the user 101, the wearable computing device 100 and the object 106 are located in the same environment 103, so the environment 103 where the wearable computing device 100 is located is the same as the environment where the user 101 and/or object 106 are located.

Such an image 105 may be acquired in a variety of ways. In some embodiments, the wearable computing device 100 receives an image of the environment 103 captured by the camera 104 that operatively communicates with the wearable computing device 100. The camera 104 may be a normal camera such as a digital camera, a smart telephone, and a non-panorama camera on a tablet computer. It should be understood the foregoing examples of the camera 104 are merely for the discussion purpose, which are not limiting or intended to limit the scope of the subject matter described herein in any way. Those skilled in the art may use any other available devices to acquire the image of the environment 103.

According to the embodiment of the subject matter described herein, the camera 104 operatively communicates with the wearable computing device 100. In one embodiment, the camera 104 and the wearable computing device 100 are separately disposed. The camera 104 may be disposed at a fixed location relative to the wearable computing device 100, for example, a location in front of the wearable computing device 100 at a predefined distance. The camera 104 may be connected with the wearable computing device 100 via a communication network (e.g., WIFI, Bluetooth, etc.) and deliver the acquired image to the wearable computing device 100 in the form of a video stream.

Alternatively, in another embodiment, the camera 104 may be integrated on the wearable computing device 100, so that it can change its location according to the movement of the user 101 who wears the wearable computing device 100. In this way, the scenario captured by the camera 104 can be ensured to keep consistent with the location of the user 101. As a result, a light effect that better matches the environment 103 can be acquired.

In some embodiments, while capturing the image 105, a shooting direction of the camera 104 may be determined in advance, so that the environment 103 where the wearable computing device 100 is located can be fully covered. In one embodiment, multiple shooting directions for covering the environment 103 may be determined based on a parameter indicating the FOV range of the camera 104, and the camera 104 is caused to capture images according to the determined shooting directions. Different models of cameras may have different FOV range parameters. The FOV range may be fixed or variable. Depending on different FOV ranges, the number and shooting directions of images for covering the environment 103 also differ somewhat. For example, where the largest Horizontal Field of View (FOV) of the camera 104 is 67 degrees, 34 shooting directions may be determined to capture 34 images capable of covering the environment 103.

FIG. 3 shows a schematic view of a camera 300 according to an embodiment of the subject matter described herein. In an example as shown in FIG. 3, assuming that the wearable computing device 100 with the camera 104 is in a position 301, then 34 points are shown in a spherical coordinate system whose coordinate origin is the position 301, each point corresponding to one shooting direction. Specifically, one point is shown in each of positions 310 and 370 that correspond to 90 degrees and −90 degrees, respectively; four points are shown in each of positions 320 and 360 that correspond to 60 degrees and −60 degrees, respectively; moreover, eight points are shown in each of positions 330, 340, 350 that correspond to 30 degrees, 0 degree and −30 degrees, respectively. In one embodiment, the user 101 of the wearable computing device 100 at the coordinate origin 301 may take images towards these 34 points, thereby acquiring 34 images capable of covering the environment 103.

It should be understood the shooting directions shown in FIG. 3 are merely exemplary but not limiting. A relationship between the FOV range and the determined shooting direction of the camera may be determined according to multiple conventional modes or algorithms, which is omitted here.

In some embodiments, the wearable computing device 100 may prompt the determined shooting directions to the user 101, for example, displaying in these directions reference objects (e.g., white balloons) having a predefined shape, color, flicker mode, and/or the like. Thus, when the user 101 looks at the reference object according to prompts (e.g., voice prompts, visual prompts, etc.), the camera 104 of the wearable computing device 100 may automatically take an image according to the shooting direction. In this way, the plurality of images 105 covering the environment 103 may be captured according to the multiple shooting directions.

In some embodiments, the camera 104 may comprise more than one camera (e.g., front camera and rear camera) that may capture an image respectively, so images of the environment 103 may be captured more rapidly and effectively. It should be understood although the foregoing embodiments have described the example of acquiring multiple images 105 in block 210, this is merely exemplary and does not limit the scope of the subject matter described herein. In other embodiments of the subject matter described herein, the ambient light information may also be determined only according to a single image 105 acquired in block 210. The single image 105 may be acquired according to the determined shooting direction, or may be an image of the environment 103 which is acquired by the camera 104 according to a current direction at a predefined time point. Acquiring a single image 105 is more rapid and flexible than acquiring multiple images 105.

After receiving the image 105 of the environment 103 from the camera 104, the wearable computing device 100 may store the image to a temporary image buffer so as to adjust the image.

Returning to FIG. 2, in block 220, pixel values of the image 105 are adjusted based on the camera parameter used by the camera 104 in capturing the image 105. The camera parameter may comprise one or more parameters used by the camera 104 in capturing the image 105, such as an exposure time, light sensitivity (ISO), light exposure, an aperture size, a shutter speed and/or other parameter. In some embodiments, pixel values of the image 105 may be adjusted based on the camera parameter by various means. For example, the image 105 may be regularized using a formula as below:

Color(r,g,b,a)=OriginalColor(r,g,b,a)Gamma*1ExposureTime*ISO,(1)

where Gamma represents a gamma correction parameter, OriginalColor(r,g,b,a) represents pixel values of an unadjusted image, Color(r,g,b,a) represents pixel values of an adjusted image, and r, g, b and a denote a red value, green value, blue value and alpha value of one pixel of an image respectively, the alpha value indicating the pixel’s opacity which, for example, ranges between 0 and 255. In addition, in Formula (I ), ExposureTime represents an exposure time of the camera, and ISO represents light exposure of the camera.

According to Formula (1), the image 105 may be adjusted as a “regularized” image. Where the ambient light information is determined based on multiple images 105, by adjusting these images 105 as such, the brightness of respective images may be adjusted to a uniform reference level, so that the ambient light information may be determined more accurately on the basis of these images.

It should be understood Formula (1) is merely one example of adjusting the image and is not intended to limit the embodiments of the subject matter described herein. Those skilled in the art should appreciate that besides Formula (1), the image may be adjusted by any other appropriate means.

In block 230, ambient light information is determined based on the adjusted image 105. The ambient light information at least indicates light intensities in multiple directions under the environment 103. In some embodiments, a panorama image of the environment 103 may be generated based on the adjusted image 105. This may be implemented according to a conventional panorama stitching method, for example, which is omitted here. In embodiments where the number of images 105 is much less (e.g., only one image 105), ambient light information may be determined directly based on the image, instead of a panorama image being generated. In an alternative embodiment, a panorama image may be generated based on the image, wherein the panorama image can reflect a part of ambient light conditions and thus may be referred to as “partial panorama image.” Then, the image 105 or the generated panorama image may be converted to a stereogram, such as a cube map, a mirror ball, etc., and may be used as ambient light information in its entirety or in part. Such conversion process may be completed by a predefined remapping operation. FIG. 4 shows a schematic view of a relevant process 400, which will be described in detail below. Take a cube map as an example. It uses a hexahedral cube to represent the surrounding light environment, facilitating a graphics processing unit (GPU) of the wearable computing device 100 to render the object 106 more efficiently. FIG. 5 shows an exemplary cube map 500 according to embodiments of the subject matter described herein.

With reference to the embodiments shown in connection with FIG. 4, a detailed discussion is presented to the process 400 of generating ambient light information. In the example shown in FIG. 4, suppose the images 105 captured by the camera 104 each have locatable information which can be used to calculate the position of the camera, and a panorama image 410 may be generated based on the images 105. A point 411 on the panorama image 410 may be transformed to a point 421 of a sphere 420 in a first transformation. Then, the point 421 may be transformed to a point 431 on a cube 430 in a second transformation. Later, the point 431 may be transformed to a point 441 on a cube map 440 with six sides in a third transformation. Through the three transformations, each point on the panorama image 410 may be transformed to a point on the cube map 440, so that the cube map 440 corresponding to the panorama image 410 may be obtained. The cube map 440 may be implemented as a stereogram 500 shown in FIG. 5 for example.

It should be understood that various technologies for transformation under different coordinate systems are well known. Therefore, the transformations (e.g., the first transformation, the second transformation and/or the third transformation) shown in the embodiment of FIG. 4 may be implemented by any method that is currently known or to be developed in the future, which will not limit the embodiments of the subject matter described herein and is omitted accordingly.

According to embodiments of the subject matter described herein, the ambient light information may be implemented as an image, video, or any other file in an appropriate format. It should be understood that the ambient light information described in the form of a cube map is discussed for illustration, without suggesting any limitation to the scope of the subject matter described herein.

Additionally, in some embodiments of the subject matter described herein, the object 106 may be rendered to the user 101 of the wearable computing device 100 based on the determined ambient light information. For example, the wearable computing device 100 may use the cube map as an initial light map and perform down-sampling to the initial light map. For example, pixels in the initial light map may be iteratively averaged by a predefined resolution reduction factor, thereby generating a set of down-sampled light maps having different resolutions.

Specifically, for the initial light cube map, a complete set of down-sampled lighting maps, e.g., a Mip-map chain, may be generated quickly. The set of down-sampled light maps are composed of light cube maps having different resolutions, and are approximate representations of light cube maps under different resolutions. The down-sampling solution according to the subject matter described herein may be implemented in various ways. In some embodiments, a predefined number (e.g., 4) of pixels at corresponding positions of the upper-layer light map may be directly averaged.

Then, the wearable computing device 100 may determine an appearance of the object 106 on the basis of the set of down-sampled light maps and render the appearance to the user 101. In embodiments of the subject matter described herein, the appearance of the object 106 may be composed of a plurality of points. The wearable computing device 100 may use the set of down-sampled light maps to determine diffuse reflectance intensities and specular reflectance intensities of the plurality of points on the object 106. Afterwards, the appearance of the object 106 may be determined on the basis of diffuse reflectance intensities and specular reflectance intensities of these points.

In this way, a more real appearance of the object 106 under the current environment may be provided to the user 101. As compared with conventional solutions for rending an object according to predefined light conditions, the wearable computing device 100 according to the embodiments of the subject matter described herein can obtain more real and accurate ambient light information, thereby improving the reality of the rendered object. As such, it can be avoided a drawback in the conventional solutions that the user cannot feel real light conditions in the real world, and accordingly the user’s interaction experience can be improved.

Optionally, in some embodiments of the subject matter described herein, existing ambient light information may further be updated using the determined ambient light information. Such existing ambient light information may be, for example, information concerning ambient light as preset to the wearable computing device 100. Alternatively or additionally, the existing ambient light information may further be historical ambient light information determined by the wearable computing device 100 at a previous time point, etc. For the sake of discussion, the information is collectively referred to as original ambient light information” here.

The process of updating the ambient light information may be automatically executed by the wearable computing device 100 at preset time periodically or a periodically, and/or may be initiated by the user 101 where necessary. In one embodiment, when the user 101 feels a significant change in the surrounding ambient light, he/she may trigger the camera 104 to take pictures of an area where the light changes dramatically. In this way, the ambient light information used by the wearable computing device 100 can be more consistent with the actual light situation of a current environment. which helps to improve the reality of the rendered object.

Specifically, in some embodiments, the wearable computing device 100 may determine whether there exists such original ambient light information or not. If yes, then the original ambient light information may be updated using the ambient light information determined in block 230. During updating of the original ambient light information, the original ambient light information may be modified if a predefined condition is met, or the original ambient light information will not be modified at all if a predefined condition is not met.

For example, in one embodiment, the original ambient light information may be divided into a plurality of portions (hereinafter referred to as “a plurality of original portions” for purpose of discussion), and the determined ambient light information in block 230 may be divided into a plurality of portions (hereinafter referred to as “a plurality of determined portions” for purpose discussion). Then, by comparing the plurality of original portions with the plurality of determined portions, it may be determined whether to modify the original ambient light information by using the determined ambient light information. FIG. 6 shows a flowchart of a method 600 for updating ambient light information according to the embodiments.

In block 610, the original ambient light information is divided into N original portions, denoted as P1, P2, . . . , PN, wherein N is an integer larger than or equal to I. In block 620, the determined ambient light information is divided into N determined portions, denoted as Q1, Q2, . . . , QN If a difference between one original portion PK among the N original portions and a corresponding determined portion QK (wherein 1≤K≤N, and K is an integer) is larger than a threshold difference, then the original ambient light information may be modified using the determined ambient light information.

In block 630, it is judged whether a difference between PK and QK is larger than a threshold difference. The threshold difference may be predetermined in various ways, e.g., according to an empirical value, a previously calculated difference value, or the like. It may be determined from the threshold difference whether there is a significant change between PK and QK. If the difference between PK and QK exceeding the threshold difference, it may be determined that there is a considerable difference between the original ambient light information and the current determined ambient light information. At this point, in block 660 the original ambient light information is modified using the determined ambient light information.

On the contrary, if it is decided in block 630 that the difference between PK and QK is less than or equal to the threshold difference, then in block 640 let K=K+1. In block 650, it is judged whether the flow goes to the last determined portion or original portion, i.e., it is judged whether K calculated in block 640 is larger than N. If K>N, this means the judgment on all original portions and their corresponding determined portions has been completed, and the difference between the original ambient light information and the current determined ambient light information is rather trivial. Thus, there is no need to modify the original ambient light information.

In this way, the ambient light information of the wearable computing device 100 may be updated dynamically or in real time, so that the ambient light information used by the wearable computing device 100 may be made more consistent with the actual light situation of a current environment and the reality of the rendered object may be improved.

According to the embodiment of the subject matter described herein, the wearable computing device 100 acquires one or more images of an environment where it is located, determines ambient light information based on the adjusted image, and renders an object to a user based on the ambient light information. Therefore, during rendering the object 106, the wearable computing device 100 takes into consideration light conditions of the real world, thereby efficiently improving the reality of the rendered object and enhancing the user experience. FIGS. 7A and 7B show an object rendered according to the prior art and an object rendered according to embodiments of the subject matter described herein, respectively. It is clear compared with the object rendered according to the prior art as shown in FIG. 7A, the object rendered according to the embodiments of the subject matter described herein as shown in FIG. 7B has a better effect of ambient light and presents to the user a stronger sense of reality. This can significantly improve the user experience and accuracy of user interaction.

The methods and functions described in this specification may at least partly executed by one or more hardware logic components, and illustrative types of usable hardware logical components comprise field programmable gate array (FPGA), application-specific standard product (ASSP), system on chip (SOC), complex programmable logic device (CPLD), etc.

Program codes for carrying out methods of the subject matter described herein may be written in any combination of one or more programming languages. These computer program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowcharts and/or block diagrams to be implemented. The program code may be executed entirely on a machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of the subject matter described herein, a machine readable medium may be any tangible medium that may contain, or store a program for use by or in connection with an instruction execution system. apparatus, or device. The machine readable medium may be a machine readable signal medium or a machine readable storage medium. A machine readable medium may comprise but not limited to an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of the machine readable storage medium would comprise an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

Further, while operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. In certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are contained in the above discussions, these should not be construed as limitations on the scope of the subject matter described herein. Certain features that are described in this specification in the context of separate implementations may also be implemented in combination in a single implementation. Conversely, various features that are described in the context of a single implementation may also be implemented in multiple embodiments separately or in any suitable sub-combination.

Listed below are some example implementations of the subject matter described herein.

The embodiments of the subject matter described herein comprise a computer implemented method. The method comprises: acquiring at least one image of an environment where a wearable computing device is located, the at least one image being captured by a camera that operatively communicates with the wearable computing device; adjusting pixel values of the at least one image based on a camera parameter used by the camera in capturing the at least one image; and determining, based on the at least one adjusted image, ambient light information that indicates light intensities in a plurality of directions in the environment.

In some embodiments, the acquiring at least one image of an environment where a wearable computing device is located comprises: determining a plurality of shooting directions required for covering the environment, based on a parameter indicating a Field of View range of the camera; and causing the camera to capture a plurality of images according to the plurality of shooting directions.

In some embodiments, the camera parameter may comprise at least one of: an exposure time, ISO, light exposure, an aperture size and a shutter speed.

In some embodiments, the determining ambient light information of the wearable computing device comprises: generating a panorama image of the environment based on the at least one adjusted image; and mapping the panorama image to a cube map indicating the ambient light information.

In some embodiment, the method further comprises: determining whether or not there exists original ambient light information of the environment where the wearable computing device is located; and in response to determining there exists the original ambient light information, updating the original ambient light information by using the determined ambient light information.

In some embodiments, the updating the original ambient light information by using the determined ambient light information comprises: dividing the original ambient light information into a first plurality of portions; dividing the determined ambient light information into a second plurality of portions; and in response to a difference between one of the first plurality of portions and a corresponding one of the second plurality of portions exceeding a threshold difference, modifying the original ambient light information by using the determined ambient light information.

In some embodiments, the method further comprises: rendering an object to a user of the wearable computing device based on the ambient light information.

In some embodiments, the rendering an object to a user of the wearable computing device based on the ambient light information comprises: generating an initial light map associated with the object based on the ambient light information; down-sampling the initial light map to generate a set of down-sampled light maps having different resolutions; and rendering the object based on the set of down-sampled light maps.

The embodiments of the subject matter described herein comprise a wearable computing device, comprising: a processing unit; a memory, coupled to the processing unit and having instructions stored therein which, when executed by the processing unit, perform actions comprising: acquiring at least one image of an environment where the wearable computing device is located, the at least one image being captured by a camera that operatively communicates with the wearable computing device; adjusting pixel values of the at least one image based on a camera parameter used by the camera in capturing the at least one image; and determining, based on the at least one adjusted image, ambient light information that indicates light intensities in a plurality of directions in the environment.

In some embodiments, the acquiring at least one image of an environment where the wearable computing device is located comprises: determining a plurality of shooting directions required for covering the environment, based on a parameter indicating a Field of View range of the camera; and causing the camera to capture a plurality of images according to the plurality of shooting directions.

In some embodiments, the camera parameter may comprise at least one of: an exposure time, ISO, light exposure, an aperture size and a shutter speed.

In some embodiments, the determining ambient light information of the wearable computing device comprises: generating a panorama image of the environment based on the at least one adjusted image, and mapping the panorama image to a cube map indicating the ambient light information.

In some embodiments, the acts further comprise: determining whether or not there exists original ambient light information of the environment where the wearable computing device is located; and in response to determining there exists the original ambient light information, updating the original ambient light information by using the determined ambient light information.

In some embodiments, the updating the original ambient light information by using the determined ambient light information comprises: dividing the original ambient light information into a first plurality of portions; dividing the determined ambient light information into a second plurality of portions; and in response to a difference between one of the first plurality of portions and a corresponding one of the second plurality of portions exceeding a threshold difference, modifying the original ambient light information by using the determined ambient light information.

In some embodiments, the acts further comprise. rendering an object to a user of the wearable computing device based on the ambient light information.

In some embodiments, the rendering an object to a user of the wearable computing device based on the ambient light information comprises: generating an initial light map associated with the object based on the ambient light information; down-sampling the initial light map to generate a set of down-sampled light maps having different resolutions; and rendering the object based on the set of down-sampled light maps.

The embodiments of the subject matter described herein further provide a computer program product stored in a non-transient storage medium and comprising machine executable instructions which, when running on a wearable computing device, cause the device to: acquire at least one image of an environment where the wearable computing device is located, the at least one image being captured by a camera that operatively communicates with the wearable computing device; adjust pixel values of the at least one image based on a camera parameter used by the camera in capturing the at least one image; and determine, based on the at least one adjusted image, ambient light information that indicates light intensities in a plurality of directions in the environment.

In some embodiments, the machine executable instructions, when running on a device, further cause the device to: determine a plurality of shooting directions required for covering the environment, based on a parameter indicating a Field of View range of the camera; and cause the camera to capture a plurality of images according to the plurality of shooting directions.

In some embodiments, the camera parameter may comprise at least one of: an exposure time, ISO, light exposure, an aperture size and a shutter speed.

In some embodiments, the machine executable instructions, when running on a device, further cause the device to: generate a panorama image of the environment based on the at least one adjusted image; and map the panorama image to a cube map indicating the ambient light information.

In some embodiments, the machine executable instructions, when running on a device, further cause the device to: determine whether or not there exists original ambient light information of the environment where the wearable computing device is located; and in response to determining there exists the original ambient light information, update the original ambient light information by using the determined ambient light information.

In some embodiments, the machine executable instructions, when running on a device, further cause the device to: divide the original ambient light information into a first plurality of portions; divide the determined ambient light information into a second plurality of portions; and in response to a difference between one of the first plurality of portions and a corresponding one of the second plurality of portions exceeding a threshold difference, modify the original ambient light information by using the determined ambient light information.

In some embodiments, the machine executable instructions, when running on a device, further cause the device to: render an object to a user of the wearable computing device based on the ambient light information.

In some embodiments, the machine executable instructions, when running on a device, further cause the device to: generate an initial light map associated with the object based on the ambient light information; down-sample the initial light map to generate a set of down-sampled light maps having different resolutions; and render the object based on the set of down-sampled light maps.

Although the subject matter described herein has been described in a language specific to structural features and/or method logic actions, it should be appreciated that the subject matter as defined in the appended claims is not limited to the specific features or actions described above. On the contrary, the specific features and actions described above are merely example forms for implementing the claims.

文章《Microsoft Patent | Ambient light based mixed reality object rendering》首发于Nweon Patent

]]>
Microsoft Patent | Adaptive panoramic video streaming using composite pictures https://patent.nweon.com/27349 Thu, 09 Mar 2023 12:45:51 +0000 https://patent.nweon.com/?p=27349 ...

文章《Microsoft Patent | Adaptive panoramic video streaming using composite pictures》首发于Nweon Patent

]]>
Patent: Adaptive panoramic video streaming using composite pictures

Patent PDF: 加入映维网会员获取

Publication Number: 20230073542

Publication Date: 2023-03-09

Assignee: Microsoft Technology Licensing

Abstract

Innovations in stream configuration operations and playback operations for adaptive streaming of panoramic video are described. The innovations include features of adaptive streaming of panoramic video with composite pictures. For example, a stream configuration tool splits an input picture of panoramic video into multiple sections and creates multiple composite pictures. A composite picture includes one of the sections as well as a low-resolution version of the input picture. A playback tool reconstructs one or more composite pictures. Under normal operation, the playback tool can use the reconstructed section(s) of the composite picture(s) to render high-quality views of the panoramic video. If the view window dramatically changes, however, or if encoded data for a section is lost or corrupted, the playback tool can use the low-resolution version of the input picture to render lower-quality details for views of the panoramic video, without disruption of playback.

Claims

We claim:

1.A computer system comprising one or more processing units and memory, wherein the computer system implements a panoramic video stream configuration tool that includes: an input buffer configured to store an input picture of panoramic video; a formatter configured to: create a low-resolution version of the input picture; split the input picture into multiple sections according to partition settings; create multiple composite pictures, each of the multiple composite pictures including one of the multiple sections and also including the low-resolution version of the input picture; and add the multiple composite pictures, respectively, to corresponding video streams; one or more video encoders configured to encode the multiple composite pictures in the corresponding video streams, respectively, thereby producing encoded data for the multiple composite pictures as part of multiple bitstreams for the corresponding video streams, respectively; and one or more output buffers configured to store the encoded data for delivery.

2.The computer system of claim 1, wherein, for each of the multiple composite pictures, the low-resolution version of the input picture is adjacent one of the multiple sections within the composite picture.

3.The computer system of claim 1, wherein, for each of the multiple composite pictures, one of the multiple sections provides a first view of a frame packing arrangement, and the low-resolution version of the input picture provides a second view of the frame packing arrangement.

4.The computer system of claim 1, wherein, for each of the multiple composite pictures, the low-resolution version of the input picture is positioned at a pre-defined location relative to the one of the multiple sections in the composite picture.

5.The computer system of claim 1, wherein a manifest file includes information that indicates where the low-resolution version of the input picture is positioned in the multiple composite pictures, respectively.

6.The computer system of claim 1, wherein the input picture has a first spatial resolution, wherein the low-resolution version of the input picture has a second spatial resolution lower than the first spatial resolution, and wherein each of the multiple sections has a third spatial resolution lower than the first spatial resolution.

7.The computer system of claim 1, wherein the input picture and the low-resolution version of the input picture are in an input projection, and wherein the formatter is further configured to: project the input picture from the input projection to an intermediate projection, the multiple sections being in the intermediate projection.

8.The computer system of claim 7, wherein the input projection is an equirectangular projection, and wherein the intermediate projection is a sinusoidal projection, at least one of the multiple sections including at least some sample values having default values not representing content of the input picture.

9.The computer system of claim 1, wherein the multiple bitstreams are video elementary bitstreams, and wherein the panoramic video stream configuration tool further includes: a multiplexer for combining the encoded data, for the multiple bitstreams, into a single container stream.

10.In a computer system that implements a panoramic video stream configuration tool, a method comprising: receiving an input picture of panoramic video; creating a low-resolution version of the input picture; splitting the input picture into multiple sections according to partition settings; creating multiple composite pictures, each of the multiple composite pictures including one of the multiple sections and also including the low-resolution version of the input picture; adding the multiple composite pictures, respectively, to corresponding video streams; encoding the multiple composite pictures in the corresponding video streams, respectively, thereby producing encoded data for the multiple composite pictures as part of multiple bitstreams for the corresponding video streams, respectively; and storing the encoded data for delivery.

11.A computer system comprising one or more processing units and memory, wherein the computer system implements a panoramic video playback tool that includes: a view controller configured to: determine a view window for playback of panoramic video; from among multiple sections of the panoramic video, identify one or more sections that contain at least part of the view window; and for the one or more identified sections, select one or more bitstreams among multiple bitstreams for corresponding video streams; a streaming controller configured to request encoded data, in the one or more selected bitstreams for the one or more identified sections, respectively, for an input picture of the panoramic video, each of the one or more identified sections being part of a composite picture that also includes a low-resolution version of the input picture; one or more input buffers configured to store the encoded data; one or more video decoders configured to decode the encoded data to reconstruct the one or more identified sections for the input picture and/or reconstruct the low-resolution version of the input picture; a mapper configured to, based at least in part on the one or more reconstructed sections and/or the reconstructed low-resolution version of the input picture, create an output picture; and one or more output buffers configured to store the output picture for output to a display device.

12.The computer system of claim 11, wherein the view controller is configured to, in multiple iterations, perform operations to determine the view window, identify the one or more sections that contain at least part of the view window, and, for the one or more identified sections, select the one or more bitstreams.

13.The computer system of claim 11, wherein the mapper is further configured to: determine which portions of the output picture cannot be created using the one or more reconstructed sections; and for any portion of the output picture that cannot be created using the one or more reconstructed sections, create that portion of the output picture using the reconstructed low-resolution version of the input picture.

14.The computer system of claim 13, wherein at least part of the output picture is created using the one or more reconstructed sections, and wherein at least part of the output picture is created using the reconstructed low-resolution version of the input picture.

15.The computer system of claim 11, wherein the low-resolution version of the input picture and the one of the multiple sections in the composite picture are located at pre-defined positions within the composite picture.

16.The computer system of claim 11, wherein a manifest file includes information that indicates where the low-resolution version of the input picture is positioned in the composite pictures, respectively.

17.The computer system of claim 11, wherein the input picture has a first spatial resolution, wherein the low-resolution version of the input picture has a second spatial resolution lower than the first spatial resolution, and wherein each of the multiple sections has a third spatial resolution lower than the first spatial resolution.

18.The computer system of claim 11, wherein the mapper is further configured to: when creating the output picture, project the one or more reconstructed sections from an intermediate projection to an output projection.

19.The computer system of claim 18, wherein the intermediate projection is a sinusoidal projection, and wherein the output projection is a screen projection.

20.The computer system of claim 11, wherein the mapper is further configured to: when creating the output picture, project the low-resolution version of the input picture from an input projection to an output projection.

Description

PRIORITY APPLICATION(S)

This patent application is a continuation of U.S. patent application Ser. No. 16/935,476, filed Jul. 22, 2020, which is a divisional of U.S. patent application Ser. No. 15/990,548, filed May 25, 2018 (now U.S. Pat. No. 10,764,494), which are hereby incorporated in their entirety by reference.

BACKGROUND

When video is streamed over the Internet and played back through a Web browser or media player, the video is delivered in digital form. Digital video is also used when video is delivered through many broadcast services, satellite services and cable television services. Real-time videoconferencing often uses digital video, and digital video is used during video capture with most smartphones, Web cameras and other video capture devices. Digital video is also used for technologies such as virtual reality (“VR”) and augmented reality (“AR”), whether video is played back in a head-mounted display, mobile device, or other type of device.

Panoramic video is video in which views in multiple directions around a central position are recorded at the same time. The recorded video can include image content in every direction, or at least image content in every direction in a 360-degree circle around the central position, as well as at least some image content above the central position and at least some image content underneath the central position. Panoramic video is sometimes called 360-degree video, immersive video, or spherical video. Panoramic video can be captured using an omnidirectional camera or a collection of multiple cameras pointing in different directions. For modern-day applications, panoramic video is processed in digital form during stages of creation, editing, and delivery, as well as stages of reconstruction and rendering for playback.

During playback, a viewer typically can control a view direction relative to the central position, potentially changing which section of the panoramic video is viewed over time. In some systems, a viewer can also zoom in or zoom out. When panoramic video is rendered for display, the section of the panoramic video that is viewed may be projected to a flat image for output. For a mobile device or computer monitor, a single output picture may be rendered. For a head-mounted display (or mobile device held in a head-mounted band), the section of the panoramic video that is viewed may be projected to two output pictures, for the left and right eyes, respectively.

When a playback tool reconstructs and renders panoramic video, resources may be wasted retrieving and reconstructing image content that is not viewed. For example, memory may be used to store sample values for areas of the panoramic video that are not viewed, and processing cycles may be used to determine the non-viewed sample values and their locations at different stages of processing.

To use fewer resources, a playback tool may retrieve and reconstruct only part (not all) of the panoramic video. For example, considering the view direction and zoom factor for a viewer, the playback tool may retrieve encoded data and reconstruct panoramic video just for those sections of the panoramic video that are visible. In this way, the playback tool may save memory, processing cycles, and other resources while correctly rendering the visible sections of the panoramic video. If the view direction or zoom factor changes, however, the playback tool may not have image content needed to correctly render sections of the panoramic video that should be visible. Playback may freeze or stall until the playback tool can recover by retrieving encoded data and reconstructing panoramic video for the newly visible sections.

SUMMARY

In summary, the detailed description presents innovations in stream configuration operations and playback operations for adaptive streaming of panoramic video. In some example implementations, the innovations can help avoid disruption in playback of panoramic video if a viewer dramatically changes view direction or zoom factor during playback, or if encoded data for a section of panoramic video is lost (e.g., due to network congestion) or corrupted.

According to one aspect of the innovations described herein, a computer system implements a panoramic video stream configuration tool that includes an input buffer, a formatter, one or more video encoders, and one or more output buffers. The input buffer is configured to store an input picture of panoramic video. The formatter is configured to create a low-resolution version of the input picture, split the input picture into multiple sections according to partition settings, and create multiple composite pictures. Each of the composite pictures includes one of the multiple sections and also includes the low-resolution version of the input picture. The formatter is configured to add the composite pictures, respectively, to corresponding video streams. The video encoder(s) are configured to encode the composite pictures in the corresponding video streams, respectively. This produces encoded data for the composite pictures as part of multiple bitstreams for the corresponding video streams, respectively. The output buffer(s) are configured to store the encoded data for delivery. In this way, even if a playback tool retrieves encoded data for only one of the bitstreams, the playback tool has image content (specifically, the low-resolution version of the input picture) that it can use to render views of the panoramic video if the view direction or zoom factor dramatically changes, or if encoded data for a specific section is lost or corrupted. The quality of the rendered views (at least for details created from the low-resolution version of the input picture) may be degraded temporarily, but playback is not disrupted.

According to another aspect of the innovations described herein, a computer system implements a panoramic video playback tool that includes a view controller, a streaming controller, one or more input buffers, one or more video decoders, a mapper, and an output buffer. The view controller is configured to determine a view window for playback of panoramic video. The view controller is further configured to, from among multiple sections of the panoramic video, identify one or more sections that contain at least part of the view window. For the identified section(s), the view controller is configured to select one or more bitstreams among multiple bitstreams for corresponding video streams. The streaming controller is configured to request encoded data, in the selected bitstream(s) for the identified section(s), respectively, for an input picture of the panoramic video. Each of the identified section(s) is part of a composite picture that also includes a low-resolution version of the input picture. The input buffer(s) are configured to store the encoded data. The video decoder(s) are configured to decode the encoded data to reconstruct the identified section(s) for the input picture and/or reconstruct the low-resolution version of the input picture. The mapper is configured to, based at least in part on the reconstructed section(s) and/or the reconstructed low-resolution version of the input picture, create an output picture. Finally, the output buffer is configured to store the output picture for output to a display device. Under normal operation, the playback tool can use the reconstructed section(s) to render high-quality views of the panoramic video. If the view direction or zoom factor dramatically changes, however, or if encoded data for a specific section is lost or corrupted, the playback tool can use the low-resolution version of the input picture to render lower-quality details for views of the panoramic video, without disruption of playback.

The innovations can be implemented as part of a method, as part of a computer system configured to perform the method or as part of tangible computer-readable media storing computer-executable instructions for causing a computer system to perform the method. The various innovations can be used in combination or separately. This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter. The foregoing and other objects, features, and advantages of the invention will become more apparent from the following detailed description, which proceeds with reference to the accompanying figures.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a diagram illustrating an example computer system in which some described embodiments can be implemented.

FIGS. 2a and 2b are diagrams illustrating example network environments in which some described embodiments can be implemented.

FIGS. 3a to 3d are diagrams of example projections for a picture of panoramic video, and FIG. 3e is a diagram illustrating an example of a screen projection for a view of a picture of panoramic video.

FIG. 4 is a diagram illustrating an example architecture for a panoramic video stream configuration tool that supports overlapping sections and composite pictures.

FIG. 5 is a diagram illustrating an example architecture for a panoramic video playback tool that supports overlapping sections and composite pictures.

FIGS. 6a and 6b are diagrams illustrating examples of stream configuration operations for adaptive streaming of panoramic video with overlapping sections.

FIG. 7 is a diagram illustrating an example of overlapping section of a picture of panoramic video in a sinusoidal projection.

FIGS. 8a and 8b are diagrams illustrating examples of playback operations for adaptive streaming of panoramic video with overlapping sections.

FIG. 9 is a flowchart illustrating an example technique for stream configuration of panoramic video with overlapping sections.

FIG. 10 is a flowchart illustrating an example technique for playback of panoramic video with overlapping sections.

FIGS. 11a and 11b are diagrams illustrating examples of stream configuration operations for adaptive streaming of panoramic video with composite pictures.

FIG. 12 is a diagram illustrating an example composite picture of panoramic video.

FIGS. 13a and 13b are diagrams illustrating examples of playback operations for adaptive streaming of panoramic video with composite pictures.

FIG. 14 is a flowchart illustrating an example technique for stream configuration of panoramic video with composite pictures.

FIG. 15 is a flowchart illustrating an example technique for playback of panoramic video with composite pictures.

DETAILED DESCRIPTION

The detailed description presents innovations in stream configuration operations and playback operations for adaptive streaming of panoramic video. The innovations include features of adaptive streaming of panoramic video with composite pictures. In some example implementations, the innovations can help avoid disruption in playback of panoramic video if a viewer dramatically changes view direction or zoom factor during playback, or if encoded data for a section of panoramic video is lost (e.g., due to network congestion) or corrupted. The innovations also include features of adaptive streaming of panoramic video with overlapping sections. In other example implementations, the innovations can help avoid disruption in playback of panoramic video as a viewer gradually changes view direction or zoom factor during playback.

In the examples described herein, identical reference numbers in different figures indicate an identical component, module, or operation. Depending on context, a given component or module may accept a different type of information as input and/or produce a different type of information as output.

More generally, various alternatives to the examples described herein are possible. For example, some of the methods described herein can be altered by changing the ordering of the method acts described, by splitting, repeating, or omitting certain method acts, etc. The various aspects of the disclosed technology can be used in combination or separately. Some of the innovations described herein address one or more of the problems noted in the background. Typically, a given technique/tool does not solve all such problems.

I. Example Computer Systems

FIG. 1 illustrates a generalized example of a suitable computer system (100) in which several of the described innovations may be implemented. The innovations described herein relate to panoramic video stream configuration, streaming, and playback. Aside from its use in panoramic video stream configuration, streaming, and/or playback, the computer system (100) is not intended to suggest any limitation as to scope of use or functionality, as the innovations may be implemented in diverse computer systems, including special-purpose computer systems adapted for panoramic video stream configuration, streaming, and/or playback.

With reference to FIG. 1, the computer system (100) includes one or more processing cores (110 . . . 11x) of a central processing unit (“CPU”) and local, on-chip memory (118). The processing core(s) (110 . . . 11x) execute computer-executable instructions. The number of processing core(s) (110 . . . 11x) depends on implementation and can be, for example, 4 or 8. The local memory (118) may be volatile memory (e.g., registers, cache, RAM), non-volatile memory (e.g., ROM, EEPROM, flash memory, etc.), or some combination of the two, accessible by the respective processing core(s) (110 . . . 11x).

The local memory (118) can store software (180) implementing tools for adaptive panoramic video stream configuration, streaming, and/or playback, using overlapping sections and/or composite pictures, for operations performed by the respective processing core(s) (110 . . . 11x), in the form of computer-executable instructions. In FIG. 1, the local memory (118) is on-chip memory such as one or more caches, for which access operations, transfer operations, etc. with the processing core(s) (110 . . . 11x) are fast.

The computer system (100) can include processing cores (not shown) and local memory (not shown) of a graphics processing unit (“GPU”). In general, a GPU is any specialized circuit, different from the CPU, that accelerates creation and/or manipulation of image data in a graphics pipeline. The GPU can be implemented as part of a dedicated graphics card (video card), as part of a motherboard, as part of a system on a chip (“SoC”), or in some other way (even on the same die as the CPU). The number of processing cores of the GPU depends on implementation. The processing cores of the GPU are, for example, part of single-instruction, multiple data (“SIMD”) units of the GPU. The SIMD width n, which depends on implementation, indicates the number of elements (sometimes called lanes) of a SIMD unit. For example, the number of elements (lanes) of a SIMD unit can be 16, 32, 64, or 128 for an extra-wide SIMD architecture. The local memory may be volatile memory (e.g., registers, cache, RAM), non-volatile memory (e.g., ROM, EEPROM, flash memory, etc.), or some combination of the two, accessible by the respective processing cores of the GPU. The processing core(s) of the GPU can execute computer-executable instructions for one or more innovations for adaptive panoramic video stream configuration, streaming, and/or playback.

Alternatively, the computer system (100) includes one or more processing cores (not shown) of a system-on-a-chip (“SoC”), application-specific integrated circuit (“ASIC”) or other integrated circuit, along with associated memory (not shown). The processing core(s) can execute computer-executable instructions for one or more innovations for adaptive panoramic video stream configuration, streaming, and/or playback.

The computer system (100) includes shared memory (120), which may be volatile memory (e.g., RAM), non-volatile memory (e.g., ROM, EEPROM, flash memory, etc.), or some combination of the two, accessible by the processing core(s). Depending on architecture (e.g., whether a GPU is part of a video card, motherboard, or SoC), CPU memory can be completely separate from GPU memory, or CPU memory and GPU memory can, at least in part, be shared memory or drawn from the same source (e.g., RAM). The memory (120) stores software (180) implementing tools for adaptive panoramic video stream configuration, streaming, and/or playback, using overlapping sections and/or composite pictures, for operations performed, in the form of computer-executable instructions. In FIG. 1, the shared memory (120) is off-chip memory, for which access operations, transfer operations, etc. with the processing cores are slower.

The computer system (100) includes one or more network adapters (140). As used herein, the term network adapter indicates any network interface card (“NIC”), network interface, network interface controller, or network interface device. The network adapter(s) (140) enable communication over a network to another computing entity (e.g., server, other computer system). The network can be a wide area network, local area network, storage area network or other network. The network adapter(s) (140) can support wired connections and/or wireless connections, for a wide area network, local area network, storage area network or other network. The network adapter(s) (140) convey data (such as computer-executable instructions, audio or video input or output, or other data) in a modulated data signal over network connection(s). A modulated data signal is a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, the network connections can use an electrical, optical, RF, or other carrier.

The computer system (100) also includes one or more input device(s) (150). The input device(s) may be a touch input device such as a keyboard, mouse, pen, or trackball, a scanning device, or another device that provides input to the computer system (100). For video, the input device(s) (150) may be a camera, video card, screen capture module, TV tuner card, or similar device that accepts video input in analog or digital form, or a CD-ROM or CD-RW that reads video input into the computer system (100). The computer system (100) can also include an audio input, a motion sensor/tracker input, and/or a game controller input.

The computer system (100) includes one or more output devices (160). The output device(s) (160) may be a printer, CD-writer, or another device that provides output from the computer system (100). For video playback, the output device(s) (160) may be a head-mounted display, computer monitor, or other display device. An audio output can provide audio output to one or more speakers.

The storage (170) may be removable or non-removable, and includes magnetic media (such as magnetic disks, magnetic tapes or cassettes), optical disk media and/or any other media which can be used to store information and which can be accessed within the computer system (100). The storage (170) stores instructions for the software (180) implementing tools for adaptive panoramic video stream configuration, streaming, and/or playback, using overlapping sections and/or composite pictures.

An interconnection mechanism (not shown) such as a bus, controller, or network interconnects the components of the computer system (100). Typically, operating system software (not shown) provides an operating environment for other software executing in the computer system (100), and coordinates activities of the components of the computer system (100).

The computer system (100) of FIG. 1 is a physical computer system. A virtual machine can include components organized as shown in FIG. 1.

The innovations can be described in the general context of computer-readable media. Computer-readable media are any available tangible media that can be accessed within a computing environment. By way of example, and not limitation, with the computer system (100), computer-readable media include memory (118, 120), storage (170), and combinations thereof. The term computer-readable media does not encompass transitory propagating signals or carrier waves.

The innovations can be described in the general context of computer-executable instructions being executed in a computer system on a target real or virtual processor. The computer-executable instructions can include instructions executable on processing cores of a general-purpose processor to provide functionality described herein, instructions executable to control a GPU or special-purpose hardware to provide functionality described herein, instructions executable on processing cores of a GPU to provide functionality described herein, and/or instructions executable on processing cores of a special-purpose processor to provide functionality described herein. In some implementations, computer-executable instructions can be organized in program modules. Generally, program modules include routines, programs, libraries, objects, classes, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The functionality of the program modules may be combined or split between program modules as desired in various embodiments. Computer-executable instructions for program modules may be executed within a local or distributed computer system.

In general, a computer system or device can be local or distributed, and can include any combination of special-purpose hardware and/or hardware with software implementing the functionality described herein. For the sake of presentation, the detailed description uses terms like “determine,” “receive” and “provide” to describe computer operations in a computer system. These terms denote operations performed by a computer and should not be confused with acts performed by a human being. The actual computer operations corresponding to these terms vary depending on implementation.

II. Example Network Environments

FIGS. 2a and 2b show example network environments (201, 202) that include video encoders (220) and video decoders (270). The encoders (220) and decoders (270) are connected over a network (250) using an appropriate communication protocol. The network (250) can include the Internet or another computer network.

In the network environment (201) shown in FIG. 2a, each real-time communication (“RTC”) tool (210) includes both one or more encoders (220) and one or more decoders (270) for bidirectional communication. Each RTC tool (210) is an example of a panoramic video stream configuration tool and a panoramic video playback tool. A given encoder (220) can produce output compliant with the H.265/HEVC standard, ISO/IEC 14496-10 standard (also known as H.264/AVC), another standard, or a proprietary format such as VP8 or VP9, or a variation or extension thereof, with a corresponding decoder (270) accepting and decoding encoded data from the encoder (220). The bidirectional communication can be part of a video conference, video telephone call, or other two-party or multi-party communication scenario. Although the network environment (201) in FIG. 2a includes two RTC tools (210), the network environment (201) can instead include three or more RTC tools (210) that participate in multi-party communication.

An RTC tool (210), as a panoramic video stream configuration tool, manages encoding by the encoder(s) (220) and also, as a panoramic video playback tool, manages decoding by the decoder(s) (270). FIG. 4 shows an example panoramic video stream configuration tool (400) that can be implemented in the RTC tool (210). FIG. 5 shows an example panoramic video playback tool (500) that can be implemented in the RTC tool (210). Alternatively, the RTC tool (210) uses another panoramic video stream configuration tool and/or another panoramic video playback tool.

In the network environment (202) shown in FIG. 2b, a panoramic video stream configuration tool (212) includes one or more encoders (220) that encode video for delivery to multiple panoramic video playback tools (214), which include decoders (270). The unidirectional communication can be provided for live broadcast video streaming, a video surveillance system, web camera monitoring system, remote desktop conferencing presentation or sharing, wireless screen casting, cloud computing or gaming, or other scenario in which panoramic video is encoded and sent from one location to one or more other locations. Although the network environment (202) in FIG. 2b includes two playback tools (214), the network environment (202) can include more or fewer playback tools (214). In general, a playback tool (214) communicates with the stream configuration tool (212) to determine one or more streams of video for the playback tool (214) to receive. The playback tool (214) receives the stream(s), buffers the received encoded data for an appropriate period, and begins decoding and playback.

The stream configuration tool (212) can include server-side controller logic for managing connections with one or more playback tools (214). A playback tool (214) can include client-side controller logic for managing connections with the stream configuration tool (212). FIG. 4 shows an example panoramic video stream configuration tool (400) that can be implemented in the stream configuration tool (212). Alternatively, the stream configuration tool (212) uses another panoramic video stream configuration tool. FIG. 5 shows an example panoramic video playback tool (500) that can be implemented in the playback tool (214). Alternatively, the playback tool (214) uses another panoramic video playback tool.

Alternatively, a Web server or other media server can store encoded video for delivery to one or more panoramic video playback tools (214), which include decoders (270). The encoded video can be provided, for example, for on-demand video streaming, broadcast, or another scenario in which encoded video is sent from one location to one or more other locations. A playback tool (214) can communicate with the media server to determine one or more streams of video for the playback tool (214) to receive. The media server can include server-side controller logic for managing connections with one or more playback tools (214). A playback tool (214) receives the stream(s), buffers the received encoded data for an appropriate period, and begins decoding and playback.

III. Example Projections for a Picture of Panoramic Video

Panoramic video (sometimes called 360-degree video, immersive video, or spherical video) is video in which views in multiple directions around a central position are recorded at the same time. A picture of panoramic video is a representation of the views in multiple directions recorded at a given time. The picture of panoramic video can include image content in every direction or substantially every direction from the central position. More commonly, a picture of panoramic video includes image content in every direction in a 360-degree circle around the central position, including at least some image content above the central position and at least some image content underneath the central view/camera position.

A picture of panoramic video includes sample values, which represent colors at locations of the picture. Depending on how the picture is projected, sample values of the picture can have various attributes. In general, sample values can have 8 bits per sample value, 10 bits per sample value, 12 bits per sample value, or some other number of bits per sample value. The dynamic range of sample values can be standard dynamic range (e.g., 0 to 100 nits), high dynamic range (e.g., 0 nits to 1000 nits, 0 nits to 1500 nits, 0 nits to 4000 nits), or some other dynamic range. With respect to color gamut, the sample values can have a narrow color gamut (common for standard dynamic range video) or a wider color gamut (common for high dynamic range video), which can potentially represent colors that are more saturated, or vivid. For a rectilinear projection, the spatial resolution of a picture of panoramic video can be 1280×720 sample values (so-called 720p), 1920×1080 sample values (so-called 1080p), 2160×1080 sample values, 3840×2160 (so-called 4K), 4320×2160 sample values, 7680×3840 sample values, 7680×4320 sample values (so-called 8K), 8640×4320 sample values, or some other number of sample values per picture. Often, the spatial resolution of a picture of panoramic video is very high (e.g., 8K or higher), so as to provide sufficient spatial resolution when a smaller view within the picture is rendered. In general, a pixel is the set of one or more collocated sample values for a location in a picture, which may be arranged in different ways for different chroma sampling formats. For a spherical projection, spatial resolution can vary.

Typically, before encoding in a rectilinear projection (e.g., an equirectangular projection), sample values of a picture are converted to a color space such as YUV, in which sample values of a luma (Y) component represent brightness or intensity values, and sample values of chroma (U, V) components represent color-difference values. The precise definitions of the color-difference values (and conversion operations between YUV color space and another color space such as RGB) depend on implementation. In general, as used herein, the term YUV indicates any color space with a luma (or luminance) component and one or more chroma (or chrominance) components, including Y′UV, YIQ, Y′IQ and YDbDr as well as variations such as YCbCr and YCoCg. Chroma sample values may be sub-sampled to a lower chroma sampling rate (e.g., for a YUV 4:2:0 format) in order to reduce the spatial resolution of chroma sample values, or the chroma sample values may have the same resolution as the luma sample values (e.g., for a YUV 4:4:4 format). After decoding, sample values in a rectilinear projection may be converted to another color space, such as an RGB color space. Sample values in a spherical projection or screen projection for a picture of panoramic video may be in an RGB color space or other color space.

The image content for a picture of panoramic video can be organized in various ways. FIG. 3a shows a spherical projection (301) for a picture of panoramic video. In the spherical projection (301), sample values are mapped to locations equally distant from a central view/camera position. Sample values may be in an RGB color space or other color space close to the final color space for rendering. The spherical projection (301) provides a conceptually simple way to represent the sample values of the picture of panoramic video, and may be useful for some modeling and rendering operations. For other stages of processing (e.g., storage, compression, decompression), however, the spherical projection (301) may not be as efficient as other types of projections.

FIG. 3b shows an equirectangular projection (302) for a picture of panoramic video. The equirectangular projection (302) is a useful representation for storing, compressing, and decompressing sample values of the picture of panoramic video. In particular, sample values of the equirectangular projection (302) can be processed with conventional video coding/decoding tools, which process blocks of sample values in rectangular pictures. The equirectangular projection (302) depicts image content in 360 degrees, rotating sideways from a central view/camera position, along the horizontal axis that bisects the equirectangular projection (302); it depicts image content in 180 degrees, rotating up or down from a central view/camera position, along the vertical axis. In the equirectangular projection (302), content towards the top of the picture and content towards the bottom of the picture is stretched horizontally, and content midway between the top and bottom is squeezed horizontally. In addition to causing visible distortion (which is not a problem to the extent the equirectangular projection (302) is not directly rendered for display), the equirectangular projection (302) uses extra sample values to represent the content towards the top of the picture and content towards the bottom of the picture, which can decrease compression efficiency. Metadata associated with the equirectangular projection (302) can indicate resolution of the equirectangular projection (302) as well as a view direction at each of one or more locations of the equirectangular projection (302) (e.g., view direction at the center of the equirectangular projection (302), view direction at the midpoint of the vertical axis along an edge of the equirectangular projection (302)). Or, a default view direction for a location of the equirectangular projection (302) can be defined. For example, the center of the equirectangular projection (302) is defined to be the view direction with pan of zero degrees and pitch of zero degrees.

FIG. 3c shows a sinusoidal projection (303) for a picture of panoramic video. The sinusoidal projection (303) is another useful representation for storing, compressing, and decompressing sample values of the picture of panoramic video. A sinusoidal projection is a pseudo-cylindrical, equal-area map projection, in which scale is constant along a central meridian (or multiple central meridians), and horizontal scale is constant throughout the map. A sinusoidal projection can have a single fold (single central meridian) or multiple folds (multiple central meridians). For example, a bi-fold sinusoidal projection can have two central meridians of equal length, with the two folds corresponding to hemispheres of the map. Thus, the sinusoidal projection (303) depicts image content in 360 degrees, rotating sideways from a central view/camera position, along the horizontal axis that bisects the sinusoidal projection (303); it depicts image content in 180 degrees, rotating up or down from a central view/camera position, along the vertical axis. Unlike the equirectangular projection (302), in the sinusoidal projection (303), content towards the top of the picture and content towards the bottom of the picture is not stretched horizontally, and content midway between the top and bottom is not squeezed horizontally. The sinusoidal projection (303) uses extra sample values having default values (e.g., black, gray) to represent areas outside the actual content, towards the top or bottom of the picture. Although this approach results in some sample values not being used to represent actual coded panoramic video, compression efficiency still tends to be better than with the equirectangular projection (302). Metadata associated with the sinusoidal projection (303) can indicate resolution of the sinusoidal projection (303) as well as a view direction at each of one or more locations of the sinusoidal projection (303) (e.g., view direction at the center of the sinusoidal projection (303), view direction at the midpoint of the vertical axis along an edge of the sinusoidal projection (303)). Or, a default view direction for a location of the sinusoidal projection (303) can be defined. For example, the center of the sinusoidal projection (303) is defined to be the view direction with pan of zero degrees and pitch of zero degrees.

FIG. 3d shows a cubemap projection (304) for a picture of panoramic video. Like the equirectangular projection (302) and sinusoidal projection (303), the cubemap projection (304) is a useful representation for storing, compressing, and decompressing sample values of the picture of panoramic video, because the faces of the cubemap projection (304) can be “unfolded” and/or split into separate sections for such operations. In the cubemap projection (304), content towards the edges of faces of a cube is stretched horizontally and/or vertically, and content towards the middle of faces is squeezed horizontally and/or vertically. In general, the extent of such stretching is less than at the top and bottom of the equirectangular projection (302), and the cubemap projection (304) may use fewer extra sample values to represent stretched content. Metadata associated with the cubemap projection (304) can indicate resolution of the cubemap projection (304) as well as a view direction at each of one or more locations of the cubemap projection (304). Or, default view directions for locations of the cubemap projection (304) can be defined.

During playback, pictures of panoramic video are reconstructed. At least conceptually, a picture may be represented in spherical projection at this stage. Typically, a viewer can control a view direction relative to the central view/camera position for the spherical projection, potentially changing which section of the panoramic video is viewed. For example, in addition to specifying heading in degrees or radians from side to side (i.e., yaw, or pan) for a view direction, the viewer can specify an inclination in degrees or radians up or down (i.e., pitch, or tilt) for the view direction and even a rotation in degrees or radians of the view (i.e., roll) for the view direction. Alternatively, the view direction can be parameterized in some other way (e.g., as a matrix of affine transform coefficients that specify a spatial rotation in three dimensions using Euler angles or quaternion units, corresponding to heading, pitch, and roll values). The viewer may also be able to zoom in or zoom out. A field of view can be specified in degrees (e.g., 90 degrees for normal view, 120 degrees for wide view) or radians. When a view of panoramic video is rendered for display, the section of the panoramic video that is viewed may be projected to a flat image, which is called a screen projection.

FIG. 3e shows an example of screen projection for a view of a picture of panoramic video. An equirectangular projection (302) of the picture is reconstructed, e.g., through video decoding operations and color conversion operations. The sample values of the picture of panoramic video are mapped to the spherical projection (303). In essence, the sample values are projected to the “inside” of the sphere for the spherical projection (303), as viewed from the perspective of a view/camera position at the center of the sphere. Locations in the spherical projection (303) are mapped to corresponding locations in the equirectangular projection (302). If a corresponding location in the equirectangular projection (302) is at or near an integer (whole pixel) offset, the sample value from the corresponding location is assigned to the location in the spherical projection (303). Otherwise, a sample value can be calculated by interpolation between sample values at nearby locations in the equirectangular projection (302) (e.g., using bilinear interpolation), and the (interpolated) sample value is assigned to the location in the spherical projection (303).

A view window (310) in the spherical projection (303) is found, based on a view direction, zoom factor, and field of view from the central view/camera position. The view window (310) is projected to a screen projection (320) for rendering. For example, a perspective transform is applied to assign sample values to the respective locations of the screen projection (320) from the sample values of the spherical projection (303). For every location of the screen projection (320), a sample value is assigned directly from the spherical projection (303) or from interpolation between sample values of the spherical projection (303). Thus, the screen projection (320) includes sample values from the spherical projection (303) and, by extension, sample values from relevant parts of the equirectangular projection (302).

IV. Examples of Identifying Sections of Pictures in Input Projections

When an application provides a view direction, field of view (if not pre-defined), and zoom factor (if configurable) for rendering a view of a picture of panoramic video, the application specifies a view window to be rendered. For example, an application provides an indication of view direction to a module of a panoramic video playback tool. The view direction can be specified as (1) a heading in degrees or radians from side to side (i.e., yaw, or pan) from a central view/camera position and (2) an inclination in degrees or radians up or down (i.e., pitch, or tilt) from the view/camera position. The view direction can also include (3) a rotation in degrees or radians of the view (i.e., roll) from the view/camera position. Alternatively, the view direction can be parameterized in some other way (e.g., as a matrix of affine transform coefficients that specify a spatial rotation in three dimensions using Euler angles or quaternion units, which correspond to heading, pitch, and roll values). The field of view can be specified in degrees (e.g., 90 degrees for normal view, 120 degrees for wide view) or radians. A zoom factor can be specified as a distance from a view camera position, size of view window, or in some other way. Alternatively, instead of directly providing indications of view direction (and possibly field of view and zoom factor), an application can specify a source for indications of view direction (and possibly field of view and zoom factor), in which case the specified source provides the indications during rendering. In any case, the module of the panoramic video playback tool finds the appropriate view window for a spherical projection of the picture of panoramic video.

The view window typically includes a small proportion of the overall content of a picture of panoramic video. To simplify processing and save resources during operations such as retrieval and decoding of encoded data, a panoramic video playback tool can identify one or more sections of an input picture, in an input projection (such as an equirectangular projection, cubemap projection, sinusoidal projection, or other projection), that contain the view window, then use that information to limit which operations are performed when reconstructing the picture of panoramic video. In particular, the panoramic video playback tool can limit operations to the identified section(s) of the picture in the input projection.

For example, a panoramic video playback tool finds a view window of a spherical projection based on a view direction (and field of view and zoom factor, which may be pre-defined). Based on the view window, the playback tool identifies one or more sections of an input picture (in an input projection such as an equirectangular projection, cubemap projection, or sinusoidal projection) that contain the view window of the spherical projection. Given a view window of the spherical projection, the playback tool can project from the spherical projection back to the input projection to identify a corresponding window in the input picture of panoramic video, then identify those sections in the input picture that include any part of the corresponding window. The corresponding window in the input picture can have an irregular boundary and be split (e.g., across an edge). In this way, the playback tool can identify any section of the picture that contains at least part of the view window.

Typically, the identified section(s) are aligned with boundaries of groups of sample values (e.g., blocks, slices, tiles) for different operations in the reconstruction process. Depending on the position and size of the view window, one section of the picture may include the entire view window. Or, multiple sections of the picture may collectively include the view window. The multiple sections can be contiguous or, if the view window crosses an edge of the picture, be non-contiguous. The playback tool can limit operations (such as retrieval of encoded data, decoding of encoded data, and creation of output pictures) to the identified section(s).

V. Example Architectures for Adaptive Streaming of Panoramic Video

When a panoramic video stream configuration tool receives input pictures of panoramic video, the stream configuration tool produces encoded data for the panoramic video in multiple bitstreams. When a panoramic video playback tool receives encoded data for panoramic video, the playback tool renders views of the panoramic video. This section describes various aspects of example architectures for stream configuration and example architectures for playback of panoramic video, including use of overlapping sections and composite pictures.

Panoramic video can be produced and streamed for various use case scenarios. For example, panoramic video can be produced and streamed for a live event such as a concert or sporting event. Or, as another example, panoramic video can be produced and streamed for an immersive experience for education, virtual travel, or a virtual walk-through for a real estate listing. Or, as another example, panoramic video can be produced and streamed for conferencing or tele-medicine. Or, as another example, panoramic video can be produced and streamed for immersive gameplay broadcasting.

Panoramic video can be played back in various ways. For example, panoramic video can be played back through a Web browser or video playback application, executing on a game console, desktop computer, or other computing platform. Or, as another example, panoramic video can be played back through a mobile device or head-mounted display for a VR or AR application.

In some configurations, a single entity manages end-to-end behavior of a panoramic video stream configuration tool and one or more panoramic video playback tools. In such configurations, the stream configuration tool and playback tool(s) can exchange information about partitioning of input pictures into sections, organization of composite pictures, stream selection decisions, etc. in one or more private channels. In alternative configurations, the panoramic video stream configuration tool and panoramic video playback tool(s) are managed by different entities. In such configurations, the stream configuration tool and playback tool(s) can interoperate across standardized interfaces, according to defined protocols, to exchange information about partitioning of input pictures into sections, organization of composite pictures, stream selection decisions, etc.

A. Example Stream Configuration Architectures

FIG. 4 shows an example architecture for a panoramic video stream configuration tool (400) that supports overlapping sections and composite pictures. In addition to a video source (410) and a media server (490), the example architecture includes a panoramic video stream configuration tool (400) with an input buffer (430), a formatter (440), one or more video encoders (460), and one or more output buffers (470).

The video source (410) provides input pictures (420) of panoramic video to the input buffer (430). For example, the video source (410) includes a buffer associated with an omnidirectional camera, which produces input pictures (420) of panoramic video. Alternatively, the video source (410) includes buffers associated with a collection of cameras, which produce pictures taken in different directions at a location, and a buffer that stores input pictures (420) of panoramic video aggregated, mosaicked, composited, etc. from the pictures produced by the cameras. The cameras can be physical cameras that record natural video or virtual cameras that record video in a synthetic environment (e.g., game environment). Alternatively, the stream configuration tool (400) can itself create the input pictures (420) of panoramic video, which are stored in the input buffer (430), from pictures of streams that the stream configuration tool (400) receives. The panoramic video stream configuration tool (400) can implemented at a content production site, co-located with the video source (410) or cameras. Alternatively, the panoramic video stream configuration tool (400) can be implemented at a remote site (e.g., Web server), with the video source (410) providing input pictures (420) of panoramic video to the configuration tool (400) over a network, or cameras providing streams of video to the configuration tool (400) over a network.

The input buffer (430) is configured to receive and store one or more input pictures (420) of panoramic video. Typically, an input picture (420) is in an input projection. For example, the input projection can be an equirectangular projection, cubemap projection, sinusoidal projection, or other type of projection. In some example implementations, an input picture (420) has a spatial resolution of 4K or higher. Alternatively, an input picture (420) can have a lower spatial resolution.

The formatter (440) is configured to split each input picture (420) into multiple sections (445) (n sections) according to partition settings. The value of n depends on implementation. For example, n is 6, 8, 12, or 16. A data store (not shown) can store various settings for the panoramic video stream configuration tool (400). For example, the settings can include partition settings used to split input pictures (420) of panoramic video into sections (445). The partition settings can include the count n of sections (445) into which input pictures (420) are partitioned, the relative sizes and positions of the sections (445), and (for overlapping sections) the extent of overlap between sections (445). The spatial resolution of the sections (445) depends on implementation. In some example implementations, the sections (445) each have a spatial resolution of 1080p, 720p, or some other resolution that is readily accepted by the video encoder(s) (460) and large enough to contain the content for a typical view window in playback, but small enough to exclude content of the panoramic video outside of a typical view window (to avoid unnecessary retrieval and reconstruction of content during playback).

In some configurations, the n sections (445) are non-overlapping. In other configurations, the n sections (445) are overlapping. That is, each of the n sections (445) overlaps at least one other section among the n sections (445). In some example implementations, each of the n sections (445) overlaps each adjacent section among the n sections. The overlapping of the sections (445) tends to decrease overall compression efficiency (because the same sample values may be redundantly encoded in different sections). On the other hand, the overlapping of the sections (445) tends to reduce the incidence of disruption of playback caused by bitstream switching. The formatter (440) is configured to add the n sections (445) to corresponding video streams. In FIG. 4, there are n streams for then sections (445), which are labeled 0 . . . n−1.

The formatter (440) can be configured to project the input picture (420) from an input projection to an intermediate projection, such that the n sections (445) are in the intermediate projection. For example, the input projection is an equirectangular projection or a cubemap projection, and the intermediate projection is a sinusoidal projection. In this case, at least one of the n sections (445) includes at least some sample values having default values, not representing content of the input picture of panoramic video.

The formatter (440) can be configured to receive an indication of feedback and, based at least in part on the indication of feedback, adjust the partition settings. For example, the indication of feedback includes an indication of network connection quality, an indication of magnitude of view window change activity, an indication of which view direction is prevalent, and/or some other type of feedback. To adjust the partition settings, the formatter (440) can be configured to change an extent of overlap between overlapping sections, change a count of the n sections (445), change relative sizes of at least some of the n sections (445), change positions of at least some of then sections (445), add one or more sections, at new positions, to the n sections (445), remove one or more sections from the n sections (445), and/or make some other change to the partition settings.

In some configurations, the formatter (440) is configured to create a low-resolution version of the input picture (420). For example, the formatter (440) downsamples the input picture (420) horizontally and/or vertically. The low-resolution version of the input picture (420) can have a width the same as one of then sections (e.g., 1920 sample values for a 1080p section, 1280 sample values for a 720p section). The height of the input picture (420) can be reduced proportionally. The formatter (440) is further configured to, after splitting the input picture (420) into n sections (445) (which can be overlapping or non-overlapping, depending on implementation) according to partition settings, create n composite pictures (446). Each of the n composite pictures (446) includes one of the n sections (445) and also includes the low-resolution version of the input picture (420). The formatter (440) is configured to add the n composite pictures (446), including the n sections (445), respectively, to corresponding video streams.

A composite picture (446) can be organized in various ways. For example, for each of the n composite pictures (446), the low-resolution version of the input picture (420) is adjacent one of then sections (445) within the composite picture (446). Or, as another example, for each of then composite pictures (446), one of the n sections (445) provides a first view of a frame packing arrangement, and the low-resolution version of the input picture (420) provides a second view of the frame packing arrangement. Within a composite picture (446), the low-resolution version of the input picture (420) can be positioned at a pre-defined location relative to one of the n sections (445). Alternatively, within a composite picture (446), the low-resolution version of the input picture (420) can be positioned at a variable location relative to one of the n sections (445).

The input picture (420) and the low-resolution version of the input picture (420) can be in an input projection, such as an equirectangular projection or a cubemap projection. The formatter (440) can be further configured to project the input picture (420) from the input projection to an intermediate projection, such as a sinusoidal projection. In a composite picture (446), the low-resolution version of the input picture (420) can be in the input projection or the intermediate projection.

The video encoder(s) (460) are configured to encode sample values of the n sections (445) or n composite pictures (446) in the corresponding video streams, respectively. The sample values are, for example, 8-bit sample values or 10-bit sample values in a YUV color space, with a chroma sampling rate of 4:2:0. Alternatively, the sample values encoded by the video encoder(s) (460) are in another format. The encoding produces encoded data (465) for the n sections (445) or n composite pictures (446) as part of n bitstreams for the corresponding video streams, respectively. For example, the n bitstreams are video elementary bitstreams. Depending on implementation and the format of the encoded data, the video encoder(s) (460) can produce encoded data conformant to the H.265/HEVC standard, ISO/IEC 14496-10 standard (also known as H.264/AVC), another standard, or a proprietary format such as VP8 or VP9, or a variation or extension thereof. The stream configuration tool (400) can include a multiplexer (not shown) configured to combine the encoded data, for the n bitstreams, into a single container stream.

The formatter (440) is further configured to produce one or more manifest files (442). The manifest file(s) (442) include information indicating, for each of the n bitstreams, the position (e.g., in coordinates of the input picture (420), or in coordinates of a spherical projection) of one of then sections (445) whose content is part of the corresponding video stream for that bitstream. The manifest file(s) (442) can also include information that indicates where the low-resolution version of the input picture (420) is positioned in the n composite pictures (446), respectively.

The output buffer(s) (470) are configured to store the encoded data (465) for delivery to the media server (490). The output buffer(s) can also store the manifest file(s) (442) for delivery to the media server (490). The media server (490) can be a Web server or other server, connected over a network, that stores encoded data (465) for the n streams of sections (or composite pictures) of the panoramic video and streams the encoded data (465) for selected bitstreams to playback tools for playback.

Depending on implementation and the type of processing desired, modules of the panoramic video stream configuration tool (400) can be added, omitted, split into multiple modules, combined with other modules, and/or replaced with like modules. In alternative embodiments, panoramic video stream configuration tools with different modules and/or other configurations of modules perform one or more of the described techniques. Specific embodiments of panoramic video stream configuration tools typically use a variation or supplemented version of the panoramic video stream configuration tool (400). The relationships shown between modules within the panoramic video stream configuration tool (400) indicate general flows of information in the panoramic video stream configuration tool (400); other relationships are not shown for the sake of simplicity.

In general, a given module of the panoramic video stream configuration tool (400) can be implemented by software executable on a CPU, by software controlling special-purpose hardware (e.g., a GPU or other graphics hardware for video acceleration), or by special-purpose hardware (e.g., in an ASIC). In particular, in some example implementations, video encoding operations and re-projection operations to map sample values between different projections are implemented with shader instructions executable on a GPU. Thus, computationally-intensive, repetitive operations (e.g., for video encoding, for mapping between different types of projections when splitting input pictures into sections) are likely to be implemented with graphics hardware (e.g., as shader instructions for a GPU) or other special-purpose hardware, and higher-level operations (e.g., deciding how to partition input pictures) are likely to be implemented in software executable on a CPU.

B. Example Playback Architectures

FIG. 5 shows an example architecture for a panoramic video playback tool (500) that supports overlapping sections and composite pictures. In addition to a media server (530), application (580), and display device (590), the example architecture includes a panoramic video playback tool (500) with a view controller (510), a streaming controller (520), one or more input buffers (540), one or more video decoders (550), a mapper (570), and one or more output buffers (585).

The application (580) can be provided by a third party or packaged as part of the panoramic video playback tool (500). The application (580) can separated from other modules of the panoramic video playback tool (500) (system-provided modules) by an application programming interface (“API”).

A data store (not shown) can store various settings for the panoramic video playback tool (500). For example, the settings can include information provided by the application (580) when the application (580) is installed. Other modules can interact with the data store across an interface.

The view controller (510) is configured to determine a view window for playback of panoramic video. For example, the view window depends on a view direction. The view controller (510) is configured to receive an indication of a view direction (582) for the application (580). In FIG. 5, the application (580) provides the indication of the view direction (582). Instead of the application (580), another source (e.g., a source based on one or more sensors such as one or more accelerometers, gyroscopes, tilt sensors, optical sensors, cameras, etc., or a source of user input events for key presses, mouse cursor movements, mouse scroll wheel movements, remote control input, game controller input, touch screen input, etc.) can provide the indication of the view direction (582). For example, the view direction (582) is parameterized as described in section IV. The view window can also depend on a field of view and/or zoom factor. In some configurations, the view controller (510) is also configured to receive an indication of a field of view (584) for the application (580), from the application (580) or another source. For example, the field of view (584) is parameterized as described in section IV. The field of view (584) can be defined for the application (580) or for a playback session. In some configurations, the view controller (510) is also configured to receive an indication of a zoom factor for the application (580), from the application (580) or another source. For example, the zoom factor is parameterized as described in section IV. Typically, the zoom factor can change dynamically (e.g., to zoom in or zoom out) during a playback session.

The view controller (510) is configured to receive one or more manifest files (542) and use the manifest file(s) (542) to identify one or more sections that contain at least part of the view window and/or select one or more bitstreams for the identified section(s). The manifest file(s) (542) are provided from an input buffer (540), which can receive the manifest file(s) (542) from the media server (530) or directly from a stream configuration tool. The manifest file(s) (542) include information indicating, for each of n bitstreams, the position (e.g., in coordinates of the input picture (420), or in coordinates of a spherical projection) of one of n sections whose content is part of the corresponding video stream. The manifest file(s) (542) can also include information that indicates where the low-resolution version of an input picture is positioned in composite pictures (556), respectively.

The view controller (510) is also configured to, from among multiple sections (n sections) of the panoramic video, identify one or more sections that contain at least part of the view window. For example, the view controller (510) is configured to identify each of the n sections that contains at least part of the view window. The view controller (510) can identify the section(s) that each contain at least part of the view window as described in section IV. Or, the view controller (510) can simply use position information (e.g., coordinates of a spherical projection) in the manifest file(s) (542) to identify the section(s) that each contain at least part of the view window. The identified section(s) can be contiguous sections for an input picture in an input projection (e.g., equirectangular projection, cubemap projection, sinusoidal projection). Or, the identified section(s) can be non-contiguous sections that wrap around one or more edges of an input picture in an input projection. The view controller (510) is further configured to, for the identified section(s), select one or more bitstreams among n bitstreams for corresponding video streams. The view controller (510) is configured to pass a control signal (512) to the streaming controller (520) that indicates the selected bitstream(s). In this way, the view controller (510) can iteratively perform operations to determine the view window, identify the section(s) that contain at least part of the view window, and select the bitstream(s) for the identified section(s).

In some configurations, the n sections are non-overlapping. In other configurations, the n sections are overlapping. That is, each of the n sections overlaps at least one other section among the n sections. In some example implementations, each of the n sections overlaps each adjacent section among the n sections. The overlapping of the n sections tends to decrease overall compression efficiency (because the same sample values may be redundantly encoded in different sections). On the other hand, the overlapping of the n sections also tends to reduce the incidence of playback disruption attributable to bitstream switching. In particular, when there is gradual panning motion out of a section or zooming to a new section, the overlapping of the n sections gives a way to render high-resolution views quickly and without playback disruption. Although the playback tool (500) can switch to a new bitstream for a new section, stream switching can take time (e.g., to send the request for the new bitstream to the media server (530), and to wait for a switch point at which decoding can begin in the new bitstream). For example, stream switching can take 3-5 seconds. With overlapping sections, if the view direction and/or zoom factor changes, the view controller (510) can identify new sections/streams that are to be used to create an output picture (575) for the view window sooner. Because of overlap between adjacent sections, for changes in view direction and/or zoom factor that are gradual and consistent, the view controller (510) in effect pre-fetches a new section as the view window moves out of a current section into the new section. By the time the view window reaches a non-overlapping part of the adjacent, new section, content for the adjacent, new section has already been retrieved and reconstructed, assuming the motion of the view window is not extreme. This hides network latency and stream switching latency from the viewer, and disruption of playback is avoided.

The view controller (510) can be configured to send an indication of feedback (e.g., to a stream configuration tool or to an aggregator for feedback). The feedback can then be used to adjust the partition settings applied when splitting an input picture into sections. For example, the indication of feedback includes an indication of network connection quality, an indication of magnitude of view window change activity, an indication of which view direction is prevalent, and/or another type of feedback.

The view controller (510) is configured to provide an indication (515) of the identified section(s) to the mapper (570). The mapper (570) can use the indication (515) of the identified section(s), as well as the manifest file(s) (542), when creating an output picture (575) for the view window.

The streaming controller (520) is configured to request encoded data, in the selected bitstream(s) for the identified section(s), respectively, for an input picture of the panoramic video. Depending on configuration, the streaming controller (520) can send a request (522) for encoded data (532) to the media server (530), directly to a panoramic video stream configuration tool, or to a local media store (531). The streaming controller (520) can make separate requests (522) for encoded data (532) for different portions (e.g., slices, tiles) of an input picture or for each input picture, or it can batch requests.

The media server (530) can be a Web server or other server, connected over a network, that is configured to store encoded data (532) for the n bitstreams for sections of the panoramic video, and stream the encoded data (532) for selected ones of the n bitstreams to playback tools for playback. In the scenario shown in FIG. 5, the media server (530) streams encoded data (532) for one or more selected bitstreams, which correspond to the identified sections(s) that contain a view window for playback.

If a media server (530) is not used, the panoramic video playback tool (500) can retrieve encoded data (532) for the selected bitstream(s) from a media store (531). The media store (531) can be a magnetic disk, optical storage media, non-volatile memory, or other storage or memory, connected locally to the panoramic video playback tool (500), that is configured to store encoded data (532) for panoramic video, and provide it for playback.

In some configurations, each of the identified section(s) is part of a composite picture (556) that also includes a low-resolution version of the input picture. The low-resolution version typically results from downsampling the input picture horizontally and/or vertically. The low-resolution version of the input picture can have a width the same as one of then sections (e.g., 1920 sample values for a 1080p section, 1280 sample values for a 720p section), with the height of the input picture reduced proportionally.

A composite picture (556) can be organized in various ways. For example, for each of n composite pictures (556), the low-resolution version of the input picture is adjacent one of the n sections within the composite picture (556). Or, as another example, for each of n composite pictures (556), one of then sections (555) provides a first view of a frame packing arrangement, and the low-resolution version of the input picture provides a second view of the frame packing arrangement. Within a composite picture (556), the low-resolution version of the input picture can be positioned at a pre-defined location relative to one of the n sections (555). Alternatively, within a composite picture (556), the low-resolution version of the input picture can be positioned at a variable location relative to one of the n sections (555).

The input buffer(s) (540) are configured to store the encoded data (532) for the selected bitstream(s). One of the input buffer(s) can also store the manifest file(s) (542), which may be provided by the media server (530), local media store (531), or a stream configuration tool. The input buffer(s) (540) are configured to provide encoded data (532) for selected bitstreams(s) to the video decoder(s) (550).

The video decoder(s) (550) are configured to decode the encoded data (532) to reconstruct the identified section(s) for the input picture, producing sample values for one or more reconstructed sections (555) from the corresponding video streams. When the selected bitstream(s) include composite picture(s) (556), the video decoder(s) are also configured to decode the encoded data (532) for the low-resolution version of the input picture, producing sample values for the low-resolution version of the input picture. Depending on implementation and the format of the encoded data, the video decoder(s) (550) can decode the encoded data (532) in a manner consistent with the H.265/HEVC standard, ISO/IEC 14496-10 standard (also known as H.264/AVC), another standard, or a proprietary format such as VP8 or VP9, or a variation or extension thereof. The sample values are, for example, 8-bit sample values or 10-bit sample values in a YUV color space, with a chroma sampling rate of 4:2:0. Alternatively, the sample values output by the video decoder(s) (550) are in another format.

The mapper (570) is configured to, based at least in part on the reconstructed section(s) (555) and/or the reconstructed low-resolution version of the input picture from the composite picture(s) (556), create an output picture (575). For example, the mapper (570) is configured to use the indication (515) of the identified section(s), as well as the manifest file(s) (542), to determine which sample values of the reconstructed section(s) (555), respectively, to map to the output picture (575). The mapper (570) can be configured to determine which sample values of the output picture (575) cannot be determined using the reconstructed section(s) (555) and, for any sample value of the output picture (575) that cannot be determined using the reconstructed section(s) (555), determine that sample value of the output picture (575) using the reconstructed low-resolution version of the input picture. Thus, the output picture (575) can be created using only sample values of the reconstructed section(s) (555). Or, the output picture (575) can be created using only sample values of the reconstructed low-resolution version of the input picture. Or, at least part of the output picture (575) can be created using sample values of the reconstructed section(s) (555), and at least part of the output picture (575) can be created using sample values of the reconstructed low-resolution version of the input picture.

In general, the mapper (570) is configured to perform various color space conversion operations, chroma rate upsampling operations, projection operations, interpolation operations, and spatial upsampling operations. For example, the mapper (570) is configured to convert at least some of the sample values produced by the video decoder(s) (550) from a first color space (such as a YUV color space) to a second color space (such as an RGB color space). The mapper (570) can be configured to, before color space conversion or as part of color space conversion, perform chroma sample rate upsampling, to restore chroma sample values to have the same resolution as luma sample values in the decoded video. To create the output picture (575), the mapper (570) can be configured to project the reconstructed section(s) (555) from an intermediate projection (e.g., a sinusoidal projection) to an output projection (e.g., a screen projection). To create the output picture (575), the mapper (570) can also be configured to project the reconstructed low-resolution version of the input picture from an input projection (e.g., equirectangular projection, cubemap projection) or intermediate projection (e.g., a sinusoidal projection) to an output projection (e.g., a screen projection). The mapper (570) is configured to determine appropriate sample values of the output picture (575) from sample values at corresponding locations in the reconstructed section(s) (555) or reconstructed low-resolution version of the input picture, potentially selecting sample values at the corresponding locations or performing interpolation operations (e.g., bilinear interpolation operations) to determine sample values at the corresponding locations between adjacent sample values of the reconstructed section(s) (555) or reconstructed low-resolution version of the input picture. The mapper (570) can be configured to perform spatial upsampling operations on sample values of the reconstructed low-resolution version of the input picture, to reverse downsampling operations performed when creating the low-resolution version of the input picture.

The output buffer(s) (585) are configured to store the output picture (575) for output to a display device (590). The display device (590) can be a head-mounted display, computer monitor, television screen, mobile device screen, or other type of display device.

In some example implementations, for a platform rendering mode, the mapper (570) provides the output picture (575) in a screen projection (586) to the application (580), e.g., to an output buffer (585) indicated by the application (580) for rendering. The application (580) can be a lightweight application that does not itself perform rendering operations for panoramic video, which simplifies implementation for the application (580). For example, the application (580) is a news viewer, real estate site listing application, or other application that does not specialize in presentation of panoramic video. Instead, the application (580) provides a view direction (582) and may also provide a field of view (584), and the “platform” (system-provided modules of the playback tool (500)) performs operations to generate a screen projection. Alternatively, the application (580) can set a source for view direction (582) and field of view (584), and the platform gets the view direction (582) and field of view (584) information from that source. The application (580) may also have an on/off control for rendering.

In other example implementations, in an application rendering mode, the mapper (570) provides the output picture (575) in a flat projection to the application (580), e.g., to an output buffer (585) indicated by the application (580). The flat projection can be an equirectangular projection or a cubemap projection, which may be re-projected so that it is centered at the view window, may have irrelevant details cropped away, and/or may have its spatial resolution enhanced for relevant details. In application rendering mode, the application (580) includes a module that performs additional transformations to the sample values of the output picture (575) in the flat projection (e.g., mapping to spherical projection, mapping to screen projection) so as to generate one or more screen projections appropriate for the application (580), which gives the application (580) more control over rendering decisions. For example, the application (580) is a VR application, AR application, or specialty media application for panoramic video. In application rendering mode, different applications can use different approaches to rendering of flat projections. For a mobile device or computer monitor, a single screen projection may be rendered. Or, for a head-mounted display (or mobile device held in a head-mounted band), an application (580) may generate two screen projections, for the left and right eyes, respectively.

The streaming controller (520) can selectively retrieve encoded data for additional bitstream(s). For example, if playback of panoramic video is paused, the streaming controller (520) can request encoded data for the rest of an input picture, and the video decoder(s) (550) can decode the rest of the input picture. In this way, the entire input picture is available for rendering should the viewer choose to navigate through the “paused” environment of the panoramic video.

Depending on implementation and the type of processing desired, modules of the panoramic video playback tool (500) can be added, omitted, split into multiple modules, combined with other modules, and/or replaced with like modules. For example, although shown separately in FIG. 5, the view controller (510) can be combined with the mapper (570) (i.e., part of the mapper (570)), or the view controller (510) can be combined with the streaming controller (520) (i.e., part of the streaming controller (520)). In alternative embodiments, panoramic video playback tools with different modules and/or other configurations of modules perform one or more of the described techniques. Specific embodiments of panoramic video playback tools typically use a variation or supplemented version of the panoramic video playback tool (500). The relationships shown between modules within the panoramic video playback tool (500) indicate general flows of information in the panoramic video playback tool (500); other relationships are not shown for the sake of simplicity.

In general, a given module of the panoramic video playback tool (500) can be implemented by software executable on a CPU, by software controlling special-purpose hardware (e.g., a GPU or other graphics hardware for video acceleration), or by special-purpose hardware (e.g., in an ASIC). In particular, in some example implementations, video decoding operations and re-projection operations to map sample values between different projections are implemented with shader instructions executable on a GPU. Thus, computationally-intensive, repetitive operations (e.g., for video decoding, for mapping between different types of projections when creating an output picture) are likely to be implemented with graphics hardware (e.g., as shader instructions for a GPU) or other special-purpose hardware, and higher-level operations (e.g., selecting which streams to request) are likely to be implemented in software executable on a CPU.

VI. Examples of Panoramic Video Streaming with Overlapping Sections

This section describes examples of panoramic video streaming with overlapping sections. Some examples relate to stream configuration operations, and other examples relate to playback operations.

Partitioning pictures of panoramic video into overlapping sections can decrease overall compression efficiency, because the same sample values (for overlap regions) are encoded separately in different sections. That is, more bits are used for encoded data for the overlapping sections, collectively.

On the other hand, partitioning pictures of panoramic video into overlapping sections also tends to reduce the incidence of playback disruption attributable to bitstream switching. For example, when gradual panning motion or zooming causes a view window to no longer overlap a section, the overlapping can allow a playback tool to render high-resolution views quickly and without playback disruption. Switching to a new bitstream for a new section can take time (e.g., to send the request for the new bitstream to a media server, and to wait for a switch point at which decoding can begin in the new bitstream). With overlapping sections, if the view direction and/or zoom factor changes, the playback tool can more quickly identify new sections/streams that are to be used to create an output picture for the view window. Because of overlap between adjacent sections, for changes in view direction and/or zoom factor that are gradual and consistent, the playback tool in effect preemptively fetches encoded data for a new section as the view window moves out of a current section into the new section. By the time the view window reaches a non-overlapping part of the adjacent, new section, content for the adjacent, new section has already been retrieved and reconstructed, assuming the motion of the view window is not extreme. This hides network latency and stream switching latency from the viewer, and disruption of playback is avoided. Thus, using overlapping sections can facilitate local responsiveness where there are gradual changes in view direction or zoom during playback. Using overlapping sections provides for some “cushion” if a view window suddenly changes position within sections whose content has been retrieved and reconstructed.

A. First Example of Stream Configuration Operations for Adaptive Streaming of Panoramic Video with Overlapping Sections

FIG. 6a shows a first example (601) of stream configuration operations for adaptive streaming of panoramic video with overlapping sections. In the first example (601), an input picture (610) of panoramic video is in an equirectangular projection.

With reference to FIG. 6a, a stream configuration tool receives or creates a series of input pictures—such as the input picture (610)—in an equirectangular projection. The input pictures can be created from multiple camera video streams (associated with different views from a central position) or from a video stream from an omnidirectional camera. For example, the resolution of the input pictures in the equirectangular projection is 4K (3840×2160) or higher.

For different values of phi (φ) and theta (θ) in spherical projection coordinates, the stream configuration tool splits the input picture (610) into overlapping sections (630). In general, the overlapping sections (630) are associated with different view directions. Each of the overlapping sections (630) corresponds to a region of the surface of the sphere for the panoramic video, and is parameterized using a range of phi and theta coordinates for the surface of the sphere. Alternatively, the sections (630) can be parameterized in some other way.

The overlapping sections (630) can have the same size or different sizes. For example, each of the overlapping sections (630) has a spatial resolution of 1080p, 720p, or some other size. Collectively, the overlapping sections (630) cover all of the actual content of the input picture (610). In FIG. 6a, the input picture (610) is partitioned into six overlapping sections. More generally, the number n of sections depends on implementation (e.g., n is 8, 12, 16, or 32). Each of the overlapping sections (630) overlaps with neighboring sections of the input picture (610). The extent of overlap depends on implementation (e.g., 10%, 20%, 30%). In general, having more extensive overlap provides more “lead time” for retrieval of encoded data for adjacent sections when view direction, field of view, or zoom factor changes during playback. On the other hand, having more extensive overlap results in redundant encoding of more sample values, which can increase overall bit rate and increase the bit rate per stream (section).

The extent of overlap can be static. For example, by default, the input picture (610) is always partitioned in the same way (or in a way that depends on resolution of the input picture (610), picture size for the video streams to be encoded, extent of overlap, or other factors known at the start of stream configuration). Alternatively, the count of overlapping sections (630), sizes of overlapping sections (630), and extent of overlap can be configurable and/or adjustable during stream configuration and playback. In particular, the extent of overlap can be adjusted depending on factors such as the expected reliability of a network connection and how much the position of a view window is expected to change (e.g., due to panning or zooming). If a network connection is reliable, less overlap is needed to hide latency problems for stream switching events. If a network connection is unreliable, more overlap is needed to hide latency problems for stream switching events. If the position of a view window is expected to be stationary, less overlap is needed to hide latency problems for stream switching events. If the position of a view window is expected to change quickly, more overlap is needed to hide latency problems for stream switching events.

Different variations of partitioning can be applied that have different extents of overlap. For example, resources permitting, one version of overlapping sections (630) can have less overlap (e.g., 5%) and be used for playback tools that have view windows that are relatively stationary and connect over reliable network connections. Another version of sections can have more overlap (e.g., 30%) and be used for playback tools that have view windows that change position quickly and/or are connect over unreliable network connections.

The stream configuration tool can generate as many phi/theta combinations as desired for the overlapping sections (630). These combinations can be preset or adapted to a requested view. For example, the “center” of the partitioning pattern can change, based on where the focus (or expected focus) of most view windows will be. If there is a section/stream centered at the focus, a playback tool might not need to request and combine sections from multiple bitstreams. Alternatively, a new section could simply be added, centered at the focus.

The stream configuration tool also creates a manifest file (not shown) that indicates the spherical coordinates associated with the respective overlapping sections (630). Alternatively, parameters can be sent in some other way for each of the overlapping sections (630), indicating what part of the input picture (610) is covered by that section. The parameters can be values of phi and theta per section or other information used to derive the same information about the scope of the section.

The stream configuration tool adds the overlapping sections (630) to corresponding video streams. In FIG. 6a, the six overlapping sections (630) are added to corresponding video streams (650 . . . 655), respectively.

The stream configuration tool encodes the corresponding video streams (650 . . . 655), respectively, producing bitstreams of encoded data (670 . . . 675) for the respective sections. Thus, the stream configuration tool encodes the multiple overlapping sections (630) partitioned from the input picture (610) as part of different video streams (650 . . . 655). Section 1 is encoded as a picture of stream 1, section 2 is encoded as a picture of stream 2, and so on. In this way, different sections of the input picture (610) of panoramic video are represented in different bitstreams of encoded data (670 . . . 675).

Finally, the stream configuration tool buffers the encoded data (670 . . . 675). The encoded data (670 . . . 675) can be directly sent to one or more playback tools. In most configurations, however, the encoded data (670 . . . 675) is sent to a media server. The media server can also store a manifest file with details about the overlapping sections (630) and streams.

B. Second Example of Stream Configuration Operations for Adaptive Streaming of Panoramic Video with Overlapping Sections

FIG. 6b shows a second example (602) of stream configuration operations for adaptive streaming of panoramic video with overlapping sections. In the second example (602), the input picture (612) is in a sinusoidal projection.

With reference to FIG. 6b, the stream configuration tool receives or creates a series of input pictures—such as the input picture (612)—in a sinusoidal projection. From an input picture in an equirectangular projection or cubemap projection, the stream configuration tool can convert the input picture to a sinusoidal projection. The sinusoidal projection can have a single fold or multiple folds (e.g., two-fold projection). For a single-fold sinusoidal projection, the sinusoidal projection includes panoramic video content surrounded by empty regions (e.g., with default values such as black or gray). For a two-fold sinusoidal projection, the sinusoidal projection includes panoramic video content for two “hemispheres” separated by empty regions (e.g., with default values such as black or gray). For example, the resolution of the input pictures in the sinusoidal projection is 8K or higher, which is higher than the input picture in the equirectangular representation. Compared to the equirectangular projection, the sinusoidal projection represents content towards the “poles” of the sphere without excessive stretching, distortion, etc., which can make subsequent compression operations (such as motion estimation) more effective. (In the equirectangular projection, details around the poles are scattered and stretched. Redundancy cannot be exploited as well during compression.)

For different values of phi (φ) and theta (θ) in spherical projection coordinates, the stream configuration tool splits the input picture (612) into overlapping sections (632), generally as described with reference to FIG. 6a. Since the input picture (612) is in a sinusoidal projection, however, the overlapping sections (632) can include default sample values (e.g., black or gray) for portions in empty regions outside of the actual content of the input picture (612). In the partitioning, the placement of the overlapping sections (632) can be configured to reduce the number of default sample values in the sections. Collectively, the overlapping sections (632) still cover all of the actual content of the input picture (612) of panoramic video. In FIG. 6b, the input picture (612) is partitioned into seven overlapping sections. More generally, the number n of sections depends on implementation (e.g., n is 8, 12, 16, or 32). The extent of overlap between the overlapping sections (632) depends on implementation and can be static or dynamic, generally as described with reference to FIG. 6a. The stream configuration tool can generate various phi/theta combinations for the overlapping sections (632), generally as described with reference to FIG. 6a.

Although FIG. 6b shows a separate input picture (612) in a sinusoidal projection, in practice, conversion of an input picture to a sinusoidal projection can be notional. That is, sample values of the respective overlapping sections (632) in the sinusoidal projection can be determined directly from the input picture in an equirectangular projection, cubemap projection, or other input projection.

The stream configuration tool can create a manifest file, generally as described with reference to FIG. 6a. The stream configuration tool adds the overlapping sections (632) to corresponding video streams. In FIG. 6b, the seven overlapping sections (632) are added to corresponding video streams (660 . . . 666), respectively. The stream configuration tool encodes the corresponding video streams (660 . . . 666), respectively, producing bitstreams of encoded data (680 . . . 686) for the respective sections. Thus, the stream configuration tool encodes the multiple overlapping sections (632) partitioned from the input picture (612) as part of different video streams (660 . . . 666). Section 1 is encoded as a picture of stream 1, section 2 is encoded as a picture of stream 2, and so on. In this way, different sections of the input picture (612) of panoramic video are represented in different bitstreams of encoded data (680 . . . 686). Finally, the stream configuration tool buffers the encoded data (680 . . . 686). The encoded data (680 . . . 686) can be directly sent to one or more playback tools. In most configurations, however, the encoded data (680 . . . 686) is sent to a media server, which can also store manifest files with details about the overlapping sections (632) and streams.

C. Example of Picture of Panoramic Video with Overlapping Sections

FIG. 7 is a diagram illustrating an example (700) of overlapping sections of a picture (710) of panoramic video in a sinusoidal projection. The picture (710) in the sinusoidal projection has been partitioned into seven overlapping sections of equal size. The overlapping sections are labeled 0 . . . 6.

FIG. 7 also shows an expanded view of one of the overlapping section—section 5. Section 5 includes actual content of the picture (710) of panoramic video—shown with no hatching. While section 5 includes some actual content that is only part of section 5 (and not any other section), the overlap regions (740) of section 5 include actual content that is also part of one or more other sections (specifically, sections 3, 4, and 6). If a view window moves into one or more of the overlap regions (740), a playback tool can preemptively fetch content for one or more of the adjacent sections that also includes that overlap region. Section 5 also includes default values (750) (e.g., mid-gray values) for empty regions, which are shown with hatching.

D. First Example of Playback Operations for Adaptive Streaming of Panoramic Video with Overlapping Sections

FIG. 8a shows a first example (801) of playback operations for adaptive streaming of panoramic video with overlapping sections. In the first example (801), overlapping sections (830) of a picture of panoramic video are in an equirectangular projection.

A playback tool periodically determines a view window (811) in a spherical projection (810) of the panoramic video for a viewer. During playback, the viewer can control view direction, relative to a viewer/camera position at the center of the panoramic video. The viewer may also be able to control the field of view (e.g., narrow, wide) and/or zoom factor. The view window (811) depends on view direction, and can also depend on field of view and/or zoom factor.

The playback tool also requests a manifest file from a media server (or stream configuration tool). After receiving the manifest file, the playback tool identifies which sections of the panoramic video are to be used to create an output picture (890) for the view window (811). Specifically, the playback tool identifies one or more sections that each contain at least part of the view window (811). FIG. 8a shows overlapping sections (830) of a picture of panoramic video. In FIG. 8a, two of the overlapping sections (830)—sections 4 and 5—each contain at least part of the view window (811), which is shown as a projection (834) onto the picture of panoramic video. The playback tool selects one or more bitstreams for the identified section(s), respectively.

The playback tool requests encoded data for the selected bitstream(s). For example, depending on configuration, the playback tool requests the encoded data from the stream configuration tool or a media server. The playback tool can request encoded data for the selected bitstream(s) on a picture-by-picture basis or some other basis. In the example (801) of FIG. 8a, the playback tool requests encoded data (854) for bitstream 4 and requests encoded data (855) for bitstream 5.

The playback tool receives and decodes the encoded data for the selected bitstream(s), thereby reconstructing the identified section(s). In the example (801) of FIG. 8a, the playback tool decodes the encoded data (854) for bitstream 4 to reconstruct section 4 (874), and the playback tool decodes the encoded data (855) for bitstream 5 to reconstruct section 5 (875).

The playback tool creates an output picture (890) for the view window (811) from the reconstructed section(s). In doing so, for locations of the output picture (890), the playback tool selects sample values at corresponding locations of the reconstructed section(s), or determines sample values at the corresponding locations by interpolating between adjacent sample values of the reconstructed section(s). The output picture (890) can be in a screen projection (for display) or other (e.g., equirectangular) projection (for subsequent rendering). Thus, to find the corresponding locations in the reconstructed section(s), the playback tool can warp between different projections, e.g., from a screen projection for the output picture to an equirectangular projection for the overlapping sections (830). The playback tool can also perform various post-processing operations (e.g., color conversion to a color space appropriate for a display device).

During playback, if the view window (811) changes, the playback tool can identify new sections/bitstreams to be used to create an output picture (890) for the view window (811). Because of the overlap between adjacent sections, for changes in view direction and/or zoom factor that are gradual and consistent, the playback tool can, in effect, preemptively fetch a new section as the view window (811) moves out of a current section into the new section. By the time the view window (811) reaches a non-overlapping part of the new section, content for the new section has already been retrieved and reconstructed. In this way, the playback tool can hide network latency and stream switching latency from the viewer, and disruption of playback is avoided.

E. Second Example of Playback Operations for Adaptive Streaming of Panoramic Video with Overlapping Sections

FIG. 8b shows a second example (802) of playback operations for adaptive streaming of panoramic video with overlapping sections. In the second example (802), overlapping sections (831) of a picture of panoramic video are in a sinusoidal projection. The overlapping sections (831) of the picture of panoramic video can be organized as described with reference to the example (602) of FIG. 6b.

As in the example (801) of FIG. 8a, a playback tool periodically determines a view window (811) in a spherical projection (810) of the panoramic video for a viewer. After receiving a manifest file, the playback tool identifies which of the overlapping sections (831) of the panoramic video are to be used to create an output picture (892) for the view window (811). Specifically, the playback tool identifies one or more of the overlapping sections (831) that each contain at least part of the view window (811). In FIG. 8b, two of the overlapping sections (831)—sections 2 and 6—each contain at least part of the view window (811), which is shown as a projection (832) onto the picture of panoramic video. The playback tool selects one or more bitstreams for the identified section(s), respectively.

The playback tool requests encoded data for the selected bitstream(s), generally as described with reference to FIG. 8a. In the example (802) of FIG. 8b, the playback tool requests encoded data (852) for bitstream 2 and requests encoded data (856) for bitstream 6. The playback tool receives and decodes the encoded data for the selected bitstream(s), thereby reconstructing the identified section(s). In the example (802) of FIG. 8b, the playback tool decodes the encoded data (852) for bitstream 2 to reconstruct section 2 (872), and the playback tool decodes the encoded data (856) for bitstream 6 to reconstruct section 6 (876). The playback tool creates an output picture (892) for the view window (811) from the reconstructed section(s), generally as described with reference to FIG. 8a. To find the corresponding locations in the reconstructed section(s), the playback tool can warp between different projections, e.g., from a screen projection for the output picture to a sinusoidal projection for the overlapping sections (831).

VII. Example Techniques for Stream Configuration of Panoramic Video with Overlapping Sections

FIG. 9 shows an example technique (900) for stream configuration of panoramic video with overlapping sections. A panoramic video stream configuration tool as described with reference to FIG. 4, or other panoramic video stream configuration tool, can perform the example technique (900).

The stream configuration tool receives (910) an input picture of panoramic video. Typically, the input picture is in an input projection such as an equirectangular projection, a cubemap projection, or a sinusoidal projection. The stream configuration tool splits (920) the input picture into multiple overlapping sections according to partition settings. For example, the partition settings include the count of sections, sizes of sections, positions of sections, and extent of overlap between sections. Each of the multiple sections overlaps at least one other section among the multiple sections. (This ultimately decreases overall compression efficiency but facilitates reduction of incidence of disruption attributable to bitstream switching during playback of the panoramic video.) For example, for each of the multiple sections, the section overlaps each adjacent section among the multiple sections.

In some configurations, the stream configuration tool projects the input picture from an input projection to an intermediate projection when splitting the input picture into multiple sections (in the intermediate projection). For example, the input projection is an equirectangular projection or a cubemap projection, and the intermediate projection is a sinusoidal projection. When the multiple sections are in a sinusoidal projection, at least one of the multiple sections may include at least some sample values having default values (e.g., black values or gray values, not representing content of the input picture of panoramic video).

The stream configuration tool adds (930) the multiple sections, respectively, to corresponding video streams for encoding.

FIG. 9 shows two loops. As part of a loop for a formatting pipeline, the stream configuration tool checks (940) whether to continue operations for a next input picture. If so, the stream configuration tool receives (910) the next input picture, splits (920) it into multiple overlapping sections, and adds (930) the sections to corresponding video streams. In this way, the stream configuration tool iteratively splits input pictures and adds sections for the input pictures to corresponding video streams.

As part of an encoding pipeline, the stream configuration tool encodes (950) the multiple sections in the corresponding video streams, respectively, for an input picture. This produces encoded data for the multiple sections as part of multiple bitstreams for the corresponding video streams, respectively. Typically, the bitstreams are video elementary bitstreams. The encoded data in the video elementary bitstreams can be multiplexed into a single container stream for delivery to a media server. The stream configuration tool stores (960) the encoded data for delivery (e.g., to a media server, or directly to one or more panoramic video playback tools). The stream configuration tool checks (970) whether to continue encoding operations and, if so, encodes (950) the sections of the next input picture. In this way, as part of a loop for the encoding pipeline, the stream configuration tool encodes the sections of pictures added to video streams in the formatting pipeline.

The stream configuration tool can also produce one or more manifest files. The manifest file(s) include information indicating, for each of the multiple bitstreams, the position of one of the multiple sections (in terms of an input projection or spherical projection) whose content is part of the corresponding video stream for that bitstream. For example, for each section, the manifest file includes phi and theta coordinates for the section. The stream configuration tool can deliver the manifest file(s) to a media server, for subsequent delivery to one or more playback tools. Or, the stream configuration tool can directly deliver the manifest file(s) to one or more playback tools. The manifest file(s) can be delivered as user data of elementary bitstreams, as metadata in a container, or in some other way. Alternatively, the stream configuration tool and playback tool(s) can operate without exchanging information in manifest file(s), and input pictures are partitioned into sections according to a static, pre-defined pattern.

In some configurations, the partition settings can adaptively change. For example, the stream configuration tool can receive an indication of feedback and, based at least in part on the indication of feedback, adjust the partition settings. The indication of feedback can include an indication of network connection quality, an indication of magnitude of view window change activity, an indication of which view direction is prevalent, and/or some other type of feedback. To adjust the partition settings, the stream configuration tool can change an extent of overlap between the multiple sections. For example, if network connection quality is poor or view window change activity is high, the stream configuration tool can increase the extent of overlap between adjacent sections. Or, if network connection quality is good and view window change activity is low, the stream configuration tool can decrease the extent of overlap between adjacent sections. Alternatively, the stream configuration tool can change the count of overlapping sections, change relative sizes of at least some of the overlapping sections, change positions of at least some of the overlapping sections, add one or more sections, at new positions, to the overlapping sections, and/or remove one or more sections from the overlapping sections. For example, in response to an indication of which view direction is prevalent, the stream configuration tool can add one or more sections or re-position sections to focus on the prevalent view direction (and thereby reduce the incidence of switching around the prevalent view direction). Alternatively, the stream configuration tool can make some other change to the partition settings.

VIII. Example Techniques for Playback of Panoramic Video with Overlapping Sections

FIG. 10 shows an example technique (1000) for playback of panoramic video with overlapping sections. A panoramic video playback tool as described with reference to FIG. 5, or other panoramic video playback tool, can perform the example technique (1000).

The panoramic video playback tool determines (1010) a view window for playback of panoramic video. For example, the view window depends on view direction, field of view, and/or zoom factor. The playback tool can receive an indication of a view direction for an application. For example, the indication of the view direction is a set of heading, pitch, and roll values for the view direction. Or, the indication of the view direction is a set of affine transform coefficients that specify a spatial rotation for the view direction. Or, the view direction is specified in some other way. The playback tool can receive the indication of the view direction from the application or from a source specified by the application. The playback tool can also receive an indication of a field of view and/or zoom factor for the application.

From among multiple sections of the panoramic video, the playback tool identifies (1020) one or more sections that contain at least part of the view window. Each of the multiple sections overlaps at least one other section among the multiple sections, which reduces incidence of disruption attributable to bitstream switching. For example, the playback tool identifies each of the overlapping sections that contains at least part of the view window. For the identified section(s), the playback tool selects (1030) one or more bitstreams among multiple bitstreams for corresponding video streams.

FIG. 10 shows two loops. As part of a loop for a view window pipeline, the playback tool checks (1040) whether there has been a change to the view window (e.g., due to a change in view direction, field of view, or zoom factor). If so, the playback tool determines (1010) the view window, identifies (1020) the section(s) that contain the view window, and selects (1030) the bitstream(s) for the identified section(s). In this way, the playback tool can iteratively perform operations to determine (1010) the view window, identify (1020) the section(s) that contain at least part of the view window, and select (1030) the bitstream(s) for the identified section(s).

As part a decoding and reconstruction pipeline, the playback tool requests (1050) encoded data, in the selected bitstream(s) for the identified section(s), respectively, for an input picture of the panoramic video. Depending on configuration, the playback tool can request the encoded data from a media server or directly from a panoramic video stream configuration tool. The playback tool can make separate requests for portions of an input picture or for each input picture, or the playback tool can batch requests.

The playback tool receives (1060) the encoded data (e.g., from a media server, or directly from a panoramic video stream configuration tool). The playback tool decodes (1070) the encoded data to reconstruct sample values for the identified section(s) for the input picture. Then, based at least in part on the reconstructed section(s), the playback tool creates (1080) an output picture. When creating the output picture, the playback tool can project the reconstructed section(s) from an input projection (e.g., an equirectangular projection) or an intermediate projection (e.g., sinusoidal projection) to an output projection (e.g., screen projection). The playback tool stores (1090) the output picture for output to a display device. The playback tool checks (1095) whether to continue decoding and reconstruction operations and, if so, requests (1050) and decodes (1070) encoded data for one or more sections of the next input picture. In this way, as part of a loop for the decoding and reconstruction pipeline, the playback tool reconstructs sections identified in the view window pipeline.

The playback tool can receive one or more manifest files (e.g., from a media server or directly from a stream configuration tool). The manifest file(s) include information indicating, for each of the multiple bitstreams, the position of one of the multiple sections (in terms of an input projection or spherical projection) whose content is part of the corresponding video stream for that bitstream. For example, for each section, the manifest file includes phi and theta coordinates for the section. The playback tool can use the manifest file(s) to identify (1020) the section(s) that contain at least part of the view window and/or select (1030) the bitstream(s) for the identified sections. The playback tool can also use the manifest file(s) when creating (1080) the output picture based on the reconstructed section(s). The manifest file(s) can be delivered as user data of elementary bitstreams, as metadata in a container, or in some other way. Alternatively, the stream configuration tool and playback tool(s) can operate without exchanging information in manifest file(s), and input pictures are partitioned into sections according to a static, pre-defined pattern.

The playback tool can send an indication of feedback to a stream configuration tool or intermediary that aggregates feedback. The indication of feedback can include an indication of network connection quality, an indication of magnitude of view window change activity, an indication of which view direction is prevalent, and/or some other type of feedback. The feedback can be used to adjust partition settings, for example, as described in the previous section.

IX. Examples of Panoramic Video Streaming with Composite Pictures

This section describes examples of panoramic video streaming with composite pictures. Some examples relate to stream configuration operations, and other examples relate to playback operations.

A composite picture includes a high-resolution section of an input picture of panoramic video as well as a low-resolution version of the input picture. Under normal operation, a playback tool can use reconstructed high-resolution section(s) to render high-quality views of the panoramic video. If the view window drastically changes, however, or if encoded data for a specific section is lost or corrupted, the playback tool can temporarily use the low-resolution version of the input picture to render lower-quality details for views of the panoramic video, without disruption of playback, until encoded data for the high-resolution section(s) is retrieved or recovered.

Using composite pictures of panoramic video can decrease overall compression efficiency, because sample values for low-resolution versions of input pictures, which might not be used in rendering at all, are redundantly encoded. That is, extra bits are used for encoded data for the low-resolution versions of the input pictures. On the other hand, using composite pictures also tends to reduce the incidence of playback disruption when a view window drastically changes or encoded data for a high-resolution section is lost or corrupted. When a view window changes drastically, switching to a new bitstream for a new section can take time (e.g., to send the request for the new bitstream to a media server, and to wait for a switch point at which decoding can begin in the new bitstream). Similarly, when encoded data for a high-resolution section is lost or corrupted, recovering encoded data of the bitstream for the section can take time. Until encoded data for the high-resolution section(s) is retrieved/recovered, the playback tool can use the low-resolution version of the input picture to render lower-quality details for views of the panoramic video, without disruption of playback. This hides network latency and stream switching latency from the viewer, and disruption of playback is avoided.

A. First Example of Stream Configuration Operations for Adaptive Streaming of Panoramic Video with Composite Pictures

FIG. 11a shows a first example (1101) of stream configuration operations for adaptive streaming of panoramic video with composite pictures. In the first example (1101), an input picture (1110) of panoramic video is in an equirectangular projection.

With reference to FIG. 11a, a stream configuration tool receives or creates a series of input pictures—such as the input picture (1110)—in an equirectangular projection. The input pictures can be created from multiple camera video streams (associated with different views from a central position) or from a video stream from an omnidirectional camera. For example, the resolution of the input pictures in the equirectangular projection is 4K (3840×2160) or higher.

For different values of phi (φ) and theta (θ) in spherical projection coordinates, the stream configuration tool splits the input picture (1110) into sections (1120). The sections (1120) can be non-overlapping or, as described section VI, overlapping. In general, the sections (1120) are associated with different view directions. Each of the sections (1120) corresponds to a region of the surface of the sphere for the panoramic video, and is parameterized using a range of phi and theta coordinates for the surface of the sphere. Alternatively, the sections (1120) can be parameterized in some other way. The sections (1120) can have the same size or different sizes. For example, each of the sections (1120) has a spatial resolution of 1080p, 720p, or some other size. Collectively, the sections (1120) cover all of the actual content of the input picture (1110). In FIG. 11a, the input picture (1110) is partitioned into six non-overlapping sections. More generally, the number n of sections depends on implementation (e.g., n is 8, 12, 16, or 32).

By default, the input picture (1110) is always partitioned in the same way (or in a way that depends on resolution of the input picture (1110), picture size for the video streams to be encoded, or other factors known at the start of stream configuration). Alternatively, the count of sections (1120) and sizes of sections (1120) can be configurable and/or adjustable during stream configuration and playback.

The stream configuration tool can generate as many phi/theta combinations as desired for the overlapping sections (1120). These combinations can be preset or adapted to a requested view. For example, the “center” of the partitioning pattern can change, based on where the focus (or expected focus) of most view windows will be. If there is a section/stream centered at the focus, a playback tool might not need to request and combine sections from multiple bitstreams. Alternatively, a new section could simply be added, centered at the focus.

The stream configuration tool also creates a manifest file (not shown) that indicates the spherical coordinates associated with the respective sections (1120). Alternatively, parameters can be sent in some other way for each of the sections (1120), indicating what part of the input picture (1110) is covered by that section. The parameters can be values of phi and theta per section or other information used to derive the same information about the scope of the section.

The stream configuration tool creates a low-resolution version (1130) of the input picture of panoramic video. For example, the stream configuration tool downsamples the input picture (1110) of panoramic video. The stream configuration tool can downsample the input picture (1110) by the same factor horizontally and vertically, or by different factors horizontally and vertically.

The stream configuration tool adds the sections (1120) and the low-resolution version of the input picture to composite pictures (1140 . . . 1145). For each one of the sections (1120), the stream configuration tool combines that section and the low-resolution version (1130) of the input picture to make a corresponding one of the composite pictures (1140 . . . 1145). In this way, the stream configuration tool creates a different composite picture for each of the sections (1120). For example, composite picture 0 (1140) includes section 0 and the low-resolution version (1130) of the input picture. Composite picture 5 (1140) includes section 5 and the low-resolution version (1130) of the input picture.

In the example (1101) of FIG. 11a, the low-resolution version (1130) of the input picture is in an equirectangular projection. Alternatively, the low-resolution version (1130) of the input picture can be in a sinusoidal projection or other projection. The low-resolution version (1130) of the input picture can be in the same projection as the sections (1120) (e.g., both sinusoidal or both equirectangular) or different projection (e.g., sinusoidal for the sections (1120), equirectangular for the low-resolution version (1130) of the input picture).

The low-resolution version (1130) of the input picture can be put in the composite pictures (1140 . . . 1145) at a predefined location. Or, the location can be specified, e.g., in a manifest file. The low-resolution version (1130) of the input picture can be composited below one the sections (1120) or arranged in some other configuration in a single picture. Alternatively, the composite picture can be organized as multiple views in a frame packing arrangement.

The stream configuration tool adds the composite pictures (1140 . . . 1145) to corresponding video streams. In FIG. 11a, the six composite pictures (1140 . . . 1145) are added to corresponding video streams (1150 . . . 1155), respectively.

The stream configuration tool encodes the corresponding video streams (1150 . . . 1155), respectively, producing bitstreams of encoded data (1170 . . . 1175) for the respective composite pictures. Thus, the stream configuration tool encodes the composite pictures (1140 . . . 1145), including different sections (1120) of the input picture (1110), as part of different video streams (1150 . . . 1155). Composite picture 1 is encoded as a picture of stream 1, composite picture 2 is encoded as a picture of stream 2, and so on. In this way, composite pictures for different sections of the input picture (1110) of panoramic video are represented in different bitstreams of encoded data (1170 . . . 1175).

Finally, the stream configuration tool buffers the encoded data (1170 . . . 1175). The encoded data (1170 . . . 1175) can be directly sent to one or more playback tools. In most configurations, however, the encoded data (1170 . . . 1175) is sent to a media server. The media server can also store a manifest file with details about the sections (1120) and streams.

B. Second Example of Stream Configuration Operations for Adaptive Streaming of Panoramic Video with Composite Pictures

FIG. 11b shows a second example (1102) of stream configuration operations for adaptive streaming of panoramic video with composite pictures. In the second example (1102), the input picture (1112) is in a sinusoidal projection, and the input picture (1112) is split into overlapping sections (1122).

With reference to FIG. 11b, the stream configuration tool receives or creates a series of input pictures—such as the input picture (1112)—in a sinusoidal projection. From an input picture in an equirectangular projection or cubemap projection, the stream configuration tool can convert the input picture to a sinusoidal projection, as described with reference to the example (602) of FIG. 6b.

For different values of phi (φ) and theta (θ) in spherical projection coordinates, the stream configuration tool splits the input picture (1112) into overlapping sections (1122), generally as described in section VI. Since the input picture (1112) is in a sinusoidal projection, the overlapping sections (1122) can include default sample values (e.g., black or gray) for portions in empty regions outside of the actual content of the input picture (1112). In FIG. 11b, the input picture (1112) is partitioned into seven overlapping sections. More generally, the number n of sections depends on implementation (e.g., n is 8, 12, 16, or 32). The extent of overlap between the overlapping sections (1122) depends on implementation and can be static or dynamic, as described in section VI. The stream configuration tool can create a manifest file that includes information about the sections (1122), as described in section VI.

Although FIG. 11b shows a separate input picture (1112) in a sinusoidal projection, in practice, conversion of an input picture to a sinusoidal projection can be notional. That is, sample values of the respective overlapping sections (1122) in the sinusoidal projection can be determined directly from the input picture in an equirectangular projection, cubemap projection, or other input projection.

As described with reference to the example (1101) of FIG. 11a, the stream configuration tool creates a low-resolution version (1132) of the input picture of panoramic video. The low-resolution version (1132) of the input picture can be in the same projection as the sections (1122) (e.g., both sinusoidal) or different projection (e.g., sinusoidal for the sections (1122), equirectangular for the low-resolution version (1132) of the input picture).

The stream configuration tool adds the sections (1122) and the low-resolution version (1132) of the input picture to composite pictures (1190 . . . 1196). For each one of the sections (1122), the stream configuration tool combines that section and the low-resolution version (1132) of the input picture to make a corresponding one of the composite pictures (1190 . . . 1196). In this way, the stream configuration tool creates a different composite picture for each of the sections (1122). For example, composite picture 0 (1190) includes section 0 and the low-resolution version (1132) of the input picture. Composite picture 6 (1196) includes section 6 and the low-resolution version (1132) of the input picture.

The low-resolution version (1132) of the input picture can be put in the composite pictures (1190 . . . 1196) at a predefined location. Or, the location can be specified, e.g., in a manifest file. The low-resolution version (1132) of the input picture can be composited below one the sections (1122) or arranged in some other configuration in a single picture. Alternatively, a composite picture can be organized as multiple views in a frame packing arrangement.

The stream configuration tool adds the composite pictures (1190 . . . 1196) to corresponding video streams. In FIG. 11b, the seven composite pictures (1190 . . . 1196) are added to corresponding video streams (1160 . . . 1166), respectively. The stream configuration tool encodes the corresponding video streams (1160 . . . 1166), respectively, producing bitstreams of encoded data (1180 . . . 1186) for the respective composite pictures. Thus, the stream configuration tool encodes the composite pictures (1190 . . . 1196), including different sections (1122) of the input picture (1112), as part of different video streams (1160 . . . 1166).

Finally, the stream configuration tool buffers the encoded data (1180 . . . 1186). The encoded data (1180 . . . 1186) can be directly sent to one or more playback tools. In most configurations, however, the encoded data (1180 . . . 1186) is sent to a media server. The media server can also store a manifest file with details about the sections (1122) and streams.

C. Example of Composite Picture

FIG. 12 is a diagram illustrating an example composite picture (1200) of panoramic video. The composite picture (1200) includes a section (1220) of an input picture. For example, the section (1220) is a high-resolution section partitioned from an input picture in an equirectangular projection, cubemap projection, sinusoidal projection, or other projection. The composite picture (1200) also includes a low-resolution version (1230) of the input picture. The low-resolution version (1230) of the input picture depicts the entire input picture, albeit at a lower spatial resolution than the original input picture. The low-resolution version (1230) of the input picture can be in an equirectangular projection, cubemap projection, sinusoidal projection, or other projection. In FIG. 12, the low-resolution version (1230) of the input picture is below the section (1220) of the input picture. Alternatively, the composite picture (1200) can be organized in some other way.

D. First Example of Playback Operations for Adaptive Streaming of Panoramic Video with Composite Pictures

FIG. 13a shows a first example (1301) of playback operations for adaptive streaming of panoramic video with composite pictures. In the first example (1301), sections (1330) of a picture of panoramic video are in an equirectangular projection.

A playback tool periodically determines a view window (1311) in a spherical projection (1310) of the panoramic video for a viewer. During playback, the viewer can control view direction, relative to a viewer/camera position at the center of the panoramic video. The viewer may also be able to control the field of view (e.g., narrow, wide) and/or zoom factor. The view window (1311) depends on view direction, and can also depend on field of view and/or zoom factor.

The playback tool also requests a manifest file from a media server (or stream configuration tool). After receiving the manifest file, the playback tool identifies which sections of the panoramic video are to be used to create an output picture (1390) for the view window (1311). Specifically, the playback tool identifies one or more sections that each contain at least part of the view window (1311). FIG. 13a shows non-overlapping sections (1330) of a picture of panoramic video. In FIG. 13a, one of the sections (1330)—section 4—contains the view window (1311), which is shown as a projection (1334) onto the picture of panoramic video. The playback tool selects one or more bitstreams for the identified section(s), respectively.

The playback tool requests encoded data for the selected bitstream(s). For example, depending on configuration, the playback tool requests the encoded data from the stream configuration tool or a media server. The playback tool can request encoded data for the selected bitstream(s) on a picture-by-picture basis or some other basis. In the example (1301) of FIG. 13a, the playback tool requests encoded data (1354) for bitstream 4.

The playback tool receives and decodes the encoded data for the selected bitstream(s), thereby reconstructing composite picture(s) that include the identified section(s). In the example (1301) of FIG. 13a, the playback tool decodes the encoded data (1354) for bitstream 4 to reconstruct a composite picture (1364) that includes section 4.

The playback tool creates an output picture (1390) for the view window (1311) from the reconstructed section(s) and/or the reconstructed low-resolution version of the input picture. For a normal mode of output picture creation, the reconstructed high-resolution section(s) of the composite picture(s) support rendering of high-resolution views/details from the content of the section(s). In FIG. 13a, the reconstructed section(s) include section 4. For locations of the output picture (1390), the playback tool selects sample values at corresponding locations of the reconstructed section(s), or determines sample values at the corresponding locations by interpolating between adjacent sample values of the reconstructed section(s). So long as the view window (1311) falls within the reconstructed section(s), the playback tool can render a high-quality view.

If the view window (1311) falls outside the reconstructed section(s), however, the playback tool uses a fallback mode of output picture creation. In the fallback mode, the playback tool can render low-quality views/details from the low-resolution version of the composite picture(s), without requesting additional content from the stream configuration tool or media server. For locations of the output picture (1390), the playback tool selects sample values at corresponding locations of the reconstructed low-resolution version of the input picture, or determines sample values at the corresponding locations by interpolating between adjacent sample values of the reconstructed low-resolution version of the input picture. The playback tool can scale up, etc. the low-resolution version of the input picture before rendering. Thus, the low-resolution version of the input picture supports rendering of lower-resolution views/details, as needed, if the view direction or zoom factor dramatically changes or encoded data for a high-resolution section is lost. In this way, rendering operations are not interrupted during playback, although quality of rendered views may temporarily suffer.

The output picture (1390) can be in a screen projection (for display) or other (e.g., equirectangular) projection (for subsequent rendering). To find the corresponding locations in the reconstructed section(s) and/or reconstructed low-resolution version of the input picture, the playback tool can warp between different projections, e.g., from a screen projection for the output picture to an equirectangular projection for the sections (1330). The playback tool can also perform various post-processing operations (e.g., color conversion to a color space appropriate for a display device).

During playback, if the view window (1311) changes, the playback tool can identify new sections/bitstreams to be used to create an output picture (1390) for the view window (1311). If there is a sudden, dramatic change in view direction and/or zoom factor, or if encoded data for a high-resolution section is lost, however, the playback tool can render views/details from the low-resolution version of the input picture until encoded data for high-resolution section(s) is retrieved or recovered. In this way, the playback tool can hide network latency and stream switching latency from the viewer, and disruption of playback is avoided.

E. Second Example of Playback Operations for Adaptive Streaming of Panoramic Video with Composite Pictures

FIG. 13b shows a second example (1302) of playback operations for adaptive streaming of panoramic video with composite pictures. In the second example (1302), overlapping sections (1331) of a picture of panoramic video are in a sinusoidal projection. The overlapping sections (1331) of the picture of panoramic video can be organized as described with reference to the example (602) of FIG. 6b and example (1102) of FIG. 11b.

As in the example (1301) of FIG. 13a, a playback tool periodically determines a view window (1311) in a spherical projection (1310) of the panoramic video for a viewer. After receiving a manifest file, the playback tool identifies which of the overlapping sections (1331) of the panoramic video are to be used to create an output picture (1392) for the view window (1311). Specifically, the playback tool identifies one or more of the overlapping sections (1331) that each contain at least part of the view window (1311). In FIG. 13b, two of the overlapping sections (1331)—sections 2 and 6—each contain at least part of the view window (1311), which is shown as a projection (1332) onto the picture of panoramic video. The playback tool selects one or more bitstreams for the identified section(s), respectively.

The playback tool requests encoded data for the selected bitstream(s), generally as described with reference to FIG. 13a. In the example (1302) of FIG. 13b, the playback tool requests encoded data (1352) for bitstream 2 and requests encoded data (1356) for bitstream 6. The playback tool receives and decodes the encoded data for the selected bitstream(s), thereby reconstructing composite picture(s) that include the identified section(s). In the example (1302) of FIG. 13b, the playback tool decodes the encoded data (1352) for bitstream 2 to reconstruct a composite picture (1362) that includes section 2, and the playback tool decodes the encoded data (1356) for bitstream 6 to reconstruct a composite picture (1366) that includes section 6.

The playback tool creates an output picture (1392) for the view window (1311) from the reconstructed section(s) and/or the reconstructed low-resolution version of the input picture, generally as described with reference to FIG. 13a. For a normal mode of output picture creation, the reconstructed high-resolution section(s) of the composite picture(s) support rendering of high-resolution views/details from the content of the section(s). In FIG. 13b, the reconstructed section(s) include section 2 and section 6. So long as the view window (1311) falls within the reconstructed section(s), the playback tool can render a high-quality view. If the view window (1311) falls outside the reconstructed section(s) (in FIG. 6, sections 2 and 6), the playback tool uses a fallback mode of output picture creation. In the fallback mode, the playback tool can render low-quality views/details from the low-resolution version of the composite picture(s), without requesting additional content from the stream configuration tool or media server. Thus, the low-resolution version of the input picture supports rendering of lower-resolution views/details, as needed, if the view direction or zoom factor dramatically changes or encoded data for a high-resolution section is lost. To find the corresponding locations in the reconstructed section(s) and/or reconstructed low-resolution version of the input picture, the playback tool can warp between different projections, e.g., from a screen projection for the output picture to a sinusoidal projection for the overlapping sections (1331).

X. Example Techniques for Stream Configuration of Panoramic Video with Composite Pictures

FIG. 14 shows an example technique (1400) for stream configuration of panoramic video with composite pictures. A panoramic video stream configuration tool as described with reference to FIG. 4, or other panoramic video stream configuration tool, can perform the example technique (1400).

The stream configuration tool receives (1410) an input picture of panoramic video. Typically, the input picture is in an input projection such as an equirectangular projection, a cubemap projection, or a sinusoidal projection. The input picture has a spatial resolution such as 4K or higher.

The stream configuration tool creates (1420) a low-resolution version of the input picture. For example, the stream configuration tool downsamples the input picture horizontally and/or vertically. The low-resolution version of the input picture can be in an input projection (e.g., an equirectangular projection, a cubemap projection, or a sinusoidal projection) or intermediate projection (e.g., a sinusoidal projection). In general, the low-resolution version of the input picture has a lower spatial resolution than the input picture.

The stream configuration tool also splits (1430) the input picture into multiple sections according to partition settings. The sections can be overlapping sections, as described above, or non-overlapping sections. For example, the partition settings include the count of sections, sizes of sections, positions of sections, and extent of overlap between sections. In some configurations, the stream configuration tool projects the input picture from an input projection to an intermediate projection when splitting the input picture into multiple sections (in the intermediate projection). For example, the input projection (for the input picture and low-resolution version of the input picture) is an equirectangular projection or a cubemap projection, and the intermediate projection is a sinusoidal projection. When the multiple sections are in a sinusoidal projection, at least one of the multiple sections may include at least some sample values having default values (e.g., black values or gray values, not representing content of the input picture of panoramic video). In general, each of the sections has a spatial resolution that is lower than the spatial resolution of the input picture, but may be higher or lower than the spatial resolution of the low-resolution version of the input picture.

The stream configuration tool creates (1440) multiple composite pictures. Each of the composite pictures includes one of the multiple sections and also includes the low-resolution version of the input picture. For example, within each of the composite pictures, the low-resolution version of the input picture is adjacent one of the sections. Within each of the composite pictures, the low-resolution version of the input picture can be positioned at a pre-defined location relative to the one of the multiple sections. Alternatively, for each of the composite pictures, one of the multiple sections provides a first view of a frame packing arrangement, and the low-resolution version of the input picture provides a second view of the frame packing arrangement. Alternatively, the composite pictures are organized in some other way. The stream configuration tool adds (1450) the multiple composite pictures, respectively, to corresponding video streams for encoding.

FIG. 14 shows two loops. As part of a loop for a formatting pipeline, the stream configuration tool checks (1460) whether to continue operations for a next input picture. If so, the stream configuration tool receives (1410) the next input picture, creates (1420) a low-resolution version of the next input picture, splits (1430) the next input picture into multiple sections, creates (1440) composite pictures for the next input picture, and adds (1450) the composite pictures to corresponding video streams. In this way, the stream configuration tool iteratively creates composite pictures and adds them to corresponding video streams.

As part of an encoding pipeline, the stream configuration tool encodes (1470) the multiple composite pictures in the corresponding video streams, respectively, for an input picture. This produces encoded data for the composite pictures as part of multiple bitstreams for the corresponding video streams, respectively. Typically, the bitstreams are video elementary bitstreams. The encoded data in the video elementary bitstreams can be multiplexed into a single container stream for delivery to a media server. The stream configuration tool stores (1480) the encoded data for delivery (e.g., to a media server, or directly to one or more panoramic video playback tools). The stream configuration tool checks (1490) whether to continue encoding operations and, if so, encodes (1470) the composite pictures for the next input picture. In this way, as part of a loop for the encoding pipeline, the stream configuration tool encodes the composite pictures added to video streams in the formatting pipeline.

The stream configuration tool can also produce one or more manifest files. The manifest file(s) include information indicating, for each of the multiple bitstreams, the position of one of the multiple sections (in terms of an input projection or spherical projection) whose content is part of the corresponding video stream for that bitstream. For example, for each section, the manifest file includes phi and theta coordinates for the section. The manifest file(s) can also include information that indicates where the low-resolution version of the input picture is positioned in the composite pictures, respectively. The stream configuration tool can deliver the manifest file(s) to a media server, for subsequent delivery to one or more playback tools. Or, the stream configuration tool can directly deliver the manifest file(s) to one or more playback tools. The manifest file(s) can be delivered as user data of elementary bitstreams, as metadata in a container, or in some other way. Alternatively, the stream configuration tool and playback tool(s) can operate without exchanging information in manifest file(s), and input pictures are partitioned into sections according to a static, pre-defined pattern.

XI. Example Techniques for Playback of Panoramic Video with Composite Pictures

FIG. 15 shows an example technique (1500) for playback of panoramic video with composite pictures. A panoramic video playback tool as described with reference to FIG. 5, or other panoramic video playback tool, can perform the example technique (1500).

The panoramic video playback tool determines (1510) a view window for playback of panoramic video. For example, the view window depends on view direction, field of view, and/or zoom factor. The playback tool can receive an indication of a view direction for an application. For example, the indication of the view direction is a set of heading, pitch, and roll values for the view direction. Or, the indication of the view direction is a set of affine transform coefficients that specify a spatial rotation for the view direction. Or, the view direction is specified in some other way. The playback tool can receive the indication of the view direction from the application or from a source specified by the application. The playback tool can also receive an indication of a field of view and/or zoom factor for the application.

From among multiple sections of the panoramic video, the playback tool identifies (1520) one or more sections that contain at least part of the view window. The sections can be overlapping sections, which reduces incidence of disruption attributable to bitstream switching, or non-overlapping sections. For example, the playback tool identifies each of the multiple sections that contains at least part of the view window. For the identified section(s), the playback tool selects (1530) one or more bitstreams among multiple bitstreams for corresponding video streams.

FIG. 15 shows two loops. As part of a loop for a view window pipeline, the playback tool checks (1540) whether there has been a change to the view window (e.g., due to a change in view direction, field of view, or zoom factor). If so, the playback tool determines (1510) the view window, identifies (1520) the section(s) that contain the view window, and selects (1530) the bitstream(s) for the identified section(s). In this way, the playback tool can iteratively perform operations to determine (1510) the view window, identify (1520) the section(s) that contain at least part of the view window, and select (1530) the bitstream(s) for the identified section(s).

As part a decoding and reconstruction pipeline, the playback tool requests (1550) encoded data, in the selected bitstream(s) for the identified section(s), respectively, for an input picture of the panoramic video. Depending on configuration, the playback tool can request the encoded data from a media server or directly from a panoramic video stream configuration tool. The playback tool can make separate requests for portions of an input picture or for each input picture, or the playback tool can batch requests.

Each of the identified section(s) is part of a composite picture that also includes a low-resolution version of an input picture. Each composite picture includes one of the multiple sections and also includes the low-resolution version of the input picture. For example, for each composite picture, the low-resolution version of the input picture is adjacent one of the multiple sections within the composite picture. For each composite picture, the low-resolution version of the input picture and the one of the multiple sections can be located at pre-defined positions within the composite picture. Alternatively, for each composite picture, one of the multiple sections provides a first view of a frame packing arrangement, and the low-resolution version of the input picture provides a second view of the frame packing arrangement. Alternatively, the composite pictures are organized in some other way.

Typically, the input picture is in an input projection such as an equirectangular projection, a cubemap projection, or a sinusoidal projection. The input picture has a spatial resolution such as 4K or higher. The low-resolution version of the input picture can be in an input projection (e.g., an equirectangular projection, a cubemap projection, or a sinusoidal projection) or intermediate projection (e.g., a sinusoidal projection). In general, the low-resolution version of the input picture has a lower spatial resolution than the input picture. In general, each of the multiple sections has a spatial resolution that is lower than the spatial resolution of the input picture, but may be higher or lower than the spatial resolution of the low-resolution version of the input picture.

The playback tool receives (1560) the encoded data (e.g., from a media server, or directly from a panoramic video stream configuration tool). The playback tool decodes (1570) the encoded data to reconstruct sample values for the identified section(s) for the input picture and/or reconstruct sample values for the low-resolution version of the input picture. For example, the playback tool reconstructs both the identified section(s) and the low-resolution version of the input picture. Or, if the view window is entirely contained by the identified section(s), the playback tool reconstructs only the identified section(s). Or, if the view window has changed dramatically and is completely outside of the identified section(s), the playback tool reconstructs only the low-resolution version of the input picture.

Then, based at least in part on the reconstructed section(s) and/or the reconstructed low-resolution version of the input picture, the playback tool creates (1580) an output picture. For example, the playback tool determines which portions of the output picture cannot be created using the reconstructed section(s) and, for any portion of the output picture that cannot be created using the reconstructed section(s), creates that portion of the output picture using the reconstructed low-resolution version of the input picture. Thus, the output picture can be created using only the reconstructed section(s). Or, the output picture can be created using only the reconstructed low-resolution version of the input picture. Or, at least part of the output picture can be created using the one or more reconstructed sections, and at least part of the output picture can be created using the reconstructed low-resolution version of the input picture.

When creating the output picture, the playback tool can project the reconstructed section(s) and/or the reconstructed low-resolution version of the input picture from an intermediate projection (e.g., sinusoidal projection) to an output projection (e.g., screen projection). Or, when creating the output picture, the playback tool can project the reconstructed section(s) and/or the reconstructed low-resolution version of the input picture from an input projection (e.g., equirectangular projection, cubemap projection, sinusoidal projection) to an output projection (e.g., screen projection).

The playback tool stores (1590) the output picture for output to a display device. The playback tool checks (1595) whether to continue decoding and reconstruction operations and, if so, requests (1550) and decodes (1570) encoded data for one or more sections of the next input picture. In this way, as part of a loop for the decoding and reconstruction pipeline, the playback tool reconstructs sections identified in the view window pipeline.

The playback tool can receive one or more manifest files (e.g., from a media server or directly from a stream configuration tool). The manifest file(s) include information indicating, for each of the multiple bitstreams, the position of one of the multiple sections (in terms of an input projection or spherical projection) whose content is part of the corresponding video stream for that bitstream. For example, for each section, the manifest file includes phi and theta coordinates for the section. The playback tool can use the manifest file(s) to identify (1520) the section(s) that contain at least part of the view window and/or select (1530) the bitstream(s) for the identified sections. The playback tool can also use the manifest file(s) when creating (1580) the output picture based on the reconstructed section(s) and/or reconstructed low-resolution version of the input picture. The manifest file(s) can also include information that indicates where the low-resolution version of the input picture is positioned in the composite pictures, respectively. The manifest file(s) can be delivered as user data of elementary bitstreams, as metadata in a container, or in some other way. Alternatively, the stream configuration tool and playback tool(s) can operate without exchanging information in manifest file(s), and input pictures are partitioned into sections according to a static, pre-defined pattern.

In view of the many possible embodiments to which the principles of the disclosed invention may be applied, it should be recognized that the illustrated embodiments are only preferred examples of the invention and should not be taken as limiting the scope of the invention. Rather, the scope of the invention is defined by the following claims. We therefore claim as our invention all that comes within the scope and spirit of these claims.

文章《Microsoft Patent | Adaptive panoramic video streaming using composite pictures》首发于Nweon Patent

]]>
Microsoft Patent | Displaying holograms via hand location https://patent.nweon.com/27323 Thu, 09 Mar 2023 11:26:54 +0000 https://patent.nweon.com/?p=27323 ...

文章《Microsoft Patent | Displaying holograms via hand location》首发于Nweon Patent

]]>
Patent: Displaying holograms via hand location

Patent PDF: 加入映维网会员获取

Publication Number: 20230075560

Publication Date: 2023-03-09

Assignee: Microsoft Technology Licensing

Abstract

Examples are disclosed that relate to computing devices, head-mounted display devices, and methods for displaying holographic objects using slicing planes or volumes. In one example a computing device causes a display system to display a holographic object associated with a holographic volume, the holographic object occluding an occluded holographic object that is not displayed. Location data of at least a portion of a hand is received from a sensor. The location data of the hand is used to locate a slicing plane or a slicing volume within the holographic volume. Based on the location of the slicing plane or the slicing volume, at least a portion of the occluded holographic object is displayed via the display system.

Claims

1.A computing device, comprising: a logic subsystem comprising a processor; and memory storing instructions executable by the logic subsystem to: display via a display system at least a portion of a holographic object associated with a holographic volume; refrain from displaying at least a portion of a first occluded holographic object and at least a portion of a second occluded holographic object; display a first affordance for manipulating a first slicing volume; display a second affordance for manipulating a second slicing volume; receive, from a sensor, first depth image data indicating that a user is grasping and moving the first affordance; at least in response to the first depth image data, display the first slicing volume expanding within the holographic volume; based on a location of the first slicing volume, display at least the portion of the first occluded holographic object and at least the portion of the second occluded holographic object; receive, from the sensor, second depth image data indicating that the user is grasping and moving the second affordance; at least in response to the second depth image data, display the second slicing volume expanding within the holographic volume; and based on a location of the second slicing volume, cease displaying at least the portion of the second occluded holographic object while continuing to display at least the portion of the first occluded holographic object.

2.The computing device of claim 1, wherein the instructions are further executable to, based on the location of the first slicing volume, cease displaying the portion of the holographic object.

3.The computing device of claim 1, wherein the instructions are further executable to: display a third affordance for manipulating the first slicing volume; and prior to displaying the first slicing volume, in response to third depth image data indicating that the user is grasping the first affordance and the third affordance, display an initial slicing plane within the holographic volume.

4.The computing device of claim 3, wherein the instructions are further executable to: detect user manipulation of the first affordance and the third affordance; and in response to detecting the user manipulation, display the first slicing volume and cease displaying the initial slicing plane.

5.The computing device of claim 4, wherein the instructions are further executable to: receive, from the sensor, fourth depth image data indicating that the user is performing a release gesture with the first affordance and with the third affordance; and in response to detecting the release gesture, freeze a state of display of the holographic volume, the first occluded holographic object, and the second occluded holographic object.

6.The computing device of claim 5, wherein the instructions are further executable to, in response to detecting the release gesture, cease displaying the first slicing volume.

7.The computing device of claim 3, wherein the second affordance is displayed between the first affordance and the third affordance.

8.The computing device of claim 1, wherein the first slicing volume is a first cuboid volume, and the instructions are further executable to: display the first cuboid volume growing in volume as the user moves their hands apart; and on condition that the first cuboid volume expands to include the first occluded holographic object and the second occluded holographic object, display the first occluded holographic object and the second occluded holographic object.

9.The computing device of claim 1, wherein the instructions are further executable to: display a fourth affordance for manipulating the second slicing volume; and in response to fifth depth image data indicating that the user is grasping the second affordance and the fourth affordance, display another initial slicing plane within the holographic volume.

10.The computing device of claim 9, wherein the instructions are further executable to: detect user manipulation of the second affordance and the fourth affordance; and in response to detecting the user manipulation, display the second slicing volume and cease displaying the another initial slicing plane.

11.The computing device of claim 1, wherein the first affordance and the second affordance are displayed outside of the holographic volume and spaced from the holographic object.

12.The computing device of claim 1, wherein the holographic object is associated with a first layer of holographic objects, the first occluded holographic object and the second occluded holographic object are associated with a second layer of holographic objects, and the instructions are further executable to: manipulate the holographic object in response to movement of a first hand of the user; and manipulate the first occluded holographic object and the second occluded holographic object in response to movement of a second hand of the user.

13.A method enacted on a computing device, the method comprising: displaying via a display system at least a portion of a holographic object associated with a holographic volume; refraining from displaying at least a portion of a first occluded holographic object and at least a portion of a second occluded holographic object; displaying a first affordance for manipulating a first slicing volume; displaying a second affordance for manipulating a second slicing volume; receiving, from a sensor, first depth image data indicating that a user is grasping and moving the first affordance; at least in response to the first depth image data, displaying the first slicing volume expanding within the holographic volume; based on a location of the first slicing volume, displaying at least the portion of the first occluded holographic object and at least the portion of the second occluded holographic object; receiving, from the sensor, second depth image data indicating that the user is grasping and moving the second affordance; at least in response to the second depth image data, displaying the second slicing volume expanding within the holographic volume; and based on a location of the second slicing volume, cease displaying at least the portion of the second occluded holographic object while continuing to display at least the portion of the first occluded holographic object.

14.The method of claim 13, further comprising, based on the location of the first slicing volume, cease displaying the portion of the holographic object.

15.The method of claim 13, further comprising: displaying a third affordance for manipulating the first slicing volume; and prior to displaying the first slicing volume, in response to third depth image data indicating that the user is grasping the first affordance and the third affordance, displaying an initial slicing plane within the holographic volume.

16.The method of claim 15, further comprising: detecting user manipulation of the first affordance and the third affordance; and in response to detecting the user manipulation, displaying the first slicing volume and cease displaying the initial slicing plane.

17.The method of claim 16, further comprising: receiving, from the sensor, fourth depth image data indicating that the user is performing a release gesture with the first affordance and with the third affordance; and in response to detecting the release gesture, freezing a state of display of the holographic volume, the first occluded holographic object, and the second occluded holographic object.

18.The method of claim 13, wherein the first slicing volume is a first cuboid volume, the method further comprising: displaying the first cuboid volume growing in volume as the user moves their hands apart; and on condition that the first cuboid volume expands to include the first occluded holographic object and the second occluded holographic object, displaying the first occluded holographic object and the second occluded holographic object.

19.The method of claim 18, further comprising: displaying a fourth affordance for manipulating the second slicing volume; detecting user manipulation of the second affordance and the fourth affordance; and in response to detecting the user manipulation, displaying the second slicing volume.

20.A head-mounted display device, comprising: a see-through display system; a logic subsystem comprising one or more processors; and memory storing instructions executable by the logic subsystem to: display via the see-through display system at least a portion of a holographic object associated with a holographic volume; refrain from displaying at least a portion of a first occluded holographic object and at least a portion of a second occluded holographic object; display a first affordance for manipulating a first slicing volume; display a second affordance for manipulating a second slicing volume; receive, from a sensor, first depth image data indicating that a user is grasping and moving the first affordance; at least in response to the first depth image data, display the first slicing volume expanding within the holographic volume; based on a location of the first slicing volume, display at least the portion of the first occluded holographic object and at least the portion of the second occluded holographic object; receive, from the sensor, second depth image data indicating that the user is grasping and moving the second affordance; at least in response to the second depth image data, display the second slicing volume expanding within the holographic volume; and based on a location of the second slicing volume, cease to display at least the portion of the second occluded holographic object while continuing to display at least the portion of the first occluded holographic object.

Description

CROSS REFERENCE TO RELATED APPLICATIONS

This application is a continuation of U.S. Non-Provisional application Ser. No. 16/391,048, filed Apr. 22, 2019, which claims priority to U.S. Provisional patent application Ser. No. 62/809,627, filed Feb. 23, 2019, and entitled “DISPLAYING HOLOGRAMS VIA HAND LOCATION,” the entirety of each of which is hereby incorporated herein by reference for all purposes.

BACKGROUND

Some display systems are configured to display virtual imagery as admixed with a real-world background, for example via a see-through display system or via augmentation of a video image of the real-world background.

SUMMARY

This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter. Furthermore, the claimed subject matter is not limited to implementations that solve any or all disadvantages noted in any part of this disclosure.

Examples are disclosed that relate to displaying holographic objects using slicing planes or volumes. In one example a computing device causes a display system to display a holographic object associated with a holographic volume, the holographic object occluding an occluded holographic object that is not displayed. Location data of at least a portion of a hand is received from a sensor. The location data of the hand is used to locate a slicing plane or a slicing volume within the holographic volume. Based on the location of the slicing plane or the slicing volume, at least a portion of the occluded holographic object is displayed via the display system.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is an example use case environment showing diagram an HMD device displaying a holographic object according to examples of the present disclosure.

FIG. 2 shows the use case environment of FIG. 1 with the user’s hands extended into the field of view of the HMD device according to examples of the present disclosure.

FIG. 3 shows an example of utilizing one hand to manipulate a slicing plane within a holographic volume according to examples of the present disclosure.

FIGS. 47 show examples of utilizing one hand to manipulate a slicing plane within a holographic volume according to examples of the present disclosure.

FIGS. 811 show examples of utilizing two hands to manipulate two slicing planes within a holographic volume according to examples of the present disclosure.

FIGS. 1219 show examples of utilizing two hands to manipulate two slicing planes that define a slicing volume within a holographic volume according to examples of the present disclosure.

FIGS. 2022 show examples of utilizing two hands to manipulate a slicing volume within a holographic volume according to examples of the present disclosure.

FIGS. 23 and 24 show examples of utilizing two digits of one hand to manipulate a slicing volume within a holographic volume according to examples of the present disclosure.

FIG. 25 schematically shows an example use environment comprising a display system that may be utilized to control the display holographic objects according to examples of the present disclosure.

FIGS. 26A and 26B are a flowchart illustrating an example method for displaying holographic objects associated with a holographic volume according to examples of the present disclosure.

FIG. 27 is a block diagram illustrating an example computing system.

DETAILED DESCRIPTION

An augmented or virtual reality system, such as a head-mounted display (HMD), may permit a user to interact with a variety of displayed holographic objects. In some examples, one or more holographic objects may occupy a volume of space. For example and with reference to the example use environment 100 shown in FIG. 1, a user 104 wears a head-mounted display (HMD) device in the form of an augmented reality display system 102. The augmented reality display system 102 displays virtual imagery to the user 104 via a see-through display system such that at least a portion of a real-world background is viewable concurrently with the displayed virtual imagery. While described in the context of an augmented reality display system and the use of a see-through display, it will be understood that examples described herein also may be enacted via a virtual reality display system or a video augmented reality display system in which a video image of a physical environment is obtained by a camera and then augmented with virtual image data when displayed to a user of the system.

In this example, the HMD 102 displays a three-dimensional holographic volume in the form of a virtual house 106 displayed within the field of view 108 of the augmented reality display system 102. Additional holographic objects may be located inside the volume of the virtual house 106. These objects are occluded from view by the HMD 102 such that the user 104 sees only exterior elements of the house (roof, walls, etc.). In some systems, if the user desires to view holographic objects located inside the house, they first must find an “edit mode” in their display system, select a separate control feature, and then manipulate the control feature to change their view. Such a control feature interposes a mediating interface between the user’s actual input and the user’s ability to change the view of occluded objects inside the house. For example, the user may be required to operate an editing affordance via digital manipulation, speech command, gaze direction, head direction, button press, or other manipulation, to change their view of the house. This approach is slow, highly precise, and requires indirect manipulation by the user.

Accordingly, examples of interaction modes are disclosed that relate to viewing inside a holographic volume in a potentially more natural, intuitive, and efficient manner. Briefly and as described in more detail below, in some examples a user of a display system may reveal holographic objects located within a holographic volume by simply moving one or both hands of the user. In some examples, location data of at least a portion of a hand is received from a sensor. Based on the location data, a change in location of the hand relative to the holographic volume is determined. Based at least on the change in location of the hand relative to the holographic volume, one or more occluded holographic objects associated with the holographic volume, which were previously occluded from view, are displayed via the display system.

As used herein, in some examples location and location data may comprise 3 degree-of-freedom location/data (such as position or orientation information relative to 3 orthogonal axes). In some examples, location and location data may comprise 6 degree-of-freedom location/data, including position information along 3 perpendicular axes and changes in orientation through rotation about the three perpendicular axes (yaw, pitch and roll).

In some examples and as described in more detail below, using articulated hand location data obtained from a sensor, a slicing plane is defined along an axis that is aligned with the backside of the user’s palm of a hand. On one side of the slicing plane, holographic objects within the holographic volume are displayed, while on the other side of the slicing plane other holographic objects within the volume are not displayed to the user. As the user moves her hand the slicing plane is correspondingly relocated, and holographic objects within the volume are correspondingly displayed or occluded. In this manner, the slicing plane may provide a “flashlight” experience in which the user may easily and selectively reveal previously occluded holographic objects within the volume.

In some examples, both hands of the user may each define a slicing plane. In some examples, the plane is defined along an axis that is aligned with the palm of the user’s hand. When the user’s palms at least partially face each other, the slicing planes may define a sub-volume within the holographic volume in which holographic objects are displayed, and outside of which holographic objects are occluded. This can create an experience of the user “holding” and dynamically resizing a volume of space between the user’s hands in which holographic objects within the volume are revealed.

In some examples, both hands of the user may define a sub-volume (spherical, oblong, polyhedral, or other shape) between the hands within the holographic volume in which holographic objects are displayed, and outside of which holographic objects are occluded. This can create an experience of the user “holding” and dynamically resizing a “beach ball”, “football” or other portion of space between the user’s hands in which holographic objects within the volume are revealed.

As a more specific example and with reference to FIG. 2, in an augmented-reality scenario, user 104 may view the holographic house 106 and other holographic objects located within the house (occluded from view in FIG. 2) in a stationary frame of reference for the real-world. The term “stationary frame of reference” indicates that the house is fixed in position relative to the real-world as a user moves through the use environment 100. The house and the internally-located objects are displayed in the real-world using a coordinate location (e.g. Cartesian coordinates) within a coordinate system of the stationary frame of reference. As described in more detail in the examples below, the user 104 orients one or both hands 120, 124 to be within the field of view 108 of the augmented reality display system 102. In some examples, moving one or both hands 120, 124 to be within the field of view 108 triggers an interaction mode that enables the user to reveal holographic objects located within the holographic volume by simply moving one or both hands 120, 124.

In some examples, the user may trigger an interaction mode as described herein by penetrating the holographic volume of house 106 with one or both hands 120,124. In other examples, the interaction mode may be triggered in any suitable manner, such as via verbal command, button press, etc.

As mentioned above and as described in more detail below, in some examples the augmented reality display system 102 uses one or more sensors to capture depth image data of the real-world use environment 100 and detects, via the depth image data, an appendage (hand 120, 124) of the user 104. Such image data may represent articulated hand image data that represents multiple joints, lengths, and/or surfaces of the hand. In this manner, the system may track the location of one or more joints, lengths, surfaces, and digits of the hand and/or planes defined by the hand. In some examples, the augmented reality display system 102 may fit a skeletal model to each image in a series of depth images, and apply one or more gesture filters to detect whether the user has performed a recognized gesture. In other examples, the augmented reality display system 102 may received depth image data and/or other image data from one or more cameras external to the display system.

With reference now to FIG. 3, in this example the left hand 120 of user 104 is within the field of view of HMD 102. A holographic volume in the form of another house 300 is displayed via the HMD. The HMD 102 receives location data of the hand 120 that may include a backside point location 130 on the upper portion 134 of the user’s hand opposite to the palm side (see also FIG. 1) and one or more other locations of the hand. Using such location data, a slice plane 304 may be defined that is substantially parallel to the surface of the upper portion 134 of the hand. As described in more detail below, the user may conveniently and naturally move hand 120 to correspondingly move the slicing plane 304 through the house 300 to selectively reveal and occlude from view other holographic objects located within the volume of house 300.

In the example of FIG. 3, an affordance of the slicing plane 304 is displayed via the HMD 102 to enable the user to more clearly perceive the current location of the plane. In this example, the affordance comprises a translucent pane that defines the boundaries of the slicing plane. In other examples, other suitable affordances (such as a simple rectangle, glowing outline, etc.) may be displayed. In other examples, an affordance of the slicing plane 304 may not be displayed.

In some examples, the slicing plane 304 may be “snapped” to align with one or more of a closest axis of the holographic volume. In the example of FIG. 3, the upper portion 134 of hand 120 is most closely aligned with the Y-Z plane of the three mutually orthogonal coordinate planes. Accordingly, the slicing plane 304 is snapped to align with the Y-Z plane. In this example, the X-Y-Z axis and corresponding three orthogonal planes are determined with respect to the surfaces of the holographic house 300. In other examples the coordinate axis and corresponding orthogonal planes may be determined and set in any suitable manner.

In some examples and as described below, the slicing plane 304 may be locked to the closest axis to which it is snapped. In this manner, the user may freely move her hand within the volume, including rotating her hand about such axis, while the slicing plane remains constrained to move along a single axis. In the example of FIG. 3, when the slicing plane 304 is snapped to the Y-Z axis, the user may move the slicing plane laterally along the X-axis to conveniently reveal and occlude other holographic objects within the house 300. In this manner, the system maintains alignment of the slicing plane with the closest coordinate plane during movement of the hand.

In other examples and as described below, a slicing plane may be free to move about all three axes from 0-360 degrees, and thereby follow the orientation of the upper portion 304 of the user’s hand.

In the example of FIG. 3, the slicing plane 304 selectively reveals holographic objects that are located behind the upper portion 134 of the hand (e.g., rearward in the negative X-axis direction). In this example, the slicing plane 304 may operate like a “flashlight” to reveal previously hidden or occluded holographic objects located in a predetermined revealing direction relative to the plane—in this example, in the negative X-axis direction relative to the plane. In other examples, other revealing directions may be utilized, such as in the positive X-axis direction relative to the plane.

With reference now to FIGS. 47, another example of utilizing one hand to manipulate a slicing plane within a holographic volume is provided. These figures show the user’s view through the see-through display of an augmented reality device, such as HMD 102. As shown in FIG. 4, a holographic volume in the form of a house model 400 is displayed via HMD 102 to a user. The house model 400 comprises a plurality of holographic objects, including structural features (f