Nweon Patent https://patent.nweon.com 映维网,影响力虚拟现实(VR)、增强现实(AR)产业信息数据平台 Sat, 28 Jan 2023 00:31:31 +0000 en-US hourly 1 https://wordpress.org/?v=4.8.17 https://patent.nweon.com/wp-content/uploads/2021/04/nweon-icon.png Nweon Patent https://patent.nweon.com 32 32 MagicLeap Patent | Method and system for resolving hemisphere ambiguity in six degree of freedom pose measurements https://patent.nweon.com/26715 Thu, 26 Jan 2023 15:53:18 +0000 https://patent.nweon.com/?p=26715 ...

文章《MagicLeap Patent | Method and system for resolving hemisphere ambiguity in six degree of freedom pose measurements》首发于Nweon Patent

]]>
Patent: Method and system for resolving hemisphere ambiguity in six degree of freedom pose measurements

Patent PDF: 加入映维网会员获取

Publication Number: 20230021404

Publication Date: 2023-01-26

Assignee: Magic Leap

Abstract

Techniques for resolving hemisphere ambiguity are disclosed. One or more magnetic fields are emitted at a handheld controller. The one or more magnetic fields are detected by one or more sensors positioned relative to a headset. Movement data corresponding to the handheld controller or the headset is detected. During a first time interval, a first position and a first orientation of the handheld controller within a first hemisphere are determined based on the detected one or more magnetic fields, and a first discrepancy is calculated based on the first position, the first orientation, and the movement data. During a second time interval, a second position and a second orientation of the handheld controller within a second hemisphere are determined based on the detected one or more magnetic fields, and a second discrepancy is calculated based on the second position, the second orientation, and the movement data.

Claims

What is claimed is:

1.A method comprising: emitting magnetic fields at a handheld controller; detecting the magnetic fields at a magnetic field sensor positioned relative to a headset; determining a first position and a first orientation of the handheld controller within a first hemisphere with respect to the headset based on the magnetic fields detected by the magnetic field sensor; and determining a second position and a second orientation of the handheld controller within a second hemisphere with respect to the headset based on the magnetic fields detected by the magnetic field sensor, wherein the second hemisphere is diametrically opposite the first hemisphere.

2.The method of claim 1, further comprising: detecting movement data corresponding to movement of the handheld controller or the headset.

3.The method of claim 2, further comprising: calculating a first discrepancy based on the first position, the first orientation, and the movement data; and calculating a second discrepancy based on the second position, the second orientation, and the movement data.

4.The method of claim 3, further comprising: continuing to determine positions and orientations of the handheld controller either within the first hemisphere or the second hemisphere based on one or both of the first discrepancy and the second discrepancy.

5.The method of claim 2, wherein the movement data corresponds to the movement of the handheld controller, and wherein the movement data is detected by a movement sensor positioned within the handheld controller.

6.The method of claim 2, wherein the movement data corresponds to the movement of the headset, and wherein the movement data is detected by a movement sensor positioned within the headset.

7.The method of claim 1, wherein the first hemisphere is a front hemisphere with respect to the headset and the second hemisphere is a back hemisphere with respect to the headset.

8.A system comprising: a handheld controller comprising a magnetic field emitter configured to emit magnetic fields; a headset comprising a magnetic field sensor configured to detect the magnetic fields; and one or more processors configured to perform operations including: determining a first position and a first orientation of the handheld controller within a first hemisphere with respect to the headset based on the magnetic fields detected by the magnetic field sensor; and determining a second position and a second orientation of the handheld controller within a second hemisphere with respect to the headset based on the magnetic fields detected by the magnetic field sensor, wherein the second hemisphere is diametrically opposite the first hemisphere.

9.The system of claim 8, further comprising: a movement sensor configured to detect movement data corresponding to movement of the handheld controller or the headset.

10.The system of claim 9, wherein the operations further comprise: calculating a first discrepancy based on the first position, the first orientation, and the movement data; and calculating a second discrepancy based on the second position, the second orientation, and the movement data.

11.The system of claim 10, wherein the operations further comprise: continuing to determine positions and orientations of the handheld controller either within the first hemisphere or the second hemisphere based on one or both of the first discrepancy and the second discrepancy.

12.The system of claim 9, wherein the movement data corresponds to the movement of the handheld controller, and wherein the movement data is detected by a movement sensor positioned within the handheld controller.

13.The system of claim 9, wherein the movement data corresponds to the movement of the headset, and wherein the movement data is detected by a movement sensor positioned within the headset.

14.The system of claim 8, wherein the first hemisphere is a front hemisphere with respect to the headset and the second hemisphere is a back hemisphere with respect to the headset.

15.A non-transitory computer-readable medium comprising instructions that, when executed by one or more processors, cause the one or more processors to perform operations comprising: causing magnetic fields to be emitted at a handheld controller; causing the magnetic fields to be detected at a sensor positioned relative to a headset; determining a first position and a first orientation of the handheld controller within a first hemisphere with respect to the headset based on the magnetic fields detected by the magnetic field sensor; and determining a second position and a second orientation of the handheld controller within a second hemisphere with respect to the headset based on the magnetic fields detected by the magnetic field sensor, wherein the second hemisphere is diametrically opposite the first hemisphere.

16.The non-transitory computer-readable medium of claim 15, wherein the operations further comprise: detecting movement data corresponding to movement of the handheld controller or the headset.

17.The non-transitory computer-readable medium of claim 16, wherein the operations further comprise: calculating a first discrepancy based on the first position, the first orientation, and the movement data; and calculating a second discrepancy based on the second position, the second orientation, and the movement data.

18.The non-transitory computer-readable medium of claim 17, wherein the operations further comprise: continuing to determine positions and orientations of the handheld controller either within the first hemisphere or the second hemisphere based on one or both of the first discrepancy and the second discrepancy.

19.The non-transitory computer-readable medium of claim 16, wherein the movement data corresponds to the movement of the handheld controller, and wherein the movement data is detected by a movement sensor positioned within the handheld controller.

20.The non-transitory computer-readable medium of claim 16, wherein the movement data corresponds to the movement of the headset, and wherein the movement data is detected by a movement sensor positioned within the headset.

Description

CROSS-REFERENCES TO RELATED APPLICATIONS

This application is a continuation of U.S. patent application Ser. No. 16/752,165, filed on Jan. 24, 2020, entitled “METHOD AND SYSTEM FOR RESOLVING HEMISPHERE AMBIGUITY IN SIX DEGREE OF FREEDOM POSE MEASUREMENTS,” which is a non-provisional of and claims the benefit of and priority to U.S. Provisional Patent Application No. 62/797,776, filed Jan. 28, 2019, entitled “METHOD AND SYSTEM FOR RESOLVING HEMISPHERE AMBIGUITY IN SIX DEGREE OF FREEDOM POSE MEASUREMENTS,” which are hereby incorporated by reference in their entirety for all purposes.

BACKGROUND

Modern computing and display technologies have facilitated the development of systems for so called “virtual reality” or “augmented reality” experiences, wherein digitally reproduced images or portions thereof are presented to a user in a manner wherein they seem to be, or may be perceived as, real. A virtual reality, or “VR,” scenario typically involves presentation of digital or virtual image information without transparency to other actual real-world visual input; an augmented reality, or “AR,” scenario typically involves presentation of digital or virtual image information as an augmentation to visualization of the actual world around the user.

Despite the progress made in these display technologies, there is a need in the art for improved methods, systems, and devices related to augmented reality systems, particularly, display systems.

SUMMARY

The present disclosure relates generally to techniques for improving the performance and user experience of optical systems. More particularly, embodiments of the present disclosure provide methods for operating an augmented reality (AR) or virtual reality (VR) device in which a handheld controller is employed for assisting operation of the device. A summary of the present disclosure is described in reference to the examples given below. As used below, any reference to a series of examples is to be understood as a reference to each of those examples disjunctively (e.g., “Examples 1-4” is to be understood as “Examples 1, 2, 3, or 4”).

Example 1 is a method of resolving hemisphere ambiguity at a system comprising one or more sensors, the method comprising: emitting, at a handheld controller, one or more magnetic fields; detecting, by one or more sensors positioned within a headset or a belt pack of the system, the one or more magnetic fields; running a first processing stack during a first time interval, wherein running the first processing stack includes: determining a first position and a first orientation of the handheld controller within a first hemisphere with respect to the headset based on the one or more magnetic fields; and calculating a first discrepancy based on the first position, the first orientation, and movement data corresponding to either the handheld controller or the headset; running a second processing stack during a second time interval, wherein running the second processing stack includes: determining a second position and a second orientation of the handheld controller within a second hemisphere with respect to the headset based on the one or more magnetic fields, wherein the second hemisphere is diametrically opposite the first hemisphere; and calculating a second discrepancy based on the second position, the second orientation, and the movement data; and aborting either the first processing stack or the second processing stack based on one or both of the first discrepancy and the second discrepancy.

Example 2 is the method of example(s) 1, further comprising: detecting the movement data by an inertial measurement unit (IMU) positioned within the handheld controller.

Example 3 is the method of example(s) 1, further comprising: detecting the movement data by an inertial measurement unit (IMU) positioned within the headset.

Example 4 is the method of example(s)s 2 or 3, wherein the movement data is detected during one or both of the first time interval and the second time interval.

Example 5 is the method of example(s)s 2 or 3, wherein the movement data is detected prior to both the first time interval and the second time interval.

Example 6 is the method of example(s) 1, wherein the first time interval is concurrent with the second time interval.

Example 7 is the method of example(s) 1, wherein the first time interval is simultaneous with the second time interval.

Example 8 is the method of example(s) 1, wherein the first time interval has a first start time and the second time interval has a second start time, and wherein the first start time and the second start time are simultaneous or are separated by less than a threshold.

Example 9 is the method of example(s) 1, further comprising: comparing the first discrepancy to a threshold; determining that the first discrepancy exceeds the threshold; and in response to determining that the first discrepancy exceeds the threshold: aborting the first processing stack; and allowing the second processing stack to continue.

Example 10 is the method of example(s) 1, further comprising: comparing the second discrepancy to a threshold; determining that the second discrepancy exceeds the threshold; and in response to determining that the second discrepancy exceeds the threshold: aborting the second processing stack; and allowing the first processing stack to continue.

Example 11 is the method of example(s) 1, further comprising: comparing the first discrepancy to the second discrepancy; determining that the first discrepancy exceeds the second discrepancy; and in response to determining that the first discrepancy exceeds the second discrepancy: aborting the first processing stack; and allowing the second processing stack to continue.

Example 12 is the method of example(s) 1, further comprising: comparing the first discrepancy to the second discrepancy; determining that the second discrepancy exceeds the first discrepancy; and in response to determining that the second discrepancy exceeds the first discrepancy: aborting the second processing stack; and allowing the first processing stack to continue.

Example 13 is the method of example(s) 1, wherein the first hemisphere is a front hemisphere with respect to the headset and the second hemisphere is a back hemisphere with respect to the headset.

Example 14 is the method of example(s) 1, further comprising: delivering virtual content to the user based on either: the first position and the first orientation; or the second position and the second orientation.

Example 15 is the method of example(s) 1, wherein the system is an optical device.

Example 16 is a system comprising: a handheld controller comprising a magnetic field emitter configured to emit one or more magnetic fields; a headset or belt pack comprising one or more magnetic field sensors configured to detect the one or more magnetic fields; a processor configured to perform operations including: running a first processing stack during a first time interval, wherein running the first processing stack includes: determining a first position and a first orientation of the handheld controller within a first hemisphere with respect to the headset based on the one or more magnetic fields; and calculating a first discrepancy based on the first position, the first orientation, and movement data corresponding to either the handheld controller or the headset; running a second processing stack during a second time interval, wherein running the second processing stack includes: determining a second position and a second orientation of the handheld controller within a second hemisphere with respect to the headset based on the one or more magnetic fields, wherein the second hemisphere is diametrically opposite the first hemisphere; and calculating a second discrepancy based on the second position, the second orientation, and the movement data; and aborting either the first processing stack or the second processing stack based on one or both of the first discrepancy and the second discrepancy.

Example 17 is the system of example(s) 16, further comprising: detecting the movement data by an inertial measurement unit (IMU) positioned within the handheld controller.

Example 18 is the system of example(s) 16, wherein the operations further comprise: detecting the movement data by an inertial measurement unit (IMU) positioned within the headset.

Example 19 is the system of example(s)s 17 or 18, wherein the movement data is detected during one or both of the first time interval and the second time interval.

Example 20 is the system of example(s)s 17 or 18, wherein the movement data is detected prior to both the first time interval and the second time interval.

Example 21 is the system of example(s) 16, wherein the first time interval is concurrent with the second time interval.

Example 22 is the system of example(s) 16, wherein the first time interval is simultaneous with the second time interval.

Example 23 is the system of example(s) 16, wherein the first time interval has a first start time and the second time interval has a second start time, and wherein the first start time and the second start time are simultaneous or are separated by less than a threshold.

Example 24 is the system of example(s) 16, wherein the operations further comprise: comparing the first discrepancy to a threshold; determining that the first discrepancy exceeds the threshold; and in response to determining that the first discrepancy exceeds the threshold: aborting the first processing stack; and allowing the second processing stack to continue.

Example 25 is the system of example(s) 16, wherein the operations further comprise: comparing the second discrepancy to a threshold; determining that the second discrepancy exceeds the threshold; and in response to determining that the second discrepancy exceeds the threshold: aborting the second processing stack; and allowing the first processing stack to continue.

Example 26 is the system of example(s) 16, wherein the operations further comprise: comparing the first discrepancy to the second discrepancy; determining that the first discrepancy exceeds the second discrepancy; and in response to determining that the first discrepancy exceeds the second discrepancy: aborting the first processing stack; and allowing the second processing stack to continue.

Example 27 is the system of example(s) 16, wherein the operations further comprise: comparing the first discrepancy to the second discrepancy; determining that the second discrepancy exceeds the first discrepancy; and in response to determining that the second discrepancy exceeds the first discrepancy: aborting the second processing stack; and allowing the first processing stack to continue.

Example 28 is the system of example(s) 16, wherein the first hemisphere is a front hemisphere with respect to the headset and the second hemisphere is a back hemisphere with respect to the headset.

Example 29 is the system of example(s) 16, wherein the operations further comprise: delivering virtual content to the user based on either: the first position and the first orientation; or the second position and the second orientation.

Example 30 is the system of example(s) 16, wherein the system is an optical device.

Example 31 is a method of resolving hemisphere ambiguity, the method comprising: emitting one or more magnetic fields at a handheld controller; detecting the one or more magnetic fields by one or more sensors positioned relative to a headset; detecting movement data corresponding to the handheld controller or the headset; during a first time interval: determining a first position and a first orientation of the handheld controller within a first hemisphere with respect to the headset based on the detected one or more magnetic fields; and calculating a first discrepancy based on the first position, the first orientation, and the movement data; during a second time interval: determining a second position and a second orientation of the handheld controller within a second hemisphere with respect to the headset based on the detected one or more magnetic fields, wherein the second hemisphere is diametrically opposite the first hemisphere; and calculating a second discrepancy based on the second position, the second orientation, and the movement data.

Example 32 is a system comprising: a handheld controller comprising a magnetic field emitter configured to emit one or more magnetic fields; a headset comprising one or more magnetic field sensors configured to detect the one or more magnetic fields; a movement sensor configured to detect movement data corresponding to the handheld controller or the headset; and one or more processors configured to perform operations including: during a first time interval: determining a first position and a first orientation of the handheld controller within a first hemisphere with respect to the headset based on the detected one or more magnetic fields; and calculating a first discrepancy based on the first position, the first orientation, and the movement data; during a second time interval: determining a second position and a second orientation of the handheld controller within a second hemisphere with respect to the headset based on the detected one or more magnetic fields, wherein the second hemisphere is diametrically opposite the first hemisphere; and calculating a second discrepancy based on the second position, the second orientation, and the movement data.

Example 33 is a non-transitory computer-readable medium comprising instructions that, when executed by one or more processors, cause the one or more processors to perform operations comprising: emitting one or more magnetic fields at a handheld controller; detecting the one or more magnetic fields by one or more sensors positioned relative to a headset; detecting movement data corresponding to the handheld controller or the headset; during a first time interval: determining a first position and a first orientation of the handheld controller within a first hemisphere with respect to the headset based on the detected one or more magnetic fields; and calculating a first discrepancy based on the first position, the first orientation, and the movement data; during a second time interval: determining a second position and a second orientation of the handheld controller within a second hemisphere with respect to the headset based on the detected one or more magnetic fields, wherein the second hemisphere is diametrically opposite the first hemisphere; and calculating a second discrepancy based on the second position, the second orientation, and the movement data.

BRIEF DESCRIPTION OF THE DRAWINGS

The accompanying drawings, which are included to provide a further understanding of the disclosure, are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the detailed description serve to explain the principles of the disclosure. No attempt is made to show structural details of the disclosure in more detail than may be necessary for a fundamental understanding of the disclosure and various ways in which it may be practiced.

FIG. 1 illustrates an augmented reality (AR) scene as viewed through a wearable AR device, according to some embodiments.

FIG. 2 illustrates various possible components of an AR system.

FIG. 3 illustrates an example system diagram of an electromagnetic tracking system.

FIG. 4 illustrates an example of how an electromagnetic tracking system may be incorporated with an AR system.

FIG. 5 illustrates the hemisphere ambiguity problem that may be present in electromagnetic tracking systems.

FIG. 6 illustrates a method for resolving hemisphere ambiguity at a system or device including one or more sensors.

FIG. 7 illustrates a method for resolving hemisphere ambiguity at a system or device including one or more sensors.

FIG. 8 illustrates results of a simulation showing the expected totem pose when the head is moving and the totem is still.

FIG. 9 illustrates experimental data when the head is moving and the totem is still, with the totem being initialized in the correct hemisphere.

FIG. 10 illustrates experimental totem movement data when the totem is still, with the totem being initialized in the correct hemisphere.

FIG. 11 illustrates experimental data when the head is moving and the totem is still, with the totem being initialized in the wrong hemisphere.

FIG. 12 illustrates experimental totem movement data with no totem motion corresponding to the pose data displayed in FIG. 11.

FIG. 13 illustrates a computer system, according to some embodiments described herein.

DETAILED DESCRIPTION OF SPECIFIC EMBODIMENTS

A typical head-worn augmented reality (AR) display is at least loosely coupled to a user’s head, and thus moves when the user’s head moves. If the user’s head motions are detected by the display system, the data being displayed can be updated to take the change in head pose into account. As an example, if a user wearing a head-worn display views a virtual representation of a three-dimensional (3D) object on the display and walks around the area where the 3D object appears, that 3D object can be re-rendered for each viewpoint, giving the user the perception that they are walking around an object that occupies real space. If the head-worn display is used to present multiple objects within a virtual space (for instance, a rich virtual world), measurements of head pose (e.g., the location and orientation of the user’s head) can be used to re-render the scene to match the user’s dynamically changing head location and orientation and provide an increased sense of immersion in the virtual space.

Accordingly, detection or calculation of head pose can facilitate the display system to render virtual objects such that they appear to occupy a space in the real world in a manner that makes sense to the user. In addition, detection of the position and/or orientation of a real object, such as a handheld device or controller (which also may be referred to as a “totem”), haptic device, or other real physical object, in relation to the user’s head or AR system may also facilitate the display system in presenting display information to the user to enable the user to interact with certain aspects of the AR system efficiently. At least for AR applications, placement of virtual objects in spatial relation to physical objects (e.g., presented to appear spatially proximate a physical object in two- or three-dimensions) may be a non-trivial problem.

For example, head movement may significantly complicate placement of virtual objects in a view of an ambient environment. Such is true whether the view is captured as an image of the ambient environment and then projected or displayed to the end user, or whether the end user perceives the view of the ambient environment directly. For instance, head movement will likely cause a field of view of the end user to change, which will likely require an update to where various virtual objects are displayed in the field of the view of the end user.

Additionally, head movements may occur within a large variety of ranges and speeds. Head movement speed may vary not only between different head movements, but within or across the range of a single head movement. For instance, head movement speed may initially increase (e.g., linearly or not) from a starting point, and may decrease as an ending point is reached, obtaining a maximum speed somewhere between the starting and ending points of the head movement. Rapid head movements may even exceed the ability of the particular display or projection technology to render images that appear uniform and/or as smooth motion to the end user.

Head tracking accuracy and latency (i.e., the elapsed time between when the user moves his or her head and the time when the image gets updated and displayed to the user) have been challenges for virtual reality (VR) and AR systems. Especially for display systems that fill a substantial portion of the user’s visual field with virtual elements, it can be important that the accuracy of head-tracking is high and that the overall system latency is very low from the first detection of head motion to the updating of the light that is delivered by the display to the user’s visual system. If the latency is high, the system can create a mismatch between the user’s vestibular and visual sensory systems, and generate a user perception scenario that can lead to motion sickness or simulator sickness. If the system latency is high, the apparent location of virtual objects will appear unstable during rapid head motions.

In addition to head-worn display systems, other display systems can benefit from accurate and low latency head pose detection. These include head-tracked display systems in which the display is not worn on the user’s body, but is, e.g., mounted on a wall or other surface. The head-tracked display acts like a window onto a scene, and as a user moves his head relative to the “window” the scene is re-rendered to match the user’s changing viewpoint. Other systems include a head-worn projection system, in which a head-worn display projects light onto the real world.

Additionally, in order to provide a realistic augmented reality experience, AR systems may be designed to be interactive with the user. For example, multiple users may play a ball game with a virtual ball and/or other virtual objects. One user may “catch” the virtual ball, and throw the ball back to another user. In another embodiment, a first user may be provided with a totem (e.g., a real bat communicatively coupled to the AR system) to hit the virtual ball. In other embodiments, a virtual user interface may be presented to the AR user to allow the user to select one of many options. The user may use totems, haptic devices, wearable components, or simply touch the virtual screen to interact with the system.

Detecting head pose and orientation of the user, and detecting a physical location of real objects in space enable the AR system to display virtual content in an effective and enjoyable manner. However, although these capabilities are key to an AR system, they are difficult to achieve. For example, the AR system may need to recognize a physical location of a real object (e.g., user’s head, totem, haptic device, wearable component, user’s hand, etc.) and correlate the physical coordinates of the real object to virtual coordinates corresponding to one or more virtual objects being displayed to the user. This can require highly accurate sensors and sensor recognition systems that track a position and orientation of one or more objects at rapid rates.

Current approaches do not perform localization at satisfactory speed or precision standards. Thus, there is a need for better localization systems in the context of AR and VR devices. In the following description, various embodiments will be described. For purposes of explanation, specific configurations and details are set forth in order to provide a thorough understanding of the embodiments. However, it will also be apparent to one skilled in the art that the embodiments may be practiced without the specific details. Furthermore, well-known features may be omitted or simplified in order not to obscure the embodiments being described.

FIG. 1 illustrates an AR scene as viewed through a wearable AR device according to some embodiments described herein. An AR scene 100 is depicted wherein a user of an AR technology sees a real-world park-like setting 106 featuring people, trees, buildings in the background, and a concrete platform 120. In addition to these items, the user of the AR technology also perceives that he “sees” a robot statue 110 standing upon the real-world platform 120, and a cartoon-like avatar character 102 flying by, which seems to be a personification of a bumble bee, even though these elements (character 102 and statue 110) do not exist in the real world. Due to the extreme complexity of the human visual perception and nervous system, it is challenging to produce a VR or AR technology that facilitates a comfortable, natural-feeling, rich presentation of virtual image elements amongst other virtual or real-world imagery elements.

FIG. 2 illustrates various possible components of an AR system. In the illustrated embodiment, an AR system user 260 is depicted wearing a head mounted component 258 featuring a frame 264 structure coupled to a display system 262 positioned in front of the eyes of the user. A speaker 266 is coupled to frame 264 in the depicted configuration and is positioned adjacent the ear canal of the user (in one embodiment, another speaker, not shown, is positioned adjacent the other ear canal of the user to provide for stereo/shapeable sound control). Display 262 is operatively coupled (as indicated by 268), such as by a wired lead or wireless connectivity, to a local processing and data module 270 which may be mounted in a variety of configurations, such as fixedly attached to frame 264, fixedly attached to a helmet or hat, removably attached to the torso of user 260 in a backpack-style configuration, or removably attached to the hip of user 260 in a belt-coupling style configuration.

Local processing and data module 270 may include a power-efficient processor or controller, as well as digital memory, such as flash memory, both of which may be utilized to assist in the processing, caching, and storage of data that is (1) captured from sensors which may be operatively coupled to frame 264, such as image capture devices (such as cameras), microphones, inertial measurement units, accelerometers, compasses, GPS units, radio devices, and/or gyroscopes; and/or is (2) acquired and/or processed using remote processing module 272 and/or remote data repository 274, possibly for passage to display 262 after such processing or retrieval.

Local processing and data module 270 may be operatively coupled (as indicated by 276, 278), such as via one or more wired or wireless communication links, to remote processing module 272 and remote data repository 274 such that these remote modules 272, 274 are operatively coupled to each other and available as resources to local processing and data module 270. In one embodiment, remote processing module 272 may include one or more relatively powerful processors or controllers configured to analyze and process data and/or image information. In one embodiment, remote data repository 274 may include a relatively large-scale digital data storage facility, which may be available through the internet or other networking configuration in a “cloud” resource configuration. In one embodiment, all data is stored and all computation is performed in the local processing and data module, allowing fully autonomous use from any remote modules.

One approach to achieve high precision localization may involve the use of an electromagnetic field coupled with electromagnetic sensors that are strategically placed on the user’s AR headset, belt pack, and/or other ancillary devices (e.g., totems, haptic devices, gaming instruments, etc.). Electromagnetic tracking systems typically include at least an electromagnetic field emitter and at least one electromagnetic field sensor. The sensors may measure electromagnetic fields with a known distribution. Based on these measurements a position and orientation of a field sensor relative to the emitter is determined.

FIG. 3 illustrates an example system diagram of an electromagnetic tracking system, which may have similar components to those developed by organizations such as the Biosense (RTM) division of Johnson & Johnson Corporation, Polhemus (RTM), Inc. of Colchester, Vt., manufactured by Sixense (RTM) Entertainment, Inc. of Los Gatos, Calif., and other tracking companies. In one or more embodiments, the electromagnetic tracking system includes an electromagnetic field emitter 302 which is configured to emit a known magnetic field. As shown in FIG. 3, electromagnetic field emitter 302 may be coupled to a power supply 310 (e.g., electric current, batteries, etc.) to provide power to electromagnetic field emitter 302.

In one or more embodiments, electromagnetic field emitter 302 includes several coils (e.g., at least three coils positioned perpendicular to each other to produce fields in the X, Y, and Z directions) that generate magnetic fields. These magnetic fields are used to establish a coordinate space, which allows the system to map a position of the sensors in relation to the known magnetic field, and helps determine a position and/or orientation of the sensors. In one or more embodiments, electromagnetic sensors 304A, 304B, etc. may be attached to one or more real objects. Electromagnetic sensors 304 may include smaller coils in which current may be induced through the emitted electromagnetic field.

Generally the components of electromagnetic field sensors 304 may include small coils or loops, such as a set of three differently oriented (i.e., such as orthogonally oriented relative to each other) coils coupled together within a small structure such as a cube or other container, that are positioned/oriented to capture incoming magnetic flux from the magnetic field emitted by electromagnetic field emitter 302, and by comparing currents induced through these coils, and knowing the relative positioning and orientation of the coils relative to each other, the relative position and orientation of a sensor relative to the emitter may be calculated.

As will be further described in reference to FIG. 4, one or more movement sensors such as inertial measurement units (IMUs) may be operatively coupled to each of electromagnetic field emitter 302 and electromagnetic field sensors 304 to detect the position and orientation of each component relative to each other and/or relative to a coordinate system. In one or more embodiments, multiple sensors (possibly including IMUs) may be used in relation to electromagnetic field emitter 302 and electromagnetic field sensors 304 to detect the position and orientation of each component. In some instances, the electromagnetic tracking system may provide positions in three directions (i.e., X, Y and Z directions), and further in two or three orientation angles. In some embodiments, measurements of the IMU(s) may be compared to the measurements of the coil to determine a position and orientation of the sensors. In one or more embodiments, both electromagnetic (EM) data and movement data, along with various other sources of data, such as cameras, depth sensors, and other sensors, may be combined to determine the position and orientation. This information may be transmitted (e.g., wireless communication, Bluetooth, etc.) to a processing unit 306. In some embodiments, pose (or position and orientation) may be reported at a relatively high refresh rate in conventional systems.

Conventionally an electromagnetic emitter is coupled to a relatively stable and large object, such as a table, operating table, wall, or ceiling, and one or more sensors are coupled to smaller objects, such as medical devices, handheld gaming components, or the like. Alternatively, as described below in reference to FIG. 4, various features of the electromagnetic tracking system may be employed to produce a configuration wherein changes or deltas in position and/or orientation between two objects that move in space relative to a more stable global coordinate system may be tracked; in other words, a configuration is shown in FIG. 4 wherein a variation of an electromagnetic tracking system may be utilized to track position and orientation delta between a head-mounted component and a handheld component, while head pose relative to the global coordinate system (say of the room environment local to the user) is determined otherwise, such as by simultaneous localization and mapping (SLAM) techniques using outward-capturing cameras which may be coupled to the head mounted component of the system.

Processing unit 306 may control electromagnetic field emitter 302, and may also capture data from the various electromagnetic field sensors 304. It should be appreciated that the various components of the system may be coupled to each other through any electro-mechanical or wireless/Bluetooth means. Processing unit 306 may also include data regarding the known magnetic field, and the coordinate space in relation to the magnetic field. This information is then used to detect the position and orientation of the sensors in relation to the coordinate space corresponding to the known electromagnetic field. Processing unit 306 may further be coupled to a threshold module 312 (as shown in the illustrated embodiment) or may, alternatively or additionally, include threshold module 312 as a subcomponent. In some implementations, threshold module 312 may be configured to dynamically adjust a threshold against which discrepancy values calculated by processing unit 306 may be compared for determining the pose of the handheld controller, as will be described in greater detail below.

One advantage of electromagnetic tracking systems is that they produce highly accurate tracking results with minimal latency and high resolution. Additionally, the electromagnetic tracking system does not necessarily rely on optical trackers, and sensors/objects not in the user’s line of-vision may be easily tracked. It should be appreciated that the strength of the electromagnetic field v drops as a cubic function of distance r from a coil transmitter (e.g., electromagnetic field emitter 302). Thus, processing unit 306 may be configured to execute certain functions, such as algorithms predicting a distance based on a measured strength, to determine a position and orientation of the sensor/object at varying distances away from electromagnetic field emitter 302.

Given the rapid decline of the strength of the electromagnetic field as one moves farther away from the electromagnetic emitter, best results, in terms of accuracy, efficiency and low latency, may be achieved at closer distances. In typical electromagnetic tracking systems, the electromagnetic field emitter is powered by electric current (e.g., plug-in power supply) and has sensors located within 20 ft radius away from the electromagnetic field emitter. A shorter radius between the sensors and field emitter may be more desirable in many applications, including AR applications.

FIG. 4 illustrates an example of how an electromagnetic tracking system may be incorporated with an AR system, with an electromagnetic field emitter 402 incorporated as part of a handheld controller 406. In one or more embodiments, the handheld controller may be a totem to be used in a gaming scenario. In other embodiments, the handheld controller may be a haptic device. In yet other embodiments, the electromagnetic field emitter may simply be incorporated as part of a belt pack 470. Handheld controller 406 may include a battery 410 or other power supply that powers electromagnetic field emitter 402. It should be appreciated that electromagnetic field emitter 402 may also include or be coupled to an IMU 450 or other movement sensor configured to assist in determining positioning and/or orientation of electromagnetic field emitter 402 relative to other components. This may be especially important in cases where both field emitter 402 and sensors 404 are mobile. Placing electromagnetic field emitter 402 in the handheld controller rather than the belt pack, as shown in the embodiment of FIG. 4, ensures that the electromagnetic field emitter is not competing for resources at the belt pack, but rather uses its own battery source at handheld controller 406.

In some embodiments, electromagnetic sensors 404 may be positioned relative to AR head set 458, such as placed on one or more locations on AR head set 458 and/or on or more locations on belt pack 470. Sensors placed on AR head set 458 may be placed along with other sensing devices such as one or more IMUs or additional magnetic flux capturing coils 408. For example, as shown in FIG. 4, sensors 404, 408 may be placed on either side of headset 458. Since these sensors are engineered to be rather small (and hence may be less sensitive, in some cases), having multiple sensors may improve efficiency and precision. In one or more embodiments, one or more sensors may also be placed on the belt pack 470 or any other part of the user’s body. Sensors 404, 408 may communicate wirelessly or through Bluetooth to a computing apparatus that determines a pose and orientation of the sensors (and the AR headset to which it is attached). In one or more embodiments, the computing apparatus may reside at belt pack 470. In other embodiments, the computing apparatus may reside at the headset itself, or even handheld controller 406. The computing apparatus may in turn include a mapping database 430 (e.g., passable world model, coordinate space, etc.) to detect pose, to determine the coordinates of real objects and virtual objects, and may even connect to cloud resources and the passable world model, in one or more embodiments.

In many instances, conventional electromagnetic emitters may be too bulky for AR devices. Therefore the electromagnetic field emitter may be engineered to be compact, using smaller coils compared to traditional systems. However, given that the strength of the electromagnetic field decreases as a cubic function of the distance away from the field emitter, a shorter radius between electromagnetic sensors 404 and electromagnetic field emitter 402 (e.g., about 3-3.5 ft) may reduce power consumption when compared to conventional systems. This aspect may either be utilized to prolong the life of battery 410 that may power handheld controller 406 and the electromagnetic field emitter 402, in one or more embodiments. Or, in other embodiments, this aspect may be utilized to reduce the size of the coils generating the magnetic field at electromagnetic field emitter 402. However, in order to get the same strength of magnetic field, the power may need to be increased. This allows for a compact electromagnetic field emitter unit 402 that may fit compactly at handheld controller 406.

Several other changes may be made when using the electromagnetic tracking system for AR devices, which may require a more efficient pose reporting rate than other applications. For example, movement-based or IMU-based pose tracking may be employed. In many cases, increased stability of the IMUs can lead to increased efficiency of the pose detection process. The IMUs may be engineered such that they remain stable up to 50-100 milliseconds. It should be appreciated that some embodiments may utilize an outside pose estimator module (i.e., IMUs may drift over time) that may enable pose updates to be reported at a rate of 10-20 Hz. By keeping the IMUs stable at a reasonable rate, the rate of pose updates may be decreased to 10-20 Hz (as compared to higher frequencies in conventional systems).

If the electromagnetic tracking system can be run at a 10% duty cycle (e.g., only pinging for ground truth every 100 milliseconds), this would be an additional way to save power at the AR system. This would mean that the electromagnetic tracking system wakes up every 10 milliseconds out of every 100 milliseconds to generate a pose estimate. This directly translates to power consumption savings, which may, in turn, affect size, battery life and cost of the AR device. In one or more embodiments, this reduction in duty cycle may be strategically utilized by providing two handheld controllers (not shown) rather than just one. For example, the user may be playing a game that requires two totems, etc. Or, in a multi-user game, two users may have their own totems/handheld controllers to play the game. When two controllers (e.g., symmetrical controllers for each hand) are used rather than one, the controllers may operate at offset duty cycles. The same concept may also be applied to controllers utilized by two different users playing a multiplayer game, for example.

FIG. 5 illustrates the hemisphere ambiguity problem that may be present in electromagnetic tracking systems such as those described herein. For six degree of freedom (DOF) tracking, a handheld controller 502 (labeled “TX”), which may also be referred to as a handheld device or totem, may generate EM signals modulated on three separate frequencies, one for each axis X, Y, and Z. A wearable 504 (labeled “RX”), which may be implemented as an AR headset, has an EM receiving component that is receiving EM signals on the X, Y, and Z frequencies. The position and orientation (i.e., pose) of the handheld controller can be derived based on the characteristics of the received EM signals. However, due to the symmetric nature of the EM signals, it may not be possible to determine which hemisphere the handheld controller is in (e.g., front hemisphere 506A or back hemisphere 506B) without an additional reference frame. That is, the same EM values can be obtained at the wearable for two diametrically opposed totem poses (one in each hemisphere), with a chosen plane passing through the center of a sphere dividing the two hemispheres. This is illustrated in FIG. 5, which shows that for a single snapshot of received EM signals by wearable 504, either pose can be valid. However, when handheld controller 502 is moved, the tracking algorithm will typically encounter errors if the wrong hemisphere is chosen due to inconsistency in the data from the various sensors. The hemisphere ambiguity arises in part due to the fact that when a six DOF tracking session is started, the initial EM totem data does not have an unequivocal absolute position. Instead, it provides a relative distance, which can be interpreted as one of two positions in the 3D volume that is divided into two equal spheres with the wearable (e.g., the AR headset mounted on the head of the user) centered between the two hemispheres. Thus, embodiments of the present disclosure provide methods and systems that resolve hemisphere ambiguity in order to enable successful tracking of the actual position of the handheld controller.

According to some embodiments of the present disclosure, a method that can be used to solve to the hemisphere ambiguity problem includes starting two processing stacks when the tracking pipeline is started: one associated with a reference pose in the front hemisphere and the other associated with a reference pose in the back hemisphere. While both processing stacks are running, their output is checked—the processing stack with the wrong assumption of hemisphere will quickly have problems tracking the handheld controller’s actual motion. At that point, the erroneous processing stack can be stopped (e.g., destroyed or terminated) and the remaining instance is allowed to continue processing. In some embodiments, a check is periodically performed to calculate a divergence between an estimated pose (a fusion of the electromagnetic data and positional data as determined by a movement sensor such as an IMU) and each reference pose (e.g., the reference pose in hemisphere 506A and the reference pose in hemisphere 506B). If either calculated divergence exceeds a threshold, then the corresponding instance of the tracking module for the particular estimated pose is stopped. Additional description related to these methods is provided more fully with respect to the description provided below.

FIG. 6 illustrates a method 600 for resolving hemisphere ambiguity at a system or device including one or more sensors. The system may be an electromagnetic tracking system or an optical device, such as an AR device, or any system supporting the emission and reception of electromagnetic signals or magnetic signals, among other possibilities. One or more steps of method 600 may be omitted during performance of method 600, and steps need not be performed in the order shown. One or more steps of method 600 may be performed by processing unit 306, local processing and data module 270, and/or remote processing module 272, among other possibilities.

At step 602, one or more magnetic fields may be emitted by a magnetic field emitter positioned at the handheld controller. The magnetic field emitter may generate magnetic fields by each coil generating a magnetic field in one direction (e.g., X, Y, or Z). The magnetic fields may be generated with an arbitrary waveform. In one or more embodiments, each of the axes may oscillate at a slightly different frequency. Although magnetic fields are discussed in some embodiments, this discussion of magnetic fields is not intended to limit embodiments of the present disclosure and other fields, including electric fields and electromagnetic fields are included within the scope of the present disclosure.

At step 604, the one or more magnetic fields may be detected by one or more sensors positioned within the headset or the belt pack. In some embodiments, a coordinate space corresponding to the magnetic or electromagnetic field may be determined. For example, the coordinate space around the emitter (e.g., the handheld controller) may be determined based on the detected magnetic field. In some embodiments, the behavior of the coils at the sensors (which may be attached to a known object) may be detected. For example, a current induced at the coils may be calculated. In other embodiments, a rotation of coils, or any other quantifiable behavior may be tracked and measured.

At step 606A, a first processing stack is initiated causing the first processing stack to run during a first time interval. Running the first processing stack may include performing one or more of steps 608A, 612A, and 614A. The first processing stack may run on local processing and data module 270, remote processing module 272, or on a server remote to the device. At step 606B, a second processing stack is initiated causing the second processing stack to run during a second time interval. Running the second processing stack may include performing one or more of steps 608B, 612B, and 614B. The second processing stack may run on local processing and data module 270, remote processing module 272, or on a server remote to the device. The first time interval may be simultaneous, concurrent, or nonconcurrent (i.e., non-overlapping) with the second time interval. In some embodiments, the first processing stack and the second processing stack are initialized simultaneously. In some embodiments, the first processing stack and the second processing stack are initialized sequentially.

At step 608A, the position and orientation of the handheld controller within a first hemisphere is determined based on the detected magnetic fields. At step 608B, the position and orientation of the handheld controller within a second hemisphere is determined based on the detected magnetic fields. The first hemisphere and the second hemisphere may be diametrically opposite and, in some embodiments, may correspond to the front hemisphere and the back hemisphere with respect to the headset. However, the first hemisphere and the second hemisphere may be defined in any of a variety of configurations, including front/back, above/below, left/right. In some embodiments, the interface between the hemispheres is defined by the plane having a normal that is pointing 10, 20, 30, 40, 50, 60, 70, or 80 degrees downward from a forward direction of the headset. In one example, controller 406 may consult a mapping table that correlates a behavior of the coils at the sensors to various positions or orientations. Based on these calculations, the position in the coordinate space along with the orientation of the sensors may be determined.

At step 610, movement data is detected by a movement sensor, such as an IMU sensor. The movement data may correspond to (e.g., may be indicative of) movement of the handheld controller and/or the headset. The movement data may include linear acceleration, angular rate, and/or orientation data, among other possibilities. In some embodiments, the movement data may be referred to as IMU data when detected by an IMU sensor. In some instances, the IMU sensor may be positioned within the handheld controller or the headset. For example, the IMU sensor may be mounted within the handheld controller or the headset. In some embodiments, the movement data is detected by two different IMUs, one within the handheld controller and another within the headset. In such embodiments, the movement data may be used to determine relative movement of the handheld controller with respect to the headset.

At steps 612A, 612B, one or more performance statistics associated with the first processing stack and the second processing stack are analyzed, respectively. In the illustrated embodiment, a discrepancy is calculated between the determined positions and orientations from steps 608A, 608B and a relative movement of the handheld controller with respect to the headset as determined using the movement data. In some examples, for each processing stack, multiple poses may be determined and compared to the movement of the handheld controller as indicated by the movement data. If, for example, the movement data indicates that the handheld controller is not moving (with respect to the headset) while the determined pose(s) (determined using the detected magnetic data) indicate that the handheld controller is moving, a discrepancy with a high value may be calculated.

At steps 614A, 614B, the discrepancies calculated in steps 612A, 612B, are compared to a predetermined threshold. If either discrepancy exceeds the threshold, the corresponding processing stack is aborted or terminated and the other processing stack is allowed to continue processing (e.g., continue determinations of the pose of the handheld controller in the respective hemisphere based on additional detected magnetic fields). In some embodiments, the threshold acts as an acceptable accuracy threshold such that if either discrepancy is less than the threshold, the corresponding processing stack is allowed to continue processing and the other processing stack is aborted. In some alternative embodiments, if the discrepancy is greater than the threshold, but lower than a second threshold, indicating a discrepancy of relatively small value, which can be referred to as an intermediate discrepancy, portions of the process may be repeated. For example, if this intermediate discrepancy is calculated at 612A, portions 608A, 610, 612A, and 614A may be repeated until the discrepancy exceeds the second threshold, which will result in termination of the first hemisphere processing. In some embodiments, the threshold value is dynamic, as opposed to a fixed value, and may permit a larger or smaller discrepancy depending on the estimated pose over time. Non limiting examples of dynamic thresholds include determining large changes in an estimated pose over a short time and then controller 306 (or intermediate decision block) may temporarily permit a threshold with a larger discrepancy, or a number of samples that must remain below a threshold value in a given time interval may adjust.

FIG. 7 illustrates a method 700 for resolving hemisphere ambiguity at a system or device including one or more sensors. The system may be an electromagnetic tracking system or an optical device, such as an AR device, or any system supporting the emission and reception of magnetic signals, among other possibilities. One or more steps of method 700 may be omitted during performance of method 700, and steps need not be performed in the order shown. One or more steps of method 700 may be performed by controller 306, local processing and data module 270, and/or remote processing module 272, among other possibilities. One or more steps of method 700 may correspond to one or more steps of method 600.

At step 702, one or more magnetic fields may be emitted by a magnetic field emitter positioned at the handheld controller. At step 704, the one or more magnetic fields may be detected by one or more sensors positioned within the headset or the belt pack. At step 706A, a first processing stack is initiated causing the first processing stack to run during a first time interval. At step 706B, a second processing stack is initiated causing the second processing stack to run during a second time interval. At step 708A, the position and orientation of the handheld controller within a first hemisphere is determined based on the detected magnetic fields. At step 708B, the position and orientation of the handheld controller within a second hemisphere is determined based on the detected magnetic fields. The first hemisphere and the second hemisphere may be diametrically opposite and, in some embodiments, may correspond to the front hemisphere and the back hemisphere with respect to the headset.

At step 710, movement data is detected by a movement sensor, similar to that described in reference to step 610. At steps 712A, 712B, one or more performance statistics of the first processing stack and the second processing stack are analyzed, respectively. In the illustrated embodiment, a discrepancy is calculated between the determined positions and orientations from steps 708A, 708B and a relative movement of the handheld controller with respect to the headset as determined using the movement data.

At step 714 the discrepancies calculated in steps 712A, 712B are compared to each other. If the discrepancy calculated in step 712A (i.e., the “first discrepancy”) exceeds the discrepancy calculated in step 712B (i.e., the “second discrepancy”), then the first processing stack is aborted and the second processing stack is allowed to continue processing (e.g., continue determinations of the pose of the handheld controller in the respective hemisphere based on additional detected magnetic fields). On the other hand, if the first discrepancy is less than the second discrepancy, then the second processing stack is aborted and the first processing stack is allowed to continue processing.

In some embodiments described herein, the term “EM tracking” may refer to the usage of EM waves to determine the location of an object. In some embodiments described herein, the term “pose” may refer to the (x, y, z) definition of an objects position in relation to the world pose. In some embodiments described herein, the term “world pose” may refer to the position of an object in relation to the (0, 0, 0) reference point. In some embodiments described herein, the term “head pose” may refer to the position of the head in the world. In some embodiments described herein, the term “totem pose” may refer to the position of the totem/handheld controller in the world. In some embodiments, the totem pose is computed by adding the fused pose to the head pose. In some embodiments described herein, the term “IMU pose” corresponds to the movement of the totem as measured by the IMU. In some embodiments described herein, the term “EM pose” corresponds to the movement of the totem as measured by the EM tracking. In some embodiments described herein, the term “fused pose” is calculated by factoring in the IMU pose into the EM pose along with some additional filtering. In some embodiments, simultaneous poses are run through the tracking algorithm to obtain two versions of all totem poses: EM, IMU, and fused. Based on the employed method, the appropriate pose velocity (change in position) is compared with the expected change, which is assumed depending on the conditions observed.

In some implementations, the totem pose hemisphere is established when no totem motion is detected concurrently with a changing head pose. This approach relies on there being no totem motion detected by the totem IMU and also the head pose changing at the same time instant. The premise is that if the totem pose were to be initialized in the wrong hemisphere, then a change in the head pose would result in a change in the totem pose, all in the world frame or its converse, that is, a totem pose initialized in the right hemisphere should see no change in its pose even if the head pose changes.

In some embodiments, the totem pose is calculated by combining the EM sensor measured distance between the EM emitter and receiver, and adding it to the head pose after putting the data through the necessary transformations so that the reference frames are aligned. If the totem has not moved, and the head moves away from the totem, the EM distance should change by the same absolute value to ensure that the calculated totem pose remains the same. In some embodiments, this method provides an algebraic solution where an inequality in the calculated totem pose is an indicator of incorrect hemisphere initialization. Multiple samples can be collected with the totem at rest and if all samples result in a moving totem pose in the world frame, when the head is moving, then the chosen hemisphere can be discarded as the incorrect one.

FIG. 8 illustrates results of a simulation showing the expected totem pose when the head is moving and the totem is still. The simulated data is backed up by the experimental data shown in reference to FIGS. 912 and validates embodiments of the present disclosure that are able to determine that with the head moving and the totem still, the actual totem pose will not change in the world reference frame. The fused totem pose is shown in this example since the EM totem pose tends to have significant noise and the fused pose includes processing to account for the movement data and the necessary filtering.

Referring to FIG. 8, the totem pose is unchanged in examples 0 through 4, illustrated as position 8 in the world frame. In example 0, the head pose is static at position 6 in the world frame and a ghost totem pose is present at position 4 in the world frame. Referring to example 1, the head pose shifts from position 6 to approximately position 3 in the world frame. In response to this change in head pose, although the actual totem position is static at position 8, the ghost totem pose shifts from position 4 to position −2 in the world frame. Referring to example 2, the head pose shifts from position 6 to approximately position 4 and again the actual totem position is static while the ghost totem pose shifts from position 4 to approximately position 0. Referring to example 3, the head pose shifts from position 6 to position 8 and the ghost totem pose also moves to position 8. Referring to example 4, the head pose shifts from position 6 to approximately position 9 while the ghost totem pose shifts from position 4 to approximately position 10. As can be observed in examples 0 through 4, in some embodiments, the relative distance between the head pose and each of the totem poses (actual and ghost) in the world frame may be identical over a wide range of positions of the head pose.

FIG. 9 illustrates experimental data when the head is moving and the totem is still, with the totem being initialized in the correct hemisphere. As shown in FIG. 9, the totem fused pose remains mostly unchanged in spite of head motion when the totem is initialized in the correct hemisphere, as was theorized by the actual totem pose as demonstrated by the simulated data in FIG. 8, which showed no movement of the actual totem pose across examples 0 through 4.

FIG. 10 illustrates experimental totem movement data when the totem is still, with the totem being initialized in the correct hemisphere. The data shown in FIG. 10 may correspond to the pose data shown in FIG. 9. The upper plot of FIG. 10 shows rotation (in degrees) measurements by the IMU gyros as a function of time and the lower plot of FIG. 10 shows acceleration (in m/s2) measurements by the IMU accelerometers as a function of time. As shown in FIG. 10, the IMU gyro data has some noise around the zero degrees rotation value. The IMU acceleration data shows that no translational change is seen on the X, Y, and Z axes. Since the change in acceleration is measured with respect to the gravitational acceleration, the constant offset on the Y and Z axes, even with no totem motion, is expected.

FIG. 11 illustrates experimental data when the head is moving and the totem is still, with the totem being initialized in the wrong hemisphere. The experimental data shows that the totem pose is changing with head movement even when the totem is still. It is apparent that the totem fused pose is a magnified version of the head pose when initialized in the wrong hemisphere, as was theorized by the “ghost totem pose” simulated data in FIG. 8, which showed significant movement of the ghost totem pose across examples 0 through 4.

FIG. 12 illustrates experimental totem movement data with no totem motion corresponding to the pose data displayed in FIG. 11. The upper plot of FIG. 12 shows rotation (in degrees) measurements by the IMU gyros as a function of time and the lower plot of FIG. 12 shows acceleration (in m/s2) measurements by the IMU accelerometers as a function of time. The IMU gyro data shows little noise around the zero degrees rotation value and the IMU acceleration data shows that no translational change is seen on the X, Y, and Z axes.

FIG. 13 illustrates a simplified computer system 1300 according to an embodiment described herein. Computer system 1300 as illustrated in FIG. 13 may be incorporated into devices described herein. FIG. 13 provides a schematic illustration of one embodiment of computer system 1300 that can perform some or all of the steps of the methods provided by various embodiments. It should be noted that FIG. 13 is meant only to provide a generalized illustration of various components, any or all of which may be utilized as appropriate. FIG. 13, therefore, broadly illustrates how individual system elements may be implemented in a relatively separated or relatively more integrated manner.

Computer system 1300 is shown including hardware elements that can be electrically coupled via a bus 1305, or may otherwise be in communication, as appropriate. The hardware elements may include one or more processors 1310, including without limitation one or more general-purpose processors and/or one or more special-purpose processors such as digital signal processing chips, graphics acceleration processors, and/or the like; one or more input devices 1315, which can include without limitation a mouse, a keyboard, a camera, and/or the like; and one or more output devices 1320, which can include without limitation a display device, a printer, and/or the like.

Computer system 1300 may further include and/or be in communication with one or more non-transitory storage devices 1325, which can include, without limitation, local and/or network accessible storage, and/or can include, without limitation, a disk drive, a drive array, an optical storage device, a solid-state storage device, such as a random access memory (“RAM”), and/or a read-only memory (“ROM”), which can be programmable, flash-updateable, and/or the like. Such storage devices may be configured to implement any appropriate data stores, including without limitation, various file systems, database structures, and/or the like.

Computer system 1300 might also include a communications subsystem 1319, which can include without limitation a modem, a network card (wireless or wired), an infrared communication device, a wireless communication device, and/or a chipset such as a Bluetooth™ device, an 802.11 device, a WiFi device, a WiMax device, cellular communication facilities, etc., and/or the like. The communications subsystem 1319 may include one or more input and/or output communication interfaces to permit data to be exchanged with a network such as the network described below to name one example, other computer systems, television, and/or any other devices described herein. Depending on the desired functionality and/or other implementation concerns, a portable electronic device or similar device may communicate image and/or other information via the communications subsystem 1319. In other embodiments, a portable electronic device, e.g., the first electronic device, may be incorporated into computer system 1300, e.g., an electronic device as an input device 1315. In some embodiments, computer system 1300 will further include a working memory 1335, which can include a RAM or ROM device, as described above.

Computer system 1300 also can include software elements, shown as being currently located within the working memory 1335, including an operating system 1340, device drivers, executable libraries, and/or other code, such as one or more application programs 1345, which may include computer programs provided by various embodiments, and/or may be designed to implement methods, and/or configure systems, provided by other embodiments, as described herein. Merely by way of example, one or more procedures described with respect to the methods discussed above, might be implemented as code and/or instructions executable by a computer and/or a processor within a computer; in an aspect, then, such code and/or instructions can be used to configure and/or adapt a general purpose computer or other device to perform one or more operations in accordance with the described methods.

A set of these instructions and/or code may be stored on a non-transitory computer-readable storage medium, such as the storage device(s) 1325 described above. In some cases, the storage medium might be incorporated within a computer system, such as computer system 1300. In other embodiments, the storage medium might be separate from a computer system e.g., a removable medium, such as a compact disc, and/or provided in an installation package, such that the storage medium can be used to program, configure, and/or adapt a general purpose computer with the instructions/code stored thereon. These instructions might take the form of executable code, which is executable by computer system 1300 and/or might take the form of source and/or installable code, which, upon compilation and/or installation on computer system 1300 e.g., using any of a variety of generally available compilers, installation programs, compression/decompression utilities, etc., then takes the form of executable code.

It will be apparent to those skilled in the art that substantial variations may be made in accordance with specific requirements. For example, customized hardware might also be used, and/or particular elements might be implemented in hardware, software including portable software, such as applets, etc., or both. Further, connection to other computing devices such as network input/output devices may be employed.

As mentioned above, in one aspect, some embodiments may employ a computer system such as computer system 1300 to perform methods in accordance with various embodiments of the technology. According to a set of embodiments, some or all of the procedures of such methods are performed by computer system 1300 in response to processor 1310 executing one or more sequences of one or more instructions, which might be incorporated into the operating system 1340 and/or other code, such as an application program 1345, contained in the working memory 1335. Such instructions may be read into the working memory 1335 from another computer-readable medium, such as one or more of the storage device(s) 1325. Merely by way of example, execution of the sequences of instructions contained in the working memory 1335 might cause the processor(s) 1310 to perform one or more procedures of the methods described herein. Additionally or alternatively, portions of the methods described herein may be executed through specialized hardware.

The terms “machine-readable medium” and “computer-readable medium,” as used herein, refer to any medium that participates in providing data that causes a machine to operate in a specific fashion. In an embodiment implemented using computer system 1300, various computer-readable media might be involved in providing instructions/code to processor(s) 1310 for execution and/or might be used to store and/or carry such instructions/code. In many implementations, a computer-readable medium is a physical and/or tangible storage medium. Such a medium may take the form of a non-volatile media or volatile media. Non-volatile media include, for example, optical and/or magnetic disks, such as the storage device(s) 1325. Volatile media include, without limitation, dynamic memory, such as the working memory 1335.

Common forms of physical and/or tangible computer-readable media include, for example, a floppy disk, a flexible disk, hard disk, magnetic tape, or any other magnetic medium, a CD-ROM, any other optical medium, punchcards, papertape, any other physical medium with patterns of holes, a RAM, a PROM, EPROM, a FLASH-EPROM, any other memory chip or cartridge, or any other medium from which a computer can read instructions and/or code.

Various forms of computer-readable media may be involved in carrying one or more sequences of one or more instructions to the processor(s) 1310 for execution. Merely by way of example, the instructions may initially be carried on a magnetic disk and/or optical disc of a remote computer. A remote computer might load the instructions into its dynamic memory and send the instructions as signals over a transmission medium to be received and/or executed by computer system 1300.

The communications subsystem 1319 and/or components thereof generally will receive signals, and the bus 1305 then might carry the signals and/or the data, instructions, etc. carried by the signals to the working memory 1335, from which the processor(s) 1310 retrieves and executes the instructions. The instructions received by the working memory 1335 may optionally be stored on a non-transitory storage device 1325 either before or after execution by the processor(s) 1310.

The methods, systems, and devices discussed above are examples. Various configurations may omit, substitute, or add various procedures or components as appropriate. For instance, in alternative configurations, the methods may be performed in an order different from that described, and/or various stages may be added, omitted, and/or combined. Also, features described with respect to certain configurations may be combined in various other configurations. Different aspects and elements of the configurations may be combined in a similar manner. Also, technology evolves and, thus, many of the elements are examples and do not limit the scope of the disclosure or claims.

Specific details are given in the description to provide a thorough understanding of exemplary configurations including implementations. However, configurations may be practiced without these specific details. For example, well-known circuits, processes, algorithms, structures, and techniques have been shown without unnecessary detail in order to avoid obscuring the configurations. This description provides example configurations only, and does not limit the scope, applicability, or configurations of the claims. Rather, the preceding description of the configurations will provide those skilled in the art with an enabling description for implementing described techniques. Various changes may be made in the function and arrangement of elements without departing from the spirit or scope of the disclosure.

Also, configurations may be described as a process which is depicted as a schematic flowchart or block diagram. Although each may describe the operations as a sequential process, many of the operations can be performed in parallel or concurrently. In addition, the order of the operations may be rearranged. A process may have additional steps not included in the figure.

Furthermore, examples of the methods may be implemented by hardware, software, firmware, middleware, microcode, hardware description languages, or any combination thereof. When implemented in software, firmware, middleware, or microcode, the program code or code segments to perform the necessary tasks may be stored in a non-transitory computer-readable medium such as a storage medium. Processors may perform the described tasks.

Having described several example configurations, various modifications, alternative constructions, and equivalents may be used without departing from the spirit of the disclosure. For example, the above elements may be components of a larger system, wherein other rules may take precedence over or otherwise modify the application of the technology. Also, a number of steps may be undertaken before, during, or after the above elements are considered. Accordingly, the above description does not bind the scope of the claims.

As used herein and in the appended claims, the singular forms “a”, “an”, and “the” include plural references unless the context clearly dictates otherwise. Thus, for example, reference to “a user” includes one or more of such users, and reference to “the processor” includes reference to one or more processors and equivalents thereof known to those skilled in the art, and so forth.

Also, the words “comprise”, “comprising”, “contains”, “containing”, “include”, “including”, and “includes”, when used in this specification and in the following claims, are intended to specify the presence of stated features, integers, components, or steps, but they do not preclude the presence or addition of one or more other features, integers, components, steps, acts, or groups.

It is also understood that the examples and embodiments described herein are for illustrative purposes only and that various modifications or changes in light thereof will be suggested to persons skilled in the art and are to be included within the spirit and purview of this application and scope of the appended claims.

文章《MagicLeap Patent | Method and system for resolving hemisphere ambiguity in six degree of freedom pose measurements》首发于Nweon Patent

]]>
Sony Patent | Augmented reality placement for user feedback https://patent.nweon.com/26739 Thu, 26 Jan 2023 15:52:10 +0000 https://patent.nweon.com/?p=26739 ...

文章《Sony Patent | Augmented reality placement for user feedback》首发于Nweon Patent

]]>
Patent: Augmented reality placement for user feedback

Patent PDF: 加入映维网会员获取

Publication Number: 20230021433

Publication Date: 2023-01-26

Assignee: Sony Interactive Entertainment Inc

Abstract

Methods and systems are provided for generating augmented reality (AR) scenes where the AR scenes include one or more artificial intelligence elements (AIEs) that are rendered as visual objects in the AR scenes. The method includes generating an AR scene for rendering on a display; the AR scene includes a real-world space and virtual objects projected in the real-world space. The method includes analyzing a field of view into the AR scene; the analyzing is configured to detect an action by a hand of the user when reaching into the AR scene. The method includes generating one or more AIEs rendered as virtual objects in the AR scene, each AIE is configured to provide a dynamic interface that is selectable by a gesture of the hand of the user. In one embodiment, each of the AIEs is rendered proximate to a real-world object present in the real-world space; the real-world object is located in a direction of where the hand of the user is detected to be reaching when the user makes the action by the hand.

Claims

What is claimed is:

1.A computer-implemented method, comprising: generating an augmented reality (AR) scene for rendering on a display, the AR scene includes a real-world space and virtual objects projected in the real-world space; analyzing a field of view into the AR scene, the analyzing is configured to detect an action by a hand of a user when reaching into the AR scene; and generating one or more artificial intelligence elements (AIEs) rendered as virtual objects in the AR scene, each AIE is configured to provide a dynamic interface that is selectable by a gesture of the hand of the user; wherein each of the AIEs is rendered proximate to a real-world object present in the real-world space, the real-world object is located in a direction of where the hand of the user is detected to be reaching when the user makes the action by the hand; wherein the dynamic interface provides information related to the real-world object, said information is contextually related to the real-world object, and the AIE is rendered proximate to the real-world object to highlight a feature associated with the real-world object responsive to detecting the reaching by the hand of the user toward the real-world object.

2.(canceled)

3.The computer-implemented method of claim 1, wherein said feature enables the real-world object to perform an action associated with the feature when selected by the user.

4.The computer-implemented method of claim 1, wherein the real-world object, when provided with the AIEs, enables use of the real-world object in the AR scene.

5.The computer-implemented method of claim 1, wherein the AIEs are updated when the user is detected to reach toward another real-world object in the AR scene, said updated AIEs cause the AIEs to move from an initial position in the AR scene to an adjusted position in the AR scene.

6.The computer-implemented method of claim 1, wherein additional AIEs are rendered proximate to the hand of the user when reaching into the AR scene.

7.The computer-implemented method of claim 1, wherein the gesture of the hand of the user is determined to select the dynamic interface of an AIE, the selection of the AIE triggers an update to the AIEs rendered for providing additional AIEs for selection by additional one or more gestures.

8.The computer-implemented method of claim 7, wherein the additional AIEs are related to the selected dynamic interface of the AIE.

9.The computer-implemented method of claim 1, wherein the one or more AIEs are predicted to be preferred by the user.

10.The computer-implemented method of claim 1, wherein the generated AIEs is based on learning AIEs selected by the user in one or more prior interactions with AR interactive content.

11.The computer-implemented method of claim 1, wherein the generated AIEs is based on processing a profile of the user, physical actions of the user, and contextual data through model of the user, the model configured to identify features from the profile of the user, the physical actions of the user, and the contextual data to classify attributes of the user, the attributes of the user being used to select the AIEs.

12.The computer-implemented method of claim 11, wherein the physical actions of the user include voice associated with the user, eye gaze associated with the user, body movement associated with the user, the gesture of the hand of the user, or a combination of two or more thereof.

13.The computer-implemented method of claim 1, wherein the action by the hand of the user can be detected based on eye gaze associated with the user, body movement associated with the user, or a combination of two or more thereof.

14.A system for displaying an augmented reality (AR) scene, the system comprising: an AR head mounted display (HMD), said AR HMD includes a display for rendering the AR scene, said AR scene includes a real-world space and virtual objects projected in the real-world space; and analyzing a field of view into the AR scene, the analyzing is configured to detect an action by a hand of a user when reaching into the AR scene; and a processing unit associated with the AR HMD for processing one or more artificial intelligence elements (AIEs) rendered as virtual objects in the AR scene, each AIE is configured to provide a dynamic interface that is selectable by a gesture of a hand of a user; wherein each of the AIEs is rendered proximate to a real-world object present in the real-world space, the real-world object is located in a direction of where the hand of the user is detected to be reaching when the user makes an action by the hand; wherein the dynamic interface provides information related to the real-world object, said information is contextually related to the real-world object, and the AIE is rendered proximate to the real-world object to highlight a feature associated with the real-world object responsive to detecting the reaching by the hand of the user toward the real-world object.

15.(canceled)

16.The system of claim 14, wherein said feature enables the real-world object to perform an action associated with the feature when selected by the user.

17.The system of claim 14, wherein the real-world object, when provided with the AIEs, enables use of the real-world object in the AR scene.

18.The system of claim 14, wherein the AIEs are updated when the user is detected to reach toward another real-world object in the AR scene, said updated AIEs cause the AIEs to move from an initial position in the AR scene to an adjusted position in the AR scene.

19.The system of claim 14, wherein the generated AIEs is based on processing a profile of the user, physical actions of the user, and contextual data through model of the user, the model configured to identify features from the profile of the user, the physical actions of the user, and the contextual data to classify attributes of the user, the attributes of the user being used to select the AIEs.

20.The system of claim 19, wherein the physical actions of the user include voice associated with the user, eye gaze associated with the user, body movement associated with the user, the gesture of the hand of the user, or a combination of two or more thereof.

Description

BACKGROUND1. Field of the Disclosure

The present disclosure relates generally to generating an augmented reality (AR) scene and more particularly to methods and systems for generating one or more artificial intelligence elements (AIEs) in the AR scene that are selectable by the user.

2. Description of the Related Art

Augmented reality (AR) technology has seen unprecedented growth over the years and is expected to continue growing at a compound annual growth rate. AR technology is an interactive three-dimensional (3D) experience that combines a view of the real-world with computer-generated elements (e.g., virtual objects) in real-time. In AR simulations, the real-world is infused with virtual objects and provides an interactive experience. With the rise in popularity of AR technology, various industries have implemented AR technology to enhance the customer experience. Some of the industries include, for example, video games, shopping & retail, education, entertainment, healthcare, real estate, virtual assistance, etc.

For example, a growing trend in the video game industry is to incorporate AR gaming where the AR game superimposes a pre-created environment on top of a user’s actual environment. AR gaming enhances the gaming experience of the user which keeps the games interesting since new AR scenes can be generated based on the real-world environment of the user. In another example, a growing trend is to incorporate AR technology into sophisticated tools and operations that may assist a user with various personal tasks, e.g., navigation & visual guidance, etc. Unfortunately, some users may find that current AR technology that are used in assisting users are not personalized enough to assist the user with their day-to-day tasks.

It is in this context that implementations of the disclosure arise.

SUMMARY

Implementations of the present disclosure include methods, systems, and devices relating to generating augmented reality (AR) scenes for rendering on a display. In some embodiments, methods are disclosed to enable one or more artificial intelligence elements (AIEs) to be generated and rendered as virtual objects in the AR scene where each AIE is configured to provide a dynamic interface that is selectable by a gesture of the hand of the user. For example, a user may be wearing AR goggles (e.g., AR head mounted display) and immersed in an AR environment that includes both real-world objects and virtual objects. While interacting with various AR scenes of the AR environment, the system may be configured to detect an action by a hand of the user when reaching into the AR scene. When the hand of the user is detected to be reaching toward a real-world object (e.g., video game controller, mobile phone, book, streaming device controller, etc.), the system is configured to generate one or more AIEs proximate to the real-world object that the user is reaching toward. In one embodiment, each AIE is configured to provide a dynamic interface that is selectable by the user where the dynamic interface provides information related to the real-world object. Since the AR scenes can include various real-world objects, the methods disclosed herein outline ways of generating one or more AIEs related to a real-world object to provide the user with information related to the real-world object and to use the features that are associated with the real-world object.

Thus, as a user interacts with the AR scenes in an AR environment, the actions of the user are monitored and the system is configured to detect and determine the real-world object that the hand of the user is reaching toward so that AIEs can be rendered proximate to the real-world. In this way, as the user interacts with the AR scene, real-world objects that may be of interest to the user may have AIEs rendered proximate to the real-world objects and each of the AIEs may have a corresponding dynamic interface that provides information related to the real-world object. Thus, users can find out information related to a real-world object in an AR scene seamlessly and efficiently by selecting the respective AIE using their hand. In some embodiments, the AIEs that are selected for generating in the AR scene can be based on a model and the selected AIEs can be based on the interests and preferences of the user. In one embodiment, the model can use as inputs a profile of the user; user captured physical actions, and contextual data to select which AIEs to use for generating in the AR scene.

In one embodiment, a computer-implemented method is provided. The method includes generating an augmented reality (AR) scene for rendering on a display, the AR scene includes a real-world space and virtual objects projected in the real-world space. The method includes analyzing a field of view into the AR scene; the analyzing is configured to detect an action by a hand of the user when reaching into the AR scene. The method includes generating one or more artificial intelligence elements (AIEs) rendered as virtual objects in the AR scene, each AIE is configured to provide a dynamic interface that is selectable by a gesture of the hand of the user. In one embodiment, each of the AIEs is rendered proximate to a real-world object present in the real-world space; the real-world object is located in a direction of where the hand of the user is detected to be reaching when the user makes the action by the hand. In this way, AIEs are dynamically generated and placed proximate to real-world objects to provide a dynamic interface that provides information related to the real-world object so that the user can seamlessly obtain information or use features that are associated with the real-world object.

In another embodiment, a system for displaying an AR scene is provided. The system includes an AR head mounted display (HMD); said AR HMD includes a display for rendering the AR scene. In one embodiment, the AR scene includes a real-world space and virtual objects projected in the real-world space. The system includes a processing unit associated with the AR HMD for processing one or more artificial intelligence elements (AIEs) rendered as virtual objects in the AR scene. In one embodiment, each AIE is configured to provide a dynamic interface that is selectable by a gesture of a hand of a user. In one embodiment, each of the AIEs is rendered proximate to a real-world object present in the real-world space; the real-world object is located in a direction of where the hand of the user is detected to be reaching when the user makes an action by the hand.

Other aspects and advantages of the disclosure will become apparent from the following detailed description, taken in conjunction with the accompanying drawings, illustrating by way of example the principles of the disclosure.

BRIEF DESCRIPTION OF THE DRAWINGS

The disclosure may be better understood by reference to the following description taken in conjunction with the accompanying drawings in which:

FIG. 1A illustrates an embodiment of a system for interaction with an augmented reality (AR) environment via an AR head-mounted display (HMD), in accordance with an implementation of the disclosure.

FIG. 1B illustrates an embodiment of the system for interaction with an AR environment via an AR HMD shown in FIG. 1A illustrating a plurality of real-world objects on the surface of the coffee table, in accordance with an implementation of the disclosure.

FIG. 1C illustrates an embodiment of a system for interaction with an AR environment via a mobile device, in accordance with an implementation of the disclosure.

FIGS. 2A-2C illustrate various embodiments of different AR scenes that are viewed from a field of view (FOV) of a user 100 which include real-world objects and a plurality of AIEs 104 that are rendered at various positions within the respective AR scenes, in accordance with an implementation of the disclosure.

FIG. 3 illustrates an embodiment of a user interacting with various AIEs in an AR environment which includes the user initiating gameplay of a board game, in accordance with an implementation of the disclosure.

FIG. 4 illustrates an embodiment of a user interacting with various that are rendered proximate to a universal controller in an AR environment, in accordance with an implementation of the disclosure.

FIG. 5 illustrates an embodiment of a user interacting with various that are rendered proximate to a mobile device in an AR environment, in accordance with an implementation of the disclosure.

FIG. 6 illustrates an embodiment of a method for using a model to dynamically select one or more AIEs for rendering in an AR scene using a user profile, user captured physical actions, and contextual data as inputs, in accordance with an implementation of the disclosure.

FIG. 7 illustrates an embodiment of a user captured physical actions table illustrating various physical actions that are captured while the user is interacting with the AR environment, in accordance with an implementation of the disclosure.

FIG. 8 illustrates a method for generating an AR scene for a user and generating one or more AIEs for rendering in the AR scene, in accordance with an implementation of the disclosure.

FIG. 9 illustrates components of an example device that can be used to perform aspects of the various embodiments of the present disclosure.

DETAILED DESCRIPTION

The following implementations of the present disclosure provide methods, systems, and devices for generating augmented reality (AR) scenes for a user of an AR head mounted display (HMD) and generating one or more artificial intelligence elements (AIEs) that are rendered as visual objects in the AR scenes. In one embodiment, each generated AIE is configured to provide a dynamic interface that is selectable by the user where the dynamic interface provides information related to the real-world object. In particular, while a user is interacting with an AR scene, one or more AIEs are generated and rendered proximate the hand of the user and to the real-world objects that are present in the real-world space. As used herein, the term “AIEs” should be broadly understood to refer to rendered virtual objects in the AR scene where each AIE is configured to provide a dynamic interface that is selectable by a gesture of the hand of the user. In one embodiment, the dynamic interface associated with the corresponding AIE may provide information related to the real-world objects that are present in the real-world space of the user. Accordingly, as the AIEs are dynamically generated and placed proximate to the hand of the user and proximate to real-world objects, this enables an enhanced and improved AR experience for a user since the AIEs provide information related to the real-world object and facilitates a way for the user to use the features that are associated with the real-world object. This allows the user to quickly obtain information related to the real-world object quickly and efficiently since the information can be accessible by selecting the AIE. In turn, this can enhance the AR experience for users who may want a seamless way to navigate an AR environment while quickly accessing information related to a real-world object.

By way of example, in one embodiment, a method is disclosed that enables generating AR scenes and generating one or more AIEs that are rendered as virtual objects proximate to real-world objects in the AR scenes. The method includes generating an augmented reality AR scene for rendering on a display, the AR scene includes a real-world space and virtual objects projected in the real-world space. In one embodiment, the method may further include analyzing a field of view into the AR scene. In one example, the analyzing is configured to detect an action by a hand of the user when reaching into the AR scene. In another embodiment, the method may include generating one or more AIEs rendered as virtual objects in the AR scene. In one example, each AIE is configured to provide a dynamic interface that is selectable by a gesture of the hand of the user. In another example, each of the AIEs is rendered proximate to a real-world object present in the real-world space. In one embodiment, the real-world object is located in a direction of where the hand of the user is detected to be reaching when the user makes the action by the hand. It will be obvious, however, to one skilled in the art that the present disclosure may be practiced without some or all of the specific details presently described. In other instances, well known process operations have not been described in detail in order not to unnecessarily obscure the present disclosure.

In accordance with one embodiment, a system is disclosed for generating AR scenes and generating one or more AIEs for rendering as virtual objects proximate to real-world objects in the AR scenes. For example, a user may be using AR goggles to interact in an AR environment which includes various AR scenes generated by a cloud computing and gaming system. While viewing and interacting with the AR scenes through the display of the AR goggles, the system is configured to analyze the field of view (FOV) into the AR scene and detect an action by a hand of the user when reaching into the AR scene.

In one embodiment, the system is configured to identify which one of the real-world objects in the AR scene the user is reaching toward for rendering the corresponding AIEs proximate to the real-world object. In one embodiment, the selection of AIEs for rendering in the AR scene is selected based on a model. In some embodiments, the model may be able to receive as inputs a profile of the user; user captured physical actions, and contextual data. In other embodiments, each AIE is configured to provide a dynamic interface which can provide information related to its corresponding real-world object. In other embodiments, each AIE is configured to provide a dynamic interface which can allow the user to use the features that are associated with the real-world object. In this way, users are provided with seamless and efficient way of accessing information related to the real-world objects since the information can be accessed quickly by viewing or selecting the dynamic interface associated with the AIE.

With the above overview in mind, the following provides several example figures to facilitate understanding of the example embodiments.

FIG. 1A illustrates an embodiment of a system for interaction with an augmented reality (AR) environment via an AR head-mounted display (HMD) 102, in accordance with implementations of the disclosure. As used herein, the term “augmented reality” (AR) generally refers to user interaction with an AR environment where a real-world environment is enhanced by computer-generated perceptual information (e.g., virtual objects). An AR environment may include both real-world objects and virtual objects where the virtual objects are overlaid onto the real-world environment to enhance the experience of the user 100. In one embodiment, the AR scenes of an AR environment can be viewed through a display of a device such as an AR HMD, mobile phone, or any other device in a manner that is responsive in real-time to the movements of the AR HMD (as controlled by the user) to provide the sensation to the user of being in the AR environment. For example, the user may see a three-dimensional (3D) view of the AR environment when facing in a given direction, and when the user turns to a side and thereby turns the AR HMD likewise, and then the view to that side in the AR environment is rendered on the AR HMD.

As illustrated in FIG. 1A, a user 100 is shown physically located in a real-world space 105 wearing an AR HMD 102 to interact with an AIE 104 that is rendered in an AR scene of the AR environment. In one embodiment, the AR HMD 102 is worn in a manner similar to glasses, goggles, or a helmet, and is configured to display AR scenes, video game content, or other content to the user 100. The AR HMD 102 provides a very immersive experience to the user by virtue of its provision of display mechanisms in close proximity to the user’s eyes. Thus, the AR HMD 102 can provide display regions to each of the user’s eyes which occupy large portions or even the entirety of the field of view of the user, and may also provide viewing with three-dimensional depth and perspective.

In some embodiments, the AR HMD 102 may include an externally facing camera that is configured to capture images of the real-world space of the user 100 such as real-world objects that may be located in the real-world space 105 of the user. In some embodiments, the images captured by the externally facing camera can be analyzed to determine the location/orientation of the real-world objects relative to the AR HMD 102. Using the known location/orientation of the AR HMD 102, the real-world objects, and inertial sensor data from the AR HMD, the physical actions and movements of the user can be continuously monitored and tracked during the user’s interaction.

In some embodiments, the AR HMD 102 may provide a user with a field of view (FOV) 118 into the AR scene. Accordingly, as the user 100 turns their head and looks toward different regions within the real-world space 105, the AR scene is updated to include any additional virtual objects and real-world objects that may be within the FOV 118 of the user 100. In one embodiment, the AR HMD 102 may include a gaze tracking camera that is configured to capture images of the eyes of the user 100 to determine the gaze direction 116 of the user 100 and the specific virtual objects or real-world objects that the user 100 is focused on. Accordingly, based on the FOV 118 and the gaze direction 116 of the user 100, the system may detect specific actions by a hand of the user when reaching into the AR scene, e.g., user reaching toward a mobile phone that is lying on the coffee table.

As illustrated in the example shown in FIG. 1A, the gaze direction 116 of the user 100 is focused on a generated AIE 104 (e.g., Start AR Session?) that is selectable by the user. In the illustrated example, the AR scene may include the generated AIE 104 that is rendered as a virtual object in the AR scene and various real-world objects (e.g., coffee table 120, computer 108, display 110) since the real-world objects are within the FOV 118 of the user. In some embodiments, the AIE 104 is configured to provide a dynamic interface that is selectable by a gesture of the hand of the user 100. In the illustrated example, if the user 100 touches the dynamic interface, an AR session may initiate and the user 100 can proceed and explore other features provided by the AR environment. In some embodiments, the user may select the dynamic interface of an AIE by various physical actions such as pointing, tapping, touching, eye gaze, voice command, etc.

In other implementations, the AR HMD 102 can be wirelessly connected to a computer 108. The computer 108 can be any general or special purpose computer known in the art, including but not limited to, a gaming console, personal computer, laptop, tablet computer, mobile device, cellular phone, tablet, thin client, set-top box, media streaming device, etc. In some implementations, the computer 108 can be configured to execute a video game, and output the video and audio from the video game for rendering by the AR HMD 102. In some implementations, the computer 108 is configured to execute any other type of interactive application (e.g., AR scenes) that provides a virtual space/environment that can be viewed through an AR HMD. In some implementations, the AR HMD 102 may also communicate with the computer through alternative mechanisms or channels, such as via a network 112 to which both the AR HMD 102 and the computer 108 are connected via a modem and router.

In the illustrated implementation, the AR HMD 102 is wirelessly connected to a cloud computing and gaming system 114 over a network 112. In one embodiment, the cloud computing and gaming system 114 maintains and executes the AR scenes and video game being played by the user 100. In some embodiments, the cloud computing and gaming system 114 is configured to receive inputs from the AR HMD 102 over the network 112. The cloud computing and gaming system 114 is configured to process the inputs to affect the state of the AR scenes of the AR environment. The output from the executing AR scenes, such as virtual objects, real-world objects, video data, audio data, and user interaction data, is transmitted to the AR HMD 102. In other implementations, the AR HMD 102 may communicate with the cloud computing and gaming system 114 wirelessly through alternative mechanisms or channels such as a cellular network.

FIG. 1B illustrates an embodiment of the system for interaction with an augmented reality (AR) environment via an AR HMD 102 shown in FIG. 1A illustrating a plurality of real-world objects 106a106n on the surface of the coffee table 120. As illustrated, the user 100 is shown physically located in a real-world space 105 wearing an AR HMD 102 while interacting with AR scenes of the AR environment. As shown in the figure, various real-world objects 106a106n are located on the surface of the coffee table 120. In one embodiment, the real-world objects 106a106n may be included in the generated AR scenes since the real-world objects are located within the FOV 118 of the user 100. In particular, real-world object 106a is a “video game controller,” real-world object 106b is a “mobile device,” real-world objects 106c are “books,” and real-world object 106n is a “knight chess piece.” As further illustrated in FIG. 1B, the system is configured to detect that the hand of the user 100 is reaching toward the real-world object 106a (e.g., video game controller). Further, the gaze direction 116 of the user is directed toward the real-world object 106a (e.g., video game controller). Accordingly, in one embodiment, based on the physical actions of the user 100 (e.g., reaching toward video game controller and gaze direction directed at video game controller), the system may generate one or more AIEs 104a104n and render the AEIs proximate to the real-world object 106a (e.g., video game controller) to highlight and emphasize the real-world object 106a. As a result, the AIEs 104a104n may correspond to the real-world object 106a and may provide information that is contextual related to the real-world object 106a.

As illustrated in the example shown in FIG. 1B, the AIEs 104a104n is rendered as virtual objects in the AR scene. In some embodiments, each AIE is configured to provide a dynamic interface that is selectable by a gesture of the user or by the hand of the user. For example, a gesture of the user may include pointing at the AIE, looking at the AIE, providing a voice command prompt, touching the AIE, etc. In one embodiment, the dynamic interface associated with the corresponding AIE 104 may provide information related to the real-world object and the information may be contextually related to the real-world object. In other embodiments, the dynamic interface associated with the corresponding AIE 104 may enable a way for the user to use the features that are associated with the real-world object such as performing specific functions and features associated with the real-world object. For example, referring to FIG. 1B, AIE 104a may provide information describing the real-world object 106a, e.g., PS4 video game controller. In another example, AIE 104b may automatically launch the user’s favorite video game by turning on the computer 108 (e.g., game console) and displaying the video game on the display screen 110. In yet another example, AIE 104n may automatically launch and display the PlayStation Store on the display screen 110.

In some embodiments, the system in FIG. 1B may include an AR HMD 102 is wirelessly connected to a cloud computing and gaming system 114 over a network 112. As noted above, the cloud computing and gaming system 114 maintains and executes the AR scenes and video game being played by the user 100. The cloud computing and gaming system 114 is configured to process the inputs from the AR HMD 102 to affect the state of the AR scenes of the AR environment. The output from the executing AR scenes, such as virtual objects, real-world objects, video data, and audio data, is transmitted to the AR HMD 102 for rendering on the display of the AR HMD 102.

FIG. 1C illustrates an embodiment of a system for interaction with an AR environment via a mobile device 106b. In some embodiments, non-HMDs may be substituted, including without limitation, portable device screens (e.g., tablet, smartphone, laptop, etc.) or any other type of display that can be configured to render video and/or provide for display of an interactive scene or virtual environment in accordance with the present implementations. For example, as illustrated in FIG. 1C a user 100 is shown physically located in a real-world space 105 using a mobile device 106b to view AR scenes. In the illustrated example, the display of the mobile device 106b illustrates AR scene which includes a real-world object 106b (e.g., knight chess piece) and a virtual object 124 (e.g., virtual chess board). In one embodiment, the virtual object 124 (e.g., virtual chess board) is generated in response to a selection of an AIE that enables the system to generate the virtual chess board. Accordingly, in one example, the real-world object 106b (e.g., knight chess piece), when provided with AIEs, enables use of the real-world object in the AR scene with the generated virtual chess board.

In some embodiments, as shown in FIG. 1C, the mobile device 106b is wirelessly connected to a cloud computing and gaming system 114 over a network 112. In one embodiment, the cloud computing and gaming system 114 maintains and executes the AR scenes and video game being played by the user 100. The cloud computing and gaming system 114 is configured to process the inputs from the mobile device 106b to affect the state of the AR scenes of the AR environment. The output from the executing AR scenes, such as virtual objects, real-world objects, video data, and audio data, is transmitted to the mobile device 106b for rendering on the display of the mobile device 106b.

FIGS. 2A-2C illustrate various embodiments of different AR scenes that are viewed from a FOV of a user 100 which include real-world objects 106a106n and a plurality of AIEs 104 (e.g., AIE-1 to AIE-n) that are rendered at various positions within the respective AR scenes. In the illustrated example shown in FIG. 2A, the AR scene illustrates various real-world objects 106a106n on the surface of the coffee table 120 and the generated AIEs 104 (e.g., AIE-1 to AIE-n) at various locations within the AR scene. During the user’s interaction with the AR scene, the externally facing camera located on the AR HMD 102 is configured to capture images of the real-world objects 106a106n within the FOV of the user and to track the actions of the user 100. For example, as shown in FIG. 2A, the externally facing camera can be used to identify that real-world object 106a which corresponds to a “video game controller,” real-world object 106b which corresponds to a “mobile device,” real-world objects 106c which correspond to “books,” and real-world object 106n which corresponds to a “knight chess piece.” In some embodiments, the externally facing camera can detect that the hand of the user 100 is reaching into the AR scene in a direction toward real-world object 106a (e.g., video game controller).

In one embodiment, as the user interacts with the AR scene, a plurality of AIEs 104 can be generated along the hand 202 of the user 100 and proximate to the real-world object 106 that the hand of the user 100 is reaching toward. As shown in FIG. 2A, the AIEs 104 (e.g., AIE-1 to AIE-n) is rendered at different locations within the AR scene. In particular, a total of three AIEs 104 (e.g., AIE-1, AIE-2, AIE-3) that correspond real-world object 106a (e.g., video game controller) are rendered proximate to the video game controller since the hand 202 of the user 100 is reaching in a direction toward the real-world object 106a (e.g., video game controller). As illustrated, the three AIEs 104 (e.g., AIE-1, AIE-2, AIE-3) forms an AIE group 204a which is shaped in a triangular formation and surrounds the video game controller to highlight and emphasize the video game controller. In other embodiments, the AIE group 204a may form any other shape, e.g., circle, square, rectangle, oval, etc.

In one embodiment, when an AIE group 204 surrounds a real-world object 106, this may indicate that the corresponding AIEs are contextually related to the real-world object and the AIEs can provide information and enable selectable features that corresponds to the real-world object 106. For example, AIE-1 may provide information indicating that the real-world object 106a is a PS4 video game controller. AIE-2 may include a selectable feature associated with the PS4 video game controller that allows the user 100 to initiate an action associated with the video game controller such as automatically launching the user’s favorite video game. AIE-3 may include a selectable feature that automatically launches and displays the PlayStation Store on the display screen of the user.

As further illustrated in the AR scene shown in FIG. 2A, AIE groups 204b204n are rendered proximate to the hand and arm of the user 100. As shown, AIE group 204b forms a triangular shape which includes three AIEs 104 (e.g., AIE-4, AIE-5, AIE-6). Further, AIE group 204n forms a triangular shape which includes three AIEs 104 (e.g., AIE-7, AIE-8, AIE-n). In other embodiments, AIE groups 204b204n may form any other shape, e.g., circle, square, rectangle, etc. As illustrated, neither AIE group 204b204n surrounds a particular real-world object 106 since the AIE groups 204b204n and its corresponding AIEs 106 are not relevant to the real-world object 106 that the hand 202 of the user is reaching toward, e.g., video game controller. Instead, the non-relevant AIEs are rendered proximate to the hand and arm of the user and the position of the non-relevant AIEs are updated to a position proximate the real-object when the hand 202 of the user reaches toward a real-object that relates to the non-relevant AIEs.

FIG. 2B illustrates an embodiment of AR scene illustrating the hand 202 of the user reaching toward another real-world object 106n (e.g., knight chess piece) in the AR scene. In some embodiments, the system is configured to detect an action by a hand 202 of the user 100 when reaching into the AR scene. As illustrated in the AR scene, the hand 202 of the user is directed toward a different real-world object 106n (e.g., knight chess piece) instead of the real-world object 106a (e.g., video game controller) shown in FIG. 2A. Since the hand 202 of the user is directed toward another real-world object in the AR scene, the system is configured to update the position of the AIEs such that the updated AIEs causes the AIEs to move from an initial position in the AR scene to an adjusted position in the AR scene.

For example, the system may determine that the hand 202 of the user 100 is directed toward the real-world object 106n (e.g., knight chess piece) and AIE group 204b its three AIEs 104 (e.g., AIE-4, AIE-5, AIE-6) are related to the real-world object 106n (e.g., knight chess piece). Accordingly, the system is configured to render the AIE group 204b at a position that is proximate to the real-world object 106n (e.g., knight chess piece) to highlight and emphasize the knight chess piece. In one example, AIE-4 may provide information indicating that the real-world object 106n is a knight chess piece and is used for playing chess. AIE-5 may include a selectable feature that sends a request to the system to automatically launch and display a virtual representation of a chess board. AIE-6 may include a selectable feature that sends a request to a friend of the user to join the user 100 to play a game of chess.

As further illustrated in FIG. 2B, AIE groups 204a and 204n are rendered proximate to the hand and arm of the user 100 since the corresponding AIEs are not relevant to the real-world object 106n (e.g., knight chess piece). For example, in one embodiment, AIE group 204a and its corresponding AIEs moved from a position proximate to the real-world object 106a (e.g., video game controller) to a position proximate to the hand and arm of the user since the hand 202 of the user 100 is no longer reaching in a direction toward the real-world object 106a (e.g., video game controller).

FIG. 2C illustrates another embodiment of AR scene illustrating the hand 202 of the user reaching toward the real-world object 106n (e.g., knight chess piece) and the position of AIE groups 204a204n within the AR scene. Based on a prior AR interaction, with respect to FIG. 2A, the system previously determined that the AIE group 204a its corresponding AIEs (e.g., AIE-1, AIE-2, AIE-3) correspond to the real-world object 106a (e.g., video game controller). Accordingly, the AIE group 204a was rendered proximate to the video game controller since the hand 202 of the user 100 was reaching in a direction toward the real-world object 106a (e.g., video game controller). Referring to the embodiment shown in FIG. 2C, the hand 202 of the user 100 is shown reaching in a direction toward a different real-world object 106n (e.g., knight chess piece) where AIE group 204b is rendered proximate to the real-world object 106n (e.g., knight chess piece) to highlight and emphasize the knight chess piece. Accordingly, instead of moving the AIE group 204a and its corresponding AIEs to a different position within the AR scene, the AIE group 204a automatically stays at a position proximate to the real-world object 106a (e.g., video game controller) since the system has previously associated the AIE group 204a with the real-world object 106a (e.g., video game controller). As a result, over time, based on prior AR scene interactions, the system will be trained to learn which AIEs are related to the appropriate real-world objects in the AR scene so that the AIEs can be rendered at a position proximate to the real-world objects.

In some embodiments, the plurality of AIEs 104 can be selected and generated based on processing a profile of the user, physical actions of the user, and contextual data through model of the user. In one embodiment, the model is configured to identify features from the profile of the user, the physical actions of the user, and the contextual data to classify attributes of the user, the attributes of the user being used to select the AIEs. In one embodiment, the system may use the model to select and generate AIEs 104 that are predicted to be preferred by the user. For example, during the user’s interaction with an AR environment, the system may determine that the user is reaching toward real-world object 106n (e.g., knight chess piece). Using the model, system may select and generate AIEs 104 that will allow the user to play chess on a virtual chess board since the user has played chess using a virtual chess board in prior interactions in the AR environment.

FIG. 3 illustrates an embodiment of a user 100 interacting with various AIEs 104a104n in an AR environment which includes the user 100 initiating gameplay of a board game, e.g., chess. Referring to FIG. 2C, when the user 100 selects the dynamic interface associated with AIE-5, a request is sent to the system to automatically launch the virtual chess board 304 shown in FIG. 3. As illustrated in the embodiment of FIG. 3, the hand 202 of the user 100 is shown interacting and selecting various AIEs 104a104n that are rendered within the AR scenes 302a302n.

In one embodiment, as shown in AR scene 302a, the AR scene includes a total of three AIEs 104a104c that are related to the real-world object 106n (e.g., knight chess piece) and the virtual chess board 304. In some embodiments, the AIEs 104a104c may provide information related to the real-world object 106n (e.g., knight chess piece) or the virtual chess board 304. In other embodiments, the dynamic interface associated with the AIEs 104a104c may initiate and perform specific functions and features associated with the knight chess piece or the virtual chess board 304. For example, as illustrated in the AR scene 302a, AIE 104a may allow the user 100 to initiate a “new game” where the user 100 can use the knight chess piece and generated virtual chess pieces to play chess on the virtual chess board 304. In another example, AIE 104b may automatically send a request to “invite friends” to join the user 100 in the gameplay. In another example, AIE 104c may automatically “resume game” of the user’s previous gameplay. In the illustrated example, the user 100 is shown selecting AIE 104c (e.g., resume game) which initiates a previous chess game that the user 100 participated in.

As shown in AR scene 302b, the AR scene includes one or more updates to the AIEs after the user 100 selects the dynamic interface corresponding to AIE 104c. In one embodiment, when the gesture of the hand 202 of the user is determined to select the dynamic interface of an AIE, the selection of the AIE triggers an update to the AIEs rendered for providing additional AIEs for selection by additional one or more gestures. As illustrated in AR scene 302b, the additional AIEs 104e104f forms a rectangular shape and surrounds the real-world object 106n (e.g., knight chess piece) and the virtual chess board 304. The AR scene includes a total of four updated AIEs 104e104f that are related to the selected dynamic interface corresponding to AIE 104c (e.g., resume game), real-world object 106n (e.g., knight chess piece), and the virtual chess board 304. For example, as illustrated in the AR scene 302b, AIE 104d may allow the user 100 to resume the chess board game with a particular player, e.g., “with Jane.” In another example, AIE 104e may show a listing of prior moves made by the user 100 in a particular game, e.g., “show moves.” In another example, AIE 104f may allow the user 100 to select a particular prior game session to resume, e.g., “select session.” In another example, AIE 104g may allow the user 100 to resume the chess board game with a particular player, e.g., “with Bob.” In the illustrated example shown in AR scene 302b, the user 100 is shown selecting AIE 104g (e.g., with Bob) which initiates a previous chess game with Bob.

As shown in AR scene 302c, the AR scene includes one or more updates to the AIEs after the user 100 selects the dynamic interface corresponding to AIE 104g. As illustrated in the AR scene 302c, AIEs 104h104j forms a triangular shape and surrounds the real-world object 106n (e.g., knight chess piece) and the virtual chess board 304. The AR scene includes a total of three updated AIEs 104e104f which are related to the selected dynamic interface corresponding to AIE 104g (e.g., w/Bob), real-world object 106n (e.g., knight chess piece) and the virtual chess board 304. For example, as illustrated in AR scene 302c, AIE 104h may allow the user 100 to resume the chess board game with Bob from a particular game session, e.g., “Last Night’s Game.” In another example, AIE 104i may initiate sending a message to Bob, e.g., “Message Bob.” In another example, AIE 104j may allow the user 100 to access Bob’s calendar to see when Bob is available for a game session, e.g., “See Bob’s Calendar.” In the illustrated example shown in the AR scene 302c, the user 100 is shown selecting AIE 104h which initiates last night’s game session with Bob.

As shown in AR scene 302n, the AR scene includes one or more updates to the AIEs after the user 100 selects the dynamic interface corresponding to AIE 104h. As illustrated in the AR scene 302n, the AIEs 104k104n forms an oval shape and surrounds the real-world object 106n (e.g., knight chess piece) and the virtual chess board 304. The AR scene includes a total of two updated AIEs 104k104n which are related to the selected dynamic interface corresponding to AIE 104h, the real-world object 106n (e.g., knight chess piece), and the virtual chess board 304. For example, as illustrated in the AR scene 302n, AIE 104k may send a request to the system indicating that the user is waiting for Bob to join the game session, e.g., “Waiting for Bob.” In another example, AIE 104n may allow the user to exit out of the AR scene, e.g., “Exit.” In the illustrated example shown in the AR scene 302n, the user 100 is shown selecting AIE 104k to notify the system that the user 100 is waiting for Bob to join the game session.

FIG. 4 illustrates an embodiment of a user 100 interacting with various AIEs 104a104n that are rendered proximate to a universal controller 404 in an AR environment. As illustrated, the hand 202 of the user 100 is shown interacting and selecting various AIEs 104a104n that are rendered within the AR scenes 402a402n. In one embodiment, as shown in AR scene 402a, the AR scene includes a total of four AIEs 104a104d that are related to the universal controller 404. In some embodiments, the universal controller 404 may be a real-world object or a virtual object. The AIEs 104a104n may provide information related to the universal controller 404. In other embodiments, the AIEs 104a104n may initiate and perform specific functions and features associated with the universal controller 404.

For example, as illustrated in the AR scene 402a, selecting AIE 104a may turn on a cable box so that the user can watch cable television, e.g., “Cable.” In another example, selecting AIE 104b may automatically turn on a streaming device so that the user can view content via the streaming device, e.g., “Streaming.” In another example, selecting AIE 104c may automatically turn on a television of the user, e.g., “TV.” In another example, selecting AIE 104d may automatically turn on a radio of the user, e.g., “Radio.” In the illustrated example, the user 100 is shown selecting AIE 104b which initiates the streaming device of the user so that the user can select specific content to view.

As shown in AR scene 402b, the AR scene includes one or more updates to the AIEs after the user 100 selects the dynamic interface corresponding to AIE 104b. As illustrated in the AR scene 402b, the AIEs 104e104g forms a rectangular shape which surrounds the universal controller 404. The AR scene includes a total of three updated AIEs 104e104g which are related to the selected dynamic interface and the universal controller 404. For example, as illustrated in the AR scene 402b, selecting AIE 104e may allow the user 100 to access news content from the streaming device, e.g., “news.” In another example, selecting AIE 104f may allow the user 100 to access sports content from the streaming device, e.g., “sports.” In another example, selecting AIE 104g may allow the user 100 to access movie content from the streaming device, e.g., “movie.” In the illustrated example shown in the AR scene 402b, the user 100 is shown selecting AIE 104f (e.g., Sports) which allows access to a library of sports channels that are available through the streaming device.

As shown in AR scene 402c, the AR scene includes one or more updates to the AIEs after the user 100 selects the dynamic interface corresponding to AIE 104f (e.g., Sports). As illustrated in the AR scene 402c, the AIEs 104h104j forms a triangular shape which surrounds the universal controller 404. The AR scene includes a total of three updated AIEs 104h104j which are related to the selected dynamic interface and the universal controller 404. For example, as illustrated in the AR scene 402c, selecting AIE 104h may provide the user 100 with access to baseball related content from streaming device, e.g., “baseball.” In another example, selecting AIE 104i may allow the user 100 to access golf related content from the streaming device, e.g., “golf.” In another example, selecting AIE 104j may allow the user 100 to access basketball related content from the streaming device, e.g., “basketball.” In the illustrated example shown in the AR scene 402c, the user 100 is shown selecting AIE 104h (e.g., Baseball) which provides access to a library of baseball related content that is available through the streaming device.

As shown in AR scene 402n, the AR scene includes one or more updates to the AIEs after the user 100 selects the dynamic interface corresponding to AIE 104h (e.g., Baseball). As illustrated in the AR scene 402n, the AIEs 104k104n forms an oval shape which surrounds the universal controller 404. The AR scene includes a total of four updated AIEs 104k104n which are related to the selected dynamic interface and the universal controller 404. For example, as illustrated in the AR scene 402n, AIE 104k may allow the user 100 to access baseball content related to the Dodgers baseball team, e.g., Dodgers. In another example, AIE 1041 may allow the user 100 to access baseball content related to the Yankees baseball team, e.g., Yankees. In another example, AIE 104m may allow the user 100 to access baseball content related to the A’s baseball team, e.g., A’s. In another example, AIE 104n may allow the user 100 to access baseball content related to the Giants baseball team, e.g., Giants. In the illustrated example shown in the AR scene 402n, the user 100 is shown selecting AIE 104k (e.g., Dodgers) which provides access baseball content related to the Dodgers.

FIG. 5 illustrates an embodiment of a user 100 interacting with various AIEs 104a104n that are rendered proximate to a mobile device 106b in an AR environment. As illustrated, the hand 202 of the user 100 is shown selecting various AIEs 104a104n that are rendered within the AR scenes 502a502n. In one embodiment, as shown in AR scene 502a, the AR scene includes a total of five AIEs 104a104e that are related to the selected dynamic interface and the mobile device 106b. In some embodiments, the mobile device 106b may be a real-world object or a virtual object. The AIEs may provide information related to the mobile device 106b. In other embodiments, the AIEs may enable the user to initiate and perform specific functions and features associated with the mobile device 106b.

For example, as illustrated in the AR scene 502a, selecting AIE 104a may provide the user with access to various social media platforms, e.g., “Social Media.” In another example, selecting AIE 104b may automatically initiate a call to the user’s mom, e.g., “Call Mom.” In another example, selecting AIE 104c may automatically initiate an order for pizza, e.g., “Order Pizza.” In another example, selecting AIE 104d may automatically initiate a text message using the mobile device 106b, e.g., “Send Text.” In the illustrated example, the user 100 is shown selecting AIE 104c which initiates an order for pizza using the mobile device 106b.

As shown in AR scene 502b, the AR scene includes one or more updates to the AIEs after the user 100 selects the dynamic interface corresponding to AIE 104c (e.g., Order Pizza). As illustrated in the AR scene 502b, the AIEs 104f104i forms a rectangular shape which surrounds the mobile device 106b. The AR scene includes a total of four updated AIEs 104e104g which are related to the selected dynamic interface and the mobile device 106b. For example, as illustrated in the AR scene 502b, selecting AIE 104f may initiate an order for pizza at Mario’s Pizza. In another example, selecting AIE 104g may initiate an order for pizza at Pizza Cart. In another example, selecting AIE 104h may initiate an order for pizza at Big Mikes. In another example, selecting AIE 104i may initiate an order for pizza at Gino’s Pizza. In the illustrated example shown in AR scene 502b, the user 100 is shown selecting AIE 104f (e.g., Mario’s Pizza) which initiates an order for pizza at Mario’s Pizza.

As shown in AR scene 502c, the AR scene includes one or more updates to the AIEs after the user 100 selects the dynamic interface corresponding to AIE 104f (e.g., Mario’s Pizza). As shown, the AIEs 104h104j forms a triangular shape which surrounds the mobile device 106b. The AR scene includes a total of three updated AIEs 104h104j which are related to the selected dynamic interface and the mobile device 106b. For example, as illustrated in the AR scene 502c, selecting AIE 104h may allow the user to place the same order as a previous order made by the user, e.g., Same Order as Previous. In another example, selecting AIE 104i may allow the user 100 to build a new pizza from scratch, e.g., Build Pizza. In another example, selecting AIE 104j may allow the user 100 to view the menu at Mario’s Pizza, e.g., Menu. In the illustrated example shown in the AR scene 502c, the user 100 is shown selecting AIE 104h which automatically places an order for the same pizza that the user had previously order.

As shown in AR scene 502n, the AR scene includes one or more updates to the AIEs after the user 100 selects the dynamic interface corresponding to AIE 104h (e.g., Same Order as Previous). As illustrated in the AR scene 502n, the AIEs 104k104n forms an oval shape which surrounds the mobile device 106b. The AR scene includes a total of two updated AIEs 104k104n which are related to ordering pizza via the mobile device 106b. For example, as illustrated in the AR scene 502n, AIE 104k may request the pizza to be delivered at 9 PM.

FIG. 6 illustrates an embodiment of a method for using a model 620 to dynamically select one or more AIEs 104 for rendering in an AR scene using a user profile 602, user captured physical actions 604, and contextual data 606 as inputs. As noted above, each AIE 104 is placed in a position that is proximate to the hand of the user and proximate to the real-world objects. In this way, this facilitates an enhanced and improved AR experience for the user since the AIEs may provide information related to the real-world object. In turn, this can enhance the AR experience for users who may want a seamless way to quickly access information related to a real-world object.

As shown in FIG. 6, in one embodiment, the system may include feature extraction operations (e.g., 608, 610, 612) that are configured to identify various features in the user profile 602, the user captured physical actions 604, and the contextual data 606. After the feature extraction operations identifies the features associated with the inputs, classifier operations (e.g., 614, 616, 618) may be configured to classify the features using one or more classifiers. In some embodiments, the system includes a model 620 of the user that is configured to receive the classified features from the classifier operations. Using the classified features, the model 620 can be used to select AIEs 104 for rendering in an AR scene and to determine which real-world objects each one of the selected AIEs correspond to. In some embodiments, operation 622 can use the model 620 to determine which AIEs 104 to select for rendering, the corresponding real-world objects for each AIE, and the location to render the AIEs within the AR scene.

In another embodiment, a cloud computing and gaming system 114 located at server 628 may receive the selected AIEs 104 from operation 622 for processing. In some embodiments, the cloud computing and gaming system 114 may work together with an AR user interaction processor 630 to process the selected AIEs 104 to generate AR interactive content to the user 100. In one embodiment, at operation 624, the user is configured to receive the AR interactive content which can be rendered on a display of a device of the user. In another embodiment, operation 626 can be configured to capture the user’s interaction with the AIEs which can be incorporated into the user captured physical actions 604 as feedback data.

In one embodiment, the system can process the user profile 602. The user profile 602 may include various attributes and information associated with user 100 such as gameplay tendencies, behavior tendencies, viewing history, preferences, interests, disinterests, etc. In some embodiments, the user profile extraction 608 operation is configured to process the user profile 602 to identify and extract features associated with the profile of the user 100. After the user profile extraction 608 operation processes and identifies the features from the user profile 602, the user profile classifiers 614 operation is configured to classify the features using one or more classifiers. In one embodiment, the features are labeled using a classification algorithm for further refining by the model 620.

In another embodiment, the system can process the user captured physical actions 604. In some embodiments, the physical actions of the user are continuously monitored and tracked during the user’s interaction with the AR environment. In one embodiment, the user captured physical actions 604 may include various attributes and information associated with the actions of the user such as hand movement data, head movement data, body movement data, eye gaze data, voice capture data, face capture data, and controller input data. For example, a user 100 may be viewing AR interactive content that includes a living room (e.g., real-world space) with various real-world objects located in the living room, e.g., remote control, TV, stereo system. The hand movement data and the eye gaze data of the user may provide information that the user is reaching toward the remote control and looking at the stereo system Accordingly, the system may infer that the user is interested in turning on the stereo system and select AIEs that correspond to the stereo system for rendering in the AR scene.

In some embodiments, the user captured physical actions extraction 610 operation is configured to process the user captured physical actions 604 to identify and extract features associated with the physical actions of the user. After the user captured physical actions extraction 610 operation processes and identifies the features from the user captured physical actions 604, the user captured physical actions classifier 616 operation is configured to classify the features using one or more classifiers. In one embodiment, the features are labeled using a classification algorithm for further refining by the model 620.

In another embodiment, the system can process the contextual data 606. In one embodiment, the contextual data 606 may include a variety of information associated with the context of the AR environment that the user is interacting in such as real-world space, real-world objects, virtual objects, date, time, contextual data regarding the interaction, etc. For example, the contextual data 606 may provide information that indicates that it is a Friday evening and that the user generally eats dinner at 7 PM. In other embodiments, the contextual data extraction 612 operation is configured to process the contextual data 606 to identify and extract features associated with the context of the user. After the contextual data extraction 612 processes and identifies the features from the contextual data 606, the contextual data classifiers 618 operation is configured to classify the features using one or more classifiers. In some embodiments, the features are labeled using a classification algorithm for further refining by the model 620.

In some embodiments, the model 620 is configured to receive as inputs the classified features (e.g., user profile classified features, user captured physical actions classified features, contextual data classified features). In another embodiment, other inputs that are not direct inputs or lack of input/feedback, may also be taken as inputs to the model 620. The model 620 may use a machine learning model to predict one or more AIEs 104 to select and its corresponding real-world objects that it corresponds to. In other embodiments, the model 620 can receive as input feedback data such as data related to the selected AIEs by the user in prior interactions with an AR interactive content. The feedback data, in one example, may include selections and non-selections of AIEs by the user.

In one example, a user profile 602 associated with a user may indicate that the user has the tendency to order pizza from Mario’s Pizzeria every other Friday evening for dinner. Further, while interacting in the AR interactive content, based on the user captured physical actions 604 associated with the user, the system may determine that the user is reaching toward a mobile device (e.g., real-world object). The contextual data 606 may further indicate that it is a Friday evening and that the user and his friends are trying to decide what to have for dinner. Accordingly, using the user profile, user captured physical actions, and the contextual data, the model 620 may be used to select AIEs 104 that are related to the mobile device where one or more of the AIEs 104 would include selectable features that would enable the user to order pizza from Mario’s Pizzeria.

In some embodiments, operation 622 can use the model 620 to determine which AIEs 104 to select for rendering, the corresponding real-world objects associated with the AIEs, and the location to render the AIEs within the AR scene. Once the AIEs 104 are selected for rendering as virtual objects in the AR scene, the AIEs are received by server 628 for processing. In some embodiments, server 628 may include the cloud computing and gaming system 114 and the AR user interaction processor 630. In one embodiment, the cloud computing and gaming system 114 may receive the selected AIEs 104 from operation 622 for processing. In one embodiment, the cloud computing and gaming system 114 is configured to generate the AR interactive content and context of the environment which may include various AR scenes. In one embodiment, the AR scenes are continuously updated based on the interaction of the user. In other embodiments, the user profile 602 and the contextual data 606 can be provided by the cloud computing and gaming system 114.

In some embodiments, the cloud computing and gaming system 114 may work together with an AR user interaction processor 630 to render the AIEs 104 at its respective location and position within the AR scene. For example, the AR user interaction processor 630 may determine the direction of the user’s eye gaze is directed toward a chess piece (e.g., real-world object) and that the user’s hand is reaching in a direction toward the chess piece. Accordingly, the cloud computing and gaming system 114 may infer that the user is interested in playing chess and generate AR interactive content (e.g., AR scenes with real-objects and virtual objects) that include AIEs that are rendered proximate to the chess piece for selection by the user. In other embodiments, the user captured physical actions 604 can be provided by the AR user interaction processor 630.

After the cloud computing and gaming system 114 the AR user interaction processor 630 produces the AR interactive content, operation 624 is configured to receive the AR interactive content so that the user 100 can interact with the AR interactive content. In one embodiment, the AR interactive content may include composite images of the real-world with AIEs in position in context of what the user is doing or playing (e.g., AIEs in position near a chess piece). In one example, as the hand of the user moves and changes direction within the AR scene, the AIEs 104 follows and stays at a position that is proximate to the hand of the user. In other embodiments, while the user interacts with the AR interactive content and selects the AIEs, operation 624 can continuously receive updates to the AR interactive content which may include updated AIEs that are rendered at their appropriate locations. In one embodiment, the updated AIEs may provide information related to the real-world objects in the AR scene and provide additional AIEs for selection by the user.

In one embodiment, during the user’s interaction with the AR interactive content, operation 626 is configured to capture the user’s interaction with the AIEs, the real-world objects, and any other virtual objects in the AR interactive content. In some embodiments, operation 626 can be configured to assess the user’s interaction with the AIEs, and the placement of the AIEs in the AR scene. The user’s interaction with the AIEs may be explicit or implied by the user. For example, if AIEs are rendered near a real-world object and is ignored by the user, it may be implied that the user is not interested in the real-world object. Accordingly, various inferences can be captured by operation 626 and incorporated into the user captured physical actions 604 or user profile 602 which can help provide a more accurate selection of the AIEs.

FIG. 7 illustrates an embodiment of a user captured physical actions table 702 illustrating various physical actions that are captured while the user is interacting with the AR interactive content. As shown, the user captured physical actions table 702 includes a physical action category 704 and a general description 706. In one embodiment, the physical action category 704 may include information associated with the user that is captured during the user’s interaction with the AR environment such as body movement data, eye gaze data, voice capture data, face capture data, controller input data, etc.

To provide an illustration of the user captured physical actions table 702 in FIG. 7, in one example, the system may determine that based on the contextual data, the user is using an AR HMD to play a video game in a living room of the user. Using the user captured physical actions table 702, eye gaze data and the face capture data indicates that user is looking at a remote control that is sitting on a coffee table of the user. Based on the user profile data and the contextual data, the system may determine that a live baseball game that includes the user’s favorite team will be starting in the next five minutes. Accordingly, the system may select one or more AIEs for rendering proximate to the remote control where the one or more AIEs correspond to features that will enable the remote control to automatically turn on a TV to display the baseball game. Accordingly, the selected AIEs can be selected and rendered at a position proximate to the user based on the physical actions of the user, profile of the user, and contextual information so that the user can seamlessly select features associated with the remote control (e.g., turn on TV and stream baseball game) in an efficient manner

FIG. 8 illustrates a method for generating an AR scene for a user and generating one or more artificial AIEs for rendering in the AR scene. In one embodiment, the method described in FIG. 8 provides the user with an enhanced and improved AR experience since the AIEs may provide information related to the real-world objects in the AR scene which can be quickly accessible by the user. In one embodiment, the method includes an operation 802 that is configured to generate an augmented reality AR scene for rendering on a display. In some embodiments, the AR scene may include a real-world space and virtual objects projected in the real-world space. For example, a user may be wearing AR goggles that includes a display. While immersed in the AR environment that includes both real-world objects and virtual objects, the user can interact with the various virtual objects and the real-world objects that that are displayed to the user.

The method shown in FIG. 8 then flows to operation 804 where the operation is configured to analyze a FOV into the AR scene. In one embodiment, operation 804 is configured to detect an action by a hand of the user when reaching into the AR scene. In some embodiments, operation 804 is configured to use an externally facing camera to capture images that are within the FOV of the user, e.g., real-world objects and virtual objects. In one embodiment, the externally facing camera can be used to detect the various actions of the user such as the hand movements and hand gestures of the user when reaching into the AR scene. In other embodiments, operation 804 is configured to capture other actions of the user such as body movement data, eye gaze data, voice capture data, face capture data, and controller input data.

The method flows to operation 806 where the operation is configured to generate one or more AIEs that are rendered as virtual objects in the AR scene. In one embodiment, each AIE is configured to provide a dynamic interface that is selectable by a gesture of the hand of the user. In some embodiments, operation 806 may use a model to select the AIEs for generating in the AR scene. In one example, the plurality of AIEs can be selected and generated based on processing a profile of the user, physical actions of the user, and contextual data through model of the user. In one embodiment, the selected AIEs are selected based on the interests and preferences of the user. In other embodiments, the selected AIEs are related to the real-objects in the AR scene. In other embodiments, operation 806 is configured to render the AIEs proximate to the real-world object present in the real-world space. In some embodiments, the real-world object is located in a direction of where the hand of the user is detected to be reaching when the user makes the action by the hand.

FIG. 9 illustrates components of an example device 900 that can be used to perform aspects of the various embodiments of the present disclosure. This block diagram illustrates a device 900 that can incorporate or can be a personal computer, video game console, personal digital assistant, a server or other digital device, suitable for practicing an embodiment of the disclosure. Device 900 includes a central processing unit (CPU) 902 for running software applications and optionally an operating system. CPU 902 may be comprised of one or more homogeneous or heterogeneous processing cores. For example, CPU 902 is one or more general-purpose microprocessors having one or more processing cores. Further embodiments can be implemented using one or more CPUs with microprocessor architectures specifically adapted for highly parallel and computationally intensive applications, such as processing operations of interpreting a query, identifying contextually relevant resources, and implementing and rendering the contextually relevant resources in a video game immediately. Device 900 may be a localized to a player playing a game segment (e.g., game console), or remote from the player (e.g., back-end server processor), or one of many servers using virtualization in a game cloud system for remote streaming of gameplay to clients.

Memory 904 stores applications and data for use by the CPU 902. Storage 906 provides non-volatile storage and other computer readable media for applications and data and may include fixed disk drives, removable disk drives, flash memory devices, and CD-ROM, DVD-ROM, Blu-ray, HD-DVD, or other optical storage devices, as well as signal transmission and storage media. User input devices 908 communicate user inputs from one or more users to device 900, examples of which may include keyboards, mice, joysticks, touch pads, touch screens, still or video recorders/cameras, tracking devices for recognizing gestures, and/or microphones. Network interface 914 allows device 900 to communicate with other computer systems via an electronic communications network, and may include wired or wireless communication over local area networks and wide area networks such as the internet. An audio processor 912 is adapted to generate analog or digital audio output from instructions and/or data provided by the CPU 902, memory 904, and/or storage 906. The components of device 900, including CPU 902, memory 904, data storage 906, user input devices 908, network interface 910, and audio processor 912 are connected via one or more data buses 922.

A graphics subsystem 920 is further connected with data bus 922 and the components of the device 900. The graphics subsystem 920 includes a graphics processing unit (GPU) 916 and graphics memory 918. Graphics memory 918 includes a display memory (e.g., a frame buffer) used for storing pixel data for each pixel of an output image. Graphics memory 918 can be integrated in the same device as GPU 908, connected as a separate device with GPU 916, and/or implemented within memory 904. Pixel data can be provided to graphics memory 918 directly from the CPU 902. Alternatively, CPU 902 provides the GPU 916 with data and/or instructions defining the desired output images, from which the GPU 916 generates the pixel data of one or more output images. The data and/or instructions defining the desired output images can be stored in memory 904 and/or graphics memory 918. In an embodiment, the GPU 916 includes 3D rendering capabilities for generating pixel data for output images from instructions and data defining the geometry, lighting, shading, texturing, motion, and/or camera parameters for a scene. The GPU 916 can further include one or more programmable execution units capable of executing shader programs.

The graphics subsystem 914 periodically outputs pixel data for an image from graphics memory 918 to be displayed on display device 910. Display device 910 can be any device capable of displaying visual information in response to a signal from the device 900, including CRT, LCD, plasma, and OLED displays. Device 900 can provide the display device 910 with an analog or digital signal, for example.

It should be noted, that access services, such as providing access to games of the current embodiments, delivered over a wide geographical area often use cloud computing. Cloud computing is a style of computing in which dynamically scalable and often virtualized resources are provided as a service over the Internet. Users do not need to be an expert in the technology infrastructure in the “cloud” that supports them. Cloud computing can be divided into different services, such as Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and Software as a Service (SaaS). Cloud computing services often provide common applications, such as video games, online that are accessed from a web browser, while the software and data are stored on the servers in the cloud. The term cloud is used as a metaphor for the Internet, based on how the Internet is depicted in computer network diagrams and is an abstraction for the complex infrastructure it conceals.

A game server may be used to perform the operations of the durational information platform for video game players, in some embodiments. Most video games played over the Internet operate via a connection to the game server. Typically, games use a dedicated server application that collects data from players and distributes it to other players. In other embodiments, the video game may be executed by a distributed game engine. In these embodiments, the distributed game engine may be executed on a plurality of processing entities (PEs) such that each PE executes a functional segment of a given game engine that the video game runs on. Each processing entity is seen by the game engine as simply a compute node. Game engines typically perform an array of functionally diverse operations to execute a video game application along with additional services that a user experiences. For example, game engines implement game logic, perform game calculations, physics, geometry transformations, rendering, lighting, shading, audio, as well as additional in-game or game-related services. Additional services may include, for example, messaging, social utilities, audio communication, game play replay functions, help function, etc. While game engines may sometimes be executed on an operating system virtualized by a hypervisor of a particular server, in other embodiments, the game engine itself is distributed among a plurality of processing entities, each of which may reside on different server units of a data center.

According to this embodiment, the respective processing entities for performing the may be a server unit, a virtual machine, or a container, depending on the needs of each game engine segment. For example, if a game engine segment is responsible for camera transformations, that particular game engine segment may be provisioned with a virtual machine associated with a graphics processing unit (GPU) since it will be doing a large number of relatively simple mathematical operations (e.g., matrix transformations). Other game engine segments that require fewer but more complex operations may be provisioned with a processing entity associated with one or more higher power central processing units (CPUs).

By distributing the game engine, the game engine is provided with elastic computing properties that are not bound by the capabilities of a physical server unit. Instead, the game engine, when needed, is provisioned with more or fewer compute nodes to meet the demands of the video game. From the perspective of the video game and a video game player, the game engine being distributed across multiple compute nodes is indistinguishable from a non-distributed game engine executed on a single processing entity, because a game engine manager or supervisor distributes the workload and integrates the results seamlessly to provide video game output components for the end user.

Users access the remote services with client devices, which include at least a CPU, a display and I/O. The client device can be a PC, a mobile phone, a netbook, a PDA, etc. In one embodiment, the network executing on the game server recognizes the type of device used by the client and adjusts the communication method employed. In other cases, client devices use a standard communications method, such as html, to access the application on the game server over the internet.

It should be appreciated that a given video game or gaming application may be developed for a specific platform and a specific associated controller device. However, when such a game is made available via a game cloud system as presented herein, the user may be accessing the video game with a different controller device. For example, a game might have been developed for a game console and its associated controller, whereas the user might be accessing a cloud-based version of the game from a personal computer utilizing a keyboard and mouse. In such a scenario, the input parameter configuration can define a mapping from inputs which can be generated by the user’s available controller device (in this case, a keyboard and mouse) to inputs which are acceptable for the execution of the video game.

In another example, a user may access the cloud gaming system via a tablet computing device, a touchscreen smartphone, or other touchscreen driven device. In this case, the client device and the controller device are integrated together in the same device, with inputs being provided by way of detected touchscreen inputs/gestures. For such a device, the input parameter configuration may define particular touchscreen inputs corresponding to game inputs for the video game. For example, buttons, a directional pad, or other types of input elements might be displayed or overlaid during running of the video game to indicate locations on the touchscreen that the user can touch to generate a game input. Gestures such as swipes in particular directions or specific touch motions may also be detected as game inputs. In one embodiment, a tutorial can be provided to the user indicating how to provide input via the touchscreen for gameplay, e.g., prior to beginning gameplay of the video game, so as to acclimate the user to the operation of the controls on the touchscreen.

In some embodiments, the client device serves as the connection point for a controller device. That is, the controller device communicates via a wireless or wired connection with the client device to transmit inputs from the controller device to the client device. The client device may in turn process these inputs and then transmit input data to the cloud game server via a network (e.g., accessed via a local networking device such as a router). However, in other embodiments, the controller can itself be a networked device, with the ability to communicate inputs directly via the network to the cloud game server, without being required to communicate such inputs through the client device first. For example, the controller might connect to a local networking device (such as the aforementioned router) to send to and receive data from the cloud game server. Thus, while the client device may still be required to receive video output from the cloud-based video game and render it on a local display, input latency can be reduced by allowing the controller to send inputs directly over the network to the cloud game server, bypassing the client device.

In one embodiment, a networked controller and client device can be configured to send certain types of inputs directly from the controller to the cloud game server, and other types of inputs via the client device. For example, inputs whose detection does not depend on any additional hardware or processing apart from the controller itself can be sent directly from the controller to the cloud game server via the network, bypassing the client device. Such inputs may include button inputs, joystick inputs, embedded motion detection inputs (e.g., accelerometer, magnetometer, gyroscope), etc. However, inputs that utilize additional hardware or require processing by the client device can be sent by the client device to the cloud game server. These might include captured video or audio from the game environment that may be processed by the client device before sending to the cloud game server. Additionally, inputs from motion detection hardware of the controller might be processed by the client device in conjunction with captured video to detect the position and motion of the controller, which would subsequently be communicated by the client device to the cloud game server. It should be appreciated that the controller device in accordance with various embodiments may also receive data (e.g., feedback data) from the client device or directly from the cloud gaming server.

It should be understood that the various embodiments defined herein may be combined or assembled into specific implementations using the various features disclosed herein. Thus, the examples provided are just some possible examples, without limitation to the various implementations that are possible by combining the various elements to define many more implementations. In some examples, some implementations may include fewer elements, without departing from the spirit of the disclosed or equivalent implementations.

Embodiments of the present disclosure may be practiced with various computer system configurations including hand-held devices, microprocessor systems, microprocessor-based or programmable consumer electronics, minicomputers, mainframe computers and the like. Embodiments of the present disclosure can also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a wire-based or wireless network.

Although the method operations were described in a specific order, it should be understood that other housekeeping operations may be performed in between operations, or operations may be adjusted so that they occur at slightly different times or may be distributed in a system which allows the occurrence of the processing operations at various intervals associated with the processing, as long as the processing of the telemetry and game state data for generating modified game states and are performed in the desired way.

One or more embodiments can also be fabricated as computer readable code on a computer readable medium. The computer readable medium is any data storage device that can store data, which can be thereafter be read by a computer system. Examples of the computer readable medium include hard drives, network attached storage (NAS), read-only memory, random-access memory, CD-ROMs, CD-Rs, CD-RWs, magnetic tapes and other optical and non-optical data storage devices. The computer readable medium can include computer readable tangible medium distributed over a network-coupled computer system so that the computer readable code is stored and executed in a distributed fashion.

In one embodiment, the video game is executed either locally on a gaming machine, a personal computer, or on a server. In some cases, the video game is executed by one or more servers of a data center. When the video game is executed, some instances of the video game may be a simulation of the video game. For example, the video game may be executed by an environment or server that generates a simulation of the video game. The simulation, on some embodiments, is an instance of the video game. In other embodiments, the simulation maybe produced by an emulator. In either case, if the video game is represented as a simulation, that simulation is capable of being executed to render interactive content that can be interactively streamed, executed, and/or controlled by user input.

Although the foregoing embodiments have been described in some detail for purposes of clarity of understanding, it will be apparent that certain changes and modifications can be practiced within the scope of the appended claims. Accordingly, the present embodiments are to be considered as illustrative and not restrictive, and the embodiments are not to be limited to the details given herein, but may be modified within the scope and equivalents of the appended claims.

文章《Sony Patent | Augmented reality placement for user feedback》首发于Nweon Patent

]]>
MagicLeap Patent | Virtual and augmented reality systems and methods https://patent.nweon.com/26713 Thu, 26 Jan 2023 15:51:47 +0000 https://patent.nweon.com/?p=26713 ...

文章《MagicLeap Patent | Virtual and augmented reality systems and methods》首发于Nweon Patent

]]>
Patent: Virtual and augmented reality systems and methods

Patent PDF: 加入映维网会员获取

Publication Number: 20230021443

Publication Date: 2023-01-26

Assignee: Magic Leap

Abstract

A virtual or augmented reality display system that controls power inputs to the display system as a function of image data. Image data itself is made of a plurality of image data frames, each with constituent color components of, and depth planes for displaying on, rendered content. Light sources or spatial light modulators to relay illumination from the light sources may receive signals from a display controlled to adjust a power setting to the light source or spatial light modulator based on control information embedded in an image data frame.

Claims

What is claimed is:

1.A system comprising: a display configured to display image data, the image data comprising a current image data frame with a plurality of depth planes; a plurality of light sources configured to illuminate the display; and a display system controller configured to receive the current image data frame, read control information embedded in the current image data frame, the control information comprising an indication to inactivate a selected depth plane within the current image data frame, and control at least one control input for at least one of the plurality of light sources based on the embedded control information so as to inactivate the selected depth plane within the current image data frame.

2.The system of claim 1, wherein the at least one control input comprises a reduced-power setting for a light source corresponding to the selected depth plane.

3.The system of claim 2, wherein the at least one control input comprises a complete power shut-off setting for the light source corresponding to the selected depth plane.

4.The system of claim 1, further comprising a plurality of spatial light modulators configured to modulate light from the plurality of light sources with the image data, wherein the at least one control input comprises a reduced-power setting for a spatial light modulator corresponding to the selected depth plane.

5.The system of claim 4, wherein the at least one control input comprises a complete power shut-off setting for the spatial light modulator corresponding to the selected depth plane.

6.The system of claim 1, wherein the embedded control information further comprises advance frame display information which comprises an indication to selectively re-activate the selected depth plane within a subsequent image data frame.

7.The system of claim 6, wherein the display system controller is configured to control at least one control input for at least one of the plurality of light sources based on the advance frame display information embedded in the current image data frame.

8.The system of claim 7, wherein the at least one control input comprises an increased-power setting for a light source corresponding to the selected depth plane.

9.The system of claim 8, wherein the advance frame display information includes an indication of a timing for implementing the increased-power setting.

10.The system of claim 9, wherein the timing for implementing the increased-power setting is prior to a timing for displaying the subsequent image data frame.

11.The system of claim 9, wherein the timing for implementing the increased-power setting is based on a user’s motion.

12.The system of claim 8, wherein the at least one control input comprises a full power setting for the light source corresponding to the inactivated depth plane.

13.The system of claim 12, wherein the advance frame display information includes an indication of a timing for implementing the full power setting.

14.The system of claim 13, wherein the timing for implementing the full power setting is prior to a timing for displaying the subsequent image data frame.

15.The system of claim 13, wherein the timing for implementing the increased-power setting is based on a user’s motion.

16.The system of claim 1, wherein the embedded control information is inserted into the current image data frame in place of a portion of the image data.

17.The system of claim 16, wherein the embedded control information comprises one or more substituted values in one or more pixels of the image data.

18.The system of claim 1, wherein the display system controller is further configured to remove the embedded control information from the current image data frame before the current image data frame is displayed.

19.A display system controller configured to perform a method comprising: receiving a current image data frame with a plurality of depth planes; reading control information embedded in the current image data frame, the control information comprising an indication to inactivate a selected depth plane within the current image data frame; and controlling at least one control input for at least one light source based on the embedded control information so as to inactivate the selected depth plane within the current image data frame.

Description

INCORPORATION BY REFERENCE TO ANY PRIORITY APPLICATION

This application is a continuation of U.S. patent application Ser. No. 16/902,820, filed on Jun. 16, 2020, and entitled, “VIRTUAL AND AUGMENTED REALITY SYSTEMS AND METHODS USING DISPLAY SYSTEM CONTROL INFORMATION EMBEDDED IN IMAGE DATA,” which is a continuation of U.S. patent application Ser. No. 15/902,710, filed on Feb. 22, 2018, and entitled “VIRTUAL AND AUGMENTED REALITY SYSTEMS AND METHODS USING DISPLAY SYSTEM CONTROL INFORMATION EMBEDDED IN IMAGE DATA.” These and any other application for which a foreign or domestic priority claim is identified in the Application Data Sheet, as filed with the present application, are hereby incorporated by reference under 37. CFR 1.57.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is related to U.S. Non-Provisional application Ser. No. 15/239,710 filed on Aug. 17, 2016, entitled “VIRTUAL AND AUGMENTED REALITY SYSTEMS AND METHODS,” and U.S. Non-Provisional application Ser. No. 15/804,356 filed on Nov. 6, 2017, entitled “VIRTUAL AND AUGMENTED REALITY SYSTEMS AND METHODS,” each of which are incorporated by reference herein in their entirety.

BACKGROUNDField

This disclosure relates to virtual and augmented reality imaging and visualization systems.

Description of the Related Art

Modern computing and display technologies have facilitated the development of virtual reality and augmented reality systems. Virtual reality, or “VR,” systems create a simulated environment for a user to experience. This can be done by presenting computer-generated imagery to the user through a display. This imagery creates a sensory experience which immerses the user in the simulated environment. A virtual reality scenario typically involves presentation of only computer-generated imagery rather than also including actual real-world imagery.

Augmented reality systems generally supplement a real-world environment with simulated elements. For example, augmented reality, or “AR,” systems may provide a user with a view of the surrounding real-world environment via a display. However, computer-generated imagery can also be presented on the display to enhance the real-world environment. This computer-generated imagery can include elements which are contextually-related to the real-world environment. Such elements can include simulated text, images, objects, etc. The simulated elements can often times be interactive in real time. FIG. 1 depicts an example augmented reality scene 1 where a user of an AR technology sees a real-world park-like setting 6 featuring people, trees, buildings in the background, and a concrete platform 1120. In addition to these items, computer-generated imagery is also presented to the user. The computer-generated imagery can include, for example, a robot statue 1110 standing upon the real-world platform 1120, and a cartoon-like avatar character 2 flying by which seems to be a personification of a bumble bee, even though these elements 2, 1110 are not actually present in the real-world environment.

Because the human visual perception system is complex, it is challenging to produce a VR or AR technology that facilitates a comfortable, natural-feeling, rich presentation of virtual image elements amongst other virtual or real-world imagery elements.

SUMMARY

In some embodiments, a virtual or augmented reality display system comprises: a display configured to display imagery for a plurality of depth planes; a display controller configured to receive rendered virtual or augmented reality imagery data from a graphics processor, and to control the display based at least in part on control information embedded in the rendered imagery, wherein the embedded control information indicates a shift to apply to at least a portion of the rendered imagery when displaying the imagery.

In some embodiments, the shift alters the displayed position of one or more virtual or augmented reality objects as compared to the position of the one or more objects in the rendered imagery.

In some embodiments, the shift comprises a lateral shift of at least a portion of the imagery by a specified number of pixels within the same depth plane.

In some embodiments, the shift comprises a longitudinal shift of at least a portion of the imagery from one depth plane to another.

In some embodiments, the display controller is further configured to scale at least a portion of the imagery in conjunction with a longitudinal shift from one depth plane to another.

In some embodiments, the shift comprises a longitudinal shift of at least a portion of the imagery from one depth plane to a virtual depth plane, the virtual depth plane comprising a weighted combination of at least two depth planes.

In some embodiments, the shift is based on information regarding a head pose of a user.

In some embodiments, the shift is performed by the display controller without re-rendering the imagery.

In some embodiments, a method in a virtual or augmented reality display system comprises: receiving rendered virtual or augmented reality imagery data from a graphics processor; and displaying the imagery for a plurality of depth planes based at least in part on control information embedded in the rendered imagery, wherein the embedded control information indicates a shift to apply to at least a portion of the rendered imagery when displaying the imagery.

In some embodiments, the method further comprises shifting the displayed position of one or more virtual or augmented reality objects as compared to the position of the one or more objects in the rendered imagery.

In some embodiments, the method further comprises laterally shifting at least a portion of the imagery by a specified number of pixels within the same depth plane based on the control information.

In some embodiments, the method further comprises longitudinally shifting at least a portion of the imagery from one depth plane to another based on the control information.

In some embodiments, the method further comprises scaling at least a portion of the imagery in conjunction with longitudinally shifting the imagery from one depth plane to another.

In some embodiments, the method further comprises longitudinally shifting at least a portion of the imagery from one depth plane to a virtual depth plane, the virtual depth plane comprising a weighted combination of at least two depth planes.

In some embodiments, the shift is based on information regarding a head pose of a user.

In some embodiments, the method further comprises shifting the imagery without re-rendering the imagery.

In some embodiments, a virtual or augmented reality display system comprises: a display configured to display virtual or augmented reality imagery for a plurality of depth planes, the imagery comprising a series of images made up of rows and columns of pixel data; a display controller configured to receive the imagery from a graphics processor and to control the display based at least in part on control information embedded in the imagery, wherein the embedded control information comprises depth plane indicator data which indicates at which of the plurality of depth planes to display at least a portion of the imagery.

In some embodiments, the control information does not alter the number of rows and columns of pixel data in the series of images.

In some embodiments, the control information comprises a row or column of information substituted for a row or column of pixel data in one or more of the series of images.

In some embodiments, the control information comprises a row or column of information appended to the pixel data for one or more of the series of images.

In some embodiments, the pixel data comprises a plurality of color values, and wherein the depth plane indicator data is substituted for one or more bits of at least one of the color values.

In some embodiments, the depth plane indicator data is substituted for one or more least significant bits of at least one of the color values.

In some embodiments, the depth plane indicator data is substituted for one or more bits of a blue color value.

In some embodiments, each pixel comprises depth plane indicator data.

In some embodiments, the display controller is configured to order the series of images based at least in part on the depth plane indicator data.

In some embodiments, a method in a virtual or augmented reality display system comprises: receiving virtual or augmented reality imagery from a graphics processor, the imagery comprising a series of images made up of rows and columns of pixel data for a plurality of depth planes; displaying the imagery based at least in part on control information embedded in the imagery, wherein the embedded control information comprises depth plane indicator data which indicates at which of the plurality of depth planes to display at least a portion of the imagery.

In some embodiments, the control information does not alter the number of rows and columns of pixel data in the series of images.

In some embodiments, the control information comprises a row or column of information substituted for a row or column of pixel data in one or more of the series of images.

In some embodiments, the control information comprises a row or column of information appended to the pixel data for one or more of the series of images.

In some embodiments, the pixel data comprises a plurality of color values, and wherein the depth plane indicator data is substituted for one or more bits of at least one of the color values.

In some embodiments, the depth plane indicator data is substituted for one or more least significant bits of at least one of the color values.

In some embodiments, the depth plane indicator data is substituted for one or more bits of a blue color value.

In some embodiments, each pixel comprises depth plane indicator data.

In some embodiments, the method further comprises ordering the series of images based at least in part on the depth plane indicator data.

In some embodiments, a virtual or augmented reality display system comprises: a first sensor configured to provide measurements of a user’s head pose over time; and a processor configured to estimate the user’s head pose based on at least one head pose measurement and based on at least one calculated predicted head pose, wherein the processor is configured to combine the head pose measurement and the predicted head pose using one or more gain factors, and wherein the one or more gain factors vary based upon the user’s head pose position within a physiological range of movement.

In some embodiments, the first sensor is configured to be head-mounted.

In some embodiments, the first sensor comprises an inertial measurement unit.

In some embodiments, the one or more gain factors emphasize the predicted head pose over the head pose measurement when the user’s head pose is in a central portion of the physiological range of movement.

In some embodiments, the one or more gain factors emphasize the predicted head pose over the head pose measurement when the user’s head pose is nearer the middle of the physiological range of movement than a limit of the user’s physiological range of movement.

In some embodiments, the one or more gain factors emphasize the head pose measurement over the predicted head pose when the user’s head pose approaches a limit of the physiological range of movement.

In some embodiments, the one or more gain factors emphasize the head pose measurement over the predicted head pose when the user’s head pose is nearer a limit of the physiological range of movement than the middle of the physiological range of movement.

In some embodiments, the first sensor is configured to be head-mounted and further comprising a second sensor configured to be body-mounted, wherein the at least one head pose measurement is determined based on measurements from both the first sensor and the second sensor.

In some embodiments, the head pose measurement is determined based on a difference between measurements from the first sensor and the second sensor.

In some embodiments, a method of estimating head pose in a virtual or augmented reality display system comprises: receiving measurements of a user’s head pose over time from a first sensor; and estimating, using a processor, the user’s head pose based on at least one head pose measurement and based on at least one calculated predicted head pose, wherein estimating the user’s head pose comprises combining the head pose measurement and the predicted head pose using one or more gain factors, and wherein the one or more gain factors vary based upon the user’s head pose position within a physiological range of movement.

In some embodiments, the first sensor is configured to be head-mounted and the method further comprises: receiving body orientation measurements from a second sensor configured to be body-mounted; and estimating the user’s head pose based on the at least one head pose measurement and based on the at least one calculated predicted head pose, wherein the at least one head pose measurement is determined based on measurements from both the first sensor and the second sensor.

In some embodiments, a virtual or augmented reality display system comprises: a sensor configured to determine one or more characteristics of the ambient lighting; a processor configured to adjust one or more characteristics of a virtual object based on the one or more characteristics of the ambient lighting; and a display configured to display the virtual object to a user.

In some embodiments, the one or more characteristics of the ambient lighting comprise the brightness of the ambient lighting.

In some embodiments, the one or more characteristics of the ambient lighting comprise the hue of the ambient lighting.

In some embodiments, the one or more characteristics of the virtual object comprise the brightness of the virtual object.

In some embodiments, the one or more characteristics of the virtual object comprise the hue of the virtual object.

In some embodiments, a method in a virtual or augmented reality display system comprises: receiving one or more characteristics of the ambient lighting from a sensor; adjusting, using a processor, one or more characteristics of a virtual object based on the one or more characteristics of the ambient lighting; and displaying the virtual object to a user.

In some embodiments, a virtual or augmented reality display system comprises: a processor configured to compress virtual or augmented reality imagery data, the imagery comprising imagery for multiple depth planes, the processor being configured to compress the imagery data by reducing redundant information between the depth planes of the imagery; a display configured to display the imagery for the plurality of depth planes.

In some embodiments, the imagery for a depth plane is represented in terms of differences with respect to an adjacent depth plane.

In some embodiments, the processor encodes motion of an object between depth planes.

In some embodiments, a method in a virtual or augmented reality display system comprises: compressing virtual or augmented reality imagery data with a processor, the imagery comprising imagery for multiple depth planes, the processor being configured to compress the imagery data by reducing redundant information between the depth planes of the imagery; displaying the imagery for the plurality of depth planes.

In some embodiments, the imagery for a depth plane is represented in terms of differences with respect to an adjacent depth plane.

In some embodiments, the method further comprises encoding motion of an object between depth planes.

In some embodiments, a virtual or augmented reality display system comprises: a display configured to display virtual or augmented reality imagery for a plurality of depth planes; a display controller configured to control the display, wherein the display controller dynamically configures a sub-portion of the display to refresh per display cycle.

In some embodiments, the display comprises a scanning display and the display controller dynamically configures the scanning pattern to skip areas of the display where the imagery need not be refreshed.

In some embodiments, the display cycle comprises a frame of video imagery.

In some embodiments, the display controller increases the video frame rate if the sub-portion of the display to be refreshed decreases in size.

In some embodiments, the display controller decreases the video frame rate if the sub-portion of the display to be refreshed increases in size.

In some embodiments, a method in a virtual or augmented reality display system comprises: displaying virtual or augmented reality imagery for a plurality of depth planes with a display; dynamically configuring a sub-portion of the display to refresh per display cycle.

In some embodiments, the display comprises a scanning display and the method further comprises dynamically configuring the scanning pattern to skip areas of the display where the imagery need not be refreshed.

In some embodiments, the display cycle comprises a frame of video imagery.

In some embodiments, the method further comprises increasing the video frame rate if the sub-portion of the display to be refreshed decreases in size.

In some embodiments, the method further comprises decreasing the video frame rate if the sub-portion of the display to be refreshed increases in size.

In some embodiments, a virtual or augmented reality display system comprises: a transmitter which transmits an electric or magnetic field that varies in space; a tangible object which allows a user to interact with a virtual object or scene, the tangible object comprising a sensor which detects the electric or magnetic field from the transmitter, wherein measurements from the sensor are used to determine the position or orientation of the tangible object with respect to the transmitter.

In some embodiments, the transmitter is integrated with a head-mounted portion of the virtual or augmented reality display system.

In some embodiments, a method in a virtual or augmented reality display system comprises: transmitting an electric or magnetic field that varies in space using a transmitter; detecting the electric or magnetic field using a sensor; using measurements from the sensor to determine the position or orientation of the sensor with respect to the transmitter.

In some embodiments, the transmitter is integrated with a head-mounted portion of the virtual or augmented reality display system.

In some embodiments, a virtual or augmented reality display system comprises a display configured to display imagery for a plurality of depth planes; a display controller configured to receive rendered virtual or augmented reality imagery data, and to control the display based at least in part on control information embedded in the rendered imagery, wherein the embedded control information indicates a desired brightness or color to apply to at least a portion of the rendered imagery when displaying the imagery. The desired brightness or color can alter the displayed position of one or more virtual or augmented reality objects as compared to the position of the one or more objects in the rendered imagery. The desired brightness or color can longitudinal shift at least a portion of the imagery from one depth plane to a virtual depth plane, the virtual depth plane comprising a weighted combination of at least two depth planes.

In some embodiments, a virtual or augmented reality display system comprises: a display configured to display imagery for a plurality of depth planes; a display controller configured to receive rendered virtual or augmented reality imagery data, and to control the display based at least in part on control information, wherein the control information indicates that at least one depth plane is inactive and the display controller is configured to control inputs to the display based on the indication that at least one depth plane is inactive, thereby reducing net power consumption of the system.

In some embodiments, the indication that at least one depth plane is inactive comprises control information comprising depth plane indicator data that specifies a plurality of active depth planes to display the imagery.

In some embodiments, indication that at least one depth plane is inactive comprises control information comprising depth plane indicator data that specifies that at least one depth plane is inactive.

In some embodiments, the control information is embedded in the rendered imagery.

In some embodiments, the display controller causes one or more light sources to be reduced in power thereby reducing net power consumption of the system. In some embodiments, reduction in power is by decreasing an amplitude of an intensity input. In some embodiments, reduction in power is by supplying no power to the one or more light sources.

In some embodiments, a method in a virtual or augmented reality display system comprises: receiving rendered virtual or augmented reality imagery data for displaying imagery on a plurality of depth planes; receiving control information indicating that at least one depth plane is inactive; and displaying the imagery for a plurality of depth planes based at least in part on said control information indicating that at least one depth plane is inactive, thereby reducing net power consumption of the system.

In some embodiments, the control information comprises depth plane indicator data that specifies a plurality of active depth planes to display the imagery.

In some embodiments, the control information comprises depth plane indicator data that specifies at least one depth plane that is inactive.

In some embodiments, the control information is embedded in the rendered imagery.

In some embodiments, upon control information indicating that at least one depth plane is inactive, one or more light sources is reduced in power thereby reducing net power consumption of the system. In some embodiments, reduction in power is by decreasing an amplitude of an intensity input. In some embodiments, reduction in power is by supplying no power to the one or more light sources.

In some embodiments, a virtual or augmented reality display system comprises: a display configured to display imagery for a plurality of depth planes having a plurality of color fields; a display controller configured to receive rendered virtual or augmented reality imagery data, and to control the display based at least in part on control information, wherein the control information indicates that at least one color field is inactive and the display controller is configured to control inputs to the display based on the indication that at least one color field is inactive, thereby reducing net power consumption of the system.

In some embodiments, the indication that at least one color field is inactive comprises control information comprising color field indicator data that specifies a plurality of active color fields to display the imagery.

In some embodiments, the indication that at least one color field is inactive comprises control information comprising color field indicator data that specifies that at least one color field is inactive.

In some embodiments, the control information is embedded in the rendered imagery.

In some embodiments, the display controller causes one or more light sources to be reduced in power thereby reducing net power consumption of the system. For example, in an RGB LED light source system, an inactive color component in a particular frame direct a single constituent red, green or blue LED family be reduced in power. In some embodiments, reduction in power is by decreasing an amplitude of an intensity input. In some embodiments, reduction in power is by supplying no power to the one or more light sources.

In some embodiments, a method in a virtual or augmented reality display system comprises: receiving rendered virtual or augmented reality imagery data for displaying imagery on a plurality of depth planes having a plurality of color fields; receiving control information indicating that at least one color field is inactive; and displaying the imagery for a plurality of color fields in a plurality of depth planes based at least in part on said control information indicating that at least one color field is inactive, thereby reducing net power consumption of the system.

In some embodiments, the control information comprises color field indicator data that specifies a plurality of active color fields to display the imagery.

In some embodiments, the control information comprises color field indicator data that specifies at least one color field that is inactive.

In some embodiments, the control information is embedded in the rendered imagery.

In some embodiments, upon control information indicating that at least one color field is inactive, one or more light sources is reduced in power thereby reducing net power consumption of the system. For example, in an RGB LED light source system, an inactive color component in a particular frame direct a single constituent red, green or blue LED family be reduced in power. In some embodiments, reduction in power is by decreasing an amplitude of an intensity input. In some embodiments, reduction in power is by supplying no power to the one or more light sources.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates a user’s view of an augmented reality (AR) scene using an example AR system.

FIG. 2 illustrates an example of wearable display system.

FIG. 3 illustrates a conventional display system for simulating three-dimensional imagery for a user.

FIG. 4 illustrates aspects of an approach for simulating three-dimensional imagery using multiple depth planes.

FIGS. 5A-5C illustrate relationships between radius of curvature and focal radius.

FIG. 6 illustrates an example of a waveguide stack for outputting image information to a user.

FIG. 7 shows an example of exit beams outputted by a waveguide.

FIG. 8 illustrates an example design of a waveguide stack in which each depth plane has three associated waveguides that each output light of a different color.

FIG. 9 illustrates an example timing scheme for a virtual or augmented reality system which displays light field imagery.

FIG. 10 illustrates an example format for a frame of video data which includes appended control information.

FIG. 11 illustrates another example format for a frame of video data which includes control information.

FIG. 12 illustrates an example format for a pixel of video data which includes embedded control information.

FIG. 13 illustrates how a frame of video can be separated into color components which can be displayed serially.

FIG. 14 illustrates how a frame of video data can be separated, using depth plane indicator information, into multiple depth planes which can each be split into color components sub-frames for display.

FIG. 15 illustrates an example where the depth plane indicator information of FIG. 12 indicates that one or more depth planes of a frame of video data are inactive.

FIG. 16 illustrates example drawing areas for a frame of computer-generated imagery in an augmented reality system.

FIG. 17 schematically illustrates the possible motion of a user’s head about two rotational axes.

FIG. 18 illustrates how a user’s head pose can be mapped onto a three-dimensional surface.

FIG. 19 schematically illustrates various head pose regions which can be used to define gain factors for improving head pose tracking.

FIG. 20 is a block diagram depicting an AR/MR system, according to one embodiment.

DETAILED DESCRIPTION

Virtual and augmented reality systems disclosed herein can include a display which presents computer-generated imagery to a user. In some embodiments, the display systems are wearable, which may advantageously provide a more immersive VR or AR experience. FIG. 2 illustrates an example of wearable display system 80. The display system 80 includes a display 62, and various mechanical and electronic modules and systems to support the functioning of that display 62. The display 62 may be coupled to a frame 64, which is wearable by a display system user or viewer 60 and which is configured to position the display 62 in front of the eyes of the user 60. In some embodiments, a speaker 66 is coupled to the frame 64 and positioned adjacent the ear canal of the user (in some embodiments, another speaker, not shown, is positioned adjacent the other ear canal of the user to provide for stereo/shapeable sound control). The display 62 is operatively coupled, such as by a wired or wireless connection 68, to a local data processing module 70 which may be mounted in a variety of configurations, such as fixedly attached to the frame 64, fixedly attached to a helmet or hat worn by the user, embedded in headphones, or otherwise removably attached to the user 60 (e.g., in a backpack-style configuration, in a belt-coupling style configuration, etc.).

The local processing and data module 70 may include a processor, as well as digital memory, such as non-volatile memory (e.g., flash memory), both of which may be utilized to assist in the processing and storing of data. This includes data captured from sensors, such as image capture devices (e.g., cameras), microphones, inertial measurement units, accelerometers, compasses, GPS units, radio devices, and/or gyros. The sensors may be, e.g., operatively coupled to the frame 64 or otherwise attached to the user 60. Alternatively, or additionally, sensor data may be acquired and/or processed using a remote processing module 72 and/or remote data repository 74, possibly for passage to the display 62 after such processing or retrieval. The local processing and data module 70 may be operatively coupled by communication links (76, 78), such as via a wired or wireless communication links, to the remote processing module 72 and remote data repository 74 such that these remote modules (72, 74) are operatively coupled to each other and available as resources to the local processing and data module 70.

In some embodiments, the remote processing module 72 may include one or more processors configured to analyze and process data (e.g., sensor data and/or image information). In some embodiments, the remote data repository 74 may comprise a digital data storage facility, which may be available through the internet or other networking configuration in a “cloud” resource configuration. In some embodiments, all data is stored and all computations are performed in the local processing and data module, allowing fully autonomous use from a remote module.

In some embodiments, the computer-generated imagery provided via the display 62 can create the impression of being three-dimensional. This can be done, for example, by presenting stereoscopic imagery to the user. In some conventional systems, such imagery can include separate images of a scene or object from slightly different perspectives. The separate images can be presented to the user’s right eye and left eye, respectively, thus simulating binocular vision and its associated depth perception.

FIG. 3 illustrates a conventional display system for simulating three-dimensional imagery for a user. Two distinct images 74 and 76, one for each eye 4 and 6, are outputted to the user. The images 74 and 76 are spaced from the eyes 4 and 6 by a distance 10 along an optical or z-axis parallel to the line of sight of the viewer. The images 74 and 76 are flat and the eyes 4 and 6 may focus on the images by assuming a single accommodated state. Such systems rely on the human visual system to combine the images 74 and 76 to provide a perception of depth for the combined image.

It will be appreciated, however, that the human visual system is more complicated and providing a realistic perception of depth is more challenging. For example, many viewers of conventional 3D display systems find such systems to be uncomfortable or may not perceive a sense of depth at all. Without being limited by theory, it is believed that viewers of an object may perceive the object as being “three-dimensional” due to a combination of vergence and accommodation. Vergence movements (i.e., rolling movements of the pupils toward or away from each other to converge the lines of sight of the eyes to fixate upon an object) of the two eyes relative to each other are closely associated with focusing (or “accommodation”) of the lenses of the eyes. Under normal conditions, changing the focus of the lenses of the eyes, or accommodating the eyes, to change focus from one object to another object at a different distance will automatically cause a matching change in vergence to the same distance, under a relationship known as the “accommodation-vergence reflex.” Likewise, a change in vergence will trigger a matching change in accommodation, under normal conditions. As noted herein, many stereoscopic display systems display a scene using slightly different presentations (and, so, slightly different images) to each eye such that a three-dimensional perspective is perceived by the human visual system. Such systems are uncomfortable for many viewers, however, since they simply provide different presentations of a scene but with the eyes viewing all the image information at a single accommodated state, and thus work against the accommodation-vergence reflex. Display systems that provide a better match between accommodation and vergence may form more realistic and comfortable simulations of three-dimensional imagery.

For example, light field imagery can be presented to the user to simulate a three-dimensional view. Light field imagery can mimic the rays of light which enter the eyes of a viewer in a real-world environment. For example, when displaying light field imagery, light rays from objects that are simulated to be perceived at a distance are made to be more collimated when entering the viewer’s eyes, while light rays from objects that are simulated to be perceived nearby are made to be more divergent. Thus, the angles at which light rays from objects in a scene enter the viewer’s eyes are dependent upon the simulated distance of those objects from the viewer. Light field imagery in a virtual or augmented reality system can include multiple images of a scene or object from different depth planes. The images may be different for each depth plane (e.g., provide slightly different presentations of a scene or object) and may be separately focused by the viewer’s eyes, thereby helping to provide the user with a comfortable perception of depth.

When these multiple depth plane images are presented to the viewer simultaneously or in quick succession, the result is interpreted by the viewer as three-dimensional imagery. When the viewer experiences this type of light field imagery, the eyes accommodate to focus the different depth planes in much the same way as they would do when experiencing a real-world scene. These focal cues can provide for a more realistic simulated three-dimensional environment.

In some configurations, at each depth plane, a full color image may be formed by overlaying component images that each have a particular component color. For example, red, green, and blue images may each be separately outputted to form each full color depth plane image. As a result, each depth plane may have multiple component color images associated with it.

FIG. 4 illustrates aspects of an approach for simulating three-dimensional imagery using multiple depth planes. With reference to FIG. 4, objects at various distances from eyes 4 and 6 on the z-axis are accommodated by the eyes (4, 6) so that those objects are in focus. The eyes 4 and 6 assume particular accommodated states to bring into focus objects at different distances along the z-axis. Consequently, a particular accommodated state may be said to be associated with a particular one of depth planes 14, such that objects or parts of objects in a particular depth plane are in focus when the eye is in the accommodated state for that depth plane. In some embodiments, three-dimensional imagery may be simulated by providing different presentations of an image for each of the eyes (4, 6), and also by providing different presentations of the image corresponding to each of the depth planes.

The distance between an object and the eye (4 or 6) can change the amount of divergence of light from that object, as viewed by that eye. FIGS. 5A-5C illustrate relationships between distance and the divergence of light rays. The distance between the object and the eye 4 is represented by, in order of decreasing distance, R1, R2, and R3. As shown in FIGS. 5A-5C, the light rays become more divergent as distance to the object decreases. As distance increases, the light rays become more collimated. Stated another way, it may be said that the light field produced by a point (the object or a part of the object) has a spherical wavefront curvature, which is a function of how far away the point is from the eye of the user. The curvature increases with decreasing distance between the object and the eye 4. Consequently, at different depth planes, the degree of divergence of light rays is also different, with the degree of divergence increasing with decreasing distance between depth planes and the viewer’s eye 4. While only a single eye 4 is illustrated for clarity of illustration in FIGS. 5A-5C and other figures herein, it will be appreciated that the discussions regarding eye 4 may be applied to both eyes (4 and 6) of a viewer.

Without being limited by theory, it is believed that the human eye typically can interpret a finite number of depth planes to provide depth perception. Consequently, a highly believable simulation of perceived depth may be achieved by providing, to the eye, different presentations of an image corresponding to each of these limited number of depth planes.

FIG. 6 illustrates an example of a waveguide stack for outputting image information to a user. A display system 1000 includes a stack of waveguides, or stacked waveguide assembly 178, that may be utilized to provide three-dimensional perception to the eye/brain using a plurality of waveguides (182, 184, 186, 188, 190). In some embodiments, the display system 1000 is the system 80 of FIG. 2, with FIG. 6 schematically showing some parts of that system 80 in greater detail. For example, the waveguide assembly 178 may be integrated into the display 62 of FIG. 2.

With continued reference to FIG. 6, the waveguide assembly 178 may also include a plurality of features (198, 196, 194, 192) between the waveguides. In some embodiments, the features (198, 196, 194, 192) may be lenses. The waveguides (182, 184, 186, 188, 190) and/or the plurality of lenses (198, 196, 194, 192) may be configured to send image information to the eye with various levels of wavefront curvature or light ray divergence. Each waveguide level may be associated with a particular depth plane and may be configured to output image information corresponding to that depth plane. Image injection devices (200, 202, 204, 206, 208) may be utilized to inject image information into the waveguides (182, 184, 186, 188, 190), each of which may be configured, as described herein, to distribute incoming light across each respective waveguide, for output toward the eye 4. Light exits an output surface (300, 302, 304, 306, 308) of the image injection devices (200, 202, 204, 206, 208) and is injected into a corresponding input edge (382, 384, 386, 388, 390) of the waveguides (182, 184, 186, 188, 190). In some embodiments, a single beam of light (e.g., a collimated beam) may be injected into each waveguide to output an entire field of cloned collimated beams that are directed toward the eye 4 at particular angles (and amounts of divergence) corresponding to the depth plane associated with a particular waveguide.

In some embodiments, the image injection devices (200, 202, 204, 206, 208) are discrete displays that each produce image information for injection into a corresponding waveguide (182, 184, 186, 188, 190, respectively). In some other embodiments, the image injection devices (200, 202, 204, 206, 208) are the output ends of a single multiplexed display which may, e.g., pipe image information via one or more optical conduits (such as fiber optic cables) to each of the image injection devices (200, 202, 204, 206, 208).

A controller 210 controls the operation of the stacked waveguide assembly 178 and the image injection devices (200, 202, 204, 206, 208). In some embodiments, the controller 210 includes programming (e.g., instructions in a non-transitory computer-readable medium) that regulates the timing and provision of image information to the waveguides (182, 184, 186, 188, 190) according to, e.g., any of the various schemes disclosed herein. In some embodiments, the controller may be a single integral device, or a distributed system connected by wired or wireless communication channels. The controller 210 may be part of the processing modules (70 or 72) (FIG. 2) in some embodiments.

The waveguides (182, 184, 186, 188, 190) may be configured to propagate light within each respective waveguide by total internal reflection (TIR). The waveguides (182, 184, 186, 188, 190) may each be planar or curved, with major top and bottom surfaces and edges extending between those major top and bottom surfaces. In the illustrated configuration, the waveguides (182, 184, 186, 188, 190) may each include light redirecting elements (282, 284, 286, 288, 290) that are configured to redirect light, propagating within each respective waveguide, out of the waveguide to output image information to the eye 4. A beam of light is outputted by the waveguide at locations at which the light propagating in the waveguide strikes a light redirecting element. The light redirecting elements (282, 284, 286, 288, 290) may be reflective and/or diffractive optical features. While illustrated disposed at the bottom major surfaces of the waveguides (182, 184, 186, 188, 190) for ease of description and drawing clarity, in some embodiments, the light redirecting elements (282, 284, 286, 288, 290) may be disposed at the top and/or bottom major surfaces, and/or may be disposed directly in the volume of the waveguides (182, 184, 186, 188, 190). In some embodiments, the light redirecting elements (282, 284, 286, 288, 290) may be formed in a layer of material that is attached to a transparent substrate to form the waveguides (182, 184, 186, 188, 190). In some other embodiments, the waveguides (182, 184, 186, 188, 190) may be a monolithic piece of material and the light redirecting elements (282, 284, 286, 288, 290) may be formed on a surface and/or in the interior of that piece of material.

With continued reference to FIG. 6, as discussed herein, each waveguide (182, 184, 186, 188, 190) is configured to output light to form an image corresponding to a particular depth plane. For example, the waveguide 182 nearest the eye may be configured to deliver collimated light, as injected into such waveguide 182, to the eye 4. The collimated light may be representative of the optical infinity focal plane. The next waveguide up 184 may be configured to send out collimated light which passes through the first lens (192; e.g., a negative lens) before it can reach the eye 4; such first lens 192 may be configured to create a slight convex wavefront curvature so that the eye/brain interprets light coming from that next waveguide up 184 as coming from a first focal plane closer inward toward the eye 4 from optical infinity. Similarly, the third up waveguide 186 passes its output light through both the first 192 and second 194 lenses before reaching the eye 4; the combined optical power of the first 192 and second 194 lenses may be configured to create another incremental amount of wavefront curvature so that the eye/brain interprets light coming from the third waveguide 186 as coming from a second focal plane that is even closer inward toward the person from optical infinity than was light from the next waveguide up 184.

The other waveguide layers (188, 190) and lenses (196, 198) are similarly configured, with the highest waveguide 190 in the stack sending its output through all of the lenses between it and the eye for an aggregate focal power representative of the closest focal plane to the person. To compensate for the stack of lenses (198, 196, 194, 192) when viewing/interpreting light coming from the world 144 on the other side of the stacked waveguide assembly 178, a compensating lens layer 180 may be disposed at the top of the stack to compensate for the aggregate power of the lens stack (198, 196, 194, 192) below. Such a configuration provides as many perceived focal planes as there are available waveguide/lens pairings. Both the light redirecting elements of the waveguides and the focusing aspects of the lenses may be static (i.e., not dynamic or electro-active). In some alternative embodiments, they may be dynamic using electro-active features.

With continued reference to FIG. 6, the light redirecting elements (282, 284, 286, 288, 290) may be configured to both redirect light out of their respective waveguides and to output this light with the appropriate amount of divergence or collimation for a particular depth plane associated with the waveguide. As a result, waveguides having different associated depth planes may have different configurations of light redirecting elements (282, 284, 286, 288, 290), which output light with a different amount of divergence depending on the associated depth plane. In some embodiments, as discussed herein, the light redirecting elements (282, 284, 286, 288, 290) may be volumetric or surface features, which may be configured to output light at specific angles. For example, the light redirecting elements (282, 284, 286, 288, 290) may be volume holograms, surface holograms, and/or diffraction gratings. Light redirecting elements, such as diffraction gratings, are described in U.S. patent application Ser. No. 14/641,376, filed Mar. 7, 2015, which is incorporated by reference herein in its entirety. In some embodiments, the features (198, 196, 194, 192) may not be lenses; rather, they may simply be spacers (e.g., cladding layers and/or structures for forming air gaps).

In some embodiments, the light redirecting elements (282, 284, 286, 288, 290) are diffractive features that form a diffraction pattern, or “diffractive optical element” (also referred to herein as a “DOE”). Preferably, the DOE’s have a relatively low diffraction efficiency so that only a portion of the light of the beam is deflected away toward the eye 4 with each intersection of the DOE, while the rest continues to move through a waveguide via total internal reflection. The light carrying the image information is thus divided into a number of related exit beams that exit the waveguide at a multiplicity of locations and the result is a fairly uniform pattern of exit emission toward the eye 4 for this particular collimated beam reflecting around within a waveguide.

In some embodiments, one or more DOEs may be switchable between “on” states in which they actively diffract, and “off” states in which they do not significantly diffract. For instance, a switchable DOE may comprise a layer of polymer dispersed liquid crystal, in which microdroplets comprise a diffraction pattern in a host medium, and the refractive index of the microdroplets can be switched to substantially match the refractive index of the host material (in which case the pattern does not appreciably diffract incident light) or the microdroplet can be switched to an index that does not match that of the host medium (in which case the pattern actively diffracts incident light).

FIG. 7 shows an example of exit beams outputted by a waveguide. One waveguide is illustrated, but it will be appreciated that other waveguides in the stack of waveguides 178 may function similarly. Light 400 is injected into the waveguide 182 at the input edge 382 of the waveguide 182 and propagates within the waveguide 182 by TIR. At points where the light 400 impinges on the DOE 282, a portion of the light exits the waveguide as exit beams 402. The exit beams 402 are illustrated as substantially parallel but, as discussed herein, they may also be redirected to propagate to the eye 4 at an angle (e.g., forming divergent exit beans), depending on the depth plane associated with the waveguide 182. It will be appreciated that substantially parallel exit beams may be indicative of a waveguide that corresponds to a depth plane at a large simulated distance (e.g., optical infinity) from the eye 4. Other waveguides may output an exit beam pattern that is more divergent, which would require the eye 4 to accommodate to focus on a closer simulated distance and would be interpreted by the brain as light from a distance closer to the eye 4 than optical infinity.

FIG. 8 schematically illustrates an example design of a stacked waveguide assembly in which each depth plane has three associated waveguides that each output light of a different color. A full color image may be formed at each depth plane by overlaying images in each of multiple component colors, e.g., three or more component colors. In some embodiments, the component colors include red, green, and blue. In some other embodiments, other colors, including magenta, yellow, and cyan, may be used in conjunction with or may replace one of red, green, or blue. Each waveguide may be configured to output a particular component color and, consequently, each depth plane may have multiple waveguides associated with it. Each depth plane may have, e.g., three waveguides associated with it: one for outputting red light, a second for outputting green light, and a third for outputting blue light.

With continued reference to FIG. 8, depth planes 14a14f are shown. In the illustrated embodiment, each depth plane has three component color images associated with it: a first image of a first color, G; a second image of a second color, R; and a third image of a third color, B. As a convention herein, the numbers following each of these letters indicate diopters (1/m), or the reciprocal of the apparent distance of the depth plane from a viewer, and each box in the figures represents an individual component color image. In some embodiments, G is the color green, R is the color red, and B is the color blue. As discussed above, the perceived distance of the depth plane from the viewer may be established by the light redirecting elements (282, 284, 286, 288, 290), e.g. diffractive optical element (DOE), and/or by lenses (198, 196, 194, 192), which cause the light to diverge at an angle associated with the apparent distance.

In some arrangements, each component color image may be outputted by a different waveguide in a stack of waveguides. For example, each depth plane may have three component color images associated with it: a first waveguide to output a first color, G; a second waveguide to output a second color, R; and a third waveguide to output a third color, B. In arrangements in which waveguides are used to output component color images, each box in the figure may be understood to represent an individual waveguide.

While the waveguides associated with each depth plane are shown adjacent to one another in this schematic drawing for ease of description, it will be appreciated that, in a physical device, the waveguides may all be arranged in a stack with one waveguide per level. Different depth planes are indicated in the figure by different numbers for diopters following the letters G, R, and B.

Display Timing Schemes

In some embodiments, a virtual or augmented reality system provides light field imagery by successively displaying multiple different depth planes for a given frame of video data. The system then updates to the next frame of video data and successively displays multiple different depth planes for that frame. For example, the first frame of video data can actually include three separate sub-frames of data: a far field frame D0, a midfield frame D1, and a near field frame D2. D0, D1, and D2 can be displayed in succession. Subsequently, the second frame of video data can be displayed. The second frame of video data can likewise include a far field frame, a midfield frame, and a near field frame, which are displayed successively, and so on. While this example uses three depth planes, light field imagery is not so-limited. Rather, any plural number of depth planes can be used depending, for example, upon the desired video frame rates and the capabilities of the system.

Because each frame of light field video data includes multiple sub-frames for different depth planes, systems which provide light field imagery may benefit from display panels which are capable of high refresh rates. For example, if the system displays video with a frame rate of 120 Hz but includes imagery from multiple different depth planes, then the display will need to be capable of a refresh rate greater than 120 Hz in order to accommodate the multiple depth plane images for each frame of video. In some embodiments, Liquid Crystal Over Silicon (LCOS) display panels are used, though other types of display panels can also be used (including color sequential displays and non-color sequential displays).

FIG. 9 illustrates an example timing scheme for a virtual or augmented reality system which displays light field imagery. In this example, the video frame rate is 120 Hz and the light field imagery includes three depth planes. In some embodiments, the green, red, and blue components of each frame are displayed serially rather than at the same time.

A video frame rate of 120 Hz allows 8.333 ms in which to display all of the depth planes for a single frame of video. As illustrated in FIG. 9, each frame of video data includes three depth planes and each depth plane includes green, red, and blue components. For example the depth plane D0 includes a green sub-frame, G0, a red sub-frame, R0, and a blue sub-frame, B0. Similarly, the depth plane D1 comprises green, red, and blue sub-frames, G1, R1, and B1, respectively, and the depth plane D2 comprises green, red, and blue sub-frames, G2, R2, and B2, respectively. Given that each video frame comprises three depth planes, and each depth plane has three color components, the allotted 8.333 ms is divided into nine segments of 0.926 ms each. As illustrated in FIG. 9, the green sub-frame G0 for the first depth plane is displayed during the first time segment, the red sub-frame R0 for the first depth plane is displayed during the second time segment, and so on. The total green on-time for each frame of video is 2.778 ms. The same is true of the total red on-time and blue on-time for each video frame. It should be understood, however, that other video frame rates can also be used, in which case the specific time intervals illustrated in FIG. 9 could be adjusted accordingly. While the individual color components are illustrated as having equal display times, this is not required and the ratios of the display times between the color components can be varied. Furthermore, the flashing order illustrated in FIG. 9 for the depth planes and color component sub-frames is but one example. Other flashing orders can also be used. Moreover, while FIG. 9 illustrates an embodiment which uses a color sequential display technology, the techniques described herein are not limited to color sequential displays.

Other display timing schemes are also possible. For example, the frame rate, number of depth planes, and color components can vary. In some embodiments, the frame rate of a virtual or augmented reality system as described herein is 80 Hz and there are three depth planes. In some embodiments, different depth planes can be displayed in different frames. For example, light field video with four depth planes can be displayed at an effective frame rate of 60 Hz by displaying two depth planes per frame at a frame rate of 120 Hz (depth planes DO and D1 can be displayed in the first 8.33 ms and depth planes D2 and D3 can be displayed in the next 8.33 ms—full depth information is provided in 16.7 ms, for an effective frame rate of 60 Hz). In some embodiments, the number of depth planes which are shown can vary spatially on the display. For example, a larger number of depth planes can be shown in a sub-portion of the display in the user’s line of sight, and a smaller number of depth planes can be shown in sub-portions of the display located in the user’s peripheral vision. In such embodiments, an eye tracker (e.g., a camera and eye tracking software) can be used to determine which portion of the display the user is looking at.

Control Information for Video Data

FIG. 10 illustrates an example format for a frame of video data which includes appended control information. As illustrated in FIG. 10, each frame of video data may comprise an array of pixel data (image information 1020) formatted into rows and columns. In the illustrated example, there are 1280 columns and 960 rows of pixel data in the image information 1020 which form an image. FIG. 10 also illustrates that control information 1010 can be appended to the image information 1020. In this example, a control packet 1010 can be appended to as the image information 1020, for example, an extra row. The first row (Row 000) comprises the control information 1010, whereas Rows 1-960 contain the image information 1020. Thus, in this embodiment, the host transmits a resolution of 1280×961 to the display controller.

The display controller reads the appended control information 1010 and uses it, for example, to configure the image information 1020 sent to one or more display panels (e.g., a left-eye and a right-eye display panel). In this example, the row of control information 1010 is not sent to the display panels. Thus, while the host transmits information, including the control information 1010 and the image information 1020, to the display controller with a resolution of 1280×961, the display controller removes the control information 1010 from the stream of data and transmits only the image information 1020 to the display panel(s) with a resolution of 1280×960. The image information 1020 can be transmitted to a display panel (e.g., an LCOS display panel) in, for example, Display Serial Interface (DSI) format. While FIG. 10 illustrates that the appended control information 1010 comprises a single row appended at the beginning of each frame of video data, other amounts of control information could alternatively be appended. Further, the control information 1010 does not necessarily have to be appended at the beginning of each frame of video data but could instead be inserted into the video data at other locations. However, appending control information 1010 at the beginning of a frame may allow the controller to more readily act on the control information 1010 at the beginning of a frame of rendered imagery prior to displaying the image information 1020.

FIG. 11 illustrates another example format for a frame of video data which includes control information. FIG. 11 is similar to FIG. 10 except that the control information 1110 is inserted in place of the first row of video data rather than being appended to the frame of video data before the first row. Thus, the first row (Row 000) of the frame comprises control information 1110, while the remaining 959 rows comprise image information 1120.

In this example, the host transmits information to the display controller with a resolution of 1280×960. The display controller can use the control information 1110 to configure the image information 1120 sent to the display panel(s). The display controller then transmits the frame of video data illustrated in FIG. 11 to the display panel(s). However, in some embodiments, before transmitting the frame of video data illustrated in FIG. 11 to the display panel(s), the display controller can remove the control information 1110 by, for example, setting that row of the frame of video data to zeros. This causes the first row of each frame of video data to appear as a dark line on the display.

Using the scheme illustrated in FIG. 11, control information 1110 can be included with a frame of video data without changing the resolution of the information sent to the display controller. However, the trade-off in this example is that the effective display resolution is decreased due to the fact that some image information is replaced by the control information. While FIG. 11 illustrates that the control information 1110 is inserted in place of the first row of image information, the control information could alternatively be inserted in place of another row in the frame.

The control information illustrated in, for example, FIGS. 10 and 11 (and later in FIG. 12) can be used for a number of different purposes. For example, the control information can indicate whether a frame of video data should be displayed on the left-eye video panel or the right-eye video panel. The control information can indicate which of a plurality of depth planes the frame of video data corresponds to. The control information can indicate the flashing order for the light field video information. For example, the control information can indicate the order in which to display each depth plane, as well as the order to display the color component sub-frames for each depth plane. In addition, there may be a need to shift pixels left/right or up/down after the content for the display has already been generated by the host. Rather than adjusting and re-rendering the image information, the control information can include pixel shift information which specifies the direction and magnitude of a pixel shift which should be carried out by the display controller.

Such pixel shifts can be carried out for a number of reasons. Pixel shifts can be performed in cases in which the image content needs to be moved on the display due to, for example, a user’s head movement. In such cases, the content may be the same but its location within the viewing area on the display may need to be shifted. Rather than re-rendering the image information at the GPU and sending the whole set of pixels to the display controller again, the pixel shift can be applied to the image information using the pixel shift control information. As illustrated in FIGS. 10 and 11, the pixel shift control information can be included at the beginning of a frame. Alternatively, and/or additionally, a late update control packet can be sent within a frame (e.g., after the first row) to perform an appropriate pixel shift based on an updated head pose mid frame. This can be done using, for example, a Mobile Industry Processor Interface (MIPI) Display Serial Interface (DSI) virtual channel.

Pixel shifts can also be performed in cases in which the user is moving his or her head and a more accurate representation of the pixels is wanted. Rather than having the GPU re-render the image information, a late shift on the display can be applied using the pixel shift approach. Any pixel shift described herein could impact a single depth plane or multiple depth planes. As already discussed herein, in some embodiments, there are differences in time between when various depth planes are displayed. During these time differences, the user may shift his or her eyes such that the viewing frustum may need to be shifted. This can be accomplished using a pixel shift for any of the depth planes.

The pixel shift control information can indicate a pixel shift in the X-Y direction within a frame of a single depth plane. Alternately, and/or additionally, the pixel shift control information can indicate a shift in the Z direction between depth plane buffers. For example, an object that was previously displayed in one or more depth planes may move to another depth plane set with a Z-pixel shift. This type of shift can also include a scaler to enlarge or reduce the partial image for each depth. Assume, for example, that a displayed character is floating between two depth planes and there is no occlusion of that character with another object. Apparent movement of the character in the depth direction can be accomplished by re-drawing the character forward or backward one or more depth planes using the Z-pixel shift and scaler. This can be accomplished without re-rendering the character and sending a frame update to the display controller, resulting in a smoother motion performance at much lower computational cost.

The scaler can also be used to compensate for magnification effects that occur within the display as a result of, for example, the lenses 192, 194, 196, 198. Such lenses may create virtual images which are observable by the user. When a virtual object moves from one depth plane to another, the optical magnification of the virtual image can actually be opposite of what would be expected in the physical world. For example, in the physical world when an object is located at a further depth plane from the viewer, the object appears smaller than it would if located at a closer depth plane. However, when the virtual object moves from a nearer depth plan to a further depth plane in the display, the lenses may actually magnify the virtual image of the object. Thus, in some embodiments, a scaler is used to compensate for optical magnification effects in the display. A scaler can be provided for each depth plane to correct magnification effects caused by the optics. In addition, a scaler can be provided for each color if there are any scaling issues to be addressed on a per color basis.

In some embodiments, the maximum horizontal pixel shift can correspond to the entire panel width, while the maximum vertical pixel shift can correspond to the entire panel height. Both positive and negative shifts can be indicated by the control information. Using this pixel shift information, the display controller can shift a frame of video data left or right, up or down, and forward or backward between depth planes. The pixel shift information can also cause a frame of video data to be completely or partially shifted from the left-eye display panel to the right-eye display panel, or vice versa. Pixel shift information can be included for each of the depth planes in the light field video information.

In some embodiments, such as those where scanning-based displays are used, incremental distributed pixel shifts can be provided. For example, the images for a frame of video can be shifted incrementally in one or more depth planes until reaching the end (e.g., bottom) of the image. The pixels which are displayed first can be shifted more or less than later-displayed pixels within a frame in order to compensate for head movement or in order to simulate motion of the object. Further, there can be an incremental pixel shift on a per-plane basis. For example, pixels in one depth plane can be shifted more or less than pixels in another depth plane. In some embodiments, eye tracking technology is used to determine which portion of a display screen the user is fixated on. Objects in different depth planes, or even at different locations within a single depth plane, can be pixel shifted (or not shifted) depending on where the user is looking. If there are objects that the user is not fixating on, pixel shift information for those objects may be disregarded in order to improve performance for pixel shifts in the imagery that the user is fixating on. Again, an eye tracker can be used to determine where on the display the user is looking.

The control information can also be used to specify and/or regulate one or more virtual depth planes. A virtual depth plane can be provided at a desired interval between two defined depth planes in a virtual or augmented reality system by blending the two depth plane images with appropriate weightings to maintain the desired brightness of the imagery. For example, if a virtual depth plane is desired between depth plane DO and depth plane D1, then a blending unit can weight the pixel values of the DO image information by 50% while also weighting the pixel values of the D1 image information by 50%. (So long as the weightings sum to 100%, then the apparent brightness of the imagery can be maintained.) The result would be a virtual depth plane that appears to be located midway between DO and D1. The apparent depth of the virtual depth plane can be controlled by using different blending weights. For example, if it is desired that the virtual depth plane appear closer to D1 than DO, then the D1 image can be weighted more heavily. One or more scalers can be used to ensure that a virtual object is substantially the same size in both of the depth planes that are being blended so that like portions of the virtual object are combined during the blending operation. The control information can specify when virtual depth plane imagery is to be calculated and the control information can also include blending weights for the virtual depth planes. In various embodiments, the weights can be stored in a programmable look up table (LUT). The control information can be used to select the appropriate weights from the LUT that would provide a desired virtual depth plane.

The control information can also indicate whether image information for one of two stereo displays should be copied into the other. For example, in the case of the most distant simulated depth plane (e.g., background imagery), there may be relatively little difference (e.g., due to parallax shift) between the right and left eye images. In such cases, the control information can indicate that the image information for one of the stereo displays be copied to the other display for one or more depth planes. This can be accomplished without re-rendering the image information at the GPU for both the right and left eye displays or re-transferring image information to the display controller. If there are relatively small differences between the right and left eye images, pixel shifts can also be used to compensate without re-rendering or re-transferring image information for both eyes.

The control information illustrated in FIGS. 10 and 11 can also be used for other purposes besides those specifically enumerated here.

While FIGS. 10 and 11 illustrate that rows of control information can be included with/in a frame of video data (e.g., with image information), control information can also (or alternatively) be embedded in individual pixels of video data (e.g., image information). This is illustrated in FIG. 12, which illustrates an example format for a pixel 1200 of video data (e.g., image information) which includes embedded control information 1240. FIG. 12 illustrates that the pixel 1200 comprises a blue value 1230 (Byte 0), a green value 1220 (Byte 1), and a red value 1210 (Byte 2). In this embodiment, each of the color values has a color depth of eight bits. In some embodiments, one or more of the bits corresponding to one or more of the color values can be replaced by control information 1240 at the expense of the bit depth of the color value(s). Thus, control information can be embedded directly in pixels at the expense of dynamic range of the color value(s) for the pixel. For example, as illustrated in FIG. 12, the highlighted two least significant bits of the blue value can be dedicated as the control information 1240. Though not illustrated, bits of the other color values can also be dedicated as control information. Moreover, different numbers of pixel bits can be dedicated as control information.

In some embodiments, the control information 1240 embedded in the pixels can be depth plane indicator information (though the control information embedded in the pixels can also be any other type of control information, including other types described herein). As discussed herein, light field video information can include a number of depth planes. The bit depth for one or more pixels in the video frame can be reduced and the resulting available bit(s) can be used to indicate the depth plane to which a pixel corresponds.

As a concrete example, consider the 24-bit RGB pixel data illustrated in FIG. 12. Each of the red, green, and blue color values has a bit depth of eight bits. As already discussed, the bit depth of one or more of the color components can be sacrificed and replaced by depth plane indicator information. For example, since the eye is less sensitive to blue, the blue component can be represented by six bits (bits B3-B8 in FIG. 12) instead of eight. The resulting extra two bits (bits B1 and B2) can be used to specify which of up to four depth planes that pixel corresponds to. If there are more or fewer depth planes, then a greater or lesser number of color bits can be sacrificed. For example if the bit depth is reduced by one bit, up to two depth planes can be specified. If the bit depth is reduced by three bits, up to eight depth planes can be specified, etc. In this way, the dynamic range of a color value can be traded off for the ability to encode depth plane indicator information directly within the image information itself.

In some embodiments, depth plane indicator information 1240 is encoded in every pixel. In other embodiments, depth plane indicator information 1240 may be encoded in one pixel per frame, or one pixel per line, one pixel per virtual or augmented reality object, etc. In addition, depth plane indicator information 1240 can be encoded in just a single color component, or in multiple color components. Similarly, the technique of encoding depth plane indicator information 1240 directly within image information is not limited solely to color image information. The technique can be practiced in the same way for grayscale images, etc.

FIG. 12 illustrates one technique for encoding depth plane indicator information in image information. Another technique is to employ chroma subsampling and use the resulting available bits as depth plane indicator information. For example, the image information can be represented in YCbCr format, where Y represents the luminance component (which may or may not be gamma corrected), Cb represents a blue-difference chroma component, and Cr represents a red-difference chroma component. Since the eye is less sensitive to chroma resolution than luminance resolution, the chroma information can be provided with a lesser resolution than the luminance information without unduly degrading image quality. In some embodiments, a YCbCr 4:2:2 format is used in which a Y value is provided for each pixel but Cb and Cr values are each only provided for every other pixel in alternating fashion. If a pixel (in the absence of chroma subsampling) normally consists of 24 bits of information (8-bit Y value, 8-bit Cb value, and 8-bit Cr value), then after employing chroma subsampling each pixel will only require 16 bits of information (8-bit Y value and 8-bit Cb or Cr value). The remaining 8 bits can be used as depth plane indicator information. The depth plane indicator information can be used to separate the pixels into the appropriate depth planes to be displayed at the appropriate times.

In both the embodiment illustrated in FIG. 12 and the chroma subsampling embodiment, the depth plane indicator information can specify actual depth planes supported by the virtual or augmented reality system and/or virtual depth planes as discussed herein. If the depth plane indicator information specifies a virtual depth plane, it can also specify the weightings of the depth planes to be blended, as discussed herein.

The usage of the embedded depth plane indicator information in the display controller is illustrated in FIG. 14. But first, FIG. 13 is provided by way of background to show the operation of the display controller when only a single depth plane is present. FIG. 13 illustrates how a frame of video can be separated into color components which can be displayed serially. The left-hand panel 1310 of FIG. 13 shows an image which comprises one frame of a 120 frame per second video. As indicated by the right-hand panel 1330 of FIG. 13, the image is separated into red, green, and blue color components which are flashed on the display by the display controller over the course of 1/120 of a second (8.33 ms). For simplicity, FIG. 13 shows that each of the color components is flashed once and that each of the color components is active for the same amount of time. The human vision system then fuses the individual color component sub-frames into the original color image shown in the left-hand panel of FIG. 13. FIG. 14 shows how this process can be adapted when each frame of video data includes multiple depth planes.

FIG. 14 illustrates how a frame of video data can be separated, using depth plane indicator information, into multiple depth planes which can each be split into color components sub-frames for display. In some embodiments, a host transmits a stream of video data to a display controller. This stream of video data is represented by the image in the left-hand panel 1410 of FIG. 14. The display controller can use embedded depth plane indicator information 1240 to separate the stream of video data into a plurality of RxGxBx sequences, where a R0G0B0 sequence corresponds to a first depth plane, a R1G1B1 sequence corresponds to a second depth plane, and a R2G2B2 sequence corresponds to a third depth plane. As illustrated in FIG. 13, this depth plane separation can be performed on the basis of the two least significant blue bits in each pixel. The result is shown in the middle panel 1420 of FIG. 14, which shows three separate depth plane images. Finally, each of the three separate depth plane images shown in the middle panel 1420 of FIG. 14 can be separated into its constituent color component sub-frames. The color component sub-frames of each depth plane can then be sequentially flashed to the display, as illustrated by the right-hand panel 1430 of FIG. 14. The sequence order can be, for example, R0G0B0R1G1B1R2G2B2 as illustrated in FIG. 14, or G0R0B0G1R1B1G2R2B2 as illustrated in FIG. 9.

The depth plane indicator information 1240 can be used by the display controller to determine the number of RxGxBx sequences to use and which pixels correspond to which sequence. Control information can also be provided to specify the order of RxGxBx color sequences that are flashed to the display. For example, in the case of video data which includes three depth planes (D0, D1, D2), there are six possible orders in which the individual RxGxBx sequences can be flashed to the display panel: D0, D1, D2; D0, D2, D1; D1, D0, D2; D1, D2, DO; D2, D0, D1; and D2, D1, DO. If the order specified by the control information is D0, D1, D2, then pixels with blue LSB bits 0b00 corresponding to the first depth plane, D0, can be selected as the first RxGxBx color sequence image going out. Pixels with blue LSB bits 0b01 corresponding to the second depth plane, D1, can be selected as the second RxGxBx color sequence image going out, and so on.

FIG. 15 illustrates an example where the depth plane indicator information of FIG. 12 indicates that one or more depth planes of a frame of video data are inactive. FIG. 15 is similar to FIG. 14 in that it shows a stream of video data (represented by the left-hand panel 1510 of FIG. 15) being separated into depth planes (represented by the middle panel 1520 of FIG. 15), which are then each separated into color component sub-frames (represented by the right-hand panel 1530 of FIG. 15). However, FIG. 15 is distinct from FIG. 14 in that only a single depth plane is shown as being active.

As already discussed, the depth plane indicator information 1240 in FIG. 12 comprises the two least significant bits of the blue value in each pixel. These two bits are capable of specifying up to four depth planes. However, the video data may include fewer than four depth planes. For instance, in the preceding example, the video data includes only three depth planes. In such cases where the video data includes fewer than the maximum number of specifiable depth planes, the depth plane indicator information can specify that one or more depth planes are inactive. For example, continuing with the preceding example, if the two blue LSB bits in a pixel are set to 0b11, then the pixel can be assigned to an inactive fourth depth plane D3. As shown in FIG. 15, only one of three RxGxBx color sequences is activated in the output sequence; the inactive depth planes are shown as black frames. As before, control information can be provided to specify the order in which depth planes are displayed. As shown in the middle panel 1520 of FIG. 15, in the illustrated example, the control information has specified that the inactive depth plane D3 be shown first and last in the sequence. Thus, only the middle frame in the sequence comprises image information which is flashed to the display. (Other sequences can also be used. For example, the active depth plane could be ordered first or last in the sequence, or it could be repeated in the sequence more than once.) When the display controller sees that a pixel is assigned to an inactive depth plane, then the display controller can simply disregard the pixel and not flash it to the display.

For example, when the control information indicates that one or more frames, one or more depth planes, and/or one or more color fields are/is inactive, power to the light source(s) that provides light to the display for the one or more particular frames, the one or more particular depth planes, and/or the one or more particular color fields can be reduced (e.g., entering a reduced power state or shut off completely), thereby reducing net power consumption of the system. This can save switching power at the display driver. Thus, a power-saving mode can be implemented by designating one or more frames, one or more depth planes, and/or one or more color fields of the video data as inactive. For example, in some embodiments, the control information can indicate that one or more color fields is inactive within a depth plane, while one or more other color fields in the depth plane are active. Based on this control information, the display controller can control the display to disregard the color field or fields that are inactive and display the imagery from the one or more active color fields without the inactive color field(s). For example, when the control information indicates that a color field is inactive, power to the light source(s) that provides light to the display for that particular color field can be reduced (e.g., entering a reduced power state or shut off completely), thereby reducing net power consumption of the system. Accordingly, light sources, such as light emitting diodes (LEDs), lasers, etc., that provide illumination to the display can be shut off or have their power reduced for inactive frames, inactive depth planes, and/or inactive color fields.

In some embodiments, reduced power rendering may be preferred over a complete shut off, to enable faster activation of the light source when desired. As used herein, a reactivation period may refer to a time for a light source to go from a completely “off” state to peak potential intensity. In some embodiments, light sources may have a comparatively long reactivation period requiring longer periods to reach peak potential intensity from a completely “off” state as compared to alternative light sources. Such light sources may be placed in a reduced power state to achieve reduced power consumption. In the reduced power state, the light sources may not be shut off completely. In some embodiments, light sources may have a comparatively short reactivation period requiring shorter periods to reach peak intensity from a completely “off” state as compared to alternative light sources. Such light sources may be shut off completely to achieve reduced net power consumption. For example, some light sources (e.g., light emitting diodes (LEDs), organic light emitting diodes (OLEDs), lasers, etc.) may be shut off completely to achieve reduced net power consumption, as their reactivation period is comparatively short (e.g. after controlling for signal transmission speeds of a particular architecture, the speed of light), whereas other light sources (e.g., arc lamps, fluorescent lamps, backlit liquid crystal displays (LCD)) may be placed in a reduced power state as their reactivation period is comparatively long and require longer periods to reach peak potential intensity from a completely “off” state.

In some embodiments, control information comprises advance frame display information, for example, as a function of the frame rate of an image relative to a given light source, or motion of a user’s perspective. The advance frame display information may include information regarding when a one or more depth planes of a plurality of depth planes and/or when one or more color fields of the one or more depth planes of the plurality of depth planes is, or is anticipated, to be active or inactive. For example, advance frame display information may include information indicating a particular color field of a particular depth plane, for a frame subsequent to the current frame, needs to be active N (e.g., 5) frames later. Such determination may be content driven (such as a constant user head pose or rendering perspective), or user driven (such as a user changing a field of view and the display needs for rendering). For example, in systems employing light sources having a short (nearly instantaneous) reactivation period, such as LEDs, OLEDs, lasers, and the like, no advance frame display information may be embedded in the control information as the light source may be activated to full intensity instantly. In systems employing light sources having a long reactivation period, such as arc lamps, fluorescent lamps, backlit LCDs, and the like, advance frame display information may be embedded in the control information, the advance frame display information indicating when to begin supplying power, for example, full power or increased power, to a light source resulting in optimal illumination for a particular subsequent frame.

Similarly, in some embodiments, power supplied to a spatial light modulator (SLM) conveying light source illumination may be reduced in power as a function of control information. As depicted in FIG. 20, projector architecture to deliver an image to a user comprises light sources 3320 and spatial light modulator 3340 (for example, a Liquid Crystal on Silicon, LCOS, or other microdisplay). As described above, when the control information indicates one or more frames, one or more depth planes, and/or one or more color fields are/is inactive, power to the light source(s) that provides light to the display for the one or more particular frames, the one or more particular depth planes, and/or the one or more color fields can be reduced (e.g., entering a reduced power state or shut off completely), and power to the SLM can be reduced (e.g., entering a reduced power state or shut off completely) for periods corresponding to when the one or more frames, the one or more depth planes, and/or the one or more color fields are inactive. In some embodiments, rendering by a graphics processing unit or other rendering engine may still occur, but no images are displayed (for example, a user’s current head pose or field of view does not include content that otherwise is still active and will be rendered upon the user moving to the appropriate head pose or field of view).

In some embodiments, a display controller may simultaneously deliver one or two inputs to the display among a plurality of possible inputs, the first being an inactivation or reduced power setting to a particular component for a current frame (e.g. to occur at a first time, t=0), and the second being an activation or increased power setting to a particular component for a second frame (e.g. to occur at a second time, t=0+N).

Multi-Depth Plane Image Compression

In some embodiments, image compression techniques are applied across multiple depth planes in order to reduce the amount of video image information by removing redundancy of information between depth planes. For example, rather than transmitting an entire frame of image information for each depth plane, some or all of the depth planes may instead be represented in terms of changes with respect to an adjacent depth plane. (This can also be done on a temporal basis between frames at adjacent instants in time.) The compression technique can be lossless or it can be lossy, such that changes between adjacent depth plane frames, or between temporally-adjacent frames, which are less than a given threshold can be ignored, thus resulting in a reduction in image information. In addition, the compression algorithms can encode motion of objects within a single depth plane (X-Y motion) and/or between depth planes (Z motion) using motion vectors. Rather than requiring that image information for a moving object be repeatedly transmitted over time, motion of the object can be achieved entirely or partially with pixel shift control information, as discussed herein.

Dynamically Configurable Display Drawing Areas

In systems that display light field imagery, it can be challenging to achieve high video frame rates owing to the relatively large amount of information (e.g., multiple depth planes, each with multiple color components) included for each video frame. However, video frame rates can be improved, particularly in augmented reality mode, by recognizing that computer-generated light field imagery may only occupy a fraction of the display at a time, as shown in FIG. 16.

FIG. 16 illustrates example drawing areas for a frame of computer-generated imagery in an augmented reality system. FIG. 16 is similar to FIG. 1 except that it shows only the portions of the display where augmented reality imagery is to be drawn. In this case, the augmented reality imagery includes the robot statue 1110 and the bumblebee character 2. The remaining area of the display in augmented reality mode may simply be a view of the real-world environment surrounding the user. As such, there may be no need to draw computer-generated imagery in those areas of the display. It may often be the case that the computer-generated imagery occupies only a relatively small fraction of the display area at a time. By dynamically re-configuring the specific drawing area(s) which are refreshed from frame-to-frame so as to exclude areas where no computer-generated imagery need be shown, video frame rates can be improved.

Computer-generated augmented reality imagery may be represented as a plurality of pixels, each having, for example, an associated brightness and color. A frame of video data may comprise an m×n array of such pixels, where m represents a number of rows and n represents a number of columns. In some embodiments, the display of an augmented reality system is at least partially transparent so as to be capable of providing a view of the user’s real-world surroundings in addition to showing the computer-generated imagery. If the brightness of a given pixel in the computer-generated imagery is set to zero or a relatively low value, then the viewer will see the real-world environment at that pixel location. Alternatively, if the brightness of a given pixel is set to a higher value, then the viewer will see computer-generated imagery at that pixel location. For any given frame of augmented reality imagery, the brightness of many of the pixels may fall below a specified threshold such that they need not be shown on the display. Rather than refresh the display for each of these below-threshold pixels, the display can be dynamically configured not to refresh those pixels.

In some embodiments, the augmented reality system includes a display controller for controlling the display. The controller can dynamically configure the drawing area for the display. For example, the controller can dynamically configure which of the pixels in a frame of video data are refreshed during any given refresh cycle. In some embodiments, the controller can receive computer-generated image information corresponding to a first frame of video. As discussed herein, the computer-generated imagery may include several depth planes. Based on the image information for the first frame of video, the controller can dynamically determine which of the display pixels to refresh for each of the depth planes. If, for example, the display utilizes a scanning-type display technology, the controller can dynamically adjust the scanning pattern so as to skip areas where the augmented reality imagery need not be refreshed (e.g., areas of the frame where there is no augmented reality imagery or the brightness of the augmented reality imagery falls below a specified threshold).

In this way, based upon each frame of video data that is received, the controller can identify a sub-portion of the display where augmented reality imagery should be shown. Each such sub-portion may include a single contiguous area or multiple non-contiguous areas (as shown in FIG. 16) on the display. Such sub-portions of the display can be determined for each of the depth planes in the light field image information. The display controller can then cause the display to only refresh the identified sub-portion(s) of the display for that particular frame of video. This process can be performed for each frame of video. In some embodiments, the controller dynamically adjusts the areas of the display which will be refreshed at the beginning of each frame of video data.

If the controller determines that the area of the display which should be refreshed is becoming smaller over time, then the controller may increase the video frame rate because less time will be needed to draw each frame of augmented reality data. Alternatively, if the controller determines that the area of the display which should be refreshed is becoming larger over time, then it can decrease the video frame rate to allow sufficient time to draw each frame of augmented reality data. The change in the video frame rate may be inversely proportional to the fraction of the display that needs to be filled with imagery. For example, the controller can increase the frame rate by 10 times if only one tenth of the display needs to be filled.

Such video frame rate adjustments can be performed on a frame-by-frame basis. Alternatively, such video frame rate adjustments can be performed at specified time intervals or when the size of the sub-portion of the display to be refreshed increases or decreases by a specified amount. In some cases, depending upon the particular display technology, the controller may also adjust the resolution of the augmented reality imagery shown on the display. For example, if the size of the augmented reality imagery on the display is relatively small, then the controller can cause the imagery to be displayed with increased resolution. Conversely, if the size of the augmented reality imagery on the display is relatively large, then the controller can cause imagery to be displayed with decreased resolution.

Enhanced Head Pose Estimation

As discussed herein, virtual and augmented reality systems can include body-mounted displays, such as a helmet, glasses, goggles, etc. In addition, virtual augmented reality systems can include sensors such as gyroscopes, accelerometers, etc. which perform measurements that can be used to estimate and track the position, orientation, velocity, and/or acceleration of the user’s head in three dimensions. The sensors can be provided in an inertial measurement unit worn by the user on his or her head. In this way, the user’s head pose can be estimated. Head pose estimates can be used as a means of allowing the user to interact with the virtual or augmented reality scene. For example, if the user turns or tilts his or her head, then the virtual or augmented reality scene can be adjusted in a corresponding manner (e.g., the field of view of the scene can be shifted or tilted).

FIG. 17 schematically illustrates the possible motion of a user’s head about two rotational axes. As illustrated, the user can rotate his or her head about a vertical axis and a horizontal axis perpendicular to the page. Though not illustrated, the user can also rotate his or her head about a horizontal axis that lies in the plane of the page. In some embodiments, it may be useful to define the direction of the user’s line of sight as the head pose direction. (Although such a definition of head pose would not necessarily account for the side tilt of the head, other definitions of head pose could.) FIG. 18 illustrates how a user’s head pose can be mapped onto a three-dimensional surface 1810. FIG. 18 includes a surface normal vector 1820 which indicates the user’s head pose. Each possible surface normal vector 1820 on the three-dimensional surface corresponds to a distinct head pose. In FIG. 18, a surface normal vector pointing directly up would correspond to the user’s neutral head pose when he or she is looking directly forward.

Various algorithms can be used to estimate and track the user’s head pose based on the sensor measurements from the head-mounted inertial measurement unit. These include, for example, Kalman filters and other similar algorithms. These types of algorithms typically produce estimates which are based on sensor measurements over time rather than solely at any single instant. A Kalman filter, for example, includes a prediction phase where the filter outputs a predicted estimate of the head pose based on the head pose estimate at the previous instant. Next, during an update phase, the filter updates the head pose estimate based on current sensor measurements. Such algorithms can improve the accuracy of head pose estimates, which reduces error in displaying virtual or augmented reality imagery appropriately in response to head movements. Accurate head pose estimates can also reduce latency in the system.

Typically, a Kalman filter or similar algorithm produces the most accurate head pose estimates for head poses near the user’s neutral head pose (corresponding to a vertical surface normal vector 1820 in FIG. 18). Unfortunately, such algorithms may fail to properly estimate head pose movement as the head pose deviates further from the neutral head pose because they do not account for movement limits imposed by human physiology or the movement of the user’s head in relation to the body. However, various adaptations can be made in order to reduce the effects of these weaknesses on head pose tracking.

In some embodiments, head pose estimation and tracking using Kalman filters or similar algorithms can be improved by using variable gain factors which are different depending upon the current head pose location within an envelope of physiologically-possible head poses. FIG. 18 illustrates a three-dimensional surface 1810 corresponding to such an envelope of physiologically-possible head poses. FIG. 18 shows that the user’s head has a range of motion in any direction of no more than about 180° (e.g., side to side or up and down). The current head pose within the physiological envelope can be used to adjust the Kalman filter estimated variable gain factors. In areas near the center of the envelope (i.e., neutral head pose), the gain factors can be set to emphasize the predicted head pose over the measured head pose because the Kalman filter prediction errors can be lower due to the higher linearity of the head movement in this region. This can reduce latency in the system without unduly impacting head pose estimation accuracy. When the head pose approaches the physiological head movement envelope boundaries, then the algorithm can use gain factors which are set to reduce the filter’s reliance on predicted head pose or emphasize the measured head pose over the predicted head pose in order to reduce error.

In some embodiments, each location on the physiological head pose envelope illustrated in FIG. 18 can corresponds to different gains. In other embodiments, the physiological head pose envelope can be split into separate regions and different gain values can be associated with each of the different regions. This is illustrated in FIG. 19

FIG. 19 schematically illustrates various head pose regions which can be used to define gain factors for improving head pose tracking. FIG. 19 shows a central region 1910 corresponding to relatively neutral head poses. It also includes an outer region 1930 corresponding to head poses near the physiological boundary and an intermediate region 1920 in between the central and outer regions. In some embodiments, a different set of gain factors can be specified for each head pose region. The central region 1910 shows the areas with the higher linearity of movement which will have higher accuracy prediction values produced by a Kalman filter algorithm. When the head pose is within the central region 1910, the gain factors of the Kalman filter can be set to emphasize the predicted head pose over the measured head pose or to otherwise reduce reliance on measured head pose. As the head pose exits the central region and enters the intermediate or outer regions (1920, 1930, respectively), the movement can become more constrained by physiological factors that will adversely impact the Kalman predicted head pose if not taken into account by the algorithm. Accordingly, in these regions (particularly the outer region 1930), the Kalman filter gain values can be set to reduce the filter’s reliance on predicted head pose and increase its reliance on measured head pose. For example, it would be inaccurate to strongly rely on a predicted head pose too far into the future if it is known that the acceleration of the head will come to a stop close to the envelope boundaries. Although three head pose regions are illustrated in FIG. 19, a different number of head pose regions can be used in other embodiments.

In some embodiments, head pose estimation and tracking can also be improved by sensing the position, orientation, velocity, and/or acceleration of the user’s head relative to the user’s body rather than sensing the movement of the head in an absolute sense. This can be done by providing an additional inertial measurement unit worn by the user on his or her body (e.g., on the torso or waist). It is important to note that head pose is a function of both head and body movement. The envelope of physiologically-possible head poses is not fixed in space; it moves with, for example, body rotation. If the user were sitting in a chair moving his or her head while keeping the body immobilized, then the physiological envelope would be relatively constrained such that relatively good head pose estimates could be achieved by considering only the head movement. However, when a user is actually wearing a virtual or augmented reality head-mounted display and moving around, then the physiological envelope of possible head poses varies with body movement.

A second inertial measurement unit worn on the body (e.g., mounted with the battery pack and/or processor for the virtual or augmented reality system) can help provide additional information to track the movement of the physiological envelope of head poses. Instead of fixing the envelope in space, the second inertial measurement unit can allow for movement of the head to be determined in relation to the body. For example, if the body rotates to the right, then the physiological envelope can be correspondingly rotated to the right to more accurately determine the head pose within the physiological envelope and avoid unduly constraining the operation of the Kalman filter.

In some embodiments, the motion of the head determined using the head-mounted inertial measurement unit can be subtracted from the motion of the body determined using the body-mounted inertial measurement unit. For example, the absolute position, orientation, velocity, and/or acceleration of the body can be subtracted from the absolute position, orientation, velocity, and/or acceleration of the head in order to estimate the position, orientation, velocity, and/or acceleration of the head in relation to the body. Once the orientation or motion of the head in relation to the body is known, then the actual head pose location within the physiological envelope can be more accurately estimated. As discussed herein, this allows Kalman filter gain factors to be determined in order to improve estimation and tracking of the head pose.

Enhanced “Totem” Position Estimation

In some virtual or augmented reality systems, a specified tangible object can be used as a “totem” which allows a user to interact with a virtual object or scene. For example, a tangible block which the user holds in his or her hand could be recognized by the system as an interactive device, such as a computer mouse. The system can include, for example, a camera which tracks the movement of the tangible block in the user’s hand and then accordingly adjusts a virtual pointer. A possible drawback of using computer vision for tracking totems in space is that the totems may occasionally be outside the field of view of the camera or otherwise obscured. Thus, it would be beneficial to provide a system for robustly tracking the position and motion of the totem in three dimensions with six degrees of freedom.

In some embodiments, a system for tracking the position and motion of the totem includes one or more sensors in the totem. These one or more sensors could be accelerometers and/or gyroscopes which independently determine the position and movement of the totem in space. This data can then be transmitted to the virtual or augmented reality system.

Alternatively, the one or more sensors in the totem can work in conjunction with a transmitter to determine the position and movement of the totem and space. For example, the transmitter can create spatially-varying electric and/or magnetic fields in space and the totem can include one or more sensors which repeatedly measure the field at the location of the totem, thereby allowing the position and motion of the totem to be determined. In some embodiments, such a transmitter can advantageously be incorporated into the head-mounted display of the virtual or augmented reality system. Alternatively, the transmitter could be incorporated into a body-mounted pack. In this way, the location and/or movement of the totem with respect to the head or body, respectively, of the user can be determined. This may be more useful information than if the transmitter were simply located at a fixed location (e.g., on a nearby table) because the location and/or movement of the totem can be determined in relation to the head or body of the user.

Adjustment of Imagery Colors Based on Ambient Lighting

In some embodiments, the virtual and augmented reality systems described herein include one or more sensors (e.g., a camera) to detect the brightness and/or hue of the ambient lighting. Such sensors can be included, for example, in a display helmet of the virtual or augmented reality system. The sensed information regarding the ambient lighting can then be used to adjust the brightness or hue of generated pixels for virtual objects. For example, if the ambient lighting has a yellowish cast, computer-generated virtual objects can be altered to have yellowish color tones which more closely match those of the real objects in the room. Such pixel adjustments can be made at the time an image is rendered by the GPU. Alternatively, and/or additionally, such pixel adjustments can be made after rendering by using the control information discussed herein.

AR/MR System

Referring now to FIG. 20, an exemplary embodiment of an AR or MR system 3300 (hereinafter referred to as “system 3300”) is illustrated. The system 3300 uses stacked light guiding optical element (hereinafter referred to as “LOEs 3390”). The system 3300 generally includes an image generating processor 3310, a light source 3320, a controller 3330, a spatial light modulator (“SLM”) 3340, and at least one set of stacked LOEs 3390 that functions as a multiple plane focus system. The system 3300 may also include an eye-tracking subsystem 3350. It should be appreciated that other embodiments may have multiple sets of stacked LOEs 3390.

The image generating processor 3310 is configured to generate virtual content to be displayed to a user. The image generating processor 3310 may convert an image or video associated with virtual content to a format that can be projected to the user. For example, in generating virtual content, the virtual content may need to be formatted such that portions of a particular image are displayed at a particular depth plane while others are displayed at other depth planes. In one embodiment, all of the image may be generated at a particular depth plane. In another embodiment, the image generating processor 3310 may be programmed to provide slightly different images to the right and left eyes such that when viewed together, the virtual content appears coherent and comfortable to the user’s eyes.

The image generating processor 3310 may further include a memory 3312, a GPU 3314, a CPU 3316, and other circuitry for image generation and processing. The image generating processor 3310 may be programmed with the desired virtual content to be presented to the user of the system 3300. It should be appreciated that in some embodiments, the image generating processor 3310 may be housed in the system 3300. In other embodiments, the image generating processor 3310 and other circuitry may be housed in a belt pack that is coupled to the system 3300.

The image generating processor 3310 is operatively coupled to the light source 3320 which projects light associated with the desired virtual content and one or more SLMs 3340. The light source 3320 is compact and has high resolution. The light source 3320 is operatively coupled to a controller 3330. The light source 3320 may be include color specific LEDs and lasers disposed in various geometric configurations. Alternatively, the light source 3320 may include LEDs or lasers of like color, each one linked to a specific region of the field of view of the display. In another embodiment, the light source 3320 may include a broad-area emitter such as an incandescent or fluorescent lamp with a mask overlay for segmentation of emission areas and positions. Although the light source 3320 is directly connected to the system 3300 in FIG. 20, the light source 3320 may be connected to the system 3300 via optical fibers (not shown). The system 3300 may also include condenser (not shown) configured to collimate the light from the light source 3320.

The SLM 3340 may be reflective (e.g., a liquid crystal on silicon (LCOS), a ferroelectric liquid crystal on silicon (FLCOS), an OLP dot matrix display (DMD), or a micro-electromechanical system (MEMS) mirror system), transmissive (e.g., a liquid crystal display (LCD)) or emissive (e.g. an fiber scan display (FSD) or an organic light emitting diode (OLED)) in various exemplary embodiments. The type of SLM 3340 (e.g., speed, size, etc.) can be selected to improve a creation of a perception. While OLP DMDs operating at higher refresh rates may be easily incorporated into stationary systems 3300, wearable systems 3300 may use DLPs of smaller size and power. The power of the OLP changes how depth planes/focal planes are created. The image generating processor 3310 is operatively coupled to the SLM 3340, which encodes the light from the light source 3320 with the desired virtual content. Light from the light source 3320 may be encoded with the image information when it reflects off of, emits from, or passes through the SLM 3340.

Light from the SLM 3340 is directed to the LOEs 3390 such that light beams encoded with image data for one depth plane and/or color by the SLM 3340 are effectively propagated along a single LOE 3390 for delivery to an eye of a user. Each LOE 3390 is configured to project an image or sub-image that appears to originate from a desired depth plane or FOV angular position onto a user’s retina. The light source 3320 and LOEs 3390 can therefore selectively project images (synchronously encoded by the SLM 3340 under the control of controller 3330) that appear to originate from various depth planes or positions in space. By sequentially projecting images using each of the light source 3320 and LOEs 3390 at a sufficiently high frame rate (e.g., 360 Hz for six depth planes at an effective full-volume frame rate of 60 Hz), the system 3300 can generate a 30 image of virtual objects at various depth planes that appear to exist simultaneously in the 30 image.

The controller 3330 is in communication with and operatively coupled to the image generating processor 3310, the light source 3320 and the SLM 3340 to coordinate the synchronous display of images by instructing the SLM 3340 to encode the light beams from the light source 3320 with appropriate image information from the image generating processor 3310.

The system 3300 also includes an optional eye-tracking subsystem 3350 that is configured to track the user’s eyes and determine the user’s focus. In one embodiment, the system 3300 is configured to illuminate a subset of LOEs 3390, based on input from the eye-tracking subsystem 3350 such that the image is generated at a desired depth plane that coincides with the user’s focus/accommodation. For example, if the user’s eyes are parallel to each other, the system 3300 may illuminate the LOE 3390 that is configured to deliver collimated light to the user’s eyes, such that the image appears to originate from optical infinity. In another example, if the eye-tracking subsystem 3350 determines that the user’s focus is at 1 meter away, the LOE 3390 that is configured to focus approximately within that range may be illuminated instead.

For purposes of summarizing the disclosure, certain aspects, advantages and features of the invention have been described herein. It is to be understood that not necessarily all such advantages may be achieved in accordance with any particular embodiment of the invention. Thus, the invention may be embodied or carried out in a manner that achieves or optimizes one advantage or group of advantages as taught herein without necessarily achieving other advantages as may be taught or suggested herein.

Embodiments have been described in connection with the accompanying drawings. However, it should be understood that the figures are not drawn to scale. Distances, angles, etc. are merely illustrative and do not necessarily bear an exact relationship to actual dimensions and layout of the devices illustrated. In addition, the foregoing embodiments have been described at a level of detail to allow one of ordinary skill in the art to make and use the devices, systems, methods, etc. described herein. A wide variety of variation is possible. Components, elements, and/or steps may be altered, added, removed, or rearranged.

The devices and methods described herein can advantageously be at least partially implemented using, for example, computer software, hardware, firmware, or any combination of software, hardware, and firmware. Software modules can comprise computer executable code, stored in a computer’s memory, for performing the functions described herein. In some embodiments, computer-executable code is executed by one or more general purpose computers. However, a skilled artisan will appreciate, in light of this disclosure, that any module that can be implemented using software to be executed on a general purpose computer can also be implemented using a different combination of hardware, software, or firmware. For example, such a module can be implemented completely in hardware using a combination of integrated circuits. Alternatively or additionally, such a module can be implemented completely or partially using specialized computers designed to perform the particular functions described herein rather than by general purpose computers. In addition, where methods are described that are, or could be, at least in part carried out by computer software, it should be understood that such methods can be provided on non-transitory computer-readable media (e.g., optical disks such as CDs or DVDs, hard disk drives, flash memories, diskettes, or the like) that, when read by a computer or other processing device, cause it to carry out the method.

While certain embodiments have been explicitly described, other embodiments will become apparent to those of ordinary skill in the art based on this disclosure.

文章《MagicLeap Patent | Virtual and augmented reality systems and methods》首发于Nweon Patent

]]>
Meta Patent | Systems and methods of preambles for uwb transmission https://patent.nweon.com/26729 Thu, 26 Jan 2023 15:51:21 +0000 https://patent.nweon.com/?p=26729 ...

文章《Meta Patent | Systems and methods of preambles for uwb transmission》首发于Nweon Patent

]]>
Patent: Systems and methods of preambles for uwb transmission

Patent PDF: 加入映维网会员获取

Publication Number: 20230021454

Publication Date: 2023-01-26

Assignee: Meta Platforms Technologies

Abstract

Systems and methods for selecting preamble codes for ultra-wideband (UWB) data transmissions include a device which selects a first preamble code of a plurality of preamble codes for a data transmission sent via at least one UWB antenna to a second device. Each of the plurality of preamble codes may have a sidelobe suppression ratio of at least 12 dB with respect to another one of the plurality of preamble codes. The device may transmit the data transmission including the first preamble code via the UWB antenna to the second device.

Claims

What is claimed is:

1.A method comprising: selecting, by a first device, a first preamble code of a plurality of preamble codes for a data transmission sent via at least one ultra-wideband (UWB) antenna to a second device, each of the plurality of preamble codes having a sidelobe suppression ratio of at least 12 dB with respect to another one of the plurality of preamble codes; and transmitting, by the first device, the data transmission including the first preamble code via the UWB antenna to the second device.

2.The method of claim 1, further comprising: determining, by the first device, that a third device in an environment of the first device and second device is using the first preamble code; and selecting, by the first device, a second preamble code of the plurality of preamble codes, for a subsequent data transmission.

3.The method of claim 2, wherein all of the plurality of preamble codes have a same sequence length.

4.The method of claim 3, wherein a second plurality of preamble codes having the same sequence length is defined, each of the second plurality of preamble codes being (i) distinct from the plurality of preamble codes and (ii) having a sidelobe suppression ratio of at least 12 dB with respect to another one of the second plurality of preamble codes.

5.The method of claim 1, wherein the plurality of preamble codes comprise an alphabet of at least one of two characters or four characters.

6.The method of claim 5, wherein the two characters or four characters include one or more characters from {0, 1, −1, i, −i}.

7.The method of claim 1, wherein the plurality of preamble codes comprises m-sequence preamble codes.

8.The method of claim 1, wherein the plurality of preamble codes comprises preamble codes having a sequence length of 15, 17, 19, 63, 255, 511, 1023, or 2047.

9.The method of claim 1, comprising: determining, by the first device, that a third device in an environment of the first device and the second device is using a preamble code of the plurality of preamble codes; identifying, by the first device, a second plurality of preamble codes; and selecting, by the first device, a second preamble code of the second plurality of preamble codes for a subsequent data transmission.

10.The method of claim 9, wherein the plurality of preamble codes has a same number of preamble codes as the second plurality of preamble codes.

11.A first device, comprising: at least one ultra-wideband (UWB) antennas; and at least one processor configured to: select a first preamble code of a plurality of preamble codes for a data transmission sent via at least one ultra-wideband (UWB) antenna to a second device, each of the plurality of preamble codes having a sidelobe suppression ratio of at least 12 dB with respect to another one of the plurality of preamble codes; and transmit, via the at least one UWB antenna, the data transmission including the first preamble code via the UWB antenna to the second device.

12.The first device of claim 11, wherein the at least one processor is configured to: determine that a third device in an environment of the first device and second device is using the first preamble code; and select a second preamble code of the plurality of preamble codes, for a subsequent data transmission.

13.The first device of claim 12, wherein all of the plurality of preamble codes have a same sequence length.

14.The first device of claim 13, wherein a second plurality of preamble codes having the same sequence length is defined, each of the second plurality of preamble codes being (i) distinct from the plurality of preamble codes and (ii) having a sidelobe suppression ratio of at least 12 dB with respect to another one of the second plurality of preamble codes.

15.The first device of claim 11, wherein the plurality of preamble codes comprise an alphabet of at least one of two characters or four characters.

16.The first device of claim 15, wherein the two characters or four characters include one or more characters from {0, 1, −1, i, −i}.

17.The first device of claim 11, wherein the plurality of preamble codes comprises m-sequence preamble codes.

18.The first device of claim 11, wherein the plurality of preamble codes comprises preamble codes having a sequence length of 15, 17, 19, 63, 255, 511, 1023, or 2047.

19.The first device of claim 11, wherein the at least one processor is configured to: determine that a third device in an environment of the first device and the second device is using a preamble code of the plurality of preamble codes; identify a second plurality of preamble codes; and select a second preamble code of the second plurality of preamble codes for a subsequent data transmission.

20.The first device of claim 19, wherein the plurality of preamble codes has a same number of preamble codes as the second plurality of preamble codes.

Description

CROSS-REFERENCE TO RELATED APPLICATION(S)

This application claims the benefit of and priority to U.S. Provisional Application No. 63/222,207, filed Jul. 15, 2021, the contents of which are incorporated herein by reference in its entirety.

BACKGROUND

Artificial reality such as virtual reality (VR), augmented reality (AR), or mixed reality (MR) provides immersive experience to a user. Typically, in systems and methods which implement or otherwise provide immersive experiences, such systems utilize Wi-Fi, Bluetooth, or Radio wireless links to transmit/receive data. However, using such wireless links typically requires detailed coordination between links, particularly where multiple devices in the same environment are utilizing the same wireless link technology for communications.

SUMMARY

In one aspect, this disclosure is directed to a method. The method may include selecting, by a first device, a first preamble code of a plurality of preamble codes for a data transmission sent via at least one ultra-wideband (UWB) antenna to a second device. Each of the plurality of preamble codes may have a sidelobe suppression ratio of at least 12 dB with respect to another one of the plurality of preamble codes. The method may include transmitting, by the first device, the data transmission including the first preamble code via the UWB antenna to the second device.

In some embodiments, the method includes determining, by the first device, that a third device in an environment of the first device and second device is using the first preamble code. The method may include selecting, by the first device, a second preamble code of the plurality of preamble codes, for a subsequent data transmission. In some embodiments, all of the plurality of preamble codes have a same sequence length. In some embodiments, a second plurality of preamble codes having the same sequence length is defined. Each of the second plurality of preamble codes may be (i) distinct from the plurality of preamble codes and (ii) having a sidelobe suppression ratio of at least 12 dB with respect to another one of the second plurality of preamble codes. In some embodiments, the plurality of preamble codes include an alphabet of at least one of two characters or four characters. In some embodiments, the two characters or four characters include one or more characters from {0, 1, −1, i, −i}.

In some embodiments, the plurality of preamble codes comprises m-sequence preamble codes. In some embodiments, the plurality of preamble codes include preamble codes having a sequence length of 15, 17, 19, 63, 255, 511, 1023, or 2047. In some embodiments, the method includes determining, by the first device, that a third device in an environment of the first device and the second device is using a preamble code of the plurality of preamble codes. The method may include identifying, by the first device, a second plurality of preamble codes. The method may further include selecting, by the first device, a second preamble code of the second plurality of preamble codes for a subsequent data transmission. In some embodiments, the plurality of preamble codes has a same number of preamble codes as the second plurality of preamble codes.

In another aspect, this disclosure is directed to a first device. The first device may include at least one ultra-wideband (UWB) antennas. The first device may include at least one processor configured to select a first preamble code of a plurality of preamble codes for a data transmission sent via at least one ultra-wideband (UWB) antenna to a second device. Each of the plurality of preamble codes may have a sidelobe suppression ratio of at least 12 dB with respect to another one of the plurality of preamble codes. The processor may be configured to transmit, via the at least one UWB antenna, the data transmission including the first preamble code via the UWB antenna to the second device.

In some embodiments, the at least one processor is configured to determine that a third device in an environment of the first device and second device, is using the first preamble code. The at least one processor may be configured to select a second preamble code of the plurality of preamble codes, for a subsequent data transmission. In some embodiments, all of the plurality of preamble codes have a same sequence length. In some embodiments, a second plurality of preamble codes having the same sequence length is defined. Each of the second plurality of preamble codes may be (i) distinct from the plurality of preamble codes and (ii) having a sidelobe suppression ratio of at least 12 dB with respect to another one of the second plurality of preamble codes. In some embodiments, the plurality of preamble codes includes an alphabet of at least one of two characters or four characters. In some embodiments, the two characters or four characters include one or more characters from {0, 1, −1, i, −i} (e.g., possible candidate characters, and can also include any such character multiplied/divided by a respective factor/value).

In some embodiments, the plurality of preamble codes comprises m-sequence preamble codes. In some embodiments, the plurality of preamble codes includes preamble codes having a sequence length of 15, 17, 19, 63, 255, 511, 1023, or 2047. In some embodiments, the at least one processor is configured to determine that a third device in an environment of the first device and the second device is using a preamble code of the plurality of preamble codes. The at least one processor may be configured to identify a second plurality of preamble codes. The at least one processor may be configured to select a second preamble code of the second plurality of preamble codes for a subsequent data transmission. In some embodiments, the plurality of preamble codes has a same number of preamble codes as the second plurality of preamble codes.

BRIEF DESCRIPTION OF THE DRAWINGS

The accompanying drawings are not intended to be drawn to scale. Like reference numbers and designations in the various drawings indicate like elements. For purposes of clarity, not every component can be labeled in every drawing.

FIG. 1 is a diagram of a system environment including an artificial reality system, according to an example implementation of the present disclosure.

FIG. 2 is a diagram of a head wearable display, according to an example implementation of the present disclosure.

FIG. 3 is a block diagram of an artificial reality environment, according to an example implementation of the present disclosure.

FIG. 4 is a block diagram of another artificial reality environment, according to an example implementation of the present disclosure.

FIG. 5 is a block diagram of another artificial reality environment, according to an example implementation of the present disclosure.

FIG. 6 is a block diagram of a computing environment, according to an example implementation of the present disclosure.

FIG. 7 is a block diagram of a system for using/determining/selecting preamble codes for UWB transmissions, according to an example implementation of the present disclosure.

FIG. 8A-FIG. 8B are preamble code tables corresponding to m-sequence preamble codes having a sequence length of 255, according to an example implementation of the present disclosure.

FIG. 9A-FIG. 9B are preamble code tables corresponding to m-sequence preamble codes having a sequence length of 511, according to an example implementation of the present disclosure.

FIG. 10A-FIG. 10B are preamble code tables corresponding to m-sequence preamble codes having a sequence length of 1023, according to an example implementation of the present disclosure.

FIG. 11A-FIG. 11B are preamble code tables corresponding to m-sequence preamble codes having a sequence length of 2047, according to an example implementation of the present disclosure.

FIG. 12A-FIG. 12C are preamble code tables including derived preamble codes having a sequence length of 15, according to an example implementation of the present disclosure.

FIG. 13 is a table showing metrics relating to the derived preamble code tables shown in FIG. 12A, according to an example implementation of the present disclosure.

FIG. 14A-FIG. 14C are preamble code tables including derived preamble codes having a sequence length of 17, according to an example implementation of the present disclosure.

FIG. 15A-FIG. 15B are preamble code tables including derived preamble codes having a sequence length of 19, according to an example implementation of the present disclosure.

FIG. 16 is a flowchart showing a method for selecting preamble codes for UWB transmissions, according to an example implementation of the present disclosure

DETAILED DESCRIPTION

Before turning to the figures, which illustrate certain embodiments in detail, it should be understood that the present disclosure is not limited to the details or methodology set forth in the description or illustrated in the figures. It should also be understood that the terminology used herein is for the purpose of description only and should not be regarded as limiting.

Disclosed herein are embodiments related to devices operating in the ultra-wideband (UWB) spectrum. In various embodiments, UWB devices operate in the 3-10 GHz unlicensed spectrum using 500+ MHz channels which may require low power for transmission. For example, the transmit power spectral density (PSD) for some devices may be limited to −41.3 dBm/MHz. On the other hand, UWB may have transmit PSD values in the range of −5 to +5 dBm/MHz range, averaged over 1 ms, with a peak power limit of 0 dBm in a given 50 MHz band. Using simple modulation and spread spectrum, UWB devices may achieve reasonable resistance to Wi-Fi and Bluetooth interference (as well as resistance to interference with other UWB devices within a shared or common environment) for very low data rates (e.g., 10s to 100s Kbps) and may have large processing gains. However, for higher data rates (e.g., several Mbps), the processing gains may not be sufficient to overcome co-channel interference from Wi-Fi or Bluetooth. According to the embodiments described herein, the systems and methods described herein may operate in frequency bands that do not overlap with Wi-Fi and Bluetooth, but may have good global availability based on regulatory requirements. Since regulatory requirements make the 7-8 GHz spectrum the most widely available globally (and Wi-Fi is not present in this spectrum), the 7-8 GHz spectrum may operate satisfactory both based on co-channel interference and processing gains.

Some implementations of UWB may focus on precision ranging, security, and low to moderate rate data communication. As UWB employs relatively simple modulation, it may be implemented at low cost and low power consumption. In AR/VR applications, link budget calculations for an AR/VR controller link indicate that the systems and methods described herein may be configured for effective data throughput ranging from 18 2 to 31 Mbps (e.g., with 31 Mbps being the maximum possible rate in the latest 802.15.4z standard), which may depend on body loss assumptions. Using conservative body loss assumptions, the systems and methods described herein should be configured for data throughput of up to approximately 5 Mbps, which may be sufficient to meet the data throughput performance standards for AR/VR links. With a customized implementation, data throughput rate could be increased beyond 27 Mbps (e.g., to 54 Mbps), but with possible loss in link margin.

In various implementations, devices may leverage UWB devices or antennas 308 to exchange data communications. In some systems and methods, to exchange a data communication, a device may incorporate a preamble into the transmission frame or signal. The preamble may identify, signify, relate to, or otherwise be linked to a particular data channel or linkage between the devices. For example, two devices exchanging communications may use preambles which are known to the respective devices. When one of the devices receives a transmission frame or signal, the device may extract or otherwise identify the preamble from the signal. Upon determining that the preamble matches the known preamble of the other device, the device may determine that the device is the intended recipient of the signal and parse the body of the signal to extract its data/contents. In some instances, multiple devices may be located within a given environment. Such devices may communicate with other devices in the environment using different protocols including, for instance, UWB, WIFI, Bluetooth, etc. As part of multiple devices being co-located in an environment, interference or cross-talk communication metrics, such as autocorrelation and cross-correlation, become more important.

According to the systems and methods of the present solution, a device may be configured to select a first preamble code of a plurality of preamble codes for a data transmission to be sent via a UWB antenna or device to a second device. The preamble codes may have a suppression factor/metric/level/ratio (e.g., a sidelobe suppression ratio) of at least some threshold (such as 12 dB) with respect to another preamble code. The device may transmit the data transmission including the first preamble code with the UWB antenna to the second device. Rather than performing a trial-and-error or guess-and-check process of selecting (e.g., determining, identifying) preamble codes, the systems and methods described herein may perform a more targeted selection by determining suppression ratios (or other variants of such metrics) of a selected preamble code with respect to other preamble codes. For instance, the device may determine that another device in the environment is using a particular preamble code, and select another preamble code which satisfies a suppression criteria or threshold with respect to the particular preamble code.

FIG. 1 is a block diagram of an example artificial reality system environment 100. In some embodiments, the artificial reality system environment 100 includes an access point (AP) 105, one or more HWDs 150 (e.g., HWD 150A, 150B), and one or more computing devices 110 (computing devices 110A, 110B; sometimes referred to as consoles) providing data for artificial reality to the one or more HWDs 150. The access point 105 may be a router or any network device allowing one or more computing devices 110 and/or one or more HWDs 150 to access a network (e.g., the Internet). The access point 105 may be replaced by any communication device (cell site). A computing device 110 may be a custom device or a mobile device that can retrieve content from the access point 105, and provide image data of artificial reality to a corresponding HWD 150. Each HWD 150 may present the image of the artificial reality to a user according to the image data. In some embodiments, the artificial reality system environment 100 includes more, fewer, or different components than shown in FIG. 1. In some embodiments, the computing devices 110A, 110B communicate with the access point 105 through wireless links 102A, 102B (e.g., interlinks), respectively. In some embodiments, the computing device 110A communicates with the HWD 150A through a wireless link 125A (e.g., intralink), and the computing device 110B communicates with the HWD 150B through a wireless link 125B (e.g., intralink). In some embodiments, functionality of one or more components of the artificial reality system environment 100 can be distributed among the components in a different manner than is described here. For example, some of the functionality of the computing device 110 may be performed by the HWD 150. For example, some of the functionality of the HWD 150 may be performed by the computing device 110.

In some embodiments, the HWD 150 is an electronic component that can be worn by a user and can present or provide an artificial reality experience to the user. The HWD 150 may be referred to as, include, or be part of a head mounted display (HMD), head mounted device (HMD), head wearable device (HWD), head worn display (HWD) or head worn device (HWD). The HWD 150 may render one or more images, video, audio, or some combination thereof to provide the artificial reality experience to the user. In some embodiments, audio is presented via an external device (e.g., speakers and/or headphones) that receives audio information from the HWD 150, the computing device 110, or both, and presents audio based on the audio information. In some embodiments, the HWD 150 includes sensors 155, a wireless interface 165, a processor 170, and a display 175. These components may operate together to detect a location of the HWD 150 and a gaze direction of the user wearing the HWD 150, and render an image of a view within the artificial reality corresponding to the detected location and/or orientation of the HWD 150. In other embodiments, the HWD 150 includes more, fewer, or different components than shown in FIG. 1.

In some embodiments, the sensors 155 include electronic components or a combination of electronic components and software components that detects a location and an orientation of the HWD 150. Examples of the sensors 155 can include: one or more imaging sensors, one or more accelerometers, one or more gyroscopes, one or more magnetometers, or another suitable type of sensor that detects motion and/or location. For example, one or more accelerometers can measure translational movement (e.g., forward/back, up/down, left/right) and one or more gyroscopes can measure rotational movement (e.g., pitch, yaw, roll). In some embodiments, the sensors 155 detect the translational movement and the rotational movement, and determine an orientation and location of the HWD 150. In one aspect, the sensors 155 can detect the translational movement and the rotational movement with respect to a previous orientation and location of the HWD 150, and determine a new orientation and/or location of the HWD 150 by accumulating or integrating the detected translational movement and/or the rotational movement. Assuming for an example that the HWD 150 is oriented in a direction 25 degrees from a reference direction, in response to detecting that the HWD 150 has rotated 20 degrees, the sensors 155 may determine that the HWD 150 now faces or is oriented in a direction 45 degrees from the reference direction. Assuming for another example that the HWD 150 was located two feet away from a reference point in a first direction, in response to detecting that the HWD 150 has moved three feet in a second direction, the sensors 155 may determine that the HWD 150 is now located at a vector multiplication of the two feet in the first direction and the three feet in the second direction.

In some embodiments, the wireless interface 165 includes an electronic component or a combination of an electronic component and a software component that communicates with the computing device 110. In some embodiments, the wireless interface 165 includes or is embodied as a transceiver for transmitting and receiving data through a wireless medium. The wireless interface 165 may communicate with a wireless interface 115 of a corresponding computing device 110 through a wireless link 125 (e.g., intralink). The wireless interface 165 may also communicate with the access point 105 through a wireless link (e.g., interlink). Examples of the wireless link 125 include a near field communication link, Wi-Fi direct, Bluetooth, or any wireless communication link. In some embodiments, the wireless link 125 may include one or more ultra-wideband communication links, as described in greater detail below. Through the wireless link 125, the wireless interface 165 may transmit to the computing device 110 data indicating the determined location and/or orientation of the HWD 150, the determined gaze direction of the user, and/or hand tracking measurement. Moreover, through the wireless link 125, the wireless interface 165 may receive from the computing device 110 image data indicating or corresponding to an image to be rendered.

In some embodiments, the processor 170 includes an electronic component or a combination of an electronic component and a software component that generates one or more images for display, for example, according to a change in view of the space of the artificial reality. In some embodiments, the processor 170 is implemented as one or more graphical processing units (GPUs), one or more central processing unit (CPUs), or a combination of them that can execute instructions to perform various functions described herein. The processor 170 may receive, through the wireless interface 165, image data describing an image of artificial reality to be rendered, and render the image through the display 175. In some embodiments, the image data from the computing device 110 may be encoded, and the processor 170 may decode the image data to render the image. In some embodiments, the processor 170 receives, from the computing device 110 through the wireless interface 165, object information indicating virtual objects in the artificial reality space and depth information indicating depth (or distances from the HWD 150) of the virtual objects. In one aspect, according to the image of the artificial reality, object information, depth information from the computing device 110, and/or updated sensor measurements from the sensors 155, the processor 170 may perform shading, reprojection, and/or blending to update the image of the artificial reality to correspond to the updated location and/or orientation of the HWD 150.

In some embodiments, the display 175 is an electronic component that displays an image. The display 175 may, for example, be a liquid crystal display or an organic light emitting diode display. The display 175 may be a transparent display that allows the user to see through. In some embodiments, when the HWD 150 is worn by a user, the display 175 is located proximate (e.g., less than 3 inches) to the user’s eyes. In one aspect, the display 175 emits or projects light towards the user’s eyes according to image generated by the processor 170. The HWD 150 may include a lens that allows the user to see the display 175 in a close proximity.

In some embodiments, the processor 170 performs compensation to compensate for any distortions or aberrations. In one aspect, the lens introduces optical aberrations such as a chromatic aberration, a pin-cushion distortion, barrel distortion, etc. The processor 170 may determine a compensation (e.g., predistortion) to apply to the image to be rendered to compensate for the distortions caused by the lens, and apply the determined compensation to the image from the processor 170. The processor 170 may provide the predistorted image to the display 175.

In some embodiments, the computing device 110 is an electronic component or a combination of an electronic component and a software component that provides content to be rendered to the HWD 150. The computing device 110 may be embodied as a mobile device (e.g., smart phone, tablet PC, laptop, etc.). The computing device 110 may operate as a soft access point. In one aspect, the computing device 110 includes a wireless interface 115 and a processor 118. These components may operate together to determine a view (e.g., a FOV of the user) of the artificial reality corresponding to the location of the HWD 150 and the gaze direction of the user of the HWD 150, and can generate image data indicating an image of the artificial reality corresponding to the determined view. The computing device 110 may also communicate with the access point 105, and may obtain AR/VR content from the access point 105, for example, through the wireless link 102 (e.g., interlink). The computing device 110 may receive sensor measurement indicating location and the gaze direction of the user of the HWD 150 and provide the image data to the HWD 150 for presentation of the artificial reality, for example, through the wireless link 125 (e.g., intralink). In other embodiments, the computing device 110 includes more, fewer, or different components than shown in FIG. 1.

In some embodiments, the wireless interface 115 is an electronic component or a combination of an electronic component and a software component that communicates with the HWD 150, the access point 105, other computing device 110, or any combination of them. In some embodiments, the wireless interface 115 includes or is embodied as a transceiver for transmitting and receiving data through a wireless medium. The wireless interface 115 may be a counterpart component to the wireless interface 165 to communicate with the HWD 150 through a wireless link 125 (e.g., intralink). The wireless interface 115 may also include a component to communicate with the access point 105 through a wireless link 102 (e.g., interlink). Examples of wireless link 102 include a cellular communication link, a near field communication link, Wi-Fi, Bluetooth, 60 GHz wireless link, ultra-wideband link, or any wireless communication link. The wireless interface 115 may also include a component to communicate with a different computing device 110 through a wireless link 185. Examples of the wireless link 185 include a near field communication link, Wi-Fi direct, Bluetooth, ultra-wideband link, or any wireless communication link. Through the wireless link 102 (e.g., interlink), the wireless interface 115 may obtain AR/VR content, or other content from the access point 105. Through the wireless link 125 (e.g., intralink), the wireless interface 115 may receive from the HWD 150 data indicating the determined location and/or orientation of the HWD 150, the determined gaze direction of the user, and/or the hand tracking measurement. Moreover, through the wireless link 125 (e.g., intralink), the wireless interface 115 may transmit to the HWD 150 image data describing an image to be rendered. Through the wireless link 185, the wireless interface 115 may receive or transmit information indicating the wireless link 125 (e.g., channel, timing) between the computing device 110 and the HWD 150. According to the information indicating the wireless link 125, computing devices 110 may coordinate or schedule operations to avoid interference or collisions.

The processor 118 can include or correspond to a component that generates content to be rendered according to the location and/or orientation of the HWD 150. In some embodiments, the processor 118 includes or is embodied as one or more central processing units, graphics processing units, image processors, or any processors for generating images of the artificial reality. In some embodiments, the processor 118 may incorporate the gaze direction of the user of the HWD 150 and a user interaction in the artificial reality to generate the content to be rendered. In one aspect, the processor 118 determines a view of the artificial reality according to the location and/or orientation of the HWD 150. For example, the processor 118 maps the location of the HWD 150 in a physical space to a location within an artificial reality space, and determines a view of the artificial reality space along a direction corresponding to the mapped orientation from the mapped location in the artificial reality space. The processor 118 may generate image data describing an image of the determined view of the artificial reality space, and transmit the image data to the HWD 150 through the wireless interface 115. The processor 118 may encode the image data describing the image, and can transmit the encoded data to the HWD 150. In some embodiments, the processor 118 generates and provides the image data to the HWD 150 periodically (e.g., every 11 ms or 16 ms).

In some embodiments, the processors 118, 170 may configure or cause the wireless interfaces 115, 165 to toggle, transition, cycle or switch between a sleep mode and a wake up mode. In the wake up mode, the processor 118 may enable the wireless interface 115 and the processor 170 may enable the wireless interface 165, such that the wireless interfaces 115, 165 may exchange data. In the sleep mode, the processor 118 may disable (e.g., implement low power operation in) the wireless interface 115 and the processor 170 may disable the wireless interface 165, such that the wireless interfaces 115, 165 may not consume power or may reduce power consumption. The processors 118, 170 may schedule the wireless interfaces 115, 165 to switch between the sleep mode and the wake up mode periodically every frame time (e.g., 11 ms or 16 ms). For example, the wireless interfaces 115, 165 may operate in the wake up mode for 2 ms of the frame time, and the wireless interfaces 115, 165 may operate in the sleep mode for the remainder (e.g., 9 ms) of the frame time. By disabling the wireless interfaces 115, 165 in the sleep mode, power consumption of the computing device 110 and the HWD 150 can be reduced.

Systems and Methods for Ultra-Wideband Devices

In various embodiments, the devices in the environments described above may operate or otherwise use components which leverage communications in the ultra-wideband (UWB) spectrum. In various embodiments, UWB devices operate in the 3-10 GHz unlicensed spectrum using 500+ MHz channels which may require low power for transmission. For example, the transmit power spectral density (PSD) for some systems may be limited to −41.3 dBm/MHz. On the other hand, UWB may have transmit PSD values in the range of −5 to +5 dBm/MHz range, averaged over 1 ms, with a peak power limit of 0 dBm in a given 50 MHz band. Using simple modulation and spread spectrum, UWB devices may achieve reasonable resistance to Wi-Fi and Bluetooth interference (as well as resistance to interference with other UWB devices located in the environment) for very low data rates (e.g., 10s to 100s Kbps) and may have large processing gains. However, for higher data rates (e.g., several Mbps), the processing gains may not be sufficient to overcome co-channel interference from Wi-Fi or Bluetooth. According to the embodiments described herein, the systems and methods described herein may operate in frequency bands that do not overlap with Wi-Fi and Bluetooth, but may have good global availability based on regulatory requirements. Since regulatory requirements make the 7-8 GHz spectrum the most widely available globally (and Wi-Fi is not present in this spectrum), the 7-8 GHz spectrum may operate satisfactory both based on co-channel interference and processing gains.

Some implementations of UWB may focus on precision ranging, security, and for low-to-moderate rate data communication. As UWB employs relatively simple modulation, it may be implemented at low cost and low power consumption. In AR/VR applications (or in other applications and use cases), link budget calculations for an AR/VR controller link indicate that the systems and methods described herein may be configured for effective data throughput ranging from ˜2 to 31 Mbps (e.g., with 31 Mbps being the maximum possible rate in the latest 802.15.4z standard), which may depend on body loss assumptions Referring now to FIG. 3, depicted is a block diagram of an artificial reality environment 300. The artificial reality environment 300 is shown to include a first device 302 and one or more peripheral devices 304(1)-304(N) (also referred to as “peripheral device 304” or “device 304”). The first device 302 and peripheral device(s) 304 may each include a communication device 306 including a plurality of UWB devices 308. A set of UWB devices 308 may be spatially positioned/located (e.g., spaced out) relative to each other on different locations on/in the first device 302 or the peripheral device 304, so as to maximize UWB coverage and/or to enhance/enable specific functionalities. The UWB devices 308 may be or include antennas, sensors, or other devices and components designed or implemented to transmit and receive data or signals in the UWB spectrum (e.g., between 3.1 GHz and 10.6 GHz) and/or using UWB communication protocol. In some embodiments, one or more of the devices 302, 304 may include various processing engines 310. The processing engines 310 may be or include any device, component, machine, or other combination of hardware and software designed or implemented to control the devices 302, 304 based on UWB signals transmitted and/or received by the respective UWB devices 308.

As noted above, the environment 300 may include a first device 302. The first device 302 may be or include a wearable device, such as the HWD 150 described above, a smart watch, AR glasses, or the like. In some embodiments, the first device 302 may include a mobile device (e.g., a smart phone, tablet, console device, or other computing device). The first device 302 may be communicably coupled with various other devices 304 located in the environment 300. For example, the first device 302 may be communicably coupled to one or more of the peripheral devices 304 located in the environment 300. The peripheral devices 304 may be or include the computing device 110 described above, a device similar to the first device 302 (e.g., a HWD 150, a smart watch, mobile device, etc.), an automobile or other vehicle, a beacon transmitting device located in the environment 300, a smart home device (e.g., a smart television, a digital assistant device, a smart speaker, etc.), a smart tag configured for positioning on various devices, etc. In some embodiments, the first device 302 may be associated with a first entity or user and the peripheral devices 304 may be associated with a second entity or user (e.g., a separate member of a household, or a person/entity unrelated to the first entity).

In some embodiments, the first device 302 may be communicably coupled with the peripheral device(s) 304 following a pairing or handshaking process. For example, the first device 302 may be configured to exchange handshake packet(s) with the peripheral device(s) 304, to pair (e.g., establish a specific or dedicated connection or link between) the first device 302 and the peripheral device 304. The handshake packet(s) may be exchanged via the UWB devices 308, or via another wireless link 125 (such as one or more of the wireless links 125 described above). Following pairing, the first device 302 and peripheral device(s) 304 may be configured to transmit, receive, or otherwise exchange UWB data or UWB signals using the respective UWB devices 308 on the first device 302 and/or peripheral device 304. In some embodiments, the first device 302 may be configured to establish a communications link with a peripheral device 304 (e.g., without any device pairing). For example, the first device 302 may be configured to detect, monitor, and/or identify peripheral devices 304 located in the environment using UWB signals received from the peripheral devices 304 within a certain distance of the first device 302, by identifying peripheral devices 304 which are connected to a shared Wi-Fi network (e.g., the same Wi-Fi network to which the first device 302 is connected), etc. In these and other embodiments, the first device 302 may be configured to transmit, send, receive, or otherwise exchange UWB data or signals with the peripheral device 304.

Referring now to FIG. 4, depicted is a block diagram of an environment 400 including the first device 302 and a peripheral device 304. The first device 302 and/or the peripheral device 304 may be configured to determine a range (e.g., a spatial distance, separation) between the devices 302, 304. The first device 302 may be configured to send, broadcast, or otherwise transmit a UWB signal (e.g., a challenge signal). The first device 302 may transmit the UWB signal using one of the UWB devices 308 of the communication device 306 on the first device 302. The UWB device 308 may transmit the UWB signal in the UWB spectrum. The UWB signal may have a high bandwidth (e.g., 500 MHz). As such, the UWB device 308 may be configured to transmit the UWB signal in the UWB spectrum (e.g., between 3.1 GHz and 10.6 GHz) and having a high bandwidth (e.g., 500 MHz). The UWB signal from the first device 302 may be detectable by other devices within a certain range of the first device 302 (e.g., devices having a line of sight (LOS) within 200 m of the first device 302). As such, the UWB signal may be more accurate for detecting range between devices than other types of signals or ranging technology.

The peripheral device 304 may be configured to receive or otherwise detect the UWB signal from the first device 302. The peripheral device 304 may be configured to receive the UWB signal from the first device 302 via one of the UWB devices 308 on the peripheral device 304. The peripheral device 304 may be configured to broadcast, send, or otherwise transmit a UWB response signal responsive to detecting the UWB signal from the first device 302. The peripheral device 304 may be configured to transmit the UWB response signal using one of the UWB devices 308 of the communication device 306 on the peripheral device 304. The UWB response signal may be similar to the UWB signal sent from the first device 302.

The first device 302 may be configured to detect, compute, calculate, or otherwise determine a time of flight (TOF) based on the UWB signal and the UWB response signal. The TOF may be a time or duration between a time in which a signal (e.g., the UWB signal) is transmitted by the first device 302 and a time in which the signal is received by the peripheral device 304. The first device 302 and/or the peripheral device 304 may be configured to determine the TOF based on timestamps corresponding to the UWB signal. For example, the first device 302 and/or peripheral device 304 may be configured to exchange transmit and receive timestamps based on when the first device 302 transmits the UWB signal (a first TX timestamp), when the peripheral device receives the UWB signal (e.g., a first RX timestamp), when the peripheral device sends the UWB response signal (e.g., a second TX timestamp), and when the first device 302 receives the UWB response signal (e.g., a second RX timestamp). The first device 302 and/or the peripheral device 304 may be configured to determine the TOF based on a first time in which the first device 302 sent the UWB signal and a second time in which the first device 302 received the UWB response signal (e.g., from the peripheral device 304), as indicated by first and second TX and RX timestamps identified above. The first device 302 may be configured to determine or calculate the TOF between the first device 302 and the peripheral device 304 based on a difference between the first time and the second time (e.g., divided by two).

In some embodiments, the first device 302 may be configured to determine the range (or distance) between the first device 302 and the peripheral device 304 based on the TOF. For example, the first device 302 may be configured to compute the range or distance between the first device 302 and the peripheral device 304 by multiplying the TOF and the speed of light (e.g., TOF×c). In some embodiments, the peripheral device 304 (or another device in the environment 400) may be configured to compute the range or distance between the first device 302 and peripheral device 304. For example, the first device 302 may be configured to transmit, send, or otherwise provide the TOF to the peripheral device 304 (or other device), and the peripheral device 304 (or other device) may be configured to compute the range between the first device 302 and peripheral device 304 based on the TOF, as described above.

Referring now to FIG. 5, depicted is a block diagram of an environment 500 including the first device 302 and a peripheral device 304. In some embodiments, the first device 302 and/or the peripheral device 304 may be configured to determine a position or pose (e.g., orientation) of the first device 302 relative to the peripheral device 304. The first device 302 and/or the peripheral device 304 may be configured to determine the relative position or orientation in a manner similar to determining the range as described above. For example, the first device 302 and/or the peripheral device 304 may be configured to determine a plurality of ranges (e.g., range(1), range(2), and range(3)) between the respective UWB devices 308 of the first device 302 and the peripheral device 304. In the environment 500 of FIG. 5, the first device 302 is positioned or oriented at an angle relative to the peripheral device 304. The first device 302 may be configured to compute the first range (range(1)) between central UWB devices 308(2), 308(5) of the first and peripheral device 304. The first range may be an absolute range or distance between the devices 302, 304, and may be computed as described above with respect to FIG. 4.

The first device 302 and/or the peripheral device 304 may be configured to compute the second range(2) and third range(3) similar to computing the range(1), In some embodiments, the first device 302 and/or the peripheral device 304 may be configured to determine additional ranges, such as a range between UWB device 308(1) of the first device 302 and UWB device 308(5) of the peripheral device 304, a range between UWB device 308(2) of the first device 302 and UWB device 308(6) of the peripheral device 304, and so forth. While described above as determining a range based on additional UWB signals, it is noted that, in some embodiments, the first device 302 and/or the peripheral device 304 may be configured to determine a phase difference between a UWB signal received at a first UWB device 308 and a second UWB device 308 (i.e., the same UWB signal received at separate UWB devices 308 on the same device 302, 304). The first device 302 and/or the peripheral device 304 may be configured to use each or a subset of the computed ranges (or phase differences) to determine the pose, position, orientation, etc. of the first device 302 relative to the peripheral device 304. For example, the first device and/or the peripheral device 304 may be configured to use one of the ranges relative to the first range(1) (or phase differences) to determine a yaw of the first device 302 relative to the peripheral device 304, another one of the ranges relative to the first range(1) (or phase differences) to determine a pitch of the first device 302 relative to the peripheral device 304, another one of the ranges relative to the first range(1) (or phase differences) to determine a roll of the first device 302 relative to the peripheral device 304, and so forth.

By using the UWB devices 308 at the first device 302 and peripheral devices 304, the range and pose may be determined with greater accuracy than other ranging/wireless link technologies. For example, the range may be determined within a granularity or range of +/−0.1 meters, and the pose/orientation may be determined within a granularity or range of +/−5 degrees.

Referring to FIG. 3-FIG. 5, in some embodiments, the first device 302 may include various sensors and/or sensing systems. For example, the first device 302 may include an inertial measurement unit (IMU) sensor 312, global positioning system (GPS) 314, etc. The sensors and/or sensing systems, such as the IMU sensor 312 and/or GPS 314 may be configured to generate data corresponding to the first device 302. For example, the IMU sensor 312 may be configured to generate data corresponding to an absolute position and/or pose of the first device 302. Similarly, the GPS 314 may be configured to generate data corresponding to an absolute location/position of the first device 302. The data from the IMU sensor 312 and/or GPS 314 may be used in conjunction with the ranging/position data determined via the UWB devices 308 as described above. In some embodiments, the first device 302 may include a display 316. The display 316 may be integrated or otherwise incorporated in the first device 302. In some embodiments, the display 316 may be separate or remote from the first device 302. The display 316 may be configured to display, render, or otherwise provide visual information to a user or wearer of the first device 302, which may be rendered at least in part on the ranging/position data of the first device 302.

One or more of the devices 302, 304 may include various processing engine(s) 310. As noted above, the processing engine(s) 310 may be or include any device, component, machine, or combination of hardware and software designed or implemented to control the devices 302, 304 based on UWB signals transmitted and/or received by the respective UWB devices 308. In some embodiments, the processing engine(s) 310 may be configured to compute or otherwise determine the ranges/positions of the first device 302 relative to the peripheral devices 304 as described above. In some embodiments, the processing engines 310 may be located or embodied on another device in the environment 300500 (such as at the access point 105 as described above with respect to FIG. 1). As such, the first device 302 and/or peripheral devices 304 may be configured to off-load computation to another device in the environment 300500 (such as the access point 105). In some embodiments, the processing engines 310 may be configured to perform various functions and computations relating to radio transmissions and scheduling (e.g., via the UWB devices 308 and/or other communication interface components), compute or otherwise determine range and relative position of the devices 302, 304, manage data exchanged between the devices 302, 304, interface with external components (such as hardware components in the environment 300500, external software or applications, etc.), and the like. Various examples of functions and computations which may be performed by the processing engine(s) 310 are described in greater detail below.

Various operations described herein can be implemented on computer systems. FIG. 6 shows a block diagram of a representative computing system 614 usable to implement the present disclosure. In some embodiments, the computing device 110, the HWD 150, devices 302, 304, or each of the components of FIG. 15 are implemented by or may otherwise include one or more components of the computing system 614. Computing system 614 can be implemented, for example, as a consumer device such as a smartphone, other mobile phone, tablet computer, wearable computing device (e.g., smart watch, eyeglasses, head wearable display), desktop computer, laptop computer, or implemented with distributed computing devices. The computing system 614 can be implemented to provide VR, AR, MR experience. In some embodiments, the computing system 614 can include conventional computer components such as processors 616, storage device 618, network interface 620, user input device 622, and user output device 624.

Network interface 620 can provide a connection to a wide area network (e.g., the Internet) to which WAN interface of a remote server system is also connected. Network interface 620 can include a wired interface (e.g., Ethernet) and/or a wireless interface implementing various RF data communication standards such as Wi-Fi, Bluetooth, UWB, or cellular data network standards (e.g., 3G, 4G, 5G, 60 GHz, LTE, etc.).

User input device 622 can include any device (or devices) via which a user can provide signals to computing system 614; computing system 614 can interpret the signals as indicative of particular user requests or information. User input device 622 can include any or all of a keyboard, touch pad, touch screen, mouse or other pointing device, scroll wheel, click wheel, dial, button, switch, keypad, microphone, sensors (e.g., a motion sensor, an eye tracking sensor, etc.), and so on.

User output device 624 can include any device via which computing system 614 can provide information to a user. For example, user output device 624 can include a display to display images generated by or delivered to computing system 614. The display can incorporate various image generation technologies, e.g., a liquid crystal display (LCD), light-emitting diode (LED) including organic light-emitting diodes (OLED), projection system, cathode ray tube (CRT), or the like, together with supporting electronics (e.g., digital-to-analog or analog-to-digital converters, signal processors, or the like). A device such as a touchscreen that function as both input and output device can be used. Output devices 624 can be provided in addition to or instead of a display. Examples include indicator lights, speakers, tactile “display” devices, printers, and so on.

Some implementations include electronic components, such as microprocessors, storage and memory that store computer program instructions in a computer readable storage medium (e.g., non-transitory computer readable medium). Many of the features described in this specification can be implemented as processes that are specified as a set of program instructions encoded on a computer readable storage medium. When these program instructions are executed by one or more processors, they cause the processors to perform various operation indicated in the program instructions. Examples of program instructions or computer code include machine code, such as is produced by a compiler, and files including higher-level code that are executed by a computer, an electronic component, or a microprocessor using an interpreter. Through suitable programming, processor 616 can provide various functionality for computing system 614, including any of the functionality described herein as being performed by a server or client, or other functionality associated with message management services.

It will be appreciated that computing system 614 is illustrative and that variations and modifications are possible. Computer systems used in connection with the present disclosure can have other capabilities not specifically described here. Further, while computing system 614 is described with reference to particular blocks, it is to be understood that these blocks are defined for convenience of description and are not intended to imply a particular physical arrangement of component parts. For instance, different blocks can be located in the same facility, in the same server rack, or on the same motherboard. Further, the blocks need not correspond to physically distinct components. Blocks can be configured to perform various operations, e.g., by programming a processor or providing appropriate control circuitry, and various blocks might or might not be reconfigurable depending on how the initial configuration is obtained. Implementations of the present disclosure can be realized in a variety of apparatus including electronic devices implemented using any combination of circuitry and software.

Systems and Methods for Preambles for UWB Transmissions

In various implementations, devices 302, 304 may leverage UWB devices or antennas 308 to exchange data communications. In some systems and methods, to exchange a data communication, a device may incorporate a preamble into the transmission frame or signal. The preamble may identify, signify, relate to, or otherwise be linked to a particular data channel or linkage between the devices. For example, two devices exchanging communications may use preambles which are known to the respective devices. When one of the devices receives a transmission frame or signal, the device may extract or otherwise identify the preamble from the signal. Upon determining that the preamble matches the known preamble of the other device, the device may determine that the device is the intended recipient of the signal and can parse the body of the signal to extract its data/contents. In some instances, multiple devices may be located within a given environment. Such devices may communicate with other devices in the environment using different protocols including, for instance, UWB, WIFI, Bluetooth, etc. As part of multiple devices being co-located in an environment, interference or cross-talk communication metrics, such as autocorrelation and cross-correlation, become important.

According to the systems and methods of the present solution, a device (such as device 302, 304) may be configured to select a first preamble code of a plurality of preamble codes for a data transmission to be sent via a UWB antenna or device 308 to a second device. The preamble codes may have a sidelobe suppression ratio/metric (e.g., an interference or cross-talk communication metric, such as or incorporating autocorrelation and/or cross-correlation metric(s)) of at least some threshold (such as 12 dB) with respect to another preamble code. The device may transmit the data transmission including the first preamble code with the UWB antenna to the second device. Rather than performing a trial-and-error or guess-and-check process of selecting preamble codes, the systems and methods described herein may perform a more targeted selection by determining suppression ratios of a selected preamble code with respect to other preamble codes. For instance, the device may determine that another device in the environment is using a particular preamble code, and select another preamble code which satisfies a suppression criteria or threshold with respect to the particular preamble code.

Referring now to FIG. 7, depicted is a system 700 for selecting preamble codes for UWB transmissions according to an example implementation of the present disclosure. The system 700 is shown to include the first device 302 and peripheral devices 304 described above with reference to FIG. 1A-FIG. 5. The devices 302, 304 may include the processing engine(s) 310, communications device 306, and other components/elements/hardware described above with reference to FIG. 1A-FIG. 5. The processing engine(s) 310 may include, for instance, a preamble selection engine 702 and a preamble detection engine 704. As described in greater detail below, the preamble selection engine 702 may be configured to select a preamble code from a plurality of preamble codes for a data transmission sent via the UWB device or antenna 308 (referred to hereinafter as “UWB antenna 308”) to another device in the environment. The preamble codes may have a sidelobe suppression ratio of at least 12 dB with respect to another preamble code. The device 302 (e.g., the communication device 306) may be configured to transmit a data transmission including the selected preamble code via the UWB antenna 308 to the other device.

As shown in FIG. 7, the system 700 may include several devices 302, 304 within the environment (e.g., the first device 302 and various peripheral devices 304). Some of the peripheral devices 304 may communicate using a protocol which is different (e.g., separate) from the UWB protocol. For example, the first peripheral device 304(1) may communicate with the second peripheral device 304(2) via Bluetooth, WIFI, etc. As part of such communications, the first peripheral device 304(1) and second peripheral devices 304(2) may be configured to establish, set, determine, or otherwise define a preamble for data transmissions sent by the respective devices 304(1), 304(2). While shown as two peripheral devices 304, it is noted that the environment may include any number of peripheral devices 304, some of which may communicate via UWB while others may communicate via other communications protocols.

The processing engines 310 may include a preamble selection engine 702. The preamble selection engine 702 may be or include any device, component, element, or combination of hardware configured to select a preamble code for incorporation into or use in a data transmission to be sent to another device 302, 304 in the environment. In some embodiments, the preamble selection engine 702 may be configured to select the preamble code from a preamble code table 706. The preamble code table 706 may be or include various preamble code tables described below with reference to FIG. 8A-FIG. 15B. In some embodiments, the preamble code table(s) 706 may be deployed, installed, or otherwise accessed locally at the device 302. In some embodiments, the preamble code table(s) 706 may be accessed from a remote data structure (e.g., the preamble code table(s) 706 may be stored or maintained at a remote data structure and accessed by the preamble selection engine 702).

As a general overview, the preamble code tables 706 may include preamble codes which have various metrics relating to autocorrelation and/or cross-correlation sidelobe suppression ratios. The sidelobe suppression ratios may be or include a metric or ratio indicating a degree or amount of signal radiation/interference from the UWB antenna 308 which is not in the direction of the main lobe towards the target (e.g., towards the other device). As such, as the sidelobe suppression ratio increases, the degree or amount of signal radiation from the UWB antenna 308 outside of the direction of the main lobe correspondingly decreases. The sidelobe suppression ratio may be computed as

20log10(auto correlation peakcrosscorrelation peak).

The autocorrelation sidelobe suppression ratio may be indicative of an estimation or likelihood data transmissions occurring on the same channel (e.g., as opposed to a different channel) over successive transmissions. The cross-correlation sidelobe suppression ratio may be indicative of coexistence properties between other devices within the environment. Having a good autocorrelation sidelobe suppression ratio and/or cross-correlation sidelobe suppression ratio may provide better channel estimation, coexistence properties with other high-speed UWB-compatible devices (including those with high rate pulse repetition frequency and low pulse repetition frequency or other legacy UWB devices).

The preamble selection engine 702 may be configured to select a preamble code from the preamble code table 706. In some embodiments, the preamble selection engine 702 may be configured to execute one or more algorithms or routines for selecting the preamble code table 706. In some embodiments, the preamble selection engine 702 may be configured to select an initial preamble code according to a default setting or selection rule. The preamble selection engine 702 may be configured to select additional or alternative preamble codes based on detected codes detected or determined to be in use within the environment, as described in greater detail below.

In some embodiments, the preamble selection engine 702 may be configured to negotiate the selection of the preamble code with another device (such as the peripheral device 304(N)) to which the device 302 is to send a data transmission. For example, as part of pairing with another device, or establishing, determining, or negotiating a channel or link between two devices 302, 304, the devices 302, 304 may be configured to select and share preamble codes with each other. The devices 302, 304 may be configured to select preamble codes which correspond to each other from the preamble code tables 706. The devices 302, 304 may be configured to identify from predefined/preconfigured preamble codes, or exchange and store preamble codes of other devices 304, 302 as part of establishing the session, link, or channel between the devices 302, 304. The devices 302, 304 may be configured to exchange preamble codes such that, upon receiving a subsequent data transmission with the preamble code, the devices 302, 304 may be configured to determine that they are the intended recipient of the data transmission (e.g., based on the preamble code matching a stored preamble code).

The processing engines 310 may include a preamble detection engine 704. The preamble detection engine 704 may be or include any device, component, element, or combination of hardware configured to detect a preamble code in use in a data transmission by another device 302, 304 in the environment. In some embodiments, the preamble detection engine 704 may be configured to detect a preamble code from a data transmission sent by another device 302, 304 with a different device (e.g., other than the detecting device) as an intended recipient. For example, the preamble detection engine 704 of the first device 302 may be configured to detect, identify, or otherwise determine that a preamble code is being used by the first peripheral device 304(1) for transmissions sent to the second peripheral device 304(2). The preamble detection engine 704 may be configured to determine that the preamble code is being used by the first peripheral device 304(1) based on receiving the transmission sent by the first peripheral device 304(1). The preamble detection engine 704 may be configured to extract, identify, or otherwise detect the preamble code from the transmission sent by the first peripheral device 304(1) (e.g., without parsing or inspecting the entirety of the transmission).

The preamble selection engine 702 may be configured to select or identify a preamble code based on the detected preamble codes currently in use in the environment. For example, the preamble selection engine 702 may be configured to select a preamble code based on a sidelobe suppression ratio for the selected preamble code with respect to detected preamble codes in the environment. The preamble selection engine 702 may be configured to select the preamble code to have a sidelobe suppression ratio which satisfies a selection threshold criteria with respect to the detected preamble code. The selection threshold criteria may depend on the particular preamble code type used or selected by the preamble selection engine 702. The selection threshold criteria may depend on a length of the preamble code selected by the preamble selection engine 702. These examples are described in greater detail below.

The communication device 306 may be configured to transmit, send, or otherwise provide the data transmission 708 using the selected preamble code. In some embodiments, the communication device 306 may be configured to receive a body of the data transmission 708 from another component or element of the device 302. The communication device 306 may be configured to produce, determine, derive, otherwise generate the data transmission 708 by incorporating the preamble code selected by the preamble selection engine 702 into the body for the data transmission 708. The communication device 306 may be configured to send, communicate, broadcast, or otherwise transmit the data transmission via the UWB antenna 308 to another device (such as peripheral device 304(N)).

Referring generally to FIG. 8A-FIG. 15B, depicted are various examples of preamble code tables which may be maintained by, incorporated in, or otherwise accessed by the devices 302, 304 described herein. FIG. 8A-FIG. 11B show preamble code tables corresponding to m-sequence preamble codes having different code lengths (e.g., sequence length of 255 for the preamble code tables shown FIG. 8A-FIG. 8B, sequence length of 511 for the preamble code tables shown FIG. 9A-FIG. 9B, sequence length of 1023 for the preamble code tables shown FIG. 10A-FIG. 10B, and sequence length of 2047 for the preamble code tables shown FIG. 11A-FIG. 11B). The m-sequence preamble codes may have, include, or comprise characters from an alphabet including {1, −1}. The preamble code tables shown in FIG. 8A-FIG. 11B include an autocorrelation sidelobe suppression ratio (shown in FIGS. 8A, 9A, 10A, and 11A) and a cross-correlation sidelobe suppression ratio (shown in FIGS. 8B, 9B, 10B, and 11B) for transmit and receive preamble codes.

FIG. 12A-FIG. 12B and FIG. 14A-FIG. 15B show preamble code tables including derived preamble codes having different sequence lengths (e.g., sequence length of 15 for the preamble code tables shown in FIG. 12A-FIG. 12B, sequence code length of 17 for the preamble code tables shown in FIG. 14A-FIG. 14B, and sequence code length of 19 for the preamble code tables shown in FIG. 15A-FIG. 15B). The derived preamble codes may have, include, or comprise characters from an alphabet including {1, −1, i, −i}. FIG. 13 shows metrics for the preamble code table shown FIG. 12A relating to cross-correlation and autocorrelation sidelobe suppression ratios. While metrics for the preamble codes shown in FIG. 13 are provided, it is noted that similar metrics may also be applicable to the preamble codes shown in FIG. 12B-FIG. 12C. Additionally, better performing metrics may be applicable to the preamble codes shown in FIG. 14A-FIG. 15B, given that these codes have a longer sequence length and therefore may have better performance for autocorrelation and cross-correlation sidelobe suppression ratios. It is noted that, while the preamble codes described herein include m-sequence or derived preamble codes, in various implementations, the preamble selection engine 702 may be configured to apply one or more cyclic shifts (e.g., character shifts) to the preamble codes.

Referring to FIG. 7 and FIG. 8A-FIG. 8B, the preamble selection engine 702 may be configured to access the preamble code tables to select a preamble code to use for transmitting data transmissions. In some embodiments, the preamble selection engine 702 may be configured to access the preamble code tables shown in FIG. 8A-FIG. 8B to select a corresponding m-sequence preamble code having a sequence length of 255. It is noted that different preamble code tables (including those shown in FIG. 9A-FIG. 15B may be accessed according to different design choices or applications. However, in at least some of these examples, the preamble selection engine 702 may be configured to select preamble codes from the corresponding preamble code table(s) using the same or similar logic/rules/algorithms as set forth herein.

As shown in FIG. 8B, the preamble code table may include groupings of metrics relating to autocorrelation sidelobe suppression ratios for the transmit/receive preamble codes. Specifically, M255_1, M255_2 for both transmit and receive devices may be grouped in one grouping (a first group or subset), M255_3, M255_4 for both transmit and receive devices may be grouped in another grouping (a second subset), M255_5, M255_6 for both transmit and receive devices may be grouped in another grouping (a third subset), and M255_7, M255_8 for both transmit and receive devices may grouped in yet another grouping (a fourth subset). Similarly, the M255_3, M255_4 transmit may be grouped with the M255_1, M255_2 receive in a secondary grouping (e.g., a fifth subset), and vice versa. Finally, the M255_7, M2558 transmit may be grouped with the M255_5, M255_6 receive in a secondary grouping (e.g., a sixth subset), and vice versa. As shown within these groupings or subsets, the sidelobe suppression ratio may be greater than 12 dB (e.g., greater than 18.3 dB for primary groupings, or the first, second, third, and fourth subsets, and greater than 12 for secondary groupings). Similar groupings may be established or provided for the other m-sequence and/or derived preamble codes, based on respective sidelobe suppression ratios.

The preamble selection engine 702 may be configured to select an initial preamble code based on, using, or according to the preamble code tables. For example, the preamble selection engine 702 may be configured to default selecting a first preamble code from the first subset or grouping (e.g., M255_1, M255_2 for transmit/receive devices). The preamble selection engine 702 may be configured to share, send, indicate, or otherwise identify the selected preamble code to the intended device recipient (e.g., the N-th peripheral device 304(N) in the example shown in FIG. 7). The preamble selection engine 702 may be configured to identify the selected preamble code to the N-th peripheral device 304(N) as part of negotiating (e.g., configuring or setting up) the session or channel between the devices 302, 304(N). The communication device 306 may be configured to incorporate the selected (e.g., first) preamble code into a data transmission 708 sent to the N-th peripheral device 304(N).

In various instances, other devices 304 within the environment may use preamble codes which are near or adjacent to (e.g., may be within the same grouping or subsets) the selected preamble code. For example, the preamble detection engine 704 may be configured to detect, identify, or otherwise receive a data transmission sent by another device (e.g., peripheral device 304(1)) which is using a preamble code from the first subset (e.g., from the same subset in which the selected preamble code is included). The preamble detection engine 704 may be configured to extract or otherwise identify the preamble code from the data transmission received from the other device 304(1). The preamble detection engine 704 may be configured to determine to switch preamble codes responsive to identifying a preamble code from the data transmission which is included in the same grouping or subset as the first selected preamble code (e.g., the first subset).

The preamble selection engine 702 may be configured to select a different preamble code using the preamble code table responsive to determining that another device is using the same or related preamble code (e.g., a code from the same grouping or subset). In some embodiments, the preamble selection engine 702 may be configured to select a different preamble code from a different grouping or subset. For example, where the other device is using a preamble code from the first subset (e.g., M255_1, M255_2), the preamble selection engine 702 may be configured to select a preamble code from the second subset (e.g., M255_3, M255_4). In some embodiments, the preamble selection engine 702 may be configured to select a different preamble code by applying a selection criteria to the auto- and cross-correlation sidelobe suppression ratios provided in the preamble code tables. For example, the preamble selection engine 702 may be configured to select a different preamble code which includes an autocorrelation sidelobe suppression ratio which is less than a certain threshold dB, and a corresponding cross-correlation sidelobe suppression ratio which is greater than a certain threshold dB. Continuing this example with reference to FIG. 8A-FIG. 8B, the preamble selection engine 702 may be configured to select a preamble code having an autocorrelation sidelobe suppression ratio which is less than 63 dB, and a cross-correlation sidelobe suppression ratio which is greater than 12 dB. In some embodiments, the selection criteria may be based on one (e.g., not both) of the sidelobe suppression ratios. For example, the preamble selection engine 702 may be configured to select a preamble code having an autocorrelation sidelobe suppression ratio which is less than 63 dB, or a cross-correlation sidelobe suppression ratio which is greater than 12 dB. Once the preamble selection engine 702 selects a new preamble code (e.g., and shares the new preamble code with the intended device recipient), the communication device 308 may be configured to incorporate the new preamble code in subsequent data transmissions 708 to the intended recipient device.

Referring now to FIG. 9A-FIG. 15B, and as noted above, different preamble code tables may be used according to different applications or use cases. For example, and as shown in FIG. 9A-FIG. 9B, some devices 302, 304 may use m-sequence preamble codes having a sequence length of 511. In this example, the devices 302, 304 may select preamble codes having an autocorrelation sidelobe suppression ratio of at least 54 dB (20*log 10(511)) and/or a cross-correlation sidelobe suppression ratio which is greater than 19 dB. In the example shown in FIG. 10A-FIG. 10B, some devices 302, 304 may use m-sequence preamble codes having a sequence length of 1023. In this example, the devices 302, 304 may select preamble codes having an autocorrelation sidelobe suppression ratio of at least 60 dB (20*log 10(1023)) and/or a cross-correlation sidelobe suppression ratio which is greater than 22 dB. In the example shown in FIG. 11A-FIG. 11B, some devices 302, 304 may use m-sequence preamble codes having a sequence length of 2047. In this example, the devices 302, 304 may select preamble codes having an autocorrelation sidelobe suppression ratio of at least 66 dB (20*log 10(2047)) and/or a cross-correlation sidelobe suppression ratio which is greater than 29 dB.

In the example shown in FIG. 12A-FIG. 12C, some devices 302, 304 may use algebraically constructed, determined, or derived preamble codes (shown in the respective tables) having a sequence length of 15. The preamble codes may be determined using an exhaustive search (e.g., algorithmically and/or artificially-intelligence based search). Such devices 302, 304 may select sequences grouped together and shown in either white or gray, as these sequences may have sidelobe suppression ratios known to satisfy a threshold criteria (e.g., autocorrelation sidelobe suppression ratios of at least 23 dB and cross-correlation sidelobe suppression ratios of at least 9.5 dB, as shown in FIG. 13). Similarly, in the example shown in FIG. 14A-FIG. 15B, some devices 302, 304 may use derived sequence preamble codes (e.g., shown in the respective tables) having a sequence length of 17 (for FIG. 14A-FIG. 14C) or 19 (for FIG. 15A-FIG. 15B). Such devices 302, 304 may select sequences grouped together and shown in either white or gray. While these sequences are provided, it is noted that similar sequences with equivalent properties (e.g., autocorrelation and/or cross-correlation properties) by performing a circular shift (e.g., shifting the sequence by N-number of elements or characters), by multiplying the sequence by a scalar (e.g., 1, −1, i, −I, etc.).

In each of these examples, the devices 302, 304 may be configured to select and/or switch between preamble codes based on or according to sidelobe suppression ratios reflected in or otherwise included in the preamble code tables as described herein. The devices 302, 304 may be configured to share their respective preamble codes with an intended recipient, and incorporate or otherwise provide the selected preamble code into data transmissions sent to the intended recipient. The intended recipient device may be configured to receive the data transmission, extract or identify the preamble code from the data transmission, and determine that it is the intended recipient of the data transmission responsive to the preamble code matching a shared preamble code.

Referring now to FIG. 16, depicted is a flowchart showing a method 1600 for selecting preamble codes for UWB transmissions, according to an example implementation of the present disclosure. The method 1600 may be performed by the devices 302, 304 described above with reference to FIG. 1A-FIG. 7, and using one or more of the tables described above with reference to FIG. 8A-FIG. 15B. As a brief overview, at step 1602, a device selects a preamble code. At step 1604, the device transmits a data transmission using the preamble code. At step 1606, the device determines whether another preamble code has been identified. At step 1608, the device determines whether the preamble codes are related. At step 1610, the device selects a different preamble code based on a sidelobe suppression ratio.

At step 1602, a device selects a preamble code. In some embodiments, the device selects a first preamble code of a plurality of preamble codes for a data transmission sent via one or more ultra-wideband (UWB) antenna(s) to a second device. The plurality of preamble codes (e.g., from which the device selects the preamble code) may have a sidelobe suppression ratio of at least 12 dB with respect to another one of the plurality of preamble codes. In some embodiments, the device may select the preamble code at step 1602 responsive to or as part of negotiating or establishing a session with the second device. For example, the device may determine to establish a connection, channel, or session with the second device (e.g., responsive to the devices being in range of each other, responsive to a user input to a respective device that triggers establishing the session, etc.). As part of establishing the session, the devices may select respective preamble codes to use for data transmissions sent between the devices. The preamble codes may be known to the respective devices and used to determine that a respective device is an intended recipient of a particular data transmission. For example, upon receiving a given data transmission, a device may extract the preamble code from the data transmission and determine whether the preamble code matches any particular preamble code previously shared by another device as part of negotiating or establishing the session. Responsive to the preamble code matching a known preamble code, the device may parse, analyze, or inspect the body of the data transmission. On the other hand, where the preamble code does not match a known preamble code, the device may discard, ignore, or otherwise disregard the data transmission.

In some embodiments, each of the preamble codes may have the same sequence length. For example, the preamble codes may include m-sequence preamble codes (e.g., as described above with respect to FIG. 8A-FIG. 11B). As another example, the preamble codes may include derived preamble codes (e.g., as described above with respect to FIG. 12A-FIG. 15B). However, each of the preamble codes from which a particular device selects a preamble code may be the same sequence length. For instance, the sequence lengths may be or include 15, 17, 19, 63, 255, 511, 1023, or 2047 characters. The characters may include two or four characters (e.g., depending on the particular preamble code type). For example, the characters may include {0, 1, −1, i, −i}, where i is equal to the square root of (−1). The m-sequence preamble codes may include {0, 1, −1} characters. In some embodiments, the m-sequence preamble codes may include {1, −1} characters. In some embodiments, the derived preamble codes may include {1, −1, i, −i} characters. In some embodiments, the preamble codes may be shifted by one or more characters (e.g., to shift each of the characters within the preamble code by one or more characters, where the first is now the second character, the second is now the third character and so forth, until the last character is now the first character, to provide one example).

In some embodiments, some of the preamble codes may be grouped into subsets or groups (e.g., to assign to groups of devices operating in different and/or partially-overlapping spaces). For example, each of a group of preamble codes having the same sequence length may be grouped into a first plurality or subset of preamble codes and a second plurality or subset of preamble codes. The first and second subsets may be distinct from one another (e.g., such that preamble codes of the first subset are not included in the preamble codes of the second subset). Additionally, at least one preamble code from a given subset may have a sidelobe suppression ratio of at least a threshold dB (e.g., 12 dB for sequence length of 255, for instance) with respect to another preamble code of the subset. Such implementations may provide for more efficient selection of preamble codes which satisfy a selection criteria based on sidelobe suppression ratios, rather than trial-and-error or guess-and-check approaches that may be used in other systems and methods.

At step 1604, the device transmits a data transmission using the preamble code. In some embodiments, the device may transmit the data transmission the data transmission including the first preamble code (e.g., selected at step 1602) via the UWB antenna(s) to the second device. The device may transmit the data transmission responsive to the device selecting the preamble code. In some embodiments, the device may transmit the data transmission including the preamble code responsive to receiving an acknowledgement from the second device of the selected preamble code. The data transmission may include the preamble code and a body including the data for transmission to the second device. The device may transmit the data transmission responsive to receiving the data to include in the body and responsive to selecting the preamble code.

At step 1606, the device determines whether another preamble code has been identified. In some embodiments, the device may receive another data transmission from a third device. The data transmission may be sent by or originate from a third device separate from the first and second device. The third device may be transmitting the data transmission to a different device other than the first or second device. The device may detect, intercept, or receive the data transmission from the third device. The device may extract or identify the preamble code from the data transmission. The device may identify the preamble code from the data transmission without analyzing or inspecting the body of the data from the transmission. The device may identify the preamble code to determine whether the device is the intended recipient of the data transmission. The device may determine that the device is not the intended recipient responsive to the preamble code not matching a known preamble code. Where the device does not identify additional preamble codes from data transmissions, the method 1600 may proceed back to step 1604, where the device generates subsequent data transmission(s) using the preamble code selected at step 1602. However, where the device does identify another preamble code, the method 1600 may proceed to step 1608.

At step 1608, the device determines whether the preamble codes are related. More specifically, the device may determine whether the preamble code selected at step 1602 is related to the preamble code identified at step 1606. The device may determine that the preamble codes are related responsive to the preamble codes matching and/or a determination of whether the preamble codes below to a same group or different groups (e.g., of the same codelength). In other words, the device may determine that another device in the environment is using the preamble code selected at step 1602, or is using a preamble code related via a same group/subset or in a counterpart group/subset (e.g., of the same code length). In some embodiments, the device may determine that the preamble codes are related responsive to the preamble code identified at step 1606 is in the same grouping or subset of preamble codes from which the preamble code selected at step 1602 is included. Where the device determines that the preamble codes are related, the method 1600 may proceed to step 1610. However, where the device determines that the preamble codes are not related, the method 1600 may proceed back to step 1604.

At step 1610, the device selects a different preamble code (e.g., of the same group or a different group) based on a sidelobe suppression ratio. In some embodiments, the device may select the different preamble code for subsequent data transmissions. The device may select the different preamble code responsive to identifying another preamble code being used in the environment by a different device and responsive to the preamble code being related to (e.g., the same as and/or grouped with) the preamble code previously selected and/or used by the device.

In some embodiments, responsive to determining to select a different preamble code, the device may identify a grouping, plurality, or subset of preamble codes. The device may identify the subset of preamble codes from which to select the different preamble code. The device may identify the subset based on the subset not including/having the preamble code selected at step 1602 and/or identified at step 1606, and/or based on a distance of the other device from the present device. The device may identify the subset of preamble codes based on at least one of the preamble codes having a sidelobe suppression ratio (e.g., autocorrelation and/or cross-correlation sidelobe suppression ratio) which satisfies a selection criteria from another preamble code of the subset. For instance, the device may identify the subset of preamble codes based on at least one preamble code having a sidelobe suppression ratio of at least 12 dB with respect to another preamble code of the subset. The device may select the different preamble code from the subset for the subsequent data transmissions. The device may share the selected preamble code (e.g., at step 1610) with the second device such that, upon the second device receiving subsequent transmissions, the second device may determine (e.g., using the preamble code) that the second device is the intended recipient of the subsequent transmissions.

Having now described some illustrative implementations, it is apparent that the foregoing is illustrative and not limiting, having been presented by way of example. In particular, although many of the examples presented herein involve specific combinations of method acts or system elements, those acts and those elements can be combined in other ways to accomplish the same objectives. Acts, elements and features discussed in connection with one implementation are not intended to be excluded from a similar role in other implementations or implementations.

The hardware and data processing components used to implement the various processes, operations, illustrative logics, logical blocks, modules and circuits described in connection with the embodiments disclosed herein may be implemented or performed with a general purpose single- or multi-chip processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA), or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein. A general purpose processor may be a microprocessor, or, any conventional processor, controller, microcontroller, or state machine. A processor also may be implemented as a combination of computing devices, such as a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration. In some embodiments, particular processes and methods may be performed by circuitry that is specific to a given function. The memory (e.g., memory, memory unit, storage device, etc.) may include one or more devices (e.g., RAM, ROM, Flash memory, hard disk storage, etc.) for storing data and/or computer code for completing or facilitating the various processes, layers and modules described in the present disclosure. The memory may be or include volatile memory or non-volatile memory, and may include database components, object code components, script components, or any other type of information structure for supporting the various activities and information structures described in the present disclosure. According to an exemplary embodiment, the memory is communicably connected to the processor via a processing circuit and includes computer code for executing (e.g., by the processing circuit and/or the processor) the one or more processes described herein.

The present disclosure contemplates methods, systems and program products on any machine-readable media for accomplishing various operations. The embodiments of the present disclosure may be implemented using existing computer processors, or by a special purpose computer processor for an appropriate system, incorporated for this or another purpose, or by a hardwired system. Embodiments within the scope of the present disclosure include program products comprising machine-readable media for carrying or having machine-executable instructions or data structures stored thereon. Such machine-readable media can be any available media that can be accessed by a general purpose or special purpose computer or other machine with a processor. By way of example, such machine-readable media can comprise RAM, ROM, EPROM, EEPROM, or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to carry or store desired program code in the form of machine-executable instructions or data structures and which can be accessed by a general purpose or special purpose computer or other machine with a processor. Combinations of the above are also included within the scope of machine-readable media. Machine-executable instructions include, for example, instructions and data which cause a general purpose computer, special purpose computer, or special purpose processing machines to perform a certain function or group of functions.

The phraseology and terminology used herein is for the purpose of description and should not be regarded as limiting. The use of “including” “comprising” “having” “containing” “involving” “characterized by” “characterized in that” and variations thereof herein, is meant to encompass the items listed thereafter, equivalents thereof, and additional items, as well as alternate implementations consisting of the items listed thereafter exclusively. In one implementation, the systems and methods described herein consist of one, each combination of more than one, or all of the described elements, acts, or components.

Any references to implementations or elements or acts of the systems and methods herein referred to in the singular can also embrace implementations including a plurality of these elements, and any references in plural to any implementation or element or act herein can also embrace implementations including only a single element. References in the singular or plural form are not intended to limit the presently disclosed systems or methods, their components, acts, or elements to single or plural configurations. References to any act or element being based on any information, act or element can include implementations where the act or element is based at least in part on any information, act, or element.

Any implementation disclosed herein can be combined with any other implementation or embodiment, and references to “an implementation,” “some implementations,” “one implementation” or the like are not necessarily mutually exclusive and are intended to indicate that a particular feature, structure, or characteristic described in connection with the implementation can be included in at least one implementation or embodiment. Such terms as used herein are not necessarily all referring to the same implementation. Any implementation can be combined with any other implementation, inclusively or exclusively, in any manner consistent with the aspects and implementations disclosed herein.

Where technical features in the drawings, detailed description or any claim are followed by reference signs, the reference signs have been included to increase the intelligibility of the drawings, detailed description, and claims. Accordingly, neither the reference signs nor their absence have any limiting effect on the scope of any claim elements.

Systems and methods described herein may be embodied in other specific forms without departing from the characteristics thereof. References to “approximately,” “about” “substantially” or other terms of degree include variations of +/−10% from the given measurement, unit, or range unless explicitly indicated otherwise. Coupled elements can be electrically, mechanically, or physically coupled with one another directly or with intervening elements. Scope of the systems and methods described herein is thus indicated by the appended claims, rather than the foregoing description, and changes that come within the meaning and range of equivalency of the claims are embraced therein.

The term “coupled” and variations thereof includes the joining of two members directly or indirectly to one another. Such joining may be stationary (e.g., permanent or fixed) or moveable (e.g., removable or releasable). Such joining may be achieved with the two members coupled directly with or to each other, with the two members coupled with each other using a separate intervening member and any additional intermediate members coupled with one another, or with the two members coupled with each other using an intervening member that is integrally formed as a single unitary body with one of the two members. If “coupled” or variations thereof are modified by an additional term (e.g., directly coupled), the generic definition of “coupled” provided above is modified by the plain language meaning of the additional term (e.g., “directly coupled” means the joining of two members without any separate intervening member), resulting in a narrower definition than the generic definition of “coupled” provided above. Such coupling may be mechanical, electrical, or fluidic.

References to “or” can be construed as inclusive so that any terms described using “or” can indicate any of a single, more than one, and all of the described terms. A reference to “at least one of ‘A’ and ‘B’” can include only ‘A’, only ‘B’, as well as both ‘A’ and ‘B’. Such references used in conjunction with “comprising” or other open terminology can include additional items.

Modifications of described elements and acts such as variations in sizes, dimensions, structures, shapes and proportions of the various elements, values of parameters, mounting arrangements, use of materials, colors, orientations can occur without materially departing from the teachings and advantages of the subject matter disclosed herein. For example, elements shown as integrally formed can be constructed of multiple parts or elements, the position of elements can be reversed or otherwise varied, and the nature or number of discrete elements or positions can be altered or varied. Other substitutions, modifications, changes and omissions can also be made in the design, operating conditions and arrangement of the disclosed elements and operations without departing from the scope of the present disclosure.

References herein to the positions of elements (e.g., “top,” “bottom,” “above,” “below”) are merely used to describe the orientation of various elements in the FIGURES. The orientation of various elements may differ according to other exemplary embodiments, and that such variations are intended to be encompassed by the present disclosure.

文章《Meta Patent | Systems and methods of preambles for uwb transmission》首发于Nweon Patent

]]>
Samsung Patent | Optical device and method of manufacturing the same https://patent.nweon.com/26743 Thu, 26 Jan 2023 15:49:13 +0000 https://patent.nweon.com/?p=26743 ...

文章《Samsung Patent | Optical device and method of manufacturing the same》首发于Nweon Patent

]]>
Patent: Optical device and method of manufacturing the same

Patent PDF: 加入映维网会员获取

Publication Number: 20230021508

Publication Date: 2023-01-26

Assignee: Samsung Display

Abstract

An augmented reality providing apparatus is provided. The augmented reality providing apparatus includes a lens including a first lens portion including a first reflective member, and a second lens portion including a second reflective member, and a display device on one side of the lens for displaying first and second images, wherein the first reflective member reflects the first image at a first angle, and the second reflective member reflects the second image at a second angle that is different from the first angle.

Claims

What is claimed is:

1.An augmented reality providing apparatus comprising: a first lens portion defining a first groove; a first reflective member in the first groove and having a concave shape; and a display device on one side of the first lens portion for displaying a first image, wherein the first reflective member reflects the first image at a first angle.

2.The augmented reality providing apparatus of claim 1, wherein the first groove has a diameter of about 400 μm to about 2 mm.

3.The augmented reality providing apparatus of claim 2, wherein the first reflective member has a diameter of about 100 μm to about 5 mm.

4.The augmented reality providing apparatus of claim 3, wherein the first groove has a surface roughness of about 20 nm to about 40 nm.

5.The augmented reality providing apparatus of claim 3, wherein residual stress at an inflection point where the first groove and a top surface of the first lens portion meet is about 4 MPa to about 6 MPa.

6.The augmented reality providing apparatus of claim 1, further comprising a second lens portion overlapping with the first lens portion and defining a second groove.

7.The augmented reality providing apparatus of claim 6, further comprising a second reflective member in the second groove, and having a concave shape.

8.The augmented reality providing apparatus of claim 7, wherein the first and second reflective members are inclined at different angles.

9.The augmented reality providing apparatus of claim 8, further comprising a third lens portion overlapping with the first and second lens portions, and comprising a third reflective member having a flat shape.

10.The augmented reality providing apparatus of claim 9, wherein the first, second, and third reflective members have different diameters.

11.A method of manufacturing an augmented reality providing apparatus, the method comprising: heating a part of a top surface of a lens corresponding to a region in which to form a groove; and forming the groove on the top surface of the lens by cooling the lens, wherein the heating the part of the top surface of the lens comprises heating an induction heating element, and placing the heated induction heating element in contact with the part of the top surface of the lens corresponding to the region in which to form the groove for about 0.1 seconds to about 1 second.

12.The method of claim 11, wherein the forming the groove on the top surface of the lens comprises cooling the top surface of the lens to a temperature of about −200° C. to about 0° C. to peel off the part of the top surface of the lens corresponding to the region in which to form the groove, and removing the part of the top surface of the lens that is peeled off.

13.The method of claim 12, further comprising forming a reflective member along the groove.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a divisional of U.S. patent application Ser. No. 16/430,310, filed Jun. 3, 2019, which claims priority to and the benefit of Korean Patent Application No. 10-2018-0083566, filed Jul. 18, 2018, the entire content of both of which is incorporated herein by reference.

BACKGROUND1. Field

The present disclosure relates to an optical device, and a method of manufacturing the same.

2. Description of the Related Art

Augmented reality (AR) refers to a technique of superimposing a virtual image (e.g., a computer generated image) on a real image seen by the eyes of a user, and displaying the two images as a single image. The virtual image may be an image in the form of text or graphics, and the actual image may be information regarding real objects that can be observed from the field of view (FOV) of an augmented reality device.

Augmented reality can be implemented using a head-mounted display (HMD), a head-up display (HUD), or the like. When augmented reality is implemented using a head-mounted display, the head-mounted display can be provided in the form of glasses, and can thus be easily carried around, and easily worn or removed, by a user. In this case, a display device that provides a virtual image for realizing augmented reality can be implemented using a micro-display, such as an organic light-emitting diode-on-silicon (OLEDoS) display, or a liquid crystal-on-silicon (LCOS) display. Recently, there have been demands for widening a part of the display device that can be seen by the eyes of a user (i.e., widening the field of view of the user).

SUMMARY

Embodiments of the present disclosure provide an augmented reality (AR) providing apparatus (an AR-providing apparatus) capable of widening a part of a display device that can be seen by the eyes of a user (i.e., the field of view (FOV) of the user), and a method of manufacturing the augmented reality providing apparatus. To this end, a plurality of micro-displays may be suitable.

According to an aspect of the present invention, there is provided an augmented reality providing apparatus including a lens including a first lens portion including a first reflective member, and a second lens portion including a second reflective member, and a display device on one side of the lens for displaying first and second images, wherein the first reflective member reflects the first image at a first angle, and the second reflective member reflects the second image at a second angle that is different from the first angle.

The lens may further include a third lens portion including a third reflective member, wherein the display device is further for displaying a third image, and wherein the third reflective member reflects the third image at a third angle that is different from the first and second angles.

The first, second, and third lens portions may be sequentially arranged in a first direction that is a thickness direction of the lens.

The first, second, and third reflective members may overlap one another in a first direction that is a thickness direction of the lens.

The first, second, and third reflective members may be spaced apart from one another along a second direction that is a width direction of the lens.

The first, second, and third reflective members may be inclined at different angles.

The display device may include first, second, and third display panels for displaying the first, second, and third images, respectively.

According to another aspect of the present invention, there is provided an augmented reality providing apparatus including a first lens portion defining a first groove, a first reflective member in the first groove and having a concave shape, and a display device on one side of the first lens portion for displaying a first image, wherein the first reflective member reflects the first image at a first angle.

The first groove may have a diameter of about 400 μm to about 2 mm.

The first reflective member may have a diameter of about 100 μm to about 5 mm.

The first groove may have a surface roughness of about 20 nm to about 40 nm.

Residual stress at an inflection point where the first groove and a top surface of the first lens portion meet may be about 4 MPa to about 6 MPa.

The augmented reality providing apparatus my further include a second lens portion overlapping with the first lens portion and defining a second groove.

The augmented reality providing apparatus my further include a second reflective member in the second groove, and having a concave shape.

The first and second reflective members may be inclined at different angles.

The augmented reality providing apparatus my further include a third lens portion overlapping with the first and second lens portions, and including a third micro-lens having a flat shape.

The augmented reality providing apparatus my further include a third lens portion overlapping with the first lens portion and the second lens portion, and defining a third groove, and a third reflective member in the third groove, and having a concave shape, wherein the first, second, and third reflective members have different diameters.

According to still another aspect of the present invention, there is provided a method of manufacturing an augmented reality providing apparatus, the method including heating a part of a top surface of a lens corresponding to a region in which to form a groove, and forming the groove on the top surface of the lens by cooling the lens, wherein the heating the part of the top surface of the lens includes heating an induction heating element, and placing the heated induction heating element in contact with the part of the top surface of the lens corresponding to the region in which to form the groove for about 0.1 seconds to about 1 second.

The forming the groove on the top surface of the lens may include cooling the top surface of the lens to a temperature of about −200° C. to about 0° C. to peel off the part of the top surface of the lens corresponding to the region in which to form the groove, and removing the part of the top surface of the lens that is peeled off.

The method may further include forming a reflective member along the groove.

According to the aforementioned and other embodiments of the present disclosure, a part of the display device that can be seen by the eyes of a user (i.e., a part of the display device corresponding to the field of view of the user) can be widened.

Other features and embodiments may be apparent from the following detailed description, the drawings, and the claims.

BRIEF DESCRIPTION OF THE DRAWINGS

The above and other embodiments and features of the present disclosure will become more apparent by describing in detail embodiments thereof with reference to the attached drawings, in which:

FIG. 1 is a perspective view of an augmented reality (AR) providing apparatus according to an embodiment of the present disclosure;

FIG. 2 is an exploded perspective view of a lens of an augmented reality providing apparatus according to an embodiment of the present disclosure;

FIG. 3 is an exploded perspective view of a lens according to another embodiment of the present disclosure;

FIG. 4 is a schematic cross-sectional view of an augmented reality providing apparatus according to another embodiment of the present disclosure;

FIG. 5 is a schematic cross-sectional view of an augmented reality providing apparatus according to another embodiment of the present disclosure;

FIG. 6 is a perspective view of an augmented reality providing apparatus according to another embodiment of the present disclosure;

FIG. 7 is an exploded perspective view of a lens of the augmented reality according to another embodiment of the present disclosure;

FIG. 8 is an exploded perspective view of a lens according to another embodiment of the present disclosure;

FIG. 9 is a schematic cross-sectional view of an augmented reality providing apparatus according to another embodiment of the present disclosure;

FIG. 10 is a schematic cross-sectional view of an augmented reality providing apparatus according to another embodiment of the present disclosure;

FIG. 11 is a schematic view illustrating the field of view (FOV) of a user when refractive members are located at the same inclination angle;

FIG. 12 is a schematic view illustrating the field of view of the user when a lens according to an embodiment of the present disclosure is employed;

FIG. 13 is a perspective view of an augmented reality providing apparatus according to another embodiment of the present disclosure;

FIG. 14 is a cross-sectional view of the augmented reality providing apparatus of FIG. 13;

FIG. 15 is a cross-sectional view of an augmented reality providing apparatus according to another embodiment of the present disclosure;

FIG. 16 is a cross-sectional view of an augmented reality providing apparatus according to another embodiment of the present disclosure;

FIG. 17 is a cross-sectional view of an augmented reality providing apparatus according to another embodiment of the present disclosure;

FIG. 18 is a schematic view illustrating the field of view of a user when a flat reflective member is used;

FIG. 19 is a schematic view illustrating the field of view of a user when a reflective member having a concave shape is used according to an embodiment of the present disclosure;

FIG. 20 is a perspective view of an augmented reality providing apparatus according to another embodiment of the present disclosure;

FIG. 21 is a cross-sectional view of the augmented reality providing apparatus of FIG. 20;

FIGS. 22 through 26 are cross-sectional views illustrating a method of fabricating a lens of an augmented reality providing apparatus, having a concave reflective member formed therein, according to an embodiment of the present disclosure;

FIG. 27 is a graph showing the residual stress in the groove of a lens according to an embodiment of the present disclosure;

FIG. 28 is a cross-sectional view of a display device according to an embodiment of the present disclosure; and

FIG. 29 is a perspective view of a head-mounted display including an augmented reality providing apparatus according to various embodiments of the present disclosure.

DETAILED DESCRIPTION

Features of the inventive concept and methods of accomplishing the same may be understood more readily by reference to the following detailed description of embodiments and the accompanying drawings. Hereinafter, embodiments will be described in more detail with reference to the accompanying drawings. The described embodiments, however, may be embodied in various different forms, and should not be construed as being limited to only the illustrated embodiments herein. Rather, these embodiments are provided as examples so that this disclosure will be thorough and complete, and will fully convey the aspects and features of the present inventive concept to those skilled in the art. Accordingly, processes, elements, and techniques that are not necessary to those having ordinary skill in the art for a complete understanding of the aspects and features of the present inventive concept may not be described. Unless otherwise noted, like reference numerals denote like elements throughout the attached drawings and the written description, and thus, descriptions thereof will not be repeated. Further, parts not related to the description of the embodiments might not be shown to make the description clear. In the drawings, the relative sizes of elements, layers, and regions may be exaggerated for clarity.

Various embodiments are described herein with reference to sectional illustrations that are schematic illustrations of embodiments and/or intermediate structures. As such, variations from the shapes of the illustrations as a result, for example, of manufacturing techniques and/or tolerances, are to be expected. Further, specific structural or functional descriptions disclosed herein are merely illustrative for the purpose of describing embodiments according to the concept of the present disclosure. Thus, embodiments disclosed herein should not be construed as limited to the particular illustrated shapes of regions, but are to include deviations in shapes that result from, for instance, manufacturing. For example, an implanted region illustrated as a rectangle will, typically, have rounded or curved features and/or a gradient of implant concentration at its edges rather than a binary change from implanted to non-implanted region. Likewise, a buried region formed by implantation may result in some implantation in the region between the buried region and the surface through which the implantation takes place. Thus, the regions illustrated in the drawings are schematic in nature and their shapes are not intended to illustrate the actual shape of a region of a device and are not intended to be limiting. Additionally, as those skilled in the art would realize, the described embodiments may be modified in various different ways, all without departing from the spirit or scope of the present invention.

In the following description, for the purposes of explanation, numerous specific details are set forth to provide a thorough understanding of various embodiments. It is apparent, however, that various embodiments may be practiced without these specific details or with one or more equivalent arrangements. In other instances, well-known structures and devices are shown in block diagram form in order to avoid unnecessarily obscuring various embodiments.

It will be understood that, although the terms “first,” “second,” “third,” etc., may be used herein to describe various elements, components, regions, layers and/or sections, these elements, components, regions, layers and/or sections should not be limited by these terms. These terms are used to distinguish one element, component, region, layer or section from another element, component, region, layer or section. Thus, a first element, component, region, layer or section described below could be termed a second element, component, region, layer or section, without departing from the spirit and scope of the present invention.

Spatially relative terms, such as “beneath,” “below,” “lower,” “under,” “above,” “upper,” and the like, may be used herein for ease of explanation to describe one element or feature’s relationship to another element(s) or feature(s) as illustrated in the figures. It will be understood that the spatially relative terms are intended to encompass different orientations of the device in use or in operation, in addition to the orientation depicted in the figures. For example, if the device in the figures is turned over, elements described as “below” or “beneath” or “under” other elements or features would then be oriented “above” the other elements or features. Thus, the example terms “below” and “under” can encompass both an orientation of above and below. The device may be otherwise oriented (e.g., rotated 90 degrees or at other orientations) and the spatially relative descriptors used herein should be interpreted accordingly. Similarly, when a first part is described as being arranged “on” a second part, this indicates that the first part is arranged at an upper side or a lower side of the second part without the limitation to the upper side thereof on the basis of the gravity direction.

It will be understood that when an element, layer, region, or component is referred to as being “on,” “connected to,” or “coupled to” another element, layer, region, or component, it can be directly on, connected to, or coupled to the other element, layer, region, or component, or one or more intervening elements, layers, regions, or components may be present. However, “directly connected/directly coupled” refers to one component directly connecting or coupling another component without an intermediate component. Meanwhile, other expressions describing relationships between components such as “between,” “immediately between” or “adjacent to” and “directly adjacent to” may be construed similarly. In addition, it will also be understood that when an element or layer is referred to as being “between” two elements or layers, it can be the only element or layer between the two elements or layers, or one or more intervening elements or layers may also be present.

For the purposes of this disclosure, expressions such as “at least one of,” when preceding a list of elements, modify the entire list of elements and do not modify the individual elements of the list. For example, “at least one of X, Y, and Z” and “at least one selected from the group consisting of X, Y, and Z” may be construed as X only, Y only, Z only, or any combination of two or more of X, Y, and Z, such as, for instance, XYZ, XYY, YZ, and ZZ. Like numbers refer to like elements throughout. As used herein, the term “and/or” includes any and all combinations of one or more of the associated listed items.

In the following examples, the x-axis, the y-axis, and/or the z-axis are not limited to three axes of a rectangular coordinate system, and may be interpreted in a broader sense. For example, the x-axis, the y-axis, and the z-axis may be perpendicular to one another, or may represent different directions that are not perpendicular to one another. The same applies for first, second, and/or third directions.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the present disclosure. As used herein, the singular forms “a” and “an” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises,” “comprising,” “have,” “having,” “includes,” and “including,” when used in this specification, specify the presence of the stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. As used herein, the term “and/or” includes any and all combinations of one or more of the associated listed items.

As used herein, the term “substantially,” “about,” “approximately,” and similar terms are used as terms of approximation and not as terms of degree, and are intended to account for the inherent deviations in measured or calculated values that would be recognized by those of ordinary skill in the art. “About” or “approximately,” as used herein, is inclusive of the stated value and means within an acceptable range of deviation for the particular value as determined by one of ordinary skill in the art, considering the measurement in question and the error associated with measurement of the particular quantity (i.e., the limitations of the measurement system). For example, “about” may mean within one or more standard deviations, or within ±30%, 20%, 10%, 5% of the stated value. Further, the use of “may” when describing embodiments of the present disclosure refers to “one or more embodiments of the present disclosure.” As used herein, the terms “use,” “using,” and “used” may be considered synonymous with the terms “utilize,” “utilizing,” and “utilized,” respectively. Also, the term “exemplary” is intended to refer to an example or illustration.

When a certain embodiment may be implemented differently, a specific process order may be performed differently from the described order. For example, two consecutively described processes may be performed substantially at the same time or performed in an order opposite to the described order.

Also, any numerical range disclosed and/or recited herein is intended to include all sub-ranges of the same numerical precision subsumed within the recited range. For example, a range of “1.0 to 10.0” is intended to include all subranges between (and including) the recited minimum value of 1.0 and the recited maximum value of 10.0, that is, having a minimum value equal to or greater than 1.0 and a maximum value equal to or less than 10.0, such as, for example, 2.4 to 7.6. Any maximum numerical limitation recited herein is intended to include all lower numerical limitations subsumed therein, and any minimum numerical limitation recited in this specification is intended to include all higher numerical limitations subsumed therein. Accordingly, Applicant reserves the right to amend this specification, including the claims, to expressly recite any sub-range subsumed within the ranges expressly recited herein. All such ranges are intended to be inherently described in this specification such that amending to expressly recite any such subranges would comply with the requirements of 35 U.S.C. § 112(a) and 35 U.S.C. § 132(a).

The electronic or electric devices and/or any other relevant devices or components according to embodiments of the present disclosure described herein may be implemented utilizing any suitable hardware, firmware (e.g. an application-specific integrated circuit), software, or a combination of software, firmware, and hardware. For example, the various components of these devices may be formed on one integrated circuit (IC) chip or on separate IC chips. Further, the various components of these devices may be implemented on a flexible printed circuit film, a tape carrier package (TCP), a printed circuit board (PCB), or formed on one substrate. Further, the various components of these devices may be a process or thread, running on one or more processors, in one or more computing devices, executing computer program instructions and interacting with other system components for performing the various functionalities described herein. The computer program instructions are stored in a memory which may be implemented in a computing device using a standard memory device, such as, for example, a random access memory (RAM). The computer program instructions may also be stored in other non-transitory computer readable media such as, for example, a CD-ROM, flash drive, or the like. Also, a person of skill in the art should recognize that the functionality of various computing devices may be combined or integrated into a single computing device, or the functionality of a particular computing device may be distributed across one or more other computing devices without departing from the spirit and scope of the embodiments of the present disclosure.

Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the present inventive concept belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and/or the present specification, and should not be interpreted in an idealized or overly formal sense, unless expressly so defined herein.

FIG. 1 is a perspective view of an augmented reality (AR) providing apparatus according to an embodiment of the present disclosure, and FIG. 2 is an exploded perspective view of a lens of an augmented reality providing apparatus according to an embodiment of the present disclosure.

Referring to FIGS. 1 and 2, an augmented reality providing apparatus 10 includes a lens 100, a display device 200, and an adhesive layer 300.

The lens 100 may include a plurality of lens portions. In one embodiment, the lens 100 may include first, second, and third lens portions 101, 103, and 105, but the present disclosure is not limited thereto. For example, the lens 100 may include at least two lens portions. The lens 100 may be formed of glass or plastic to be transparent or semitransparent. Accordingly, a user can see a real image (i.e., reality) through the lens 100. The lens 100 may have a refractive index (e.g., a predetermined refractive index) in consideration of the visual acuity of the user, and the first, second, and third lens portions 101, 103, and 105 may have the same refractive index. However, the present disclosure is not limited to this. The first, second, and third lens portions 101, 103, and 105 may be bonded to one another through a bonding material. For example, the bonding material may be an optically clear resin (OCR) or an optically clear adhesive (OCA), but the present disclosure is not limited thereto.

The lens 100 is illustrated as being formed as a decahedron consisting of first and second octagonal surfaces SF1 and SF2 having their corners CE chamfered and first, second, third, and fourth sides a, b, c, and d, but may be formed in various other shapes. For example, the lens 100 may be formed as a hexahedron consisting of first and second rectangular surfaces having right-angled corners and first, second, third, and fourth sides a, b, c, and d. That is, the lens 100 may be formed as a polyhedron consisting of first and second polygonal surfaces and a number of sides, or may even be formed as a cylinder.

The first surface SF1 of the lens 100 may be a surface of the third lens portion 105, and the second surface SF2 of the lens 100 may be a surface of the first lens portion 101. That is, the first surface SF1 of the lens 100 may be a surface facing an eye E of the user, and the second surface SF2 of the lens 100 may be the outer surface of the lens 100. The lens 100 may be formed in various shapes other than a polyhedron, such as a cylinder, an elliptical cylinder, a semicircular cylinder, a semielliptical cylinder, a distorted cylinder, or a distorted semicircular cylinder. The terms “distorted cylinder” and “distorted semicircular cylinder,” as used herein, refer to a cylinder and a semicircular cylinder, respectively, having a non-uniform diameter.

In one embodiment, the first, second, and third lens portions 101, 103, and 105 may have the same size, and may be bonded together in a third direction (or a Z-axis direction) to form the lens 100, but the present disclosure is not limited thereto.

The lens 100 may include first, second, and third reflective members 410, 420, and 430. For example, the first reflective member 410 may be located in the first lens portion 101, the second reflective member 420 may be located in the second lens portion 103, and the third reflective member 430 may be located in the third lens portion 105. The first, second, and third reflective members 410, 420, and 430 may also be referred to as pin mirrors or micro-mirrors, but the present disclosure is not limited thereto.

In one embodiment, the first, second, and third reflective members 410, 420, and 430 may be located in the first, second, and third lens portions 101, 103, and 105, respectively, but the present disclosure is not limited thereto. In another embodiment, a plurality of first reflective members 410 may be located in the first lens portion 101, a plurality of second reflective members 420 may be located in the second lens portion 103, and a plurality of third reflective members 430 may be located in the third lens portion 105. In order to widen a part of a display device 200 that can be perceived by the eye E of the user (i.e., to widen the field-of-view (FOV) of the user), the lens 100 may suitably include a plurality of first reflective members 410, a plurality of second reflective members 420, and a plurality of third reflective members 430. The display device 200 displays virtual images for realizing augmented reality. The display device 200 may be located on a side of the lens 100.

For example, the display device 200 may be located on the first side (e.g., side a) of the lens 100, but the present disclosure is not limited thereto. For example, the display device 200 may be located on at least one of the first, second, third, and fourth sides a, b, c, and d. The first, second, and third reflective members 410, 420, and 430 may be positioned to have different angles. The first, second, and third reflective members 410, 420, and 430 reflect the virtual images displayed by the display device 200, and thereby provide the virtual images to the eye E of the user. Because the virtual images displayed by the display device 200 are reflected by the first, second, and third reflective members 410, 420, and 430 having different angles, the depth of field of the virtual images is deepened.

For example, referring to FIG. 1, the first reflective member 410 of the first lens portion 101 provides a first image IM1 displayed by the display device 200 to the eye E of the user by reflecting the first image IM1 toward the first surface SF1 of the lens 100. The second reflective member 420 of the second lens portion 103 provides a second image IM2 displayed by the display device 200 to the eye E of the user by reflecting the second image IM2 toward the first surface SF1 of the lens 100. The third reflective member 430 of the third lens portion 105 provides a third image IM3 displayed by the display device 200 to the eye E of the user by reflecting the third image IM3 toward the first surface SF1 of the lens 100. The first, second, and third reflective members 410, 420, and 430 may allow the virtual images displayed by the display device 200 (e.g., the first, second, and third images IM1, IM2, and IM3) to be focused on a single point on the retina of the eye E of the user. As a result, even when the user focuses on the real image through the lens 100, the user can see the virtual images clearly. That is, the user can see the virtual images clearly without the need to shift his or her focus currently being placed on the real image.

The first, second, and third reflective members 410, 420, and 430 may be smaller in size than the pupil of the eye E of the user. For example, the first, second, and third reflective members 410, 420, and 430 may have a diameter of about 5 mm. In this case, because the user focuses on the real image, it is difficult for the user to recognize the first, second, and third reflective members 410, 420, and 430. However, as the size of the first, second, and third reflective members 410, 420, and 430 decreases, the luminance of the virtual images provided by the display device 200 to the eye E of the user decreases, and given this, the size of the first, second, and third reflective members 410, 420, and 430 may be appropriately set. FIG. 1 illustrates the first, second, and third reflective members 410, 420, and 430 as having a circular cross-sectional shape, but the first, second, and third reflective members 410, 420, and 430 may have an elliptical or polygonal cross-sectional shape.

In one embodiment, the first lens portion 101 may include first, second, third, and fourth sides 101a, 101b, 101c, and 101d, and may have chamfered edges CE at the corners thereof, but the present disclosure is not limited thereto. For example, the first lens portion 101 may have right-angled corners. The first reflective member 410 may be located at the center of the first lens portion 101.

The second lens portion 103 may include first, second, third, and fourth sides 103a, 103b, 103c, and 103d, and may have chamfered edges CE at the corners thereof, but the present disclosure is not limited thereto. For example, the second lens portion 103 may have right-angled corners. The second reflective member 420 may be located at the center of the second lens portion 103.

The third lens portion 105 may include first, second, third, and fourth sides 105a, 105b, 105c, and 105d, and may have chamfered edges CE at the corners thereof, but the present disclosure is not limited thereto. For example, the third lens portion 105 may have right-angled corners. The third reflective member 430 may be located at the center of the third lens portion 105.

The first, second, and third lens portions 101, 103, and 105 may have the same size in a plan view, and the first, second, and third reflective members 410, 420, and 430, which are located in the first, second, and third lens portions 101, 103, and 105, respectively, may be inclined at different angles, and may overlap with one another in the third direction (or the Z-axis direction).

FIG. 3 is an exploded perspective view of a lens according to another embodiment of the present disclosure. Referring to FIG. 3, a first reflective member 410 of a first lens portion 101 may be located close to a second side 101b of the first lens portion 101, a second reflective member 420 of a second lens portion 103 may be located at the center of the second lens portion 103, and a third reflective member 430 of a third lens portion 105 may be located close to a fourth side 105d of the third lens portion 105. That is, the first, second, and third reflective members 410, 420, and 430, which are located in the first, second, and third lens portions 101, 103, and 105, respectively, may be inclined at different angles, and may be spaced apart from one another in a first direction (or an X-axis direction) (i.e., not aligned in the third direction/the Z-axis direction), and the first and third reflective members 410 and 430 may be symmetrical in the first direction (or the X-axis direction) with respect to the second reflective member 420. However, the present disclosure is not limited to this. For example, alternatively, the first, second, and third reflective members 410, 420, and 430, which are located in the first, second, and third lens portions 101, 103, and 105, respectively, may be inclined at different angles and may be spaced apart from one another in a second direction (or a Y-axis direction). Still alternatively, the first, second, and third reflective members 410, 420, and 430, which are located in the first, second, and third lens portions 101, 103, and 105, respectively, may be inclined at different angles, and may be spaced apart from one another in both the first direction (or the X-axis direction) and the second direction (or the Y-axis direction). Here, the first direction (or the X-axis direction) may be defined as the width direction of a lens 100, and the second direction (or the Y-axis direction) may be defined as the height direction of the lens 100.

FIG. 4 is a schematic cross-sectional view of an augmented reality providing apparatus according to another embodiment of the present disclosure. Referring to FIG. 4, in one embodiment, a first reflective member 410 may be located in a first lens portion 101, a second reflective member 420 may be located in a second lens portion 103, and a third reflective member 430 may be located in a third lens portion 105.

The first, second, and third lens portions 101, 103, and 105 may have the same thickness or different thicknesses. In one embodiment, the second lens portion 103 may be thicker than the first and third lens portions 101 and 105, but the present disclosure is not limited thereto. For example, in another embodiment, the thickness of the first, second, and third lens portions 101, 103, and 105 may sequentially increase or decrease from the first lens portion 101 to the second lens portion 103 to the third lens portion 105. In yet another embodiment, some of the first, second, and third lens portions 101, 103, and 105 may have a different thickness from the rest of the first, second, and third lens portions 101, 103, and 105.

The first, second, and third reflective members 410, 420, and 430 may be located in parallel to one another in the thickness direction of a lens 100 (i.e., in a third direction/Z-axis direction), but the present disclosure is not limited thereto. For example, the first, second, and third reflective members 410, 420, and 430 may be located at different heights. In other words, the first, second, and third reflective members 410, 420, and 430 may be located at different locations in the height direction of the lens 100 (i.e., in a second direction/Y-axis direction).

For example, the height of, or the locations in the height direction of, the first, second, and third reflective members 410, 420, and 430 may sequentially increase or decrease from the first reflective member 410 to the second reflective member 420 to the third reflective member 430, while only some of the first, second, and third reflective members 410, 420, and 430 may have a different height from, or different height location from, the rest of the first, second, and third reflective members 410, 420, and 430.

A first inclination angle θ1 of the first reflective member 410 may be set such that the first reflective member 410 is able to reflect and thereby provide a first image IM1 of a display device 200 to an eye E of a user. A second inclination angle 82 of the second reflective member 420 may be set such that the second reflective member 420 is able to reflect and thereby provide a second image IM2 of the display device 200 to the eye E of the user. A third inclination angle θ3 of the third reflective member 430 may be set such that the third reflective member 430 is able to reflect and thereby provide a third image IM3 of the display device 200 to the eye E of the user.

The first, second, and third inclination angles 81, 82, and 83 refer to the angles at which the first, second, and third reflective members 410, 420, and 430 are inclined toward the second direction (or the Y-axis direction) with respect to the thickness direction of the lens 100 (i.e., the third direction/Z-axis direction). In one embodiment, the first, second, and third inclination angles 81, 82, and 83 of the first, second, and third reflective members 410, 420, and 430 may be set to differ from one another. For example, the second inclination angle θ2 of the second reflective member 420 may be set to be smaller than the first inclination angle θ1 of the first reflective member 410, and the third inclination angle θ3 of the third reflective member 430 may be set to be greater than the second inclination angle θ2 of the second reflective member 420. Also for example, only two of the first, second, and third reflective members 410, 420, and 430 may be set to have the same inclination angle. For example, the first and second inclination angles θ1 and θ2 of the first and second reflective members 410 and 420 may be set to be the same, and the third inclination angle θ3 of the third reflective member 430 may be set to be smaller than the first and second inclination angles θ1 and θ2 of the first and second reflective members 410 and 420.

The first, second, and third inclination angles 81, 82, and 83 of the first, second, and third reflective members 410, 420, and 430 may be set depending on the angles of inclination planes IP of the first, second, and third lens portions 101, 103, and 105. Each of the first, second, and third lens portions 101, 103, and 105 may be divided into first and second parts P1 and P2, and the interface between the first and second parts P1 and P2 may be defined as an inclination plane IP. Because the first, second, and third reflective members 410, 420, and 430 are mounted on the inclination planes IP of the first, second, and third lens portions 101, 103, and 105, respectively, the first, second, and third inclination angles 81, 82, and 83 of the first, second, and third reflective members 410, 420, and 430 may be determined by the inclination planes IP of the first, second, and third lens portions 101, 103, and 105. Because the lens 100 is formed by the first, second, and third lens portions 101, 103, and 105, the inclination planes IP of the first, second, and third reflective members 410, 420, and 430 may be set to differ from one another, and as a result, the first, second, and third inclination angles 81, 82, and 83 of the first, second, and third reflective members 410, 420, and 430 may be set to differ from one another. Accordingly, the first, second, and third images IM1, IM2, and IM3 output by the display device 200 can be effectively focused on a single point or area on the retina of the eye E of the user.

In a case where each of the first, second, and third reflective members 410, 420, and 430 consists of a plurality of mirrors, the mirrors of the first reflective member 410 may be set to have the same inclination angle as one another (i.e., the first inclination angle θ1), the mirrors of the second reflective member 420 may be set to have the same inclination angle as one another (i.e., the second inclination angle 82), the mirrors of the third reflective member 430 may be set to have the same inclination angle as one another (i.e., the third inclination angle θ3), and the first, second, and third inclination angles 81, 82, and 83 may be set to differ from one another. However, the present disclosure is not limited to this. For example, alternatively, the mirrors of the first reflective member 410 may be set to have different inclination angles than each other, the mirrors of the second reflective member 420 may be set to have different inclination angles than each other, and the mirrors of the third reflective member 430 may be set to have different inclination angles than each other. Still alternatively, the mirrors of the first reflective member 410 may be set to have different inclination angles, and the mirrors of the third reflective member 430 may be set to have different inclination angles, while the mirrors of the second reflective member 420 may be set to have the same inclination angle.

FIG. 5 is a schematic cross-sectional view of an augmented reality providing apparatus according to another embodiment of the present disclosure. Referring to FIG. 5, a first reflective member 410 may be located in a first lens portion 101, a second reflective member 420 may be located in a second lens portion 103, and a lens 100 may be formed by the first and second lens portions 101 and 103. A first inclination angle θ1 of the first reflective member 410 included in the first lens portion 101 may be set to enable the first reflective member 410 to reflect and thereby provide a first image IM1 of a display device 200 to an eye E of a user. A second inclination angle θ2 of the second reflective member 420 included in the second lens portion 103 may be to enable the second reflective member 420 to reflect and thereby provide a second image IM2 of the display device 200 to the eye E of the user. The first and second inclination angles θ1 and θ2 of the first and second reflective members 410 and 420 may be set to differ from each other. For example, the second inclination angle θ2 of the second reflective member 420 may be set to be smaller than the first inclination angle θ1 of the first reflective member 410, and the difference between the first and second inclination angles θ1 and θ2 may be greater than that depicted in the example of FIG. 4. However, the present disclosure is not limited to this.

The display device 200 may be a flexible display device that has flexibility and can thus be bent. For example, the display device 200 may be a flexible organic light-emitting diode (OLED) display device, but the present disclosure is not limited thereto. The display device 200 will be described later in detail. An adhesive layer 300 adheres the lens 100 and the display device 200 together.

The adhesive layer 300 may be formed as an optically clear resin film or an optically clear adhesive film. As already mentioned above, the augmented reality providing apparatus 10 can provide a real image to the eye E of the user through the lens 100, and can also provide virtual images output by the display device 200 through the first, second, and third reflective members 410, 420, and 430 to the eye E of the user. That is, the virtual images may be superimposed on the real image, and the images may then be perceived by the eye E of the user as a single image. In the augmented reality providing apparatus 10, the lens 100 consists of the first, second, and third lens portions 101, 103, and 105, and the first, second, and third reflective members 410, 420, and 430 are located in the first, second, and third lens portions 101, 103, and 105, respectively, at different inclination angles (i.e., the first, second, and third inclination angles 81, 82, and 83, respectively).

Thus, the virtual images output by the display device 200 can fall on the eye E of the user through the first, second, and third reflective members 410, 420, and 430. Accordingly, even images that fall beyond the retina of the eye E of the user can be reflected toward the retina of the eye E of the user, and as a result, the field of view of the user can be widened.

Also, a micro-display, such as an organic light-emitting diode-on-silicon (OLEDoS) display or a liquid crystal-on-silicon (LCOS) display, realizes colors by forming color filters on an organic light-emitting layer that emits white light, and can thus realize high luminance. On the other hand, the display device 200 of the augmented reality providing apparatus 10 can use red, green, and blue organic light-emitting layers. Thus, because there is no need to use color filters, the display device 200 may provide useful differences over an organic light-emitting diode-on-silicon with respect to realizing luminance.

FIG. 6 is a perspective view of an augmented reality providing apparatus according to another embodiment of the present disclosure, and FIG. 7 is an exploded perspective view of a lens of the augmented reality providing apparatus of FIG. 6.

The embodiment of FIGS. 6 and 7 differs from the embodiment of FIG. 1 in that a display device 200_1 includes first, second, and third display panels 201, 203, and 205. The embodiment of FIGS. 6 and 7 will hereinafter be described while focusing mainly on differences from the embodiment of FIG. 1.

Referring to FIGS. 6 and 7, an augmented reality providing apparatus 10_1 may include a lens 100, the display device 200_1, and an adhesive layer 300_1.

The lens 100 may include a plurality of lens portions. In one embodiment, the lens 100 may include first, second, and third lens portions 101, 103, and 105.

The lens 100 may include first, second, and third reflective members 410, 420, and 430. For example, the first reflective member 410 may be located in the first lens portion 101, the second reflective member 420 may be located in the second lens portion 103, and the third reflective member 430 may be located in the third lens portion 105. The first, second, and third reflective members 410, 420, and 430 may also be referred to as pin mirrors.

The display device 200_1 displays virtual images for realizing augmented reality. The display device 200_1 may include a plurality of display panels. The display device 200_1 may include first, second, and third display panels 201, 203, and 205 for respectively displaying first, second, and third images IM1, IM2, and IM3. That is, the display device 200_1 may include the first display panel 201 displaying the first image IM1, the second display panel 203 displaying the second image IM2, and the third display panel 205 displaying the third image IM3. The first, second, and third display panels 201, 203, and 205 may respectively correspond to the first, second, and third lens portions 101, 103, and 105. For example, the first display panel 201 may be located on one side of the first lens portion 101, the second display panel 203 may be located on one side of the second lens portion 103, and the third display panel 205 may be located on one side of the third lens portion 105. However, the present disclosure is not limited to this.

For example, the first, second, and third display panels 201, 203, and 205 may be located in parallel in the thickness direction of the lens 100 (i.e., a third direction/Z-axis direction). For example, the first, second, and third display panels 201, 203, and 205 may be located on a first side “a” of the lens 100, but the present disclosure is not limited thereto. For example, the first, second, and third display panels 201, 203, and 205 may be located on at least one of the first side “a” and second, third, and fourth sides b, c, and d of the lens 100. Also, the first, second, and third display panels 201, 203, and 205 may be located on different sides of the lens 100. For example, the first display panel 201 may be located on the first side a of the lens 100, the second display panel 203 may be located on the second side b of the lens 100, and the third display panel 205 may be located on the third side c of the lens 100. In another example, the first and second display panels 201 and 203 may be located on the first side a of the lens 100, and the third display panel 205 may be located on the third side c of the lens 100. The adhesive layer 300_1 adheres the lens 100 and the display device 200_1 together.

The adhesive layer 300_1 may include a plurality of adhesive portions. For example, the first adhesive layer 300_1 may include a first adhesive portion 301 adhering the first lens portion 101 and the first display panel 201 together, a second adhesive portion 303 adhering the second lens portion 103 and the second display panel 203 together, and a third adhesive portion 305 adhering the third lens portion 105 and the third display panel 205 together. However, the present disclosure is not limited thereto. For example, the adhesive layer 300_1 may include a single adhesive layer 300_1 adhering the first, second, and third lens portions 101, 103, and 105 and the first, second, and third display panels 201, 203, and 205 together at once. The adhesive layer 300_1 may be formed as an optically clear resin film or an optically clear adhesive film. The first, second, and third reflective members 410, 420, and 430 may have different angles.

The first, second, and third reflective members 410, 420, and 430 reflect and thereby provide virtual images displayed by the display device 200_1 to an eye E of the user. Because the virtual images displayed by the display device 200_1 are reflected by the first, second, and third reflective members 410, 420, and 430 having different angles, the depth of field of the virtual images deepens.

For example, referring to FIG. 6, the first reflective member 410 of the first lens portion 101 provides a first image IM1 displayed by the display device 200_1 to the eye E of the user by reflecting the first image IM1 toward a first surface SF1 of the lens 100. Further, the second reflective member 420 of the second lens portion 103 provides a second image IM2 displayed by the display device 200_1 to the eye E of the user by reflecting the second image IM2 toward the first surface SF1 of the lens 100, and the third reflective member 430 of the third lens portion 105 provides a third image IM3 displayed by the display device 200_1 to the eye E of the user by reflecting the third image IM3 toward the first surface SF1 of the lens 100.

The first, second, and third reflective members 410, 420, and 430 may allow the virtual images displayed by the display device 200_1 (e.g., the first, second, and third images IM1, IM2, and IM3) to be focused on a single point or area on the retina of the eye E of the user. As a result, even when the user focuses on a real image through the lens 100, the user can see the virtual images (e.g., the first, second, and third images IM1, IM2, and IM3) clearly. That is, the user can see the virtual images clearly without the need to shift his or her focus placed on the real image. The first, second, and third reflective members 410, 420, and 430, which are located in the first, second, and third lens portions 101, 103, and 105, respectively, may be inclined at different angles, and may overlap with one another in the third direction (or the Z-axis direction), but the present disclosure is not limited thereto.

FIG. 8 is an exploded perspective view of an augmented reality providing apparatus according to another embodiment of the present disclosure. Referring to FIG. 8, a first reflective member 410 of a first lens portion 101 may be located relatively close to a second side 101b of the first lens portion 101, a second reflective member 420 of a second lens portion 103 may be located at the center of the second lens portion 103, and a third reflective member 430 of a third lens portion 105 may be located relatively close to a fourth side 105d of the third lens portion 105.

FIG. 9 is a schematic cross-sectional view of an augmented reality providing apparatus according to another embodiment of the present disclosure. Referring to FIG. 9, a first reflective member 410 may be located in a first lens portion 101, a second reflective member 420 may be located in a second lens portion 103, and a third reflective member 430 may be located in a third lens portion 105.

A first inclination angle θ1 of the first reflective member 410 may be set such that the first reflective member 410 is able to reflect and thereby provide a first image IM1 of a first display panel 201 to an eye E of a user. A second inclination angle θ2 of the second reflective member 420 may be set such that the second reflective member 420 is able to reflect and thereby provide a second image IM2 of a second display panel 203 to the eye E of the user. A third inclination angle θ3 of the third reflective member 430, which is different from the first and second inclination angles θ1 and θ2, may be set such that the third reflective member 430 is able to reflect and thereby provide a third image IM3 of a third display panel 205 to the eye E of the user. Because the first, second, and third reflective members 410, 420, and 430 are located at different inclination angles (i.e., the first, second, and third inclination angles 81, 82, and 83, respectively), even images that fall beyond the retina of the eye E of the user can be reflected toward the retina of the eye E of the user, and as a result, the field of view of the user can be widened.

FIG. 10 is a schematic cross-sectional view of an augmented reality providing apparatus according to another embodiment of the present disclosure. Referring to FIG. 10, a display device 200 may include first and second display panels 201 and 203, and a lens 100 may include first and second lens portions 101 and 103.

A first inclination angle θ1 of a first reflective member 410 included in the first lens portion 101 may be set such that the first reflective member 410 is able to reflect and thereby provide a first image IM1 of the first display panel 201 to an eye E of a user. A second inclination angle θ2 of the second reflective member 420 may be set such that the second reflective member 420 is able to reflect and thereby provide a second image IM2 of the second display panel 203 to the eye E of the user.

FIG. 11 is a schematic view illustrating the field of view of a user when reflective members are located at the same angles, and FIG. 12 is a schematic view illustrating the field of view of a user when a lens according to an embodiment of the present disclosure is employed.

Referring to FIG. 11, an image output by a display device DP is reflected by first, second, and third reflective members MR1, MR2, and MR3, and is thus provided to the retina of an eye E of a user. When the display area of the display device DP is set as W1, images displayed on the outer sides of W1 are reflected by the first and third reflective members MR1 and MR3, but fail to fall on the retina of the eye E of the user by falling beyond, or outside of, the pupil of the eye E of the user. That is, only images displayed in less than an entire region of the display device DP (e.g., region W2) can be incident upon the pupil of the eye E of the user to be seen by the user.

Referring to FIG. 12, images output by a display device 200 are reflected by first, second, and third reflective members 410, 420, and 430, which are located in first, second, and third lens portions 101, 103, and 105, respectively, and are thus provided to the retina of an eye E of a user. A lens 100 consists of multiple lens portions (i.e., the first, second, and third lens portions 101, 103, and 105), and the inclination angles of the first, second, and third reflective members 410, 420, and 430 can be set to differ from one lens portion to another lens portion. Accordingly, images output in the display area of the display device 200 can be reflected at various angles. That is, when the display area of the display device 200 is set as W3, the inclination angles of the first and third reflective members 410 and 430 may be set such that images displayed on the outer sides of W3 can be incident upon the pupil of the eye of the user. For example, even images that would otherwise fall beyond the pupil of the eye E of the user can be made to be incident upon the pupil of the eye E of the user by setting the inclination angle of the first reflective member 410 to be greater than the inclination angle of the second reflective member 420, and by setting the inclination angle of the third reflective member 430 to be smaller than the inclination angle of the second reflective member 420. Accordingly, all images displayed in W3, which is the display area of the display device 200, can be incident upon the pupil of the eye E of the user, and as a result, the field of view of the user can be widened.

FIG. 13 is a perspective view of an augmented reality providing apparatus according to another embodiment of the present disclosure, and FIG. 14 is a cross-sectional view of the augmented reality providing apparatus of FIG. 13. The embodiment of FIGS. 13 and 14 differs from the embodiment of FIG. 1 in that a reflective member 400 is located along the morphology of a groove H of a lens 100. The embodiment of FIGS. 13 and 14 will hereinafter be described, focusing mainly on differences with the embodiment of FIG. 1.

Referring to FIGS. 13 and 14, an augmented reality providing apparatus 10_2 includes a lens 100, a display device 200, and an adhesive layer 300.

The lens 100 may be formed of glass or a polymer to be transparent or semitransparent. Accordingly, a user can see a real image through the lens 100.

The lens 100 may include first and second parts P1 and P2. The first and second parts P1 and P2 of the lens 100 may be bonded together to collectively form a single lens 100. A part of the lens 100 below the bonding surface of the lens 100 is defined as the first part P1, and a part of the lens 100 above the bonding surface of the lens 100 is defined as the second part P2.

The groove H, which is in the form of a recess, is located on a top surface TF of the first part P1. The top surface TF of the first part P1 may have a slope (e.g., a predetermined slope), and the groove H on the top surface TF may be inclined toward a first surface SF1 that faces an eye E of the user and may have an inclination angle θ. The inclination angle θrefers to the angle at which a line HL that is normal to a central point of the groove H is inclined from a third direction (or a Z-axis direction) toward a second direction (or a Y-axis direction).

The groove H may have a semicircular cross-sectional shape, but the present disclosure is not limited thereto. For example, the groove H may have various other cross-sectional shapes, such as a triangular or elliptical cross-sectional shape. The groove H may have a diameter of about 400 μm to about 2 mm, but the present disclosure is not limited thereto. The reflective member 400, which has a concave shape, may be located in the groove H along the morphology of the groove H.

The reflective member 400 may be smaller in size than the pupil of the eye E of the user, may have a diameter of about 100 μm to about 5 mm, and/or may be formed of one of silver (Ag), aluminum (Al), and rhodium (Rh). However, the present disclosure is not limited to this. The reflective member 400 may have a surface roughness of about 50 nm or less, but the present disclosure is not limited thereto. The second part P2 of the lens 100 may be located on the first part P1 of the lens 100 and on the reflective member 400, and the bottom surface of the second part P2 may be in contact with the top surface TF of the first part P1 and with the reflective member 400 to fill the gap inside the reflective member 400.

Because the reflective member 400 is located along the morphology of the groove H, the reflective member 400 is also inclined at the same angle as the groove H (i.e., at the inclination angle θ) with respect to the first surface SF1 that faces the eye E of the user, and can thus reflect and thereby provide an image IM output by the display device 200 to the eye E of the user. The reflective member 400 has a concave shape, and may be suitable for focusing the image IM on the pupil of the eye E of the user.

FIG. 15 is a cross-sectional view of an augmented reality providing apparatus according to another embodiment of the present disclosure. Referring to FIG. 15, in one embodiment, a plurality of grooves H, which have a concave shape, may be located on a top surface TF of a first part P1. For example, first, second, and third grooves H1, H2, and H3 may be located on the top surface TF of the first part P1. First, second, and third reflective members 410, 420, and 430 may be located in the first, second, and third grooves H1, H2, and H3, respectively. That is, the first reflective member 410, which has a concave shape, may be located in the first groove H1 along the morphology of the first groove H1, the second reflective member 420, which has a concave shape, may be located in the second groove H2 along the morphology of the second groove H2, and the third reflective member 430, which has a concave shape, may be located in the third groove H3 along the morphology of the third groove H3. In one embodiment, the first, second, and third grooves H1, H2, and H3 may have the same diameter, and the first, second, and third reflective members 410, 420, and 430 may have the same diameter.

The first, second, and third reflective members 410, 420, and 430 may be spaced apart from one another in the thickness direction of the lens 100 (i.e., a third direction/Z-axis direction), and the first, second, and third reflective members 410, 420, and 430 may have, or may be located at, different heights.

That is, the first, second, and third reflective members 410, 420, and 430 may be at different locations in the height direction of the lens 100 (i.e., a second direction/Y-axis direction). For example, the height of, or vertical location of, the first, second, and third reflective members 410, 420, and 430 may sequentially increase or decrease from the first reflective member 410 to the second reflective member 420 to the third reflective member 430, but the present disclosure is not limited thereto. The first reflective member 410 may be set to reflect and thereby provide a first image IM1 of a display device 200 to an eye E of a user. The second reflective member 420 may be set to reflect and thereby provide a second image IM2 of the display device 200 to the eye E of the user. The third reflective member 430 may be set to reflect and thereby provide a third image IM3 of the display device 200 to the eye E of the user. The first, second, and third reflective members 410, 420, and 430, which have a concave shape, reflect and thereby provide virtual images displayed by the display device 200 to the eye E of the user. Because the virtual images displayed by the display device 200 are reflected by the first, second, and third reflective members 410, 420, and 430, the depth of field of the virtual images deepens.

FIG. 16 is a cross-sectional view of an augmented reality providing apparatus according to another embodiment of the present disclosure. Referring to FIG. 16, in one embodiment, multiple grooves H having different diameters (e.g. different radii of curvature) may be located on a top surface TF of a first part P1. For example, the diameter or radius of curvature of a second groove H2 may be greater than the diameter or radius of curvature of a first groove H1, and a third groove H3. However, the present disclosure is not limited to this. That is, in another example, the diameter or radius of curvature of the first, second, and third grooves H1, H2, and H3 may sequentially increase or decrease from the first groove H1 to the second groove H2 to the third groove H3.

In yet another example, two of the first, second, and third grooves H1, H2, and H3 may have the same diameter, and the other groove may have a different diameter from the two grooves. First, second, and third reflective members 410, 420, and 430 having different diameters may be located in the first, second, and third grooves H1, H2, and H3, respectively. For example, the diameter of the first reflective member 410 may be greater than the diameter of the second reflective member 420, and the diameter of the third reflective member 430 may be smaller than the diameter of the second reflective member 420. However, the present disclosure is not limited to this. In another example, the diameter of the first, second, and third reflective members 410, 420, and 430 may sequentially increase or decrease from the first reflective member 410 to the second reflective member 420 to the third reflective member 430. In yet another example, two of the first, second, and third reflective members 410, 420, and 430 may have the same diameter, and the other reflective member may have a different diameter from the two reflective members.

FIG. 17 is a cross-sectional view of an augmented reality providing apparatus according to another embodiment of the present disclosure. Referring to FIG. 17, in one embodiment, different types of reflective members may be located in a lens 100. For example, first and second grooves H1 and H2 may be located on a top surface TF of a first part P1 to be spaced apart from each other. A first reflective member 410, which has a concave shape, may be located in a first groove H1 along the morphology of the first groove H1, a third reflective member 430, which has a concave shape, may be located in a third groove H3 along the morphology of the third groove H3, and a second reflective member 420, which has a flat shape, may be located on a part of the top surface TF between the first and third reflective members 410 and 430. However, the present disclosure is not limited thereto. Some of the reflective members of the lens 100 may have a concave shape, and some of the reflective members of the lens 100 may have a flat shape. Alternatively, some of the reflective members of the lens 100 may have a concave shape, and some of the reflective members of the lens 100 may have a convex shape. Still alternatively, some of the reflective members of the lens 100 may have a concave shape, some of the reflective members of the lens 100 may have a convex shape, and some of the reflective members of the lens 100 may have a flat shape.

FIG. 18 is a schematic view illustrating the field of view of a user when a flat reflective member is used, and FIG. 19 is a schematic view illustrating the field of view of a user when a reflective member having a concave shape is used according to an embodiment of the present disclosure. Referring to FIG. 18, images output by a display device DP are reflected by a flat reflective member MR of a lens LS and are thus provided to the retina of an eye E of a user. When the display area of the display device DP is set as W1, images incident upon the flat reflective member MR at an angle (e.g., a predetermined angle) from the outer sides of W1 fail to fall on the retina of the eye E of the user by falling beyond/outside of the pupil of the eye E of the user. For example, only images displayed in W2 can be incident upon the pupil of the eye E of the user and can thus be seen by the user, W2 being smaller than W1.

Referring to FIG. 19, images output by a display device 200 are reflected by a reflective member 400, which is located in a groove H along the morphology of the groove H, and which has a concave shape, and are thus provided to the retina of an eye E of a user. When the display area of the display device 200 is set as W3, images incident upon the reflective member 400 at an angle (e.g., a predetermined angle) from the outer sides of W3 can be focused on the pupil of the eye E of the user by the reflective member 400, which has a concave shape facing the eye E of the user. Accordingly, all images displayed in W3, which is the display area of the display device 200, can be incident upon the pupil of the eye E of the user, and as a result, the field of view of the user can be widened.

FIG. 20 is a perspective view of an augmented reality providing apparatus according to another embodiment of the present disclosure, and FIG. 21 is a cross-sectional view of the augmented reality providing apparatus of FIG. 20. The embodiment of FIGS. 20 and 21 differs from the embodiment of FIG. 1 in that a lens 100 includes grooves H, and that reflective members 400 are located along the morphology of the grooves H of the lens 100. The embodiment of FIGS. 20 and 21 will hereinafter be described, focusing mainly on differences with the embodiment of FIG. 1.

Referring to FIGS. 20 and 21, an augmented reality providing apparatus 10_3 may include a lens 100, a display device 200, and an adhesive layer 300.

The lens 100 may include a plurality of lens portions. In one embodiment, the lens 100 may include first, second, and third lens portions 101, 103, and 105, but the present disclosure is not limited thereto. For example, the lens 100 may include at least two lens portions. Each of the first, second, and third lens portions 101, 103, and 105 may include first and second parts P1 and P2. The first and second parts P1 and P2 of the lens 100 may be bonded together to form each of the first, second, and third lens portions 101, 103, and 105.

First, second, and third grooves H1, H2, and H3, which have a concave shape, may be located on a top surface TF of the first part P1. The first groove H1 may be located on the top surface TF of the first part P1 of the first lens portion 101, the second groove H2 may be located on the top surface TF of the first part P1 of the second lens portion 103, and the third groove H3 may be located on the top surface TF of the first part P1 of the third lens portion 105.

The first parts P1 of the first, second, and third lens portion 101, 103, and 105 may have different slopes, and the first, second, and third grooves H1, H2, and H3 may be inclined toward a first surface SF1 that faces an eye E of a user and may have first, second, and third inclination angles 81, 82, and 83, respectively. For example, the first groove H1 of the first lens portion 101 may have the first inclination angle θ1, the second groove H2 of the second lens portion 103 may have the second inclination angle θ2, which is greater than the first angle θ1, and the third groove H3 of the third lens portion 105 may have the third inclination angle θ3, which is greater than the second angle θ2. The first, second, and third inclination angles 81, 82, and 83 refer to the angles at which lines HL that are normal to the central points of the first, second, and third grooves H1, H2, and H3 are inclined from a third direction (or a Z-axis direction) toward a second direction (or a Y-axis direction).

First, second, and third reflective members 410, 420, and 430, which have a concave shape, may be located on top of the first, second, and third grooves H1, H2, and H3, respectively, along the morphology of the first, second, and third grooves H1, H2, and H3, respectively. For example, the first reflective member 410, which has a concave shape, may be inclined toward the first surface SF1 at the same inclination angle as the first groove H1 (i.e., the first inclination angle θ1), the second reflective member 420, which has a concave shape, may be inclined toward the first surface SF1 at the same inclination angle as the second groove H2 (i.e., the second inclination angle θ2), and the third reflective member 430, which has a concave shape, may be inclined toward the first surface SF1 at the same inclination angle as the third groove H3 (i.e., the third inclination angle θ3).

The first, second, and third reflective members 410, 420, and 430 are located to have different angles. The first, second, and third reflective members 410, 420, and 430 reflect, and thereby provide, virtual images displayed by the display device 200 to the eye E of the user. Because the virtual images displayed by the display device 200 are reflected by the first, second, and third reflective members 410, 420, and 430 having different angles, the depth of field of the virtual images deepens.

For example, the first reflective member 410 of the first lens portion 101 provides a first image IM1 displayed by the display device 200 to the eye E of the user by reflecting the first image IM1 toward the first surface SF1 of the lens 100. Also, the second reflective member 420 of the second lens portion 103 provides a second image IM2 displayed by the display device 200 to the eye E of the user by reflecting the second image IM2 toward the first surface SF1 of the lens 100, and the third reflective member 430 of the third lens portion 105 provides a third image IM3 displayed by the display device 200 to the eye E of the user by reflecting the third image IM3 toward the first surface SF1 of the lens 100. The first, second, and third reflective members 410, 420, and 430, which have a concave shape, may allow the virtual images displayed by the display device 200, e.g., the first, second, and third images IM1, IM2, and IM3, to be focused on a single point on the retina of the eye E of the user. As a result, the field of view of the user can be widened. Also, even when the user focuses on a real image through the lens 100, the user can see the virtual images clearly.

FIGS. 22 through 26 are cross-sectional views illustrating a method of fabricating a lens of an augmented reality providing apparatus, the lens having a concave reflective member formed therein, according to an embodiment of the present disclosure.

Referring to FIG. 22, a raw lens 520 is placed on a work table 510. The raw lens 520 is illustrated as having a flat top surface, but the present disclosure is not limited thereto. For example, the top surface of the raw lens 520 may have a slope (e.g., a predetermined slope). For example, the raw lens 520 may be the first part P1 of the lens 100 of FIG. 14. The raw lens 520 may be formed of glass, a polymer, or alkali-free glass.

Referring to FIG. 23, an induction heating element 530 is placed over a region in which to form a groove H. The induction heating element 530 may include a hollow heating tube 535 and an external induction coil 531 wound on the outer circumferential surface of the hollow heating tube 535. Multiple grooves H can be formed by moving a single induction heating element 530 from one place to another or by using multiple induction heating elements 530.

Referring to FIG. 24, the induction heating element 530 is heated and is placed in contact with the top surface of the raw lens 520 where the groove H is to be formed. That is, a part of the heating tube 535 with the external induction coil 531 wound thereon is heated using the external induction coil 531. The heating tube 535 may be heated to a temperature of about 700° C. to about 1200° C., but the present disclosure is not limited thereto. The heated heating tube 535 is placed in contact with the region of the raw lens 520 in which to form the groove H. The heated heating tube 535 may be placed in contact with the region of the raw lens 520 in which to form the groove H, for about 0.1 seconds to about 1 second.

Equation 1 is provided below

σf=E·α·ΔT(1v)Equation 1

where σf is Tensile Strain, a is Thermal Expansion Coefficient, ΔT is Temperature Gradient, y is Poisson’s Ratio, and E is Young’s Modulus.

Referring to Equation 1 above, tensile strain σf is proportional to a thermal expansion coefficient α, and the greater the tensile strain σf is, the more easily a peeling phenomenon occurs. In one embodiment, when the raw lens 520 is formed of glass, the thermal expansion coefficient α may be about 70 to about 80 (10-7/° C.), and the region of the raw lens 520 in which to form the groove H may be heated to a temperature of about 1200° C. to about 1300° C. In another embodiment, when the raw lens 520 is formed of a polymer, the thermal expansion coefficient α may be about 150 (10-7/° C.) or greater, and the region of the raw lens 520 in which to form the groove H may be heated to a temperature of about 800° C. to about 1000° C. In yet another embodiment, when the raw lens 520 is formed of alkali-free glass, the thermal expansion coefficient α may be about 30 to about 40 (10-7/° C.), and the region of the raw lens 520 in which to form the groove H may be heated to a temperature of about 1400° C. to about 1500° C. However, the present disclosure is not limited to these embodiments. The temperature to which the region of the raw lens 520 in which to form the groove H is heated may vary depending on the diameter of the groove H, the material of the raw lens 520, and the temperature of the heating tube 535.

A part of the top surface of the raw lens 520 corresponding to the groove H and placed in contact with the heated heating tube 535 is heated, and then, the raw lens 520 is quickly cooled. The raw lens 520 may be quickly cooled to a temperature of about −200° C. to about 0° C., but the present disclosure is not limited thereto. The heating and the cooling of the raw lens 520 are performed within a relatively short period of time. For example, the heating and the cooling of the raw lens 520 may be performed within about 2 seconds, but the present disclosure is not limited thereto.

As the raw lens 520 is heated and quickly cooled, thermal shock is generated on the surface and the inside of the raw lens 520 due to a rapid change in temperature. Thus, the top of the raw lens 520 is contracted, and the bottom of the raw lens 520 is expanded. As a result, an upper portion of the raw lens 520 that is in contact with the heated heating tube 535 is peeled off from a lower portion of the raw lens 520 to correspond to the shape of the groove H.

Referring to FIG. 25, by removing the upper portion of the raw lens 520 that is peeled off, a groove H having a semicircular cross-sectional shape is formed. In one embodiment, the groove H may have a semicircular cross-sectional shape, but the present disclosure is not limited thereto. For example, the groove H may have various other cross-sectional shapes, such as a triangular or elliptical cross-sectional shape. The groove H may have a diameter of about 400 μm to about 2 mm and may have a surface roughness of about 20 nm to about 40 nm, but the present disclosure is not limited thereto.

Referring to FIG. 26, a reflective member 400, which has a concave shape, may be formed on top of the groove H along the morphology of the groove H. The reflective member 400 may be formed to be smaller in size than the pupil of the human eye, and may have a diameter of about 100 μm to about 5 mm. The reflective member 400 may be formed by depositing at least one of silver (Ag), aluminum (Al), and rhodium (Rh), but the present disclosure is not limited thereto. For example, the reflective member 400 may be formed using various other methods such as a metal inorganic synthesis method, an electrochemical method, or a metal thin film lamination method. The reflective member 400 may have a surface roughness of about 50 nm or less, but the present disclosure is not limited thereto.

FIG. 27 is a graph showing the residual stress in the groove of a lens according to an embodiment of the present disclosure.

Referring to FIG. 27, due to thermal shock being caused by heating and then quickly cooking a lens 520, a groove H of the lens 520 has relatively high residual stress in an inflection region CNA where the groove H and a top surface TF of the lens 520 adjoin each other. For example, the groove H of the lens 520 may have a residual stress of about 4 MPa to about 6 MPa at an inflection point CNP where the groove H and the top surface TF of the lens 520 meet. The residual stress in the inflection region CNA may be inversely proportional to the distance from the inflection point CNP, and may be measured as 0 at a distance (e.g., a predetermined distance) or more away from the inflection point CNP (i.e., in a region outside the inflection region CNA).

FIG. 28 is a cross-sectional view of a display device according to an substrate 210, a flexible substrate 220, a pixel array layer 230, a barrier film 240, a embodiment of the present disclosure.

FIG. 28 illustrates a display device 200 as being implemented as an organic light-emitting diode display device.

Referring to FIG. 28, the display device 200 may include a support heat dissipation film 250, a flexible film 260, a driving integrated circuit (IC) 270, and an anisotropic conductive film 280.

The support substrate 210, which is a substrate for supporting the flexible substrate 220, may be formed of plastic or glass. For example, the support substrate 210 may be formed of polyethylene terephthalate (PET). The flexible substrate 220 may be located on the top surface of the support substrate 210 and may be formed as a plastic film having flexibility. For example, the flexible substrate 220 may be formed as a polyimide film.

The pixel array layer 230 may be formed on the top surface of the flexible substrate 220. The pixel array layer 230 may be a layer having a plurality of pixels formed thereon to display an image.

The pixel array layer 230 may include a thin-film transistor layer, a light-emitting element layer, and an encapsulation layer. The thin-film transistor layer may include scan lines, data lines, and thin-film transistors. Each of the thin-film transistors includes a gate electrode, a semiconductor layer, and source and drain electrodes. In a case where a scan driver is formed directly on a substrate, the scan driver may be formed together with the thin-film transistor layer.

The light-emitting element layer is located on the thin-film transistor layer. The light-emitting element layer includes anodes, an emission layer, cathode electrodes, and banks. The emission layer may include an organic light-emitting layer comprising an organic material. For example, the emission layer may include a hole injection layer, a hole transport layer, an organic light-emitting layer, an electron transport layer, and an electron injection layer. The hole injection layer and the electron injection layer may be omitted. In response to voltages being applied to the anode electrodes and the cathode electrodes, holes and electrons move to the organic light-emitting layer through the hole transport layer and the electron transport layer, respectively, and may be combined together in the organic light-emitting layer, thereby emitting light. The light-emitting element layer may be a pixel array layer where pixels are formed, and a region where the light-emitting element layer is formed may be defined as a display area in which an image is displayed. An area on the periphery of the display area may be defined as a non-display area.

The encapsulation layer is located on the light-emitting element layer. The encapsulation layer reduces or prevents the infiltration of oxygen or moisture into the light-emitting element layer. The encapsulation layer may include at least one inorganic film and at least one organic film. The barrier film 240, which is for protecting the display device 200 against oxygen or moisture, is located on the encapsulation layer.

The barrier film 240 may cover the pixel array layer 230 to thus protect the pixel array layer 230 against oxygen or moisture. That is, the barrier film 240 may be located on the pixel array layer 230.

The heat dissipation film 250 may be located on the bottom surface of the support substrate 210. The heat dissipation film 250 may include a buffer member 251 performing a buffer function to protect the display device 200 against external impact, and may also include a metal layer 252 having a high thermal conductivity so as to be able to effectively dissipate heat generated by the display device 200. The metal layer 252 may be formed of copper (Cu), aluminum (Al), or aluminum nitride (AlN). In a case where the heat dissipation film 250 includes the buffer member 251 and the metal layer 252, the buffer member 251 may be located on the bottom surface of the support substrate 210, and the metal layer 252 may be located on the bottom surface of the buffer member 251.

The flexible film 260 may be a chip-on-film (COF) for mounting the driving integrated circuit 270. The driving integrated circuit 270 may be implemented as a chip for providing driving signals to the data lines of the pixel array layer 230. One side of the flexible film 260 may be attached onto the top surface of the flexible substrate 220 via an anisotropic conductive film 280. For example, one side of the flexible film 260 may be attached onto pads provided on a part of the top surface of the flexible substrate 220 that is not covered by the barrier film 240. Because these pads are connected to the data lines of the pixel array layer 230, the driving signals of the driving integrated circuit 270 may be provided to the data lines of the pixel array layer 230 via the flexible film 260 and the pads.

FIG. 29 is a perspective view of a head-mounted display including an augmented reality providing apparatus according to various embodiments of the present disclosure.

FIG. 29 shows that augmented reality providing apparatuses according to various embodiments of the present disclosure are applicable to a head-mounted display. Referring to FIG. 29, a head-mounted display according to an embodiment of the present disclosure includes a first augmented reality providing apparatus 10a, a second augmented reality providing apparatus 10b, a support frame 20, and eyewear temples, or arms, 30a and 30b.

FIG. 29 illustrates the head-mounted display according to an embodiment of the present disclosure as being eyeglasses including the eyewear temples 30a and 30b, but the head-mounted display according to an embodiment of the present disclosure may include, in the eyewear temples 30a and 30b, a band that can be worn on the head. However, the present disclosure is not limited to the example of FIG. 29, and augmented reality providing apparatuses according to various embodiments of the present disclosure are applicable to various electronic devices in various manners.

While embodiments of the present invention have been mainly described, they are merely examples and are not intended to limit the present invention, and it will be understood by those of ordinary skill in the art that various modifications and applications which are not illustrated above can be made without departing from the essential characteristics of the embodiments of the present invention. For example, the respective components which are illustrated in the embodiments of the present invention may be practiced with modifications. Further, the differences relating to such modifications and applications should be construed as being included in the scope of the invention as defined by the appended claims, with functional equivalents thereof to be included.

文章《Samsung Patent | Optical device and method of manufacturing the same》首发于Nweon Patent

]]>
Sony Patent | Environmental map management apparatus, environmental map management method, and program https://patent.nweon.com/26737 Thu, 26 Jan 2023 15:47:20 +0000 https://patent.nweon.com/?p=26737 ...

文章《Sony Patent | Environmental map management apparatus, environmental map management method, and program》首发于Nweon Patent

]]>
Patent: Environmental map management apparatus, environmental map management method, and program

Patent PDF: 加入映维网会员获取

Publication Number: 20230021556

Publication Date: 2023-01-26

Assignee: Sony Interactive Entertainment Inc

Abstract

Provided are an environmental map management apparatus, an environmental map management method, and a program that can correct an environmental map to achieve an accurate association between the environmental map and a map indicated by given map data provided by a given map service. A common environmental map data storage unit (64) stores environmental map data that is generated based on sensing data acquired by a tracker and that indicates an environmental map. A pattern identification unit (68) identifies a predetermined pattern in the environmental map on the basis of the environmental map data. A corresponding element identification unit (70) identifies a linear element that appears in a map indicated by given map data provided by a given map service and that is associated with the predetermined pattern. A common environmental map data update unit (72) updates the environmental map data on the basis of a location of the linear element in the map indicated by the given map data.

Claims

1.An environmental map management apparatus comprising: an environmental map data storage unit configured to store environmental map data that is generated based on sensing data acquired by a tracker and indicates an environmental map; a pattern identification unit configured to identify a predetermined pattern in the environmental map on a basis of the environmental map data; a corresponding element identification unit configured to identify a linear element that appears in a map indicated by given map data provided by a given map service and is associated with the pattern; and an environmental map data update unit configured to update the environmental map data on a basis of a location of the linear element in the map indicated by the given map data.

2.The environmental map management apparatus according to claim 1, wherein the pattern identification unit identifies a linear portion in the environmental map on the basis of the environmental map data, and the corresponding element identification unit identifies a linear element that appears in the map indicated by the given map data and that is associated with the linear portion.

3.The environmental map management apparatus according to claim 1, wherein the environmental map data storage unit stores the environmental map data indicating the environmental map including a plurality of feature points, and the environmental map data update unit updates, based on the location of the linear element in the map indicated by the given map data, a location of at least one of the feature points in the environmental map.

4.The environmental map management apparatus according to claim 3, wherein the pattern identification unit identifies a linear set of feature points linearly disposed, from the plurality of feature points, the corresponding element identification unit identifies a linear element that appears in the map indicated by the given map data and that is associated with the linear set of feature points, and the environmental map data update unit updates, based on the location of the linear element in the map indicated by the given map data, a location of the linear set of feature points in the environmental map.

5.The environmental map management apparatus according to claim 3, wherein the environmental map data update unit updates, based on locations of two of the feature points, the locations being updated based on the location of the linear element in the map indicated by the given map data, a location in the environmental map of one of the feature points disposed between the two of the feature points in the environmental map.

6.The environmental map management apparatus according to claim 1, wherein the environmental map data storage unit stores the environmental map data indicating the environmental map including a plurality of feature points, the pattern identification unit maps the plurality of feature points on a plane orthogonal to a gravity axis, to thereby generate mapping data indicating a set of feature points on the plane, and the pattern identification unit identifies the predetermined pattern on a basis of the mapping data.

7.The environmental map management apparatus according to claim 3, wherein the environmental map data update unit updates at least one of a latitude, a longitude, an altitude, an orientation, or an elevation/depression angle of each of the feature points.

8.The environmental map management apparatus according to claim 3, wherein the environmental map data includes a plurality of pieces of key frame data each generated based on the sensing data, the key frame data includes geo-pose data indicating a geographical location and an orientation of the tracker and a plurality of pieces of feature point data indicating relative locations, with respect to the location and the orientation indicated by the geo-pose data, of the feature points different from each other, the environmental map data update unit identifies the key frame data associated with the linear element as to-be-updated key frame data, and the environmental map data update unit updates, based on the location of the linear element in the map indicated by the given map data, the geo-pose data included in the to-be-updated key frame data.

9.The environmental map management apparatus according to claim 8, wherein the environmental map data further includes key frame link data for associating two pieces of the key frame data partially overlapping with each other in terms of sensing range by the tracker with each other, the environmental map data update unit identifies intermediate key frame data that is the key frame data associated with each of two pieces of the to-be-updated key frame data through one or a plurality of different pieces of the key frame link data, and the environmental map data update unit updates, based on the geo-pose data included in each of the two pieces of the to-be-updated key frame data, the geo-pose data included in the intermediate key frame data.

10.The environmental map management apparatus according to claim 9, wherein the environmental map data update unit identifies an update amount of first geo-pose data that is the geo-pose data included in one of the pieces of the to-be-updated key frame data, the environmental map data update unit identifies an update amount of second geo-pose data that is the geo-pose data included in the other of the pieces of the to-be-updated key frame data, the environmental map data update unit identifies a first distance that is a distance from a location indicated by intermediate geo-pose data that is the geo-pose data included in the intermediate key frame data to a location indicated by the first geo-pose data, the environmental map data update unit identifies a second distance that is a distance from the location indicated by the intermediate geo-pose data to a location indicated by the second geo-pose data, and the environmental map data update unit updates the intermediate geo-pose data by an update amount that is a weighted average between the update amount of the first geo-pose data and the update amount of the second geo-pose data with weights based on the first distance and the second distance.

11.The environmental map management apparatus according to claim 8, wherein the key frame data further includes public pose data indicating the location and the orientation of the tracker expressed in metric units, and the environmental map data update unit updates, based on the location of the linear element in the map indicated by the given map data, the geo-pose data and the public pose data included in the to-be-updated key frame data.

12.The environmental map management apparatus according to claim 11, wherein the environmental map data further includes key frame link data for associating two pieces of the key frame data partially overlapping with each other in terms of sensing range by the tracker with each other, the environmental map data update unit identifies intermediate key frame data that is the key frame data associated with each of two pieces of the to-be-updated key frame data through one or a plurality of different pieces of the key frame link data, and the environmental map data update unit updates, based on the geo-pose data included in each of the two pieces of the to-be-updated key frame data, the geo-pose data and the public pose data included in the intermediate key frame data.

13.The environmental map management apparatus according to claim 12, wherein the environmental map data update unit identifies an update amount of first geo-pose data that is the geo-pose data included in one of the pieces of the to-be-updated key frame data, the environmental map data update unit identifies an update amount of second geo-pose data that is the geo-pose data included in the other of the pieces of the to-be-updated key frame data, the environmental map data update unit identifies a first distance that is a distance from a location indicated by intermediate geo-pose data that is the geo-pose data included in the intermediate key frame data to a location indicated by the first geo-pose data, the environmental map data update unit identifies a second distance that is a distance from the location indicated by the intermediate geo-pose data to a location indicated by the second geo-pose data, the environmental map data update unit updates the intermediate geo-pose data by an update amount that is a weighted average between the update amount of the first geo-pose data and the update amount of the second geo-pose data with weights based on the first distance and the second distance, and the environmental map data update unit updates the public pose data included in the intermediate key frame data by an update amount depending on the update amount of the intermediate geo-pose data included in the intermediate key frame data.

14.The environmental map management apparatus according to claim 8, wherein the geo-pose data indicates at least one of a latitude, a longitude, an altitude, an orientation, or an elevation/depression angle of the tracker.

15.An environmental map management method comprising: identifying, based on environmental map data that is generated based on sensing data acquired by a tracker and indicates an environmental map, a predetermined pattern in the environmental map; identifying a linear element that appears in a map indicated by given map data provided by a given map service and is associated with the pattern; and a updating the environmental map data on a basis of a location of the linear element in the map indicated by the given map data.

16.A program for a computer, comprising: a by a pattern identification unit, identifying, based on environmental map data that is generated based on sensing data acquired by a tracker and indicates an environmental map, a predetermined pattern in the environmental map; a by a corresponding element identification unit, identifying a linear element that appears in a map indicated by given map data provided by a given map service and is associated with the pattern; and a by an environmental map data update unit, updating the environmental map data on a basis of a location of the linear element in the map indicated by the given map data.

Description

TECHNICAL FIELD

The present invention relates to an environmental map management apparatus, an environmental map management method, and a program.

BACKGROUND ART

In recent years, user-generated content (UGC) that is content generated by a plurality of users has been garnering attention.

Further, the simultaneous localization and mapping (SLAM) technology that creates environmental maps on the basis of sensing data acquired by trackers, such as images taken by cameras included in the trackers, has been known.

As an example of UGC using the SLAM technology, PTL 1 discloses an environmental map which is generated on the basis of sensing data acquired by each of a plurality of trackers and in which locations are expressed by a common coordinate system shared by the plurality of trackers.

Further, there are various map services that provide maps (including not only general maps but also satellite maps and aeronautical charts) to users via the Internet.

CITATION LISTPatent Literature

[PTL 1]

PCT Patent Publication No. WO2019/167213

SUMMARYTechnical Problem

If a generated environmental map can accurately be associated with a map indicated by given map data provided by a given map service, a new service in which an environmental map and a map service are linked to each other can be developed. For example, a service that displays information in a map indicated by map data provided by a map service, such as anchors, local information, or store information, in a superimposed manner on the real world by using an X reality (XR) technology such as augmented reality (AR) can be realized. Further, for example, a service that reflects user operation using an XR technology in a map indicated by map data provided by a map service can be realized.

However, since an environmental map is UGC and thus generated on the basis of sensing data acquired by a plurality of individual trackers different from each other in accuracy, the environmental map is not accurate enough to be accurately associated with a map indicated by map data provided by a map service. Further, it cannot be said that the sensing accuracy of those trackers themselves is high enough to achieve an accurate association between a generated environmental map and a map indicated by map data provided by a map service. Thus, a generated environmental map cannot be successfully linked to a map service as it is.

The present invention has been made in view of such a problem and has an object to provide an environmental map management apparatus, an environmental map management method, and a program that can correct an environmental map to accurately associate the environmental map with a map indicated by given map data provided by a given map service.

Solution to Problem

In order to solve the problem described above, according to the present invention, there is provided an environmental map management apparatus including an environmental map data storage unit configured to store environmental map data that is generated based on sensing data acquired by a tracker and indicates an environmental map, a pattern identification unit configured to identify a predetermined pattern in the environmental map on the basis of the environmental map data, a corresponding element identification unit configured to identify a linear element that appears in a map indicated by given map data provided by a given map service and is associated with the pattern, and an environmental map data update unit configured to update the environmental map data on the basis of a location of the linear element in the map indicated by the given map data.

In one aspect of the present invention, the pattern identification unit identifies a linear portion in the environmental map on the basis of the environmental map data, and the corresponding element identification unit identifies a linear element that appears in the map indicated by the given map data and that is associated with the linear portion.

Further, in one aspect of the present invention, the environmental map data storage unit stores the environmental map data indicating the environmental map including a plurality of feature points, and the environmental map data update unit updates, based on the location of the linear element in the map indicated by the given map data, a location of at least one of the feature points in the environmental map.

In this aspect, the pattern identification unit may identify a linear set of feature points linearly disposed, from the plurality of feature points, the corresponding element identification unit may identify a linear element that appears in the map indicated by the given map data and that is associated with the linear set of feature points, and the environmental map data update unit may update, based on the location of the linear element in the map indicated by the given map data, a location of the linear set of feature points in the environmental map.

Further, the environmental map data update unit may update, based on locations of two of the feature points, the locations being updated based on the location of the linear element in the map indicated by the given map data, a location in the environmental map of one of the feature points disposed between the two of the feature points in the environmental map.

Further, in one aspect of the present invention, the environmental map data storage unit stores the environmental map data indicating the environmental map including a plurality of feature points, the pattern identification unit maps the plurality of feature points on a plane orthogonal to a gravity axis, to thereby generate mapping data indicating a set of feature points on the plane, and the pattern identification unit identifies the predetermined pattern on the basis of the mapping data.

Further, the environmental map data update unit may update at least one of a latitude, a longitude, an altitude, an orientation, or an elevation/depression angle of each of the feature points.

Further, the environmental map data may include a plurality of pieces of key frame data each generated based on the sensing data, the key frame data may include geo-pose data indicating a geographical location and an orientation of the tracker and a plurality of pieces of feature point data indicating relative locations, with respect to the location and the orientation indicated by the geo-pose data, of the feature points different from each other, the environmental map data update unit may identify the key frame data associated with the linear element as to-be-updated key frame data, and the environmental map data update unit may update, based on the location of the linear element in the map indicated by the given map data, the geo-pose data included in the to-be-updated key frame data.

In this aspect, the environmental map data may further include key frame link data for associating two pieces of the key frame data partially overlapping with each other in terms of sensing range by the tracker with each other, the environmental map data update unit may identify intermediate key frame data that is the key frame data associated with each of two pieces of the to-be-updated key frame data through one or a plurality of different pieces of the key frame link data, and the environmental map data update unit may update, based on the geo-pose data included in each of the two pieces of the to-be-updated key frame data, the geo-pose data included in the intermediate key frame data.

Moreover, the environmental map data update unit may identify an update amount of first geo-pose data that is the geo-pose data included in one of the pieces of the to-be-updated key frame data, the environmental map data update unit may identify an update amount of second geo-pose data that is the geo-pose data included in the other of the pieces of the to-be-updated key frame data, the environmental map data update unit may identify a first distance that is a distance from a location indicated by intermediate geo-pose data that is the geo-pose data included in the intermediate key frame data to a location indicated by the first geo-pose data, the environmental map data update unit may identify a second distance that is a distance from the location indicated by the intermediate geo-pose data to a location indicated by the second geo-pose data, and the environmental map data update unit may update the intermediate geo-pose data by an update amount that is a weighted average between the update amount of the first geo-pose data and the update amount of the second geo-pose data with weights based on the first distance and the second distance.

Further, the key frame data may further include public pose data indicating the location and the orientation of the tracker expressed in metric units, and the environmental map data update unit may update, based on the location of the linear element in the map indicated by the given map data, the geo-pose data and the public pose data included in the to-be-updated key frame data.

In this aspect, the environmental map data may further include key frame link data for associating two pieces of the key frame data partially overlapping with each other in terms of sensing range by the tracker with each other, the environmental map data update unit may identify intermediate key frame data that is the key frame data associated with each of two pieces of the to-be-updated key frame data through one or a plurality of different pieces of the key frame link data, and the environmental map data update unit may update, based on the geo-pose data included in each of the two pieces of the to-be-updated key frame data, the geo-pose data and the public pose data included in the intermediate key frame data.

Moreover, the environmental map data update unit may identify an update amount of first geo-pose data that is the geo-pose data included in one of the pieces of the to-be-updated key frame data, the environmental map data update unit may identify an update amount of second geo-pose data that is the geo-pose data included in the other of the pieces of the to-be-updated key frame data, the environmental map data update unit may identify a first distance that is a distance from a location indicated by intermediate geo-pose data that is the geo-pose data included in the intermediate key frame data to a location indicated by the first geo-pose data, the environmental map data update unit may identify a second distance that is a distance from the location indicated by the intermediate geo-pose data to a location indicated by the second geo-pose data, the environmental map data update unit may update the intermediate geo-pose data by an update amount that is a weighted average between the update amount of the first geo-pose data and the update amount of the second geo-pose data with weights based on the first distance and the second distance, and the environmental map data update unit may update the public pose data included in the intermediate key frame data by an update amount depending on the update amount of the intermediate geo-pose data included in the intermediate key frame data.

Further, the geo-pose data may indicate at least one of a latitude, a longitude, an altitude, an orientation, or an elevation/depression angle of the tracker.

Further, according to the present invention, there is provided an environmental map management method including a step of identifying, based on environmental map data that is generated based on sensing data acquired by a tracker and indicates an environmental map, a predetermined pattern in the environmental map, a step of identifying a linear element that appears in a map indicated by given map data provided by a given map service and is associated with the pattern, and a step of updating the environmental map data on the basis of a location of the linear element in the map indicated by the given map data.

Further, according to the present invention, there is provided a program for causing a computer to execute a procedure of identifying, based on environmental map data that is generated based on sensing data acquired by a tracker and indicates an environmental map, a predetermined pattern in the environmental map, a procedure of identifying a linear element that appears in a map indicated by given map data provided by a given map service and is associated with the pattern, and a procedure of updating the environmental map data on the basis of a location of the linear element in the map indicated by the given map data.

BRIEF DESCRIPTION OF DRAWINGS

FIG. 1 is a configuration diagram illustrating an example of an environmental map management system according to an embodiment of the present invention.

FIG. 2A is a configuration diagram illustrating an example of a user server according to the embodiment of the present invention.

FIG. 2B is a configuration diagram illustrating an example of a tracker according to the embodiment of the present invention.

FIG. 2C is a configuration diagram illustrating an example of a common server according to the embodiment of the present invention.

FIG. 3 is a diagram illustrating an exemplary data structure of private key frame data.

FIG. 4 is a diagram illustrating an exemplary data structure of geo-key frame data.

FIG. 5 is a diagram illustrating an exemplary data structure of key frame link data.

FIG. 6 is a diagram schematically illustrating an exemplary graph representing pieces of geo-key frame data associated with each other by key frame link data.

FIG. 7 is a functional block diagram illustrating exemplary functions that are implemented in the common server according to the embodiment of the present invention.

FIG. 8A is a flow chart illustrating an exemplary flow of processing that is performed in the common server according to the embodiment of the present invention.

FIG. 8B is a flow chart illustrating the exemplary flow of processing that is performed in the common server according to the embodiment of the present invention.

DESCRIPTION OF EMBODIMENT

Now, an embodiment of the present invention is described in detail with reference to the drawings.

FIG. 1 is a configuration diagram illustrating an example of an environmental map management system 1 according to the embodiment of the present invention. As illustrated in FIG. 1, the environmental map management system 1 according to the present embodiment includes a plurality of user systems 10. Further, the user system 10 according to the present embodiment includes a user server 12 and a tracker 14. In FIG. 1, two user systems 10a and 10b are exemplified. The user system 10a includes a user server 12a and a tracker 14a. The user system 10b includes a user server 12b and a tracker 14b. Further, the environmental map management system 1 according to the present embodiment also includes a common server 16 and a map service 18.

In the present embodiment, for example, a plurality of users who use the environmental map management system 1 each manage the user’s own user system 10. Further, each user cannot access the user systems 10 managed by the other users.

The user server 12, the tracker 14, the common server 16, and the map service 18 are connected to a computer network 20 such as the Internet. Further, in the present embodiment, the user server 12, the common server 16, and the map service 18 are able to communicate with each other. Further, in the present embodiment, the user server 12 included in the user system 10 and the tracker 14 included in the user system 10 in question are able to communicate with each other.

The user server 12 according to the present embodiment is a server computer that the user of the environmental map management system 1 uses, for example. Note that the user server 12 is not required to be a server operated by the user himself/herself and may be a cloud server operated by a cloud service provider, for example.

As illustrated in FIG. 2A, the user server 12 according to the present embodiment includes a processor 30, a storage unit 32, and a communication unit 34. The processor 30 is a program control device such as a central processing unit (CPU) configured to operate according to programs installed on the user server 12, for example. The storage unit 32 is a storage element such as a read only memory (ROM) or a random access memory (RAM) or a solid-state drive, for example. The storage unit 32 stores, for example, programs that are executed by the processor 30. The communication unit 34 is a communication interface such as a network board or a wireless local area network (LAN) module.

The tracker 14 according to the present embodiment is an apparatus configured to track the user wearing the tracker 14 in question in terms of location and which direction the user is facing.

As illustrated in FIG. 2B, the tracker 14 according to the present embodiment includes a processor 40, a storage unit 42, a communication unit 44, a display unit 46, and a sensor unit 48.

The processor 40 is a program control device such as a microprocessor configured to operate according to programs installed on the tracker 14, for example. The storage unit 42 is a storage element such as a ROM or a RAM, for example. The storage unit 42 stores, for example, programs that are executed by the processor 40. The communication unit 44 is a communication interface such as a wireless LAN module.

The display unit 46 is a display such as a liquid crystal display or organic electroluminescent (EL) display disposed on the front side of the tracker 14. The display unit 46 according to the present embodiment displays left-eye images and right-eye images, thereby being capable of displaying three-dimensional images, for example. Note that the display unit 46 may not support three-dimensional image display and only support two-dimensional image display.

The sensor unit 48 includes sensors such as a camera, an inertial sensor (IMU), a geomagnetic sensor (azimuth sensor), a gyrocompass, a global positioning system (GPS) module, a depth sensor, and an altitude sensor.

The camera included in the sensor unit 48 takes images at a predetermined sampling rate, for example. The camera included in the sensor unit 48 may be capable of taking three-dimensional images or depth images.

Further, the geomagnetic sensor or the gyrocompass included in the sensor unit 48 outputs, to the processor 30, data indicating the orientation of the tracker 14, at a predetermined sampling rate.

Further, the inertial sensor included in the sensor unit 48 outputs, to the processor 40, data indicating the acceleration, the rotation amount, the movement amount, or the like of the tracker 14, at a predetermined sampling rate.

Further, the GPS module included in the sensor unit 48 outputs, to the processor 40, data indicating the latitude and the longitude of the tracker 14, at a predetermined sampling rate.

The depth sensor included in the sensor unit 48 is a depth sensor using time of flight (ToF), patterned stereo, or structured light, for example. The depth sensor in question outputs, to the processor 40, data indicating the distance from the tracker 14, at a predetermined sampling rate.

Further, the altitude sensor included in the sensor unit 48 outputs, to the processor 40, data indicating the altitude of the tracker 14, at a predetermined sampling rate.

Further, the sensor unit 48 may include another sensor such as a radio frequency (RF) sensor, an ultrasonic sensor, or an event-driven sensor.

Note that the tracker 14 according to the present embodiment may include, for example, input-output ports such as High-Definition Multimedia Interface (HDMI) (registered trademark) ports, universal serial bus (USB) ports, or auxiliary (AUX) ports, headphones, or speakers.

The common server 16 according to the present embodiment is a server computer such as a cloud server that is used by all the users who use the environmental map management system 1, for example. All the user systems 10 included in the environmental map management system 1 are allowed to access the common server 16 according to the present embodiment.

As illustrated in FIG. 2C, the common server 16 according to the present embodiment includes a processor 50, a storage unit 52, and a communication unit 54. The processor 50 is a program control device such as a CPU configured to operate according to programs installed on the common server 16, for example. The storage unit 52 is a storage element such as a ROM or a RAM or a solid-state drive, for example. The storage unit 52 stores, for example, programs that are executed by the processor 50. The communication unit 54 is a communication interface such as a network board or a wireless LAN module.

The map service 18 according to the present embodiment includes server computers such as cloud servers managed by an existing map service provider, for example, and provides given map data to the users. The map data provided by the map service 18 is not limited to map data indicating a general map. The map service 18 may provide map data indicating an aeronautical chart (aerial image) or a satellite map (satellite image). A map indicated by the map data provided by the map service 18 according to the present embodiment is a three-dimensional map that expresses the latitude, the longitude, the altitude, and the directions, for example. Note that the map indicated by the map data provided by the map service 18 may be a two-dimensional map that expresses the latitude, the longitude, and the directions, for example.

In the present embodiment, sensing data acquired by the tracker 14 included in the user system 10 is transmitted to the user server 12 included in the user system 10 in question. In the present embodiment, for example, sensing data acquired by the processor 40 on the basis of sensing by the sensor unit 48 is transmitted to the user server 12.

The sensing data to be transmitted may include images taken by the camera included in the sensor unit 38 of the tracker 14, for example.

Further, the sensing data to be transmitted may include depth data measured by the camera or the depth senser included in the sensor unit 38 of the tracker 14.

Further, the sensing data to be transmitted may include data indicating the orientation of the tracker 14 measured by the geomagnetic sensor or the gyrocompass included in the sensor unit 38 of the tracker 14.

Further, the sensing data to be transmitted may include data indicating the acceleration, the rotation amount, the movement amount, or the like of the tracker 14 measured by the inertial sensor included in the sensor unit 38.

Further, the sensing data to be transmitted may include data indicating the latitude and the longitude of the tracker 14 measured by the GPS module included in the sensor unit 38.

Further, the sensing data to be transmitted may include data indicating the altitude measured by the altitude sensor included in the sensor unit 38.

Further, the sensing data to be transmitted may include a set of feature points (key frame).

Note that, to enhance the accuracy of the latitude and the longitude indicated by sensing data, the tracker 14 may utilize location estimation services using hotspots, the identifiers of 5G mobile edge servers, or the like.

Then, the user server 12 executes, on the basis of the sensing data transmitted from the tracker 14, SLAM processing including generation of private key frame data having a data structure exemplified in FIG. 3. Here, the SLAM processing to be executed may include, for example, relocalization processing, loop closing processing, three-dimensional (3D) mesh processing, or object recognition processing.

Further, the SLAM processing may include plane detection/3D mesh segmentation processing. The plane detection/3D mesh segmentation processing refers to processing of detecting a continuous plane such as the ground or walls and dividing the 3D mesh of the whole scene into individual 3D meshes such as the ground, buildings, and trees. Moreover, the SLAM processing may include 3D mesh optimization processing. The 3D mesh optimization processing refers to processing of removing portions possibly corresponding to moving bodies or errors due to noise or the like from a 3D mesh, reducing the number of polygons, or smoothing the surface of the mesh. In addition, the SLAM processing may include texture generation processing. The texture generation processing refers to processing of generating a texture image corresponding to a 3D mesh, on the basis of the colors of the vertices of the mesh.

Further, the user server 12 may execute environmental map generation processing based on the sensing data transmitted from the tracker 14.

Then, the user server 12 uploads the generated private key frame data onto the common server 16. Here, private key frame data on private space such as the user’s home may not be uploaded onto the common server 16.

FIG. 3 is a diagram illustrating an exemplary data structure of private key frame data according to the present embodiment. As illustrated in FIG. 3, the private key frame data includes a private key frame identification (ID), private pose data, and a plurality of pieces of feature point data (feature point data (1), feature point data (2), etc.). The private key frame data according to the present embodiment is data that is associated with sensing data to be transmitted from the tracker 14 to the user server 12, for example.

A private key frame ID included in private key frame data is identification information regarding the private key frame data.

Private pose data included in private key frame data is data indicating the location and the orientation of the tracker 14 which have been detected when sensing associated with the private key frame data in question has been performed. The location and the orientation indicated by private pose data are expressed by a coordinate system unique to the user system 10 that generates the private key frame data in question.

Feature point data included in private key frame data is data indicating the attributes of feature points such as the locations of the feature points identified on the basis of sensing data acquired by the tracker 14. In the present embodiment, private key frame data includes a plurality of pieces of feature point data associated with feature points different from each other. Here, for example, private key frame data includes as many pieces of feature point data as feature points identified by the tracker 14 performing sensing once. Feature point data included in private key frame data includes, for example, the three-dimensional coordinate values (for example, an X-coordinate value, a Y-coordinate value, and a Z-coordinate value) of the relative location of a feature point corresponding to the feature point data in question with respect to the location and the orientation indicated by private pose data included in the private key frame data in question. Further, feature point data includes color information indicating the colors of the surroundings of a feature point corresponding to the feature point data in question.

Note that private key frame data may include, in addition to the data described above, sensing data transmitted from the tracker 14, for example.

The common server 16 generates, on the basis of private key frame data uploaded from the user server 12, common environmental map data indicating an environmental map shared by the plurality of users.

As described below, common environmental map data according to the present embodiment includes geo-key frame data and key frame link data.

Geo-key frame data is data that is associated with private key frame data to be transmitted from the user server 12 to the common server 16. Thus, through private key frame data, geo-key frame data is associated with sensing data to be transmitted from the tracker 14 to the user server 12.

FIG. 4 is a diagram illustrating an exemplary data structure of geo-key frame data according to the present embodiment. As illustrated in FIG. 4, the geo-key frame data includes a geo-key frame ID, geo-pose data, public pose data, and a plurality of pieces of feature point data (feature point data (1), feature point data (2), etc.).

A geo-key frame ID included in geo-key frame data is identification information regarding the geo-key frame data.

Geo-pose data included in geo-key frame data is data indicating the geographical location and orientation of the tracker 14 which are detected when sensing data to be associated with the geo-key frame data in question is generated, for example. The geo-pose data is data indicating the latitude, the longitude, the altitude, the orientation, and the elevation/depression angle of the tracker 14 which are detected when sensing data to be associated with the geo-key frame data in question is generated, for example.

In generating new geo-key frame data, values indicating the latitude, the longitude, the altitude, the orientation, and the elevation/depression angle roughly estimated on the basis of sensing data included in private key frame data may be set to geo-pose data.

Note that geo-pose data may not indicate all the latitude, longitude, altitude, orientation, and elevation/depression angle of the tracker 14. Geo-pose data may indicate at least one of the latitude, the longitude, the altitude, the orientation, or the elevation/depression angle of the tracker 14.

Further, with the use of an image recognition technology or the like, a predetermined landmark may be detected from an image taken by the tracker 14. Then, with reference to the map provided by the map service 18, the geographical location of the landmark in question, such as the latitude and the longitude, may be identified. Then, a value indicating the geographical location identified in such a way may be set to geo-pose data.

Note that, in generating new geo-key frame data, the value of geo-pose data is not necessarily required to be set.

To feature point data included in geo-key frame data, the value of feature point data included in private key frame data associated with the geo-key frame data in question is set. Thus, feature point data included in geo-key frame data includes, for example, the three-dimensional coordinate values of the relative location of a feature point corresponding to the feature point data in question with respect to the location and the orientation indicated by geo-pose data included in the geo-key frame data in question.

Public pose data included in geo-key frame data is data indicating the location and the orientation of the tracker 14 which are detected when sensing data to be associated with the geo-key frame data in question is generated. Unlike private pose data, the location and the orientation indicated by public pose data are expressed by a coordinate system common to the plurality of user systems 10. In the following, a coordinate system common to the plurality of user systems 10 is referred to as a “common coordinate system.” In public pose data, locations are expressed in metric units, for example.

Here, for example, as described in PTL 1, a set of feature points (point cloud) that is the union of feature points indicated by geo-key frame data registered in the common server 16 in advance and a feature point indicated by feature point data included in new geo-key frame data may be collated with each other. Then, on the basis of the result of collation, the value of public pose data included in the new geo-key frame data may be determined.

Note that the geo-key frame data according to the present embodiment may include sensing data included in private key frame data associated with the geo-key frame data in question.

In the present embodiment, in generating new geo-key frame data, key frame link data for associating two pieces of geo-key frame data corresponding to partially overlapping sensing ranges with each other is also generated. Here, for example, a right rectangular pyramid-shaped region in predetermined shape and size which has, as its vertex, the location of the tracker 14 detected when sensing data to be associated with geo-key frame data is generated and in which a line that passes through the vertex and extends along the orientation of the tracker 14 in question is orthogonal to the base may be identified. Then, the right rectangular pyramid-shaped region identified in such a way may be identified as a sensing range corresponding to the geo-key frame data in question.

FIG. 5 is a diagram illustrating an exemplary data structure of key frame link data according to the present embodiment.

As illustrated in FIG. 5, the key frame link data includes a link source ID, a link destination ID, and delta pose data.

In the present embodiment, for example, when new geo-key frame data has been generated, other geo-key frame data partially overlapping with the geo-key frame data in question in terms of sensing range is retrieved. Here, for example, other geo-key frame data corresponding to a feature point partially overlapping with a feature point indicated by the geo-key frame data may be retrieved. Then, new key frame link data having the geo-key frame ID of the retrieved geo-key frame data as its link source ID and the geo-key frame ID of the generated geo-key frame data as its link destination ID is generated.

Then, the relative value of the public pose data of the geo-key frame data identified by the link destination ID, the relative value using the value of the public pose data of the geo-key frame data identified by the link source ID as a reference, is identified. Then, the identified value is set as the value of the delta pose data of the new key frame link data in question.

FIG. 6 is a diagram schematically illustrating an exemplary graph representing pieces of geo-key frame data associated with each other by key frame link data. In the example of FIG. 6, geo-key frame data is expressed by nodes (K(1) to K(n) and K(m)) and key frame link data is expressed by the links.

In such a way, on the basis of private key frame data uploaded from the various user systems 10 onto the common server 16, a plurality of pieces of geo-key frame data associated with each other are generated. With a set of feature points (point cloud) indicated by the plurality of pieces of geo-key frame data accumulated in the common server 16 as described above, an environmental map which is UGC and in which the location of each feature point is expressed by the common coordinate system is constructed. In the following, an environmental map constructed as described above is referred to as a “common environmental map.” Note that a common environmental map may be referred to in the SLAM processing that is executed in the user server 12.

If a generated common environmental map can be accurately associated with the map indicated by the map data provided by the map service 18, a new service in which a common environmental map and the map service 18 are linked to each other can be developed. For example, a service that displays information in the map indicated by the map data provided by the map service 18, such as anchors, local information, or store information, in a superimposed manner on the real world by using an XR technology such as AR can be realized. Further, for example, a service that reflects user operation using an XR technology in the map indicated by the map data provided by the map service 18 can be realized.

However, since a common environmental map generated as described above is generated on the basis of sensing data acquired by the plurality of individual trackers 14, the common environmental map is not accurate enough to be accurately associated with the map indicated by the map data provided by the map service 18. Further, it cannot be said that the sensing accuracy of those trackers 14 themselves is high enough to achieve an accurate association between a generated common environmental map and the map indicated by the map data provided by the map service 18. Thus, a generated common environmental map cannot be successfully linked to the map service 18 as it is.

Accordingly, the present embodiment makes it possible to correct a common environmental map to achieve an accurate association between the common environmental map and the map indicated by the given map data provided by the given map service 18, as described below.

Now, the functions of the common server 16 according to the present embodiment and processing that is executed in the common server 16 are further described.

FIG. 7 is a functional block diagram illustrating exemplary functions that are implemented in the common server 16 according to the present embodiment. Note that, in the common server 16 according to the present embodiment, not all the functions illustrated in FIG. 7 are required to be implemented and functions other than the functions illustrated in FIG. 7 may be implemented.

As illustrated in FIG. 7, the common server 16 functionally includes, for example, a private key frame data reception unit 60, a common environmental map data generation unit 62, a common environmental map data storage unit 64, a map service access unit 66, a pattern identification unit 68, a corresponding element identification unit 70, and a common environmental map data update unit 72. The private key frame data reception unit 60 and the map service access unit 66 are implemented by using the communication unit 54 as their main parts. The common environmental map data generation unit 62, the pattern identification unit 68, the corresponding element identification unit 70, and the common environmental map data update unit 72 are implemented by using the processor 50 as their main parts. The common environmental map data storage unit 64 is implemented by using the storage unit 52 as its main part.

The functions described above may be implemented by the processor 50 executing a program that is installed on the common server 16 which is a computer and that includes instructions corresponding to the functions described above. This program may be supplied to the common server 16 through a computer readable information storage medium such as an optical disc, a magnetic disk, a magnetic tape, a magneto-optical disk, or a flash memory, or through the Internet or the like.

The private key frame data reception unit 60 of the present embodiment receives private key frame data from the user server 12, for example.

The common environmental map data generation unit 62 of the present embodiment generates, on the basis of sensing data acquired by the tracker 14, common environmental map data indicating a common environmental map that is an environmental map shared by the plurality of users, for example. As described above, a common environmental map includes a plurality of feature points. In the present embodiment, for example, common environmental map data is generated on the basis of private key frame data received by the private key frame data reception unit 60. The common environmental map data according to the present embodiment includes a plurality of pieces of geo-key frame data and a plurality of pieces of key frame link data. Then, the common environmental map data generation unit 62 cause the common environmental map data storage unit 64 to store therein the common environmental map data generated as described above.

The common environmental map data storage unit 64 of the present embodiment stores the common environmental map data described above, for example.

The map service access unit 66 of the present embodiment accesses the given map service 18, thereby acquiring the given map data to be linked to a common environmental map, for example. In the following, the map indicated by the given map data provided by the given map service 18 to be linked to a common environmental map is referred to as a “corresponding map.”

The pattern identification unit 68 of the present embodiment identifies, on the basis of common environmental map data stored in the common environmental map data storage unit 64, a predetermined pattern in an environmental map indicated by the common environmental map data in question, for example. The pattern identification unit 68 may identify, on the basis of common environmental map data stored in the common environmental map data storage unit 64, a linear portion in an environmental map indicated by the common environmental map data in question. In the following, a linear portion identified as described above is referred to as a “linear portion of interest.” Note that a linear portion according to the present embodiment is not limited to a straight portion representing a straight road or the like. The linear portion according to the present embodiment may be a curved portion representing a curve or the like.

For example, the pattern identification unit 68 may generate a set of feature points (point cloud) on the basis of a plurality of pieces of geo-key frame data stored in the common environmental map data storage unit 64. For example, a set of feature points that is the union of feature points indicated by the plurality of respective pieces of geo-key frame data may be generated. Then, the pattern identification unit 68 may identify, by using the Hough transform technology or the like, a linear set of feature points linearly disposed, from the set of feature points generated as described above. Then, the pattern identification unit 68 may identify a line formed by the identified linear set of feature points as a linear portion of interest.

Further, for example, with the use of a pattern matching technology, sets of feature points corresponding to buildings may be extracted from the generated set of feature points. Then, a linear blank space in the common environmental map between the sets of feature points corresponding to the buildings extracted as described above may be identified as a linear portion of interest.

Moreover, in a case where the geo-key frame data includes an image taken by the tracker 14, a road region in the common environmental map may be identified on the basis of the image in question by using deep learning or the like. Then, the road region identified as described above may be identified as a linear portion of interest.

In addition, for example, a trail along which the user is estimated to have walked in the common environmental map may be identified as a linear portion of interest. For example, a linear region identified on the basis of the geo-key frame data as having been passed by the tracker 14 may be identified as a linear portion of interest.

Here, for example, a linear region that has been passed by a predetermined number or more of the trackers 14 may be identified as a linear portion of interest.

Further, the pattern identification unit 68 may map a plurality of feature points included in an environmental map stored in the common environmental map data storage unit 64 on a plane orthogonal to the gravity axis, to thereby generate mapping data indicating a set of feature points on the plane in question. For example, the pattern identification unit 68 may map a set of feature points generated on the basis of a plurality of pieces of geo-key frame data stored in the common environmental map data storage unit 64 on a horizontal plane (ground) orthogonal to the gravity axis, to thereby generate mapping data indicating a set of feature points on the plane.

Then, the pattern identification unit 68 may identify a predetermined pattern on the basis of the mapping data generated as described above. For example, a linear set of feature points linearly disposed may be identified from the set of feature points indicated by the mapping data. Then, a line formed by the identified linear set of feature points may be identified as a linear portion of interest. Alternatively, sets of feature points corresponding to buildings may be extracted from the set of feature points indicated by the mapping data. Then, a linear blank space in the common environmental map between the sets of feature points corresponding to the buildings extracted as described above may be identified as a linear portion of interest. Such processing as described above makes it greatly easy to associate a corresponding element described below and a predetermined pattern such as a linear set of feature points with each other, in a case where a corresponding map is a two-dimensional map.

The corresponding element identification unit 70 of the present embodiment identifies a linear element that appears in the corresponding map and that is associated with the predetermined pattern described above, for example. The corresponding element identification unit 70 may identify, by shape matching or pattern matching, a linear element that appears in the corresponding map and that is associated with a linear portion of interest. In the following, a linear element identified as described above is referred to as a “corresponding element.” For example, a road that is associated with a linear portion of interest identified by the pattern identification unit 68 and appears in the corresponding map is identified as a corresponding element. Here, for example, the corresponding element identification unit 70 may identify a corresponding element associated with a linear set of feature points identified by the pattern identification unit 68. Note that a corresponding element according to the present embodiment is not limited to a straight element representing a straight road or the like, like the linear portion described above. The corresponding element according to the present embodiment may be a curved element representing a curve or the like.

Further, the corresponding map may be a map in which road information indicating where roads run is expressed. Moreover, the pattern identification unit 68 may estimate, on the basis of common environmental map data stored in the common environmental map data storage unit 64, a road in an environmental map indicated by the common environmental map data in question. Then, the corresponding element identification unit 70 may identify a road associated with the road in the environmental map and expressed by road information in the corresponding map as a corresponding element.

The common environmental map data update unit 72 of the present embodiment updates common environmental map data stored in the common environmental map data storage unit 64, on the basis of the location of a corresponding element in the corresponding map, the corresponding element being identified by the corresponding element identification unit 70, for example.

The common environmental map data update unit 72 may update the location of at least one feature point in a common environmental map on the basis of the location of a corresponding element in the corresponding map, the corresponding element being identified by the corresponding element identification unit 70. Here, the common environmental map data update unit 72 may update at least one of the latitude, the longitude, the altitude, the orientation, or the elevation/depression angle of the feature point.

Further, the common environmental map data update unit 72 may update, on the basis of the location of a corresponding element in the corresponding map, the corresponding element being identified by the corresponding element identification unit 70, the location in a common environmental map of a linear set of feature points identified by the pattern identification unit 68.

In the present embodiment, the corresponding element identification unit 70 may identify corresponding elements in geo-key frame data units and the common environmental map data update unit 72 may update common environmental map data in geo-key frame data units.

For example, the corresponding element identification unit 70 may identify geo-key frame data corresponding to a sensing range that covers whole or part of a linear portion of interest as to-be-updated geo-key frame data to be updated. Then, the corresponding element identification unit 70 may narrow down, on the basis of geo-pose data included in the to-be-updated geo-key frame data, a region, in the corresponding map, to be subjected to matching with the geo-key frame data in question. Here, for example, the region to be subjected to matching may be narrowed down to a region of 10 meters square around a location indicated by the geo-pose data.

Then, in the region in the corresponding map that has been narrowed down, matching may be performed with the linear portion of interest in the sensing range corresponding to the to-be-updated geo-key frame data to identify a corresponding element corresponding to the linear portion of interest in question.

Then, the common environmental map data update unit 72 may update the geo-pose data included in the to-be-updated geo-key frame data, on the basis of the location of the corresponding element in the corresponding map. For example, the common environmental map data update unit 72 may update the geo-pose data included in the to-be-updated geo-key frame data in such a manner that the location of the linear portion of interest is matched with the location of the corresponding element in the corresponding map.

Here, for example, the common environmental map data update unit 72 may update the geo-pose data included in the to-be-updated geo-key frame data to a value determined by using an optimization method such as the least squares method. More specifically, for example, the common environmental map data update unit 72 may identify a plurality of points on the linear portion of interest in the sensing range corresponding to the to-be-updated geo-key frame data. Then, the common environmental map data update unit 72 may identify, on the basis of the geo-pose data included in the geo-key frame data, a plurality of points to be identified mapped in the corresponding map. Then, the common environmental map data update unit 72 may determine a geo-pose data value that achieves the minimum sum of squares of the distance to the corresponding element associated with the linear portion of interest with respect to the plurality of points mapped in the corresponding map. Then, the common environmental map data update unit 72 may update the geo-pose data included in the geo-key frame data to the determined value. With the update of the geo-pose data, the location of at least one feature point in the common environmental map, such as the location of a linear set of feature points in the common environmental map, is updated as a result.

Further, to achieve a match with the update of the geo-pose data included in the to-be-updated geo-key frame data, the common environmental map data update unit 72 also updates public pose data included in the to-be-updated geo-key frame data in question. In updating the public pose data, processing of converting geo-pose data values expressed by the latitude, the longitude, the altitude, and the like to public pose data values expressed in metric units is executed. In this case, a conversion with a conversion expression taking the fact that a length per longitude varies depending on the latitude into consideration is required to be executed. Note that the orientation of the tracker 14 indicated by public pose data is not required to be updated on the basis of an orientation update amount indicated by geo-pose data and may be updated in such a manner that a specific direction (for example, the north) in the corresponding map and the direction of a predetermined axis (for example, an X axis or a Y axis) indicated by the public pose data are matched with each other. Further, the common environmental map data update unit 72 also updates delta pose data included in key frame link data having the geo-key frame ID of the to-be-updated geo-key frame data in question as its link source ID or link destination ID.

Note that it is possible that as a result of the matching described above, no corresponding element associated with the linear portion of interest in the sensing range corresponding to the to-be-updated geo-key frame data can be identified. Further, it is also possible that a plurality of corresponding elements associated with the linear portion of interest in the sensing range corresponding to the to-be-updated geo-key frame data are identified. Further, even if a corresponding element corresponding to the linear portion of interest in the sensing range corresponding to the to-be-updated geo-key frame data is identified, the matching reliability is low in some cases. In the cases as described above, the common environmental map data update unit 72 may not update the geo-pose data included in the to-be-updated geo-key frame data.

Further, in the present embodiment, for example, the common environmental map data update unit 72 updates, on the basis of the locations of two feature points updated as described above, the location in a common environmental map of a feature point disposed between the two feature points in the environmental map.

Here, for example, the common environmental map data update unit 72 may identify intermediate geo-key frame data that is geo-key frame data associated with each of two pieces of to-be-updated geo-key frame data through one or a plurality of different pieces of key frame link data.

Then, the common environmental map data update unit 72 may update geo-pose data included in the intermediate geo-key frame data, on the basis of geo-pose data included in each of the two pieces of to-be-updated geo-key frame data in question.

Here, for example, it is assumed that the node K(1) and the node K(n) illustrated in FIG. 6 are nodes corresponding to to-be-updated geo-key frame data. In this case, the node K(2) to the node K(n-1) are nodes corresponding to intermediate geo-key frame data and the node K(m) is not a node corresponding to intermediate geo-key frame data.

In the following, geo-pose data included in the to-be-updated geo-key frame data corresponding to the node K(1) is referred to as “first geo-pose data.” Further, geo-pose data included in the to-be-updated geo-key frame data corresponding to the node K(n) is referred to as “second geo-pose data.”

Then, the common environmental map data update unit 72 identifies an update amount dGP1 of the first geo-pose data and an update amount dGPn of the second geo-pose data.

For example, dGP1=Gp1′−Gp1 holds, where Gp1 is the value of the first geo-pose data before update and Gp1′ is the value of the first geo-pose data after update. Further, dGPn=Gpn′−Gpn holds, where Gpn is the value of the second geo-pose data before update and Gpn′ is the value of the second geo-pose data after update. Here, for example, the update amount of at least one of the latitude, the longitude, the altitude, the orientation, or the elevation/depression angle may be identified.

Then, the common environmental map data update unit 72 identifies a first distance that is the distance from a location indicated by geo-pose data included in the intermediate geo-key frame data (hereinafter referred to as “intermediate geo-pose data”) to a location indicated by the first geo-pose data. Further, the common environmental map data update unit 72 identifies a second distance that is the distance from the location indicated by the intermediate geo-pose data to a location indicated by the second geo-pose data.

FIG. 6 illustrates a first distance d(1, 3) and a second distance d(3, n) for the intermediate geo-key frame data corresponding to the node K(3).

Then, the common environmental map data update unit 72 identifies an update amount that is the weighted average between the update amount dGP1 and the update amount dGPn with weights based on a first distance d1 and a second distance d2. Here, for example, an update amount dGP3 of the intermediate geo-key frame data corresponding to the node K(3) is identified by calculating an expression (d(3, n)×dGP1+d(1, 3)×dGpn)/(d(3, n)+d(1, 3)).

Then, the common environmental map data update unit 72 updates the intermediate geo-pose data by the update amount identified as described above. For example, it is assumed that the value of intermediate geo-pose data included in the intermediate geo-key frame data corresponding to the node K(3), the value having not yet been updated, is GP3. In this case, the intermediate geo-pose data is updated in such a manner that the intermediate geo-pose data having a value GP3′ takes GP3+dGP3.

The common environmental map data update unit 72 updates, not only the intermediate geo-key frame data corresponding to the node K(3), but also intermediate geo-pose data included in the other intermediate geo-key frame data.

Further, to achieve a match with the update of the geo-pose data included in the intermediate geo-key frame data, the common environmental map data update unit 72 also updates public pose data included in the intermediate geo-key frame data in question. Here, the public pose data included in the intermediate key frame data may be updated by an update amount depending on the update amount of the intermediate geo-pose data included in the intermediate key frame data in question. Also in this case, as described above, in updating the public pose data, processing of converting geo-pose data values expressed by the latitude, the longitude, the altitude, and the like to public pose data values expressed in metric units is executed. Further, as described above, a conversion with a conversion expression taking the fact that a length per longitude varies depending on the latitude into consideration is required to be executed. Note that, as described above, the orientation of the tracker 14 indicated by public pose data is not required to be updated on the basis of an orientation update amount indicated by geo-pose data and may be updated in such a manner that a specific direction (for example, the north) in the corresponding map and the direction of a predetermined axis (for example, the X axis or the Y axis) indicated by the public pose data are matched with each other. Further, the common environmental map data update unit 72 also updates delta pose data included in key frame link data having the geo-key frame ID of the intermediate geo-key frame data in question as its link source ID or link destination ID.

Here, an exemplary flow of common environmental map data update processing that is performed in the common server 16 according to the present embodiment is described with reference to the flow charts of FIG. 8A and FIG. 8B.

First, the map service access unit 66 acquires the corresponding map from the map service 18 (S101).

Then, the pattern identification unit 68 identifies linear portions of interest in a common environmental map indicated by common environmental map data stored in the common environmental map data storage unit 64 (S102). Here, a plurality of linear portions of interest may be identified.

Then, the corresponding element identification unit 70 selects one linear portion of interest on which the processing in S104 to S106 has not been executed, among the linear portions of interest identified in the processing in S102 (S103).

Then, the corresponding element identification unit 70 identifies a corresponding element that appears in the corresponding map acquired in the processing in S101 and that is associated with the linear portion of interest selected in the processing in S103 (S104).

Then, the common environmental map data update unit 72 identifies, as to-be-updated geo-key frame data, geo-key frame data corresponding to a sensing range that covers whole or part of the linear portion of interest selected in the processing in S103 (S105). Here, a plurality of pieces of geo-key frame data may be identified as to-be-updated geo-key frame data.

Then, the common environmental map data update unit 72 updates, on the basis of the location in the corresponding map of the corresponding element identified in the processing in S104, one or the plurality of pieces of to-be-updated geo-key frame data identified in the processing in S105 (S106). In the processing in S106, for example, geo-pose data included in the to-be-updated geo-key frame data is updated.

Then, the corresponding element identification unit 70 confirms whether or not the processing in S104 to S106 has been executed on all the linear portions of interest identified in the processing in S102 (S107).

In a case where it is confirmed that the processing in S104 to S106 has not been executed on all the linear portions of interest identified in the processing in S102 (S107: N), the processing returns to S103.

In a case where it is confirmed that the processing in S103 to S106 has been executed on all the linear portions of interest identified in the processing in S102 (S107: Y), the common environmental map data update unit 72 confirms whether or not a plurality of pieces of to-be-updated geo-key frame data have been updated in the processing in S101 to S107 (S108).

In a case where it is confirmed that not a plurality of pieces of to-be-updated geo-key frame data have been updated (S108: N), the processing of this processing example ends.

Meanwhile, in a case where it is confirmed that a plurality of pieces of to-be-updated geo-key frame data have been updated (S108: Y), the common environmental map data update unit 72 selects one to-be-updated geo-key frame on which the processing in S110 to S112 has not been executed, among the pairs of to-be-updated geo-key frames that have been updated in the processing in S101 to S107 (S109).

Then, the common environmental map data update unit 72 identifies, as intermediate geo-key frame data, geo-key frame data associated with each of the pair of pieces of to-be-updated geo-key frame data that have been selected in the processing in S109 through one or a plurality of different pieces of key frame link data (S110). Here, a plurality of pieces of geo-key frame data may be identified as intermediate geo-key frame data.

Then, the common environmental map data update unit 72 confirms whether or not one or a plurality of pieces of intermediate geo-key frame data have been identified in the processing in S110 (S111). There may be a case where no key frame data is associated with each of the pair of to-be-updated geo-key frames that have been selected in the processing in S109 through one or a plurality of different pieces of key frame link data. In such a case, it is confirmed in the processing in S111 that no intermediate geo-key frame data has been identified.

In a case where it is confirmed in the processing in S111 that intermediate geo-key frame data has been identified (S111: Y), the common environmental map data update unit 72 updates one or the plurality of pieces of intermediate geo-key frame data that have been identified in the processing in S110 (S112). In the processing in S112, for example, geo-pose data included in the intermediate geo-key frame data is updated.

Then, the common environmental map data update unit 72 confirms whether or not the processing in S110 to S112 has been executed on all the pairs of to-be-updated geo-key frames that have been updated in the processing in S101 to S107 (S113).

Also in a case where it is confirmed in the processing in S111 that no intermediate geo-key frame data has been identified (S111: N), it is confirmed whether or not the processing in S110 to S112 has been executed on all the pairs of to-be-updated geo-key frames that have been updated in the processing in S101 to S107 (S113).

In a case where it is confirmed that the processing in S110 to S112 has not been executed on all the pairs of to-be-updated geo-key frames that have been updated in the processing in S101 to S107 (S113: N), the processing returns to S109.

Meanwhile, in a case where it is confirmed that the processing in S110 to S112 has been executed on all the pairs of to-be-updated geo-key frames that have been updated in the processing in S101 to S107 (S113: Y), the processing of this processing example ends.

When n pieces of to-be-updated geo-key frame data have been updated in the processing in S101 to S107, the processing in S109 to S112 is repeatedly executed n×(n−1)/2 times.

As described above, in the present embodiment, common environmental map data is corrected in such a manner that a predetermined pattern in the common environmental map is matched with the location of a linear element corresponding to the pattern in question in the corresponding map. Thus, according to the present embodiment, a common environmental map can be corrected to be accurately associated with the map indicated by the given map data provided by the given map service 18.

Further, with matching between the map at the macro level provided by the map service 18 and a common environmental map generated at the micro level as in the present embodiment, improvements in collection efficiency and accuracy of common environmental maps are expected.

Expected application examples of the present embodiment include various fields such as automatic operation, factory automation (FA), drone autopilot, and remote control or remote indication of equipment.

Note that the present invention is not limited to the embodiment described above.

Further, the above concrete character strings and numerical values and the concrete character strings and numerical values in the drawings are illustrative, and the present invention is not limited to those character strings and numerical values.

文章《Sony Patent | Environmental map management apparatus, environmental map management method, and program》首发于Nweon Patent

]]>
Meta Patent | Talbot pattern illuminator and display based thereon https://patent.nweon.com/26727 Thu, 26 Jan 2023 15:42:49 +0000 https://patent.nweon.com/?p=26727 ...

文章《Meta Patent | Talbot pattern illuminator and display based thereon》首发于Nweon Patent

]]>
Patent: Talbot pattern illuminator and display based thereon

Patent PDF: 加入映维网会员获取

Publication Number: 20230021670

Publication Date: 2023-01-26

Assignee: Facebook Technologies

Abstract

An illuminator for a display panel includes a light source for providing a light beam and a lightguide coupled to the light source for receiving and propagating the light beam along the substrate. The lightguide includes an array of out-coupling gratings that runs parallel to the array of pixels for out-coupling portions of the light beam from the lightguide such that the out-coupled light beam portions propagate through the substrate and produce an array of optical power density peaks at the array of pixels due to Talbot effect. A period of the array of peaks is an integer multiple of a pitch of the array of pixels.

Claims

1.A display device comprising: a display panel comprising an array of pixels on a substrate, and an illuminator for illuminating the display panel, the illuminator comprising a light source for providing a light beam, and a lightguide coupled to the light source for receiving and propagating the light beam along the substrate, the lightguide comprising a first array of out-coupling gratings; wherein the first array runs parallel to the array of pixels for out-coupling portions of the light beam from the lightguide such that the out-coupled light beam portions propagate through the substrate and produce an array of optical power density peaks at the array of pixels due to Talbot effect, wherein a period of the array of optical power density peaks is M times p, where p is a pitch of the array of pixels, and M is an integer ≥1.

2.The display device of claim 1, wherein a first pitch T1 of the first array of out-coupling gratings is M times p, and wherein a distance D from a plane comprising the first array of out-coupling gratings to a plane comprising the array of pixels is D=K(T1)2/(Nλ), where K and N are integers ≥1 and λ is a wavelength of the light beam in the substrate.

3.The display device of claim 2, wherein the first array of out-coupling gratings is disposed at a surface of the illuminator facing the substrate, wherein the distance D is equal to a thickness of the substrate.

4.The display device of claim 1, wherein gratings of the first array of out-coupling gratings are configured to focus or defocus the out-coupled portions of the light beam.

5.The display device of claim 1, wherein the lightguide comprises a first plate for propagating at least a portion of the light beam therein by a series of total internal reflections between opposed parallel surfaces of the first plate.

6.The display device of claim 5, wherein the lightguide further comprises an array of redirecting gratings for redirecting portions of the light beam for spreading the light beam within the first plate.

7.The display device of claim 5, wherein gratings of the array of out-coupling gratings comprise volume hologram gratings.

8.The display device of claim 5, wherein the lightguide further comprises a second plate for propagating at least a portion of the light beam therein by a series of total internal reflections between opposed parallel surfaces of the second plate, wherein the first and second plates are optically coupled together along their parallel surfaces.

9.The display device of claim 5, further comprising a tiltable reflector in an optical path between the light source and the first plate, wherein the tiltable reflector is configured to couple the light beam into the first plate at an angle variable by tilting the tiltable reflector, whereby in operation, positions of the optical power density peaks at the array of pixels are adjustable relative to pixels of the array of pixels.

10.The display device of claim 9, further comprising a controller operably coupled to the tiltable reflector for tilting the tiltable reflector to shift the array of optical power density peaks at the array of pixels by an integer multiple of the pitch p of the array of pixels.

11.The display device of claim 5, wherein the light source is configured to provide first, second, and third light beam components at first, second, and third wavelengths respectively, the lightguide further comprising second and third arrays of out-coupling gratings optically coupled to the first plate, wherein the first, second, and third arrays of out-coupling gratings run parallel to the array of pixels at different distances therefrom for wavelength-selective out-coupling of portions of the first, second, and third light beam components respectively, for illuminating the array of pixels through the substrate.

12.The display device of claim 1, wherein the lightguide comprises: an optical dispatching circuit coupled to the light source for receiving and splitting the light beam into a plurality of sub-beams; and a first array of linear waveguides coupled to the optical dispatching circuit for receiving the sub-beams from the optical dispatching circuit, wherein the linear waveguides run parallel to one another to propagate the sub-beams along the array of pixels, wherein the out-coupling gratings of the first array are optically coupled to linear waveguides of the first array of linear waveguides.

13.The display device of claim 12, wherein: the light source is configured to provide first, second, and third components of the light beam for carrying light at first, second, and third wavelengths respectively; the optical dispatching circuit is configured for receiving and splitting each one of the first, second, and third light beam components into a plurality of sub-beams; and the first array of linear waveguides is configured for receiving sub-beams of the first light beam component; the lightguide further comprising: second and third arrays of linear waveguides coupled to the optical dispatching circuit for receiving sub-beams of the second and third light beam components, respectively, from the optical dispatching circuit, wherein the linear waveguides of the second and third arrays are running parallel one another to propagate the sub-beams along the array of pixels; and second and third arrays of out-coupling gratings optically coupled to the second and third arrays of linear waveguides, respectively, for out-coupling portions of the second and third light beam components, respectively, for illuminating the array of pixels through the substrate.

14.The display device of claim 13, wherein the lightguide further comprises a color-selective reflector in an optical path between the first, second, and third arrays of out-coupling gratings and the substrate of the display panel, wherein the color-selective reflector is configured to provide different optical path lengths for the first, second, and third light beam components.

15.An illuminator comprising: a light source for providing a light beam; a plate for propagating at least a portion of the light beam therein by a series of total internal reflections between opposed parallel surfaces of the plate; a tiltable reflector disposed in an optical path between the light source and plate and configured to couple the light beam into the plate at a variable in-coupling angle; and a first array of out-coupling gratings optically coupled to the plate for out-coupling portions of the light beam at an out-coupling angle depending on the in-coupling angle such that the light beam portions form an array of optical power density peaks due to Talbot effect at a Talbot plane spaced apart from the plate, wherein positions of the peaks at the Talbot plane depend on the out-coupling angle of the light beam portions.

16.The illuminator of claim 15, wherein the light source is configured to provide first, second, and third components of the light beam for carrying light at first, second, and third wavelengths respectively, the plate further comprising second and third arrays of out-coupling gratings optically coupled to the plate, wherein the first, second, and third arrays of out-coupling gratings run parallel to one another for wavelength-selective out-coupling of portions of the first, second, and third light beam components respectively, to form an array of optical power density peaks due to Talbot effect at the Talbot plane at the first, second, and third wavelengths, respectively, wherein positions of the optical power density peaks at the first, second, and third wavelengths depend on the out-coupling angle of the light beam portions of the first, second, and third light beam components respectively.

17.The illuminator of claim 16, wherein the first, second, and third arrays of out-coupling gratings comprise volume gratings, wherein the volume gratings of the first, second, and third arrays of out-coupling gratings are disposed at different depths in the plate.

18.An illuminator comprising: a light source for providing a light beam; and a lightguide comprising: an optical dispatching circuit coupled to the light source for receiving and splitting the light beam into a plurality of sub-beams; a first array of linear waveguides coupled to the optical dispatching circuit for receiving the sub-beams from the optical dispatching circuit, wherein the linear waveguides run parallel to one another to propagate the sub-beams therein; and a first array of out-coupling gratings optically coupled to linear waveguides of the first array of linear waveguides for out-coupling portions of the sub-beams to form an array of optical power density peaks due to Talbot effect at a Talbot plane spaced apart from the first array of out-coupling gratings.

19.The illuminator of claim 18, wherein: the light source is configured to provide first, second, and third components of the light beam for carrying light at first, second, and third wavelengths, respectively; the optical dispatching circuit is configured for receiving and splitting each one of the first, second, and third light beam components into a plurality of sub-beams; and the first array of linear waveguides is configured for receiving sub-beams of the first light beam component; the lightguide further comprising: second and third arrays of linear waveguides coupled to the optical dispatching circuit for receiving sub-beams of the second and third light beam components, respectively, from the optical dispatching circuit, wherein the linear waveguides of the second and third arrays are running parallel one another to propagate the sub-beams; and second and third arrays of out-coupling gratings optically coupled to the second and third arrays of linear waveguides, respectively, for out-coupling portions of the second and third light beam components, respectively, such that the portions of the second and third light beam components form arrays of optical power density peaks due to Talbot effect at the Talbot plane.

20.The illuminator of claim 19, wherein the lightguide further comprises a color-selective reflector in an optical path between the first, second, and third arrays of out-coupling gratings and the Talbot plane, wherein the color-selective reflector is configured to provide different optical path lengths for the first, second, and third light beam components to the Talbot plane.

Description

TECHNICAL FIELD

The present disclosure relates to display devices and illuminators suitable for use in display devices.

BACKGROUND

Visual displays are used to provide information to viewer(s) including still images, video, data, etc. Visual displays have applications in diverse fields including entertainment, education, engineering, science, professional training, advertising, to name just a few examples. Some visual displays, such as TV sets, display images to several users, and some visual display systems are intended for individual users. Visual displays are viewed either directly, or by means of special glasses that may include optical shutters, as well as special varifocal lenses.

An artificial reality system generally includes a near-eye display (e.g., a headset or a pair of glasses) configured to present content to a user. A near-eye display may display virtual objects or combine images of real objects with virtual objects, as in virtual reality (VR), augmented reality (AR), or mixed reality (MR) applications. For example, in an AR system, a user may view images of virtual objects (e.g., computer-generated images (CGIs)) superimposed onto surrounding environment.

It is desirable to reduce size and weight of a head-mounted display. Lightweight and compact near-eye displays reduce the strain on user’s head and neck, and are generally more comfortable to wear. Oftentimes, an optics block of a wearable display is the bulkiest and heaviest module of the display, especially when the optics block includes bulk optics such as refractive lenses and cube beamsplitters. Compact planar optical components, such as waveguides, gratings, Fresnel lenses, etc., may be used to reduce the size and weight of the optics block. However, compact planar optics may have low efficiency, image distortions, ghosting, residual coloring, rainbow effects, etc., which hinders their use in wearable optical display systems.

BRIEF DESCRIPTION OF THE DRAWINGS

Exemplary embodiments will now be described in conjunction with the drawings, in which:

FIG. 1A is a schematic cross-sectional view of a display device of this disclosure;

FIG. 1B is a magnified cross-sectional view of a pixel array of the display device of FIG. 1A superimposed with peaky Talbot optical power density distribution of the illuminating light at the pixel array;

FIG. 2 is a computed map of an optical power density distribution across a display panel substrate of the display device of FIG. 1A;

FIG. 3 is a side cross-sectional view of a light-guiding plate embodiment of an illuminator of the display device of FIG. 1A;

FIG. 4 is a side cross-sectional view of an embodiment of the illuminator of FIG. 3 using a tiltable microelectromechanical (MEMS) reflector for lateral adjustment of peak positions of the Talbot optical power density distribution on a pixel array;

FIG. 5 is a schematic frontal view of a pixel array illuminated with the illuminator of FIG. 4, showing focal spots of different colors shifted by tilting the MEMS reflector;

FIG. 6 is a side cross-sectional view of an embodiment of the illuminator of FIG. 3 with focusing out-coupling gratings;

FIG. 7 is a computed map of an optical power density distribution across a display panel substrate illuminated with the illuminator of FIG. 6;

FIG. 8 is an example angular distribution of illumination energy of the illuminator of FIG. 6 with 18 micrometers wide focusing gratings having 20 micrometers focal length;

FIG. 9 is a display panel illumination map example showing an insufficient pupil replication density;

FIG. 10 is an embodiment of a light-guiding plate with a buried partial reflector for a higher pupil replication density;

FIG. 11A is a reference illumination map a light-guiding plate without the buried reflector;

FIG. 11B is an illumination map a light-guiding plate with the buried reflector;

FIG. 12 is a schematic plan view of a photonic integrated circuit (PIC) embodiment of a Talbot illuminator of this disclosure;

FIG. 13A is a top schematic view of a multi-color PIC illuminator implementation with surface-relief gratings on linear waveguides;

FIG. 13B is a top schematic view of a portion of the PIC light source of FIG. 13A superimposed with a single RGB pixel of a display panel;

FIG. 13C is a three-dimensional schematic view of linear waveguides of the PIC light source of FIG. 13A;

FIG. 14 is a cross-sectional exploded view of an embodiment of a multi-color PIC illuminator with a dichroic mirror;

FIG. 15 is a plan schematic view of a near-eye display using a display device and/or an illuminator of this disclosure; and

FIG. 16 is a view of a head-mounted display of this disclosure using a display device and/or an illuminator of this disclosure.

DETAILED DESCRIPTION

While the present teachings are described in conjunction with various embodiments and examples, it is not intended that the present teachings be limited to such embodiments. On the contrary, the present teachings encompass various alternatives and equivalents, as will be appreciated by those of skill in the art. All statements herein reciting principles, aspects, and embodiments of this disclosure, as well as specific examples thereof, are intended to encompass both structural and functional equivalents thereof. Additionally, it is intended that such equivalents include both currently known equivalents as well as equivalents developed in the future, i.e., any elements developed that perform the same function, regardless of structure.

As used herein, the terms “first”, “second”, and so forth are not intended to imply sequential ordering, but rather are intended to distinguish one element from another, unless explicitly stated. Similarly, sequential ordering of method steps does not imply a sequential order of their execution, unless explicitly stated.

In a visual display including a panel of transmissive pixels coupled to an illuminator, the efficiency of light utilization depends on a ratio of a geometrical area occupied by pixels to a total area of the display panel. For miniature displays often used in near-eye and/or head-mounted displays, the ratio can be lower than 50%. The efficient illuminator utilization can be further hindered by color filters on the display panel which on average transmit no more than 30% of incoming light. On top of that, there may exist a 50% polarization loss for polarization-based display panels such as liquid crystal (LC) display panels. All these factors considerably reduce the light utilization and overall wall plug efficiency of the display, which is undesirable.

In accordance with this disclosure, the light utilization and the wall plug efficiency of a backlit display may be improved by providing an illuminator that generates an array of light points matching the locations of the transmissive pixels of the display panel. Since the illuminating light is concentrated in areas of pixels, and accordingly inter-pixel areas receive less light, the overall wall plug efficiency of the display can be improved. The array of light points, or peaks in a lateral distribution of optical power density, may be provided by utilizing Talbot effect that re-generates a periodic optical power density distribution at an illuminator at a distance from the illuminator equal to a Talbot length. Talbot light patterns may be created separately for red, green, and blue illuminating light, obviating a need for a color filter array in the display panel.

In accordance with the present disclosure, there is provided a display device comprising a display panel including an array of pixels on a substrate and an illuminator for illuminating the display panel. The illuminator includes a light source for providing a light beam and a lightguide coupled to the light source for receiving and propagating the light beam along the substrate. The lightguide includes a first array of out-coupling gratings. The first array runs parallel to the array of pixels for out-coupling portions of the light beam from the lightguide such that the out-coupled light beam portions propagate through the substrate and produce an array of optical power density peaks at the array of pixels due to Talbot effect. A period of the array of optical power density peaks is M times p, where p is a pitch of the array of pixels, and M is an integer ≥1. A first pitch T1 of the first array of out-coupling gratings may be M times p. A distance D from a plane comprising the first array of out-coupling gratings to a plane comprising the array of pixels may be defined as D=K(T1)2/(Nλ), where K and N are integers ≥1, and λ is a wavelength of the light beam in the substrate. In embodiments where the first array of out-coupling gratings is disposed at a surface of the illuminator joining the substrate, the distance D may be equal to a thickness of the substrate. Gratings of the first array of out-coupling gratings may be configured to focus or defocus the out-coupled portions of the light beam.

In some embodiments, the lightguide comprises a first plate for propagating at least a portion of the light beam in the lightguide by a series of total internal reflections between opposed parallel surfaces of the first plate. The lightguide may include an array of redirecting gratings for redirecting portions of the light beam for spreading the light beam within the first plate. The gratings may include volume hologram gratings, for example. The lightguide may further include a second plate for propagating at least a portion of the light beam therein by a series of total internal reflections between opposed parallel surfaces of the second plate. The first and second plates may be optically coupled together along their parallel surfaces.

The display device may further include a tiltable reflector in an optical path between the light source and the first plate. The tiltable reflector may be configured to couple the light beam into the first plate at an angle variable by tilting the tiltable reflector. Positions of the optical power density peaks at the array of pixels may be adjusted relative to pixels of the array of pixels by tilting the tiltable reflector. A controller may be operably coupled to the tiltable reflector for tilting the tiltable reflector to shift the array of optical power density peaks at the array of pixels by an integer multiple of the pitch p of the array of pixels.

The light source may be configured to provide first, second, and third light beam components at first, second, and third wavelengths respectively. For such embodiments, the lightguide may further include second and third arrays of out-coupling gratings optically coupled to the first plate. The first, second, and third arrays of out-coupling gratings may run parallel to the array of pixels at different distances from the array of pixels, for wavelength-selective out-coupling of portions of the first, second, and third light beam components respectively for illuminating the array of pixels through the substrate.

In some embodiments, the lightguide includes an optical dispatching circuit coupled to the light source for receiving and splitting the light beam into a plurality of sub-beams, and a first array of linear waveguides coupled to the optical dispatching circuit for receiving the sub-beams from the optical dispatching circuit. The linear waveguides may run parallel to one another to propagate the sub-beams along the array of pixels. The out-coupling gratings of the first array may be optically coupled to linear waveguides of the first array of linear waveguides. In embodiments where the light source is configured to provide first, second, and third components of the light beam for carrying light at first, second, and third wavelengths respectively, the optical dispatching circuit may be configured for receiving and splitting each one of the first, second, and third light beam components into a plurality of sub-beams. The first array of linear waveguides may be configured for receiving sub-beams of the first light beam component. The lightguide may further include second and third arrays of linear waveguides coupled to the optical dispatching circuit for receiving sub-beams of the second and third light beam components, respectively, from the optical dispatching circuit. The linear waveguides of the second and third arrays may run parallel one another to propagate the sub-beams along the array of pixels. The lightguide may further include second and third arrays of out-coupling gratings optically coupled to the second and third arrays of linear waveguides, respectively, for out-coupling portions of the second and third light beam components, respectively, for illuminating the array of pixels through the substrate. The lightguide may further include a color-selective reflector in an optical path between the first, second, and third arrays of out-coupling gratings and the substrate of the display panel. The color-selective reflector may be configured to provide different optical path lengths for the first, second, and third light beam components, to make sure that Talbot fringes for all wavelengths are on a same plane corresponding to the plane of the array of pixels.

In accordance with the present disclosure, there is provided an illuminator comprising a light source for providing a light beam. A plate may be configured for propagating at least a portion of the light beam therein by a series of total internal reflections between opposed parallel surfaces of the plate. A tiltable reflector may be disposed in an optical path between the light source and plate and configured to couple the light beam into the plate at a variable in-coupling angle. A first array of out-coupling gratings may be optically coupled to the plate for out-coupling portions of the light beam at an out-coupling angle depending on the in-coupling angle such that the light beam portions form an array of optical power density peaks due to Talbot effect at a Talbot plane spaced apart from the plate, positions of the peaks at the Talbot plane depending on the out-coupling angle of the light beam portions.

In embodiments where the light source is configured to provide first, second, and third components of the light beam for carrying light at first, second, and third wavelengths respectively, the plate may further include second and third arrays of out-coupling gratings optically coupled to the plate. The first, second, and third arrays of out-coupling gratings run parallel to one another for wavelength-selective out-coupling of portions of the first, second, and third light beam components respectively, to form an array of optical power density peaks due to Talbot effect at the Talbot plane at the first, second, and third wavelengths, respectively. Positions of the optical power density peaks at the first, second, and third wavelengths depend on the out-coupling angle of the light beam portions of the first, second, and third light beam components respectively. The first, second, and third arrays of out-coupling gratings may include volume gratings in some embodiments. The volume gratings of the first, second, and third arrays of out-coupling gratings may be disposed at different depth levels in the plate.

In accordance with the present disclosure, there is further provided an illuminator comprising a light source for providing a light beam, and a lightguide. The lightguide includes an optical dispatching circuit coupled to the light source for receiving and splitting the light beam into a plurality of sub-beams, and a first array of linear waveguides coupled to the optical dispatching circuit for receiving the sub-beams from the optical dispatching circuit. The linear waveguides run parallel to one another to propagate the sub-beams in the linear waveguides. The lightguide further includes a first array of out-coupling gratings optically coupled to linear waveguides of the first array of linear waveguides for out-coupling portions of the sub-beams to form an array of optical power density peaks due to Talbot effect at a Talbot plane spaced apart from the first array of out-coupling gratings.

In embodiments where the light source is configured to provide first, second, and third components of the light beam for carrying light at first, second, and third wavelengths, respectively, the optical dispatching circuit may be configured for receiving and splitting each one of the first, second, and third light beam components into a plurality of sub-beams. The first array of linear waveguides may be configured for receiving sub-beams of the first light beam component. The lightguide may further include second and third arrays of linear waveguides coupled to the optical dispatching circuit for receiving sub-beams of the second and third light beam components, respectively, from the optical dispatching circuit. The linear waveguides of the second and third arrays may run parallel one another to propagate the sub-beams. The lightguide may further include second and third arrays of out-coupling gratings optically coupled to the second and third arrays of linear waveguides, respectively, for out-coupling portions of the second and third light beam components, respectively, such that the portions of the second and third light beam components form arrays of optical power density peaks due to Talbot effect at the Talbot plane. The lightguide may further include a color-selective reflector in an optical path between the first, second, and third arrays of out-coupling gratings and the Talbot plane. The color-selective reflector may be configured to provide different optical path lengths for the first, second, and third light beam components to the Talbot plane.

Referring now to FIGS. 1A and 1B, a display device 100 (FIG. 1A) includes a display panel 102 and an illuminator 104 for illuminating the display panel 102. The display panel 102 includes an array of pixels 106 supported by a substrate 108. By way of a non-limiting example, the display panel 102 may be a liquid crystal (LC) panel including a thin layer of LC fluid between a pair of substrates, one of the substrates carrying an array of electrodes defining transmissive LC pixels. The illuminator 104 includes a light source 110 that provides a light beam 112, and a lightguide 114 coupled to the light source 110. The lightguide 114 receives and propagates the light beam 112 along the substrate 108, and may spread the beam along and across the lightguide 114, i.e. it may spread the beam in XY plane.

The lightguide 114 includes an array of out-coupling gratings 116 running parallel to the array of pixels 106. In operation, the out-coupling gratings 116 out-couple portions 118 of the light beam 112 from the lightguide 114 such that the out-coupled light beam portions 118 propagate through the substrate 108 and produce an array of optical power density peaks 120 (FIG. 1B) at the array of pixels 106 due to Talbot effect. Positions of the optical power density peaks 120 are coordinated with positions of the pixels 106, such that most of the illuminating light propagates through the pixels 106 and does not get blocked by opaque inter-pixel areas 107, improving the overall light throughput and, consequently, wall plug efficiency of the display device 100. One peak 120 may be provided per one pixel 106 as shown. In some embodiments, a distance between the peaks 120 may be equal to M times p, where p is a pitch of the array of pixels, and M is an integer ≥1. For example, in embodiments where Talbot pattern is produced at several wavelengths of the illuminating light, one peak 120 at a wavelength of a particular color channel may be provided per color sub-pixel of the pixel array, several sub-pixels forming one RGB pixel. Such structures will be considered further below.

The light beam portions 118 propagate towards an ocular lens 122 that collimates individual light beam portions 118 and redirects them towards an eyebox 124 of the display device 100. The function of the ocular lens 122 is to form an image in angular domain in the eyebox 124 from an image in linear domain displayed by the display panel 102.

Referring to FIG. 2, a Talbot fringe pattern 200 in the substrate 108 (FIG. 1A) of the display panel 102 is illustrated as an optical power density map. The Talbot fringe pattern 200 originates at a first plane 201 disposed parallel to the XY plane in FIG. 1A. The out-coupling gratings 116 are disposed in the first plane 201. Light propagates from left to right in FIG. 2, forming arrays of optical power density peaks at various distances from the first plane 201. The optical power density distribution at the first plane 201 is repeated at a second plane 202 separated from the first plane 201 by a Talbot pattern period, which is equal to 0.5 mm in this example. The array of pixels 106 may be located at the second plane 202. For embodiments where the array of out-coupling gratings 116 is disposed at a surface of the illuminator joining the substrate 108 as shown in FIG. 1A, the Talbot pattern period may be simply equal to a thickness of the substrate. More generally, a distance D between a plane of the out-coupling gratings and a plane of the pixels may include only a fraction of the Talbot pattern, or several such patterns, according to the following Eq. (1)

D=K(T1)2/(Nλ), (1)

where K and N are integers ≥1, and where λ is a wavelength of the light beam in the substrate. In Eq. (1) above, K is the number of repetitions of the Talbot pattern, and N defines sub-planes of Talbot peaks with a higher pitch. For example, at a middle plane 203 separated form the first 201 and second 202 planes by 0.25 mm, the pitch is doubled.

Referring to FIG. 3, a display device 300 is an embodiment of the display device 100 of FIG. 1. The display device 300 of FIG. 3 uses a light-guiding plate to expand the illuminating light along and across a display panel, i.e. in the XY plane. In the example shown, the display device 300 includes a display panel 302 coupled to an illuminator 304. The display panel 302 includes an array of pixels 306 separated by inter-pixel gaps or black grid 307 and supported by a transparent substrate 308. The display panel 302 may be e.g. an LC panel including a thin layer of LC fluid between a pair of substrates, an electrode pattern on one of the substrates defining transmissive pixels in the LC fluid layer. The illuminator 304 includes a light source 310 that provides a light beam 312, and a lightguide 314 coupled to the light source 310 by an in-coupling grating 335. The lightguide 314 receives and propagates the light beam 312 along the substrate 308, spreading the light beam 312 along and across the lightguide 314, i.e. in X- and Y-directions.

The lightguide 314 includes a transparent plano-parallel plate 334 having opposed parallel surfaces 331 and 332. In operation, the plate 334 receives the light beam 312 from the light source 310 and propagates the light beam 312 within by a series of total internal reflections, or TIRs, from the opposed parallel surfaces 331 and 332, as illustrated in FIG. 3. An array of out-coupling gratings 316, e.g. volume hologram gratings including, without limitation, polarization volume hologram (PVH) gratings, out-couples portions 318 of the light beam 312 from the plate 334. The light beam portions 318 propagate through the substrate 308, forming an array of optical power density peaks at the array of pixels 306 of the display panel 302 due to Talbot effect, similarly to what has been explained above with reference to FIGS. 1A, 1B, and FIG. 2. The lightguide 314 may further include redirecting gratings 337 for redirecting portions 318 of the light beam 312 for spreading the light beam 312 within the plate 334 in X- and Y-directions.

Turning to FIG. 4, a display device 400 is an embodiment of the display device 100 of FIG. 1. The display device 400 is similar to the display device 300 of FIG. 3 in that it also uses a light-guiding plate to expand the illuminating light along and across a display panel. In the example shown, the display device 400 includes the display panel 302 coupled to an illuminator 404. The illuminator 404 uses the light source 310 to provide the light beam 312, which is in-coupled by the in-coupling grating 335 into the plate 334 propagating the light beam 312 by a series of TIRs from the opposed surfaces 331 and 332 of the plate 334. The out-coupling gratings 316 out-couple the portions 318 of the light beam 312 to propagate through the substrate 308. The light beam portions 318 form a Talbot pattern of optical power density at the array of pixels 306 of the display panel 302. The Talbot pattern has a plurality of peaks 420. The redirecting gratings 337 facilitate spreading the light beam 312 along and across the plate 334.

What makes the display device 400 different from the display device 300 of FIG. 3 is that the illuminator 404 of FIG. 4 includes a tiltable reflector 440 in an optical path between the light source 310 and the plate 334. The tiltable reflector 440, e.g. a microelectromechanical system (MEMS) reflector, is configured to couple the light beam 312 into the plate 334 by redirecting the light beam 312 to the in-coupling grating 335 at an angle variable, or adjustable, by tilting the tiltable reflector 440. As the in-coupling angle of the light beam 312 varies, so is the out-coupling angle of the light beam portions 318. This is illustrated in FIG. 4 by solid arrows 418 representing a nominal out-coupling angle of the light beam portions 318, and dashed arrows 418* representing the light beam portions 318 out-coupled at a different angle defined by the current tilting angle of the tiltable reflector 440. The adjustable out-coupling angle of the light beam portions 318 makes positions of peaks 420 of the optical power density distribution at the plane of the array of pixels 306 adjustable relative to the pixels 306. For example, when the light beam portions 318 exit at the tilted angle as indicated by the arrows 418*, the peaks 420 shift to positions indicated at 420*.

The adjustability of positions of the peaks 420 can be used to precisely center the peaks 420 on the pixels 306. This may be done e.g. during calibration of the manufactured display unit to increase the portion of light propagated through the display panel 302, thereby improving the wall plug efficiency of the display unit. The display device 400 may further include a controller 450 operably coupled to the tiltable reflector 440. The controller 450 may provide a control signal to tilt the tiltable reflector 440, which causes the optical power density distribution at the array of pixels 306 to shift as required.

When the controller 450 shifts the peaks 420 of the optical power density distribution by an entire pitch of the pixels 306, the optical throughput of the light beam components 318 through the display panel 302 reaches a maximum value again. It is to be noted that at an optical power density distribution shifted by an integer multiple of the pitch of the array of pixels 306, the overall direction of the light propagated through the display panel 302 changes by small discrete amounts. Therefore, by tilting the tiltable reflector 440, one may steer an output pupil of the display device to a required location, by performing a plurality of non-zero steps. The required beam location may correspond, for example, to a location of a user’s eye pupil determined by an eye tracking system of the display device. The pupil steering enables one to further improve the overall light utilization and wall plug efficiency of the display unit.

In some embodiments, the light source 310 may be configured to provide light beam components for individual color channels such as red (R), green (G), and blue (B) color channels. The light source 310 may provide first, second, and third beam components at first (e.g. red), second (e.g. green), and third (e.g. blue) wavelengths respectively. The first, second, and third beam components may be combined into a single light beam by using a wavelength division multiplexor (WDM), which may include a set of dichroic mirrors, for example. In embodiments where the light beam includes beam components at different wavelengths, the lightguide may include first, second and third arrays of wavelength-selective out-coupling gratings optically coupled to the light-guiding plate. The first, second, and third arrays of out-coupling gratings may extend or run parallel to the array of pixels for wavelength-selective out-coupling of portions of the first, second, and third light beam components respectively, for illuminating the array of pixels through the substrate. Since the distance D between a plane of the out-coupling gratings and a plane of the pixels defined by Eq. (1) above includes a wavelength of the light beam, the first, second, and third arrays of out-coupling gratings would need to be disposed at different distances from the plane of the array of pixels, i.e. at different depths in the light-guiding plate, to ensure that sharp Talbot peaks are formed at the same plane for different color channels.

The first, second, and third arrays of out-coupling gratings may be offset from one another, so as to form arrays of laterally offset Talbot peaks for illuminating sub-pixels of different color channels. For example, referring to FIG. 5, an array of pixels 500 may be illuminated at a moment of time T=0 ms with a first array of red Talbot peaks (“R”, grey-shaded circles), a second array of green Talbot peaks (“G”, white circles), and a third array of blue Talbot peaks (“B”, black circles), as illustrated with the top row of pixels. At T=0 ms, the pixels 500 of the array are assigned roles of R, G, and B sub-pixels, in going from left to right. In other words, the transmission values of these pixels is set according to the relative strength of R, G, and B color channels for the particular RGB pixels. At T=3 ms, the roles shift by one pixel, i.e. the pixels 500 of the array are assigned roles of G, B, and R sub-pixels, in going from left to right. At T=6 ms, the roles shift by one more pixel, i.e. the pixels 500 of the array are assigned roles of B, R, and G sub-pixels, in going from left to right. Further shifts may be possible. This enables one to provide a horizontal spatial resolution of one sub-pixel and not three sub-pixels forming an RGB pixel, improving the overall spatial resolution of RGB pixels by a factor of three. The same technique can of course be applied to vertically shifted pixels. Also, it is to be noted that the time interval of 3 ms in FIG. 5 is meant only as an example; other time intervals are of course possible. Shifting illuminating beam components by an integer multiple of a pixel pitch allows a very flexible assignment of R, G, and B sub-pixels to individual pixels of the array of pixels 500, enabling one to improve the overall achievable throughput and spatial resolution.

Referring now to FIG. 6, a display device 600 an embodiment of the display device 100 of FIG. 1. The display device 600 is also similar to the display device 300 of FIG. 3 in that it uses the light-guiding plate 334 to expand the illuminating light over a display panel. Only a portion of the substrate 308 of the display panel is shown in FIG. 6 for simplicity.

The display device 600 includes the light source 310 to provide the light beam 312, which is in-coupled by the in-coupling grating 335 into the plate 334 of a lightguide 614 propagating the light beam 312 by a series of TIRs from the opposed surfaces 331 and 332 of the plate 334. Out-coupling gratings 616 of the lightguide 614 are configured to focus the out-coupled portions 318 of the light beam 312 at focal points 618 disposed on a same focal plane 605 disposed at a non-zero distance from the plate 334. The distance to the focal plane 605 is equal to a focal length f of the gratings 616. To provide the focusing (narrowing the beam) or defocusing (widening the beam) capability, the out-coupling gratings 616 may include volume gratings, e.g. polarization volume gratings (PVH), with curved fringes to provide the focusing function. Focusing the light beam portions 318 at the focal plane 605 away from the lightguide 614 enables one to increase, by the focal length f, the thickness of the substrate of a display panel illuminated with the lightguide 614 with Talbot optical power density distribution.

The latter point is illustrated in FIG. 7, which shows a map of optical power density in a substrate of a display panel illuminated with the lightguide 614 of FIG. 6, e.g. in the display panel substrate 308. The out-coupled beam portions 318 (FIG. 6) are focused at a focal plane 705 (FIG. 7) corresponding to the focal plane 605 in FIG. 6, which is 0.1 mm away from a first plane 701 of the out-coupling gratings 616, within the substrate 308 of the display panel 600. The area of the Talbot fringe pattern 700 of FIG. 7 between the first plane 701 and a second plane 702 where the pixel array is located is similar to the Talbot fringe pattern 200 of FIG. 2. Focusing forward i.e. downstream in the optical path allows one to use a thicker substrate of a display panel, while reducing the spot size and increasing the divergence of the light beams at the second focal plane. An additional bonus is that the increased divergence of the output light beam upstream of the ocular lens increases the exit pupil size of the display at the eyebox, e.g. the eyebox 124 of the display device downstream the ocular lens 122 (FIG. 1). The exit pupil size may be increased to be larger than a typical eye pupil size, i.e. it may overfill the eye pupil. A light cone of at least +/−15 degrees may be required to overfill the eye pupil, depending on focal length of the ocular lens of the display device. As an illustration, FIG. 8 shows an angular distribution of optical power density in arbitrary units vs. the beam angle of the light beam exiting a pixel of the pixel array with the light beam portions out-coupled and focused by an array of 18 micrometers wide volume hologram gratings with 20 micrometers focal length.

A pupil-replicating lightguide used as an illuminator for a display panel needs to provide a homogeneous illumination of the display panel. Referring to FIG. 9, a 30×50 mm miniature display panel is illuminated with a plano-parallel light guiding plate equipped with a 2D array of out-coupling gratings. Although the pitch of the gratings is very tight and may be as tight as the pitch of the display pixels, the illumination pattern remains highly non-uniform. The non-uniformity is represented by a hexagonal array of round light spots seen in FIG. 9, and is caused not by the insufficient pitch of the out-coupling gratings but by too large a step between TIRs of the illuminating light beam in the light-guiding plate.

The step may be reduced by using a thinner light-guiding plate and/or by providing several such plates optically coupled into a stack by common partially reflective surface(s) interleaved between the plates. Referring for an example to FIG. 10, a light-guiding plate 1000 includes a buried partial reflector 1010, e.g. a 50% reflector, at a middle thickness of the light-guiding plate 1000. In practical terms, the light-guiding plate 1000 may be made of a pair of thinner plates 1001 and 1002 optically coupled together by the partial reflector 1010 parallel to the XY plane along one of their respective parallel light-guiding surfaces parallel to the XY plane.

In operation, a light beam 1012 is in-coupled by an in-coupling grating 1035 into the first plate 1001, and is split at a point 1009 into two beams, a first beam 1021 and a second beam 1022, which partially propagate in the respective plates 1001 and 1002, and partially cross over into the other plates as they propagate further. The net result of such beam propagation is that a step S that each beam 1021 and 1022 makes along the X-direction halves, which doubles the density of the illuminating light spots, eventually merging them into a continuous illumination pattern.

The latter point is illustrated in FIGS. 11A and 11B. Referring first to FIG. 11A, a reference illumination pattern 1100A includes a series of light spots 1102, each spot 1102 corresponding to an area where the light beam reflects from one of the parallel surfaces of the plate. The excessive plate thickness in comparison with the light beam diameter causes the separate illumination spots 1102 to appear. Turning to FIG. 11B, an illumination pattern 1100B corresponds to the case of a light-guiding plate having a buried at the middle thickness 50% reflector. The illumination pattern 1100B includes a continuous bar 1104, which consists of densely overlapping light spots. This example shows the beam expansion on one direction only. Two-directional beam expansion shows a similar trend, with a continuous illumination of the entire display panel. The illuminating light beam should have a sufficiently broad spectrum and, accordingly, a sufficiently small coherence length to wash out any optical interference between neighboring light spots. By way of a non-limiting example, at a spectral width of a superluminescent light-emitting diode (SLED) or a pulsed laser diode of about 5 nm, the coherence length is about 60 micrometers. For as long as a lightguide plate thickness is much larger than the coherence length, the interference-caused speckle pattern is washed out.

Referring now to FIG. 12, an illuminator 1204 includes a light source 1210 for providing a light beam 1212 to a lightguide 1234. The lightguide 1234 includes an optical dispatching circuit 1241 coupled to the light source 1210. The optical dispatching circuit 1241 is based on linear waveguides and is configured to receive and split the light beam 1212 into a plurality of sub-beams propagating in individual linear waveguides. Herein, the term “linear waveguide” denotes a waveguide that bounds the light propagation in two dimensions, like a light wire. A linear waveguide may be straight, curved, etc.; in other words, the term “linear” does not mean a straight waveguide section. One example of a linear waveguide is a ridge-type waveguide.

To split the light beam 1212 into a plurality of sub-beams, the optical dispatching circuit 1241 may include a binary tree of 1×2 waveguide splitters 1244 coupled to one another by linear waveguides 1245. Other configurations of the optical dispatching circuit 1241 are possible, e.g. they may be based on a tree of Mach-Zehnder interferometers, and may include separate waveguide trees for light source components at different wavelengths, e.g. wavelengths of different color channels.

The lightguide 1234 further includes an array of linear waveguides 1242 coupled to the optical dispatching circuit 1241 for receiving the sub-beams from the optical dispatching circuit 1241. The linear waveguides 1242 run parallel to one another to propagate the sub-beams in them. The lightguide 1234 further includes an array of out-coupling gratings 1216 optically coupled to linear waveguides 1242 of the array of linear waveguides for out-coupling portions of the sub-beams propagating in the linear waveguides 1242. The out-coupling gratings 1216 are disposed parallel to the XY plane as shown, and perform a same or similar function as the out-coupling gratings 116 of the lightguide 114 of the illuminator 104 of FIG. 1A. Specifically, the out-coupling gratings 1216 out-couple the sub-beam portions from the respective linear waveguides 1242 to propagate through a substrate of a display panel and form arrays of optical power density peaks due to Talbot effect at a Talbot plane spaced apart from the array of out-coupling gratings 1216, as has been explained above with reference to FIGS. 1A, 1B, and FIG. 2.

FIGS. 13A to 13C illustrate one possible implementation of the lightguide 1234 of FIG. 12. Referring first to FIGS. 13A and 13B, a photonic integrated circuit (PIC) illuminator 1304 includes a substrate 1306 and an array of linear waveguides 1307 supported by the substrate 1306 and running along the array of pixels of a display panel. In the PIC illuminator 1304 shown in FIG. 13A, the linear waveguides 1307 include an array of “red waveguides” 1307R for conveying light at a red wavelength, an array of “green waveguides” 1307G for conveying light at a green wavelength, and an array of “blue waveguides” 1307B for conveying light at a blue wavelength. Light 1308 at different wavelengths may be generated by a multi-wavelength light source 1310 and distributed among different waveguides 1307R, 1307G, and 1307B by an optical dispatching circuit 1319, which is a part of the PIC. The function of the dispatching circuit 1319 is to expand the light along Y-direction and to reroute the light into the array of linear waveguides 1307. One row of pixels of the display panel may be disposed across all the linear waveguides 1307R, 1307G, and 1307B of red, green, and blue color channels respectively, the linear waveguides extending vertically in FIG. 13A. A row of pixels is outlined with dashed rectangle 1313 in FIG. 13A.

FIG. 13B is a magnified view of three color channel waveguides under a single pixel 1303 of the display panel. Each of the three color sub-pixels corresponds to one of a red (R), green (G), and blue (B) color channel of the image, respectively. More than three color sub-pixels may be provided, e.g. in a RGGB scheme. Light portions may be out-coupled, or redirected, from the ridge waveguides 1307R, 1307G, and 1307B by the respective gratings 1312R, 1312G, and 1312B shown in FIG. 13C forming corresponding arrays of gratings for each color channel. The gratings 1312R, 1312G, and 1312B may be chirped for focusing the out-coupled light beam in a direction along the waveguides, i.e. vertically in FIGS. 13A and 13B i.e. along X-axis. Additionally, the grating groove can be curved, to focus light in the horizontal direction, in FIGS. 13A and 13B i.e. along Y-axis. In the example of FIG. 13C, gratings 1312R, 1312G, and 1312B are formed in linear waveguides 1307R, 1307G, and 1307B respectively, although in some embodiments the arrays of gratings may be formed separately and optically coupled to the array of linear waveguides 1307.

For focusing the out-coupled light beams in horizontal direction in FIG. 13B, 1D microlenses 1318 may be provided as shown. Herein, the term “1D microlenses” denotes lenses that focus light predominantly in one dimension, e.g. cylindrical lenses. 2D lenses, i.e. lenses focusing light in two orthogonal planes, may be provided instead of 1D lenses. The array of microlenses 1318 disposed in an optical path between the gratings 1312R, 1312G, and 1312B and the pixels 1303R, 1303G, and 1303B may be used to at least partially focus of the light redirected by the gratings 1312R, 1312G, and 1312B for propagation through corresponding sub-pixels 1303R, 1303G, and 1303B. The configuration is shown in FIG. 13B for one white pixel 1303. The white pixel configuration may be repeated for each white pixel of the display panel.

Turning to FIG. 14, an illuminator 1404 includes the elements of the illuminator 1204 of FIG. 12. The waveguide structures of the optical dispatching circuit 1241, including the waveguide splitters and coupling linear waveguides and the array of straight linear waveguides 1242, are formed in a core layer 1452 supported by a substrate 1454. The illuminator 1404 further includes a color-selective reflector 1456 in an optical path of the light beam 1212 between the arrays of out-coupling gratings 1216 formed in the core layer 1452 on one hand, and a substrate 1408 of the display panel on the other. The color-selective reflector 1456 is configured to provide different optical path lengths for the light beam components at different wavelengths. To that end, the color-selective reflector 1456 may include first 1461, second 1462, and third 1463 reflectors supported by a reflector substrate 1464 at different depths (i.e. different Z-coordinates) within the reflector substrate 1464. The first 1461 and second 1462 reflectors may be dichroic reflectors. The first reflector 1461 reflects light at a first wavelength and transmits light at second and third wavelengths; the second reflector 1462 transmits light at the first and third wavelengths, and reflects the light at the second wavelength. The third reflector may be a 100% mirror reflecting light at all wavelengths, or may also be a dichroic mirror that only reflects light at the third wavelength, to reduce color channel crosstalk.

In operation, the light beam 1212 carries first 1271, second 1272, and third 1273 beam components for carrying light at first, second, and third wavelengths, respectively. For example, the first 1271, second 1272, and third 1273 beam components may be at red, green, and blue wavelengths respectively. The first beam component 1271 is reflected by the first reflector 1461, with the remaining beam components 1272 and 1273 being transmitted through. The second beam component 1272 is reflected by the second reflector 1462, with the third beam component 1273 being transmitted through. Finally, the third beam component 1273 is reflected by the third reflector 1463. As a result of the split propagation, different beam components will propagate different distances before they reach the substrate 1408 of the display panel. The different distances may be selected to compensate for the different distances to Talbot plane for light at different wavelengths, as defined by Eq. (1) above, causing the peaky Talbot patterns to overlap at the pixel plane of the display panel.

Turning to FIG. 15, a near-eye display 1500 includes a frame 1501 having a form factor of a pair of eyeglasses. The frame 1501 supports, for each eye: a display panel 1510 for providing an image in linear domain, an ocular lens 1520 for converting the image in linear domain into an image in angular domain for observation by an eye placed into an eyebox 1512, and a display panel illuminator 1530, including any illuminator of this disclosure. The display panel illuminator 1530 illuminates the pixel array of the display panel 1510 with a Talbot illumination pattern coordinated with the pixel array pattern as explained herein. A plurality of eyebox illuminators 1506 illuminate the eyebox 1512, and an eye-tracking camera 1504 takes live images of the user’s eyes. The eye illuminators 1506 may be supported by the display panel 1510.

The purpose of the eye-tracking cameras 1504 is to determine position and/or orientation of both eyes of the user. Once the position and orientation of the user’s eyes are known, the exit pupil of the back-illuminated displays may be adjusted to send the display light to the eye pupils, e.g. by relying on tiltable reflectors built into the display panel illuminator of FIG. 4. Dynamic exit pupil steering improves the overall light utilization efficiency of the near-eye display 1500. Another use of the eye tracking data is for calibration or image correction purposes. Furthermore, the imagery displayed by the near-eye display 1500 may be dynamically adjusted to account for the user’s gaze, for a better fidelity of immersion of the user into the displayed augmented reality scenery, and/or to provide specific functions of interaction with the augmented reality.

The eyebox illuminators 1506 illuminate the eyes at the corresponding eyeboxes 1512, for the eye-tracking cameras to obtain the images of the eyes, as well as to provide reference reflections i.e. glints. The glints may function as reference points in the captured eye images, facilitating the eye gazing direction determination by determining position of the eye pupil images relative to the glints images. To avoid distracting the user with illuminating light, the latter may be made invisible to the user. For example, infrared light may be used.

Turning to FIG. 16, an HMD 1600 is another example of a wearable display system that may use illuminators and/or display devices of this disclosure. The function of the HMD 1600 is to augment views of a physical, real-world environment with computer-generated imagery, and/or to generate the entirely virtual 3D imagery. The HMD 1600 may include a front body 1602 and a band 1604. The front body 1602 is configured for placement in front of eyes of a user in a reliable and comfortable manner, and the band 1604 may be stretched to secure the front body 1602 on the user’s head. A display system 1680 may be disposed in the front body 1602 for presenting AR/VR imagery to the user. Sides 1606 of the front body 1602 may be opaque or transparent.

In some embodiments, the front body 1602 includes locators 1608 and an inertial measurement unit (IMU) 1610 for tracking acceleration of the HMD 1600, and position sensors 1612 for tracking position of the HMD 1600. The IMU 1610 is an electronic device that generates data indicating a position of the HMD 1600 based on measurement signals received from one or more of position sensors 1612, which generate one or more measurement signals in response to motion of the HMD 1600. Examples of position sensors 1612 include: one or more accelerometers, one or more gyroscopes, one or more magnetometers, another suitable type of sensor that detects motion, a type of sensor used for error correction of the IMU 1610, or some combination thereof. The position sensors 1612 may be located external to the IMU 1610, internal to the IMU 1610, or some combination thereof.

The locators 1608 are traced by an external imaging device of a virtual reality system, such that the virtual reality system can track the location and orientation of the entire HMD 1600. Information generated by the IMU 1610 and the position sensors 1612 may be compared with the position and orientation obtained by tracking the locators 1608, for improved tracking accuracy of position and orientation of the HMD 1600. Accurate position and orientation is important for presenting appropriate virtual scenery to the user as the latter moves and turns in 3D space.

The HMD 1600 may further include a depth camera assembly (DCA) 1611, which captures data describing depth information of a local area surrounding some or all of the HMD 1600. To that end, the DCA 1611 may include a laser radar (LIDAR), or a similar device. The depth information may be compared with the information from the IMU 1610, for better accuracy of determination of position and orientation of the HMD 1600 in 3D space.

The HMD 1600 may further include an eye tracking system 1614 for determining orientation and position of user’s eyes in real time. The obtained position and orientation of the eyes also allows the HMD 1600 to provide the display exit pupil steering, and/or to determine the gaze direction of the user and to adjust the image generated by the display system 1680 accordingly. In one embodiment, the vergence, that is, the convergence angle of the user’s eyes gaze, is determined. The determined gaze direction and vergence angle may also be used for real-time compensation of visual artifacts dependent on the angle of view and eye position. Furthermore, the determined vergence and gaze angles may be used for interaction with the user, highlighting objects, bringing objects to the foreground, creating additional objects or pointers, etc. An audio system may also be provided including e.g. a set of small speakers built into the front body 1602.

Embodiments of the present disclosure may include, or be implemented in conjunction with, an artificial reality system. An artificial reality system adjusts sensory information about outside world obtained through the senses such as visual information, audio, touch (somatosensation) information, acceleration, balance, etc., in some manner before presentation to a user. By way of non-limiting examples, artificial reality may include virtual reality (VR), augmented reality (AR), mixed reality (MR), hybrid reality, or some combination and/or derivatives thereof. Artificial reality content may include entirely generated content or generated content combined with captured (e.g., real-world) content. The artificial reality content may include video, audio, somatic or haptic feedback, or some combination thereof. Any of this content may be presented in a single channel or in multiple channels, such as in a stereo video that produces a three-dimensional effect to the viewer. Furthermore, in some embodiments, artificial reality may also be associated with applications, products, accessories, services, or some combination thereof, that are used to, for example, create content in artificial reality and/or are otherwise used in (e.g., perform activities in) artificial reality. The artificial reality system that provides the artificial reality content may be implemented on various platforms, including a wearable display such as an HMD connected to a host computer system, a standalone HMD, a near-eye display having a form factor of eyeglasses, a mobile device or computing system, or any other hardware platform capable of providing artificial reality content to one or more viewers.

The present disclosure is not to be limited in scope by the specific embodiments described herein. Indeed, other various embodiments and modifications, in addition to those described herein, will be apparent to those of ordinary skill in the art from the foregoing description and accompanying drawings. Thus, such other embodiments and modifications are intended to fall within the scope of the present disclosure. Further, although the present disclosure has been described herein in the context of a particular implementation in a particular environment for a particular purpose, those of ordinary skill in the art will recognize that its usefulness is not limited thereto and that the present disclosure may be beneficially implemented in any number of environments for any number of purposes. Accordingly, the claims set forth below should be construed in view of the full breadth and spirit of the present disclosure as described herein.

文章《Meta Patent | Talbot pattern illuminator and display based thereon》首发于Nweon Patent

]]>
Meta Patent | Systems, devices, and methods of manipulating audio data based on microphone orientation https://patent.nweon.com/26725 Thu, 26 Jan 2023 15:33:09 +0000 https://patent.nweon.com/?p=26725 ...

文章《Meta Patent | Systems, devices, and methods of manipulating audio data based on microphone orientation》首发于Nweon Patent

]]>
Patent: Systems, devices, and methods of manipulating audio data based on microphone orientation

Patent PDF: 加入映维网会员获取

Publication Number: 20230021918

Publication Date: 2023-01-26

Assignee: Meta Platforms Technologies

Abstract

An electronic device includes a microphone array configured to capture audio data, one or more sensors configured to detect an orientation of the microphone array, and digital signal processing (DSP) logic, and an interface. The DSP logic processes, based on the orientation of the microphone array detected by the one or more sensors, the audio data captured by the microphone array to form audio input data. The interface configured to transmit the audio input data over a communications channel to be output by another electronic device.

Claims

What is claimed is:

1.An electronic device comprising: a display device repositionable relative to a base; a microphone array configured to capture audio data, the microphone array fixedly attached to the display device; one or more sensors configured to detect an orientation of the display device relative to the base; and digital signal processing (DSP) logic configured to: generate a virtual directional microphone associated with one or more individual microphones of the microphone array based on the orientation of the display device relative to the base; and process the audio data to form audio input data, the audio data including audio signals received from the one or more individual microphones that correspond to the virtual directional microphone.

2.The electronic device of claim 1, further comprising: one or more audio output devices physically incorporated in the base; and driver logic configured to drive the audio output devices to output audible sound data based on the audio input data.

3.The electronic device of claim 1, wherein to process the audio data to form the audio input data, the DSP logic is configured to drop signals received from one or more individual microphones of the microphone array based on the orientation of the display device relative to the base.

4.The electronic device of claim 1, wherein to process the audio data to form the audio input data, the DSP logic is configured to constrain a search space to signals received from one or more individual microphones of the microphone array based on the orientation of the display device relative to the base.

5.The electronic device of claim 1, wherein to process the audio data to form the audio input data, the DSP logic is configured to process the audio data using a set of audio capture parameters corresponding to the orientation of the display device relative to the base.

6.The electronic device of claim 5, wherein to determine the set of audio capture parameters corresponding to the orientation of the display device relative to the base, the DSP logic is configured to select the set of audio capture parameters corresponding to the orientation of the display device relative to the base from a plurality of sets of audio capture parameters using a lookup table.

7.The electronic device of claim 5, wherein to determine the set of audio capture parameters corresponding to the orientation of the display device relative to the base, the DSP logic is configured to apply a finite element solution that determines the set of audio capture parameters corresponding to the orientation of the display device relative to the base.

8.The electronic device of claim 5, wherein to determine the set of audio capture parameters corresponding to the orientation of the display device relative to the base, the DSP logic is configured to apply an artificial intelligence model or a machine learning model trained with a mapping of rotation angles of the microphone array to respective sets of audio capture parameter sets to predict the set of audio capture parameters corresponding to the orientation of the display device relative to the base.

9.The electronic device of claim 1, wherein: the DSP logic is configured to determine horizontal coordinate angles of the microphone array based on the orientation of the display device relative to the base, and to process the audio data to form the audio input data, the DSP logic is configured to process the audio data using the horizontal coordinate angles of the microphone array.

10.The electronic device of claim 1, wherein the one or more sensors include one or more of an accelerometer, a position encoder, a gyroscope, or a motion sensor.

11.A method comprising: receiving, by one or more processors, audio data captured by a microphone array attached to a display device, the display device repositionable relative to a base; generating, by the one or more processors, a virtual directional microphone associated with one or more individual microphones of the microphone array based on an orientation of the display device relative to the base; and processing, by the one or more processors, the audio data to form audio input data, the audio data including audio signals received from the one or more individual microphones that correspond to the virtual directional microphone.

12.The method of claim 11, further comprising: driving one or more audio output devices physically incorporated into the base to output audible sound data based on the audio input data.

13.The method of claim 11, wherein processing the audio data to form the audio input data comprises dropping, by the DSP logic, signals received from one or more individual microphones of the microphone array based on the orientation of the display device relative to the base.

14.The method of claim 11, wherein processing the audio data to form the audio input data comprises constraining, by the DSP logic, a search space to signals received from one or more individual microphones of the microphone array based on the orientation of the display device relative to the base.

15.The method of claim 11, wherein processing the audio data to form the audio input data comprises processing, by the DSP logic, the audio data using a set of audio capture parameters corresponding to the orientation of the display device relative to the base.

16.The method of claim 15, wherein determining the set of audio capture parameters corresponding to the orientation of the display device relative to the base comprises selecting, by the driver logic, the set of audio capture parameters corresponding to the orientation of the display device relative to the base from a plurality of sets of audio capture parameters using a lookup table.

17.The method of claim 15, wherein determining the set of audio capture parameters corresponding to the orientation of the display device relative to the base comprises applying, by the driver logic, a finite element solution that determines the set of audio capture parameters corresponding to the orientation of the display device relative to the base.

18.The method of claim 11, further comprising: determining, by the DSP logic, horizontal coordinate angles of the microphone array based on the orientation of the display device relative to the base, wherein processing the audio data to form the audio input data comprises processing, by the DSP logic, the audio data using the horizontal coordinate angles of the microphone array.

19.A non-transitory computer-readable storage medium storing one or more programs configured for execution by one or more processors of an electronic device having a microphone array and a speaker array, the microphone array attached to a display device repositionable relative to a base, the one or more programs including instructions that when executed by the one or more processors cause the electronic device to: receive audio data captured by the microphone array; generate a virtual directional microphone associated with one or more individual microphones of the microphone array based on an orientation of the display device relative to the base; and process the audio data to form audio input data using signals received from the one or more individual microphones that correspond to the virtual directional microphone.

20.An electronic device comprising: a display device repositionable relative to a base; a microphone array configured to capture audio data, the microphone array fixedly attached to the display device; one or more sensors configured to detect an orientation of the display device relative to the base; and digital signal processing (DSP) logic configured to determine horizontal coordinate angles of the microphone array based on the orientation of the display device relative to the base, and process the audio data using the horizontal coordinate angles of the microphone array to form audio input data.

Description

CROSS REFERENCE TO RELATED APPLICATION

This application is a continuation of U.S. patent application Ser. No. 16/897,018, filed Jun. 9, 2020, entitled, “SYSTEMS, DEVICES, AND METHODS OF MANIPULATING AUDIO DATA BASED ON MICROPHONE ORIENTATION,” which is incorporated by reference herein in its entirety.

TECHNICAL FIELD

This disclosure generally relates to communication systems, and more particularly, to video communication systems with audio communication capabilities.

BACKGROUND

Video telephony technology, including video conferencing, video chat tools and services, etc. is becoming an increasingly popular way for friends, families, colleagues, and other groups of people to communicate with each other. Camera hardware, such as webcam hardware, is increasingly being added to various end-user devices, such as smartphones, head-mounted devices (HMDs), tablet computers, laptop computers, network-connected televisions (or so-called “smart TVs”), digital displays (e.g., computer displays), whether as integrated hardware or as add-on hardware. The increasing addition of camera hardware to connected devices is increasing the ability to video conference with others using any of a number of online video telephony services. In addition, video telephony services are increasingly incorporating audio communication hardware that is becoming more and more sophisticated, such as multiple loudspeakers with frequency band-specific output capabilities, multiple microphones arrayed to provide high-precision audio capture capabilities, etc.

SUMMARY

In general, this disclosure describes telephonic systems with audio and/or video capabilities that are configured to customize audio input parameters and/or audio output parameters based on the current orientation of the microphone array that captures the audio signals for the telephonic session. In some instances, the microphone array is fixedly attached to a display device of a video telephonic system, and the display device is repositionable relative to a second portion of the telephonic system, such as a base that includes audio output devices, such as speakers. In some examples, the telephonic systems of this disclosure set equalization parameters for audio data being or to be output by one or more speakers of the telephonic system based on the orientation of the display device of the conferencing system.

In some examples, the telephonic systems of this disclosure set digital signal processing (DSP) parameters for audio data being or to be input via a microphone array of the telephonic system based on the orientation of the display device of the conferencing system. In some examples, the telephonic systems of this disclosure set echo cancellation parameters for audio data being or to be input via the microphone array of the conferencing system based on the orientation of the display device of the telephonic system.

Telephonic systems of this disclosure may implement one, some, or all of the functionalities described above in various use case scenarios consistent with this disclosure. Moreover, the audiovisual telephonic systems of this disclosure may dynamically update one or more of the audio-related parameters listed above in response to detecting positional and/or orientation changes of the microphone array (e.g., the display device as a proxy for the positional and/or orientation changes of the microphone array).

In one example, an electronic device includes a microphone array configured to capture audio data, one or more sensors configured to detect an orientation of the microphone array, digital signal processing (DSP) logic, and an interface. The DSP logic processes, based on the orientation of the microphone array detected by the one or more sensors, the audio data captured by the microphone array to form audio input data. The interface configured to transmit the audio input data over a communications channel to be output by another electronic device.

In another example, a method includes capturing, by a microphone array, audio data. The method also includes detecting, by one or more sensors, an orientation of the microphone array. Additionally, the method includes processing, by digital signal processing (DSP) logic, based on the orientation of the microphone array detected by the one or more sensors, the audio data captured by the microphone array to form audio input data. The method further includes transmitting, by an input/output interface, the audio input data over a communications channel.

In another example, an electronic device means to capture audio data, means to detect an orientation of the microphone array, digital signal processing (DSP) means, and interface means. The DSP means, based on the orientation of the microphone array, form audio input data. The interface means transmits the audio input data over a communications channel to be output by another electronic device.

In another example, a non-transitory computer-readable storage medium stores one or more programs configured for execution by one or more processors of an electronic device having a microphone array, an interface, one or more sensors, and a speaker array. The one or more programs include instructions, which when executed by the one or more processors, cause the electronic device to capture, using the microphone array, audio data. The instructions also cause the electronic device to detect, using one or more sensors, an orientation of the microphone array. Additionally, the instructions cause the electronic device to process, based on the orientation of the microphone array detected by the one or more sensors, the audio data captured by the microphone array to form audio input data. The instructions further cause the electronic device to transmit, via the interface, the audio input data over a communications channel.

The techniques and system configurations of this disclosure provide one or more technical improvements in the technology area of video telephony. As one example, the configurations of this disclosure may improve data precision by reducing audio-video offset caused by a static microphone configuration and/or static speaker configuration while the display (and thereby the camera) hardware are moved to different positions and/or orientations. As another example, configurations of this disclosure may reduce computing resource and/or bandwidth expenditure by constraining search spaces among the microphone array’s inputs based on the position/orientation of the display, thereby reducing the amount of audio data to be processed and/or transmitted over a network connection.

The configurations of this disclosure may be advantageous in a number of scenarios. For example, the configurations of this disclosure may be advantageous in scenarios in which multiple participants participate in a conferencing session from a single location with a shared device. As another example, the configurations of this disclosure may be advantageous in scenarios in which there is ambient noise that is not germane to the content of the conference.

As another example still, the configurations of this disclosure may be advantageous in scenarios in which the display is tilted in such as a way as to point one or more microphones of the microphone array at or substantially at one or more speakers of the conferencing device. The audiovisual telephonic systems of this disclosure may provide high-quality communication experiences by modifying audio parameters “on the fly” without disrupting the communication session, while accommodating display device manipulation by the local participant(s). Accordingly, the techniques of the disclosure provide specific technical improvements to the computer-related and network-related field of video telephony.

The details of one or more examples of the techniques of this disclosure are set forth in the accompanying drawings and the description below. Other features, objects, and advantages of the techniques will be apparent from the description and drawings, and from the claims.

BRIEF DESCRIPTION OF DRAWINGS

FIG. 1A is an illustration depicting an example video telephony system engaged in an audiovisual communication session, in accordance with the techniques of the disclosure.

FIG. 1B is an illustration depicting further details of a telephonic system of FIG. 1A and its surrounding environment

FIG. 2 is a block diagram illustrating an example of a telephonic system that implements one or more of the display position-based audio data manipulation techniques of the disclosure.

FIG. 3 is a flowchart illustrating an example of a display position-based audio rendering process that the telephonic system of FIGS. 1A-2 may perform, in accordance with aspects of this disclosure.

FIG. 4 is a flowchart illustrating an example of a display position-based audio capture process that the telephonic system of FIGS. 1A-2 may perform, in accordance with aspects of this disclosure.

FIG. 5 is a flowchart illustrating an example of a display position-based echo cancellation process that the telephonic system of FIGS. 1A-2 may perform, in accordance with aspects of this disclosure.

Like reference characters refer to like elements throughout the drawings and description.

DETAILED DESCRIPTION

Video telephony services, such as multi-use communication packages that include conferencing components, transport video data and audio data between two or more participants, enabling real-time or substantially real-time communications between participants who are not at the same physical location. Video telephony services are becoming increasingly ubiquitous as a communication medium in private sector enterprises, for educational and professional training/instruction, and for government-to-citizen information dissemination. With video telephony services being used more commonly and for increasingly important types of communication, the focus on data precision and service reliability is also becoming more acute.

This disclosure is directed to configurations for telephonic systems, such as video telecommunication hardware, that improve the precision with which audio data of audiovisual communication sessions are rendered for playback to the local participant(s). Additionally, the configurations of this disclosure enable video telephonic systems to constrain audio data at the local input stage and/or at the pre-transmission stage dynamically, thereby easing bandwidth requirements in these scenarios. In this way, the configurations of this disclosure provide technical improvements with respect to data precision, compute resource expenditure, and bandwidth consumption in the computing-related and network-related technical field of video telephony.

For example, an electronic device may include a device including a microphone array and one or more audio output devices. In some implementations, the electronic device may additionally include a display device, and the microphone array may be fixedly attached to the display device, e.g., attached to a bezel of the display device or encased within a portion of the display device. The display device may be repositionable (e.g., slidable and/or rotatable) relative to the one or more audio output devices. For instance, the one or more audio output devices may include one or more speakers housed in a base of the telephonic system, and the display device may be movably coupled to the base. The base may, during use of the telephonic system, remain substantially stationary relative to the environment (e.g., room) in which the telephonic system is being used. The display device may be manually moved by a user or may be moved under control of the telephonic system. In any case, repositioning the display device may result in the microphone array being repositioned relative to the one or more audio output devices. Additionally, repositioning the display device may result in the microphone array being repositioned relative to the environment (e.g., room) in which the telephonic system is being used.

The repositioning of the microphone array may affect receipt of audio signals by the microphone array, e.g., both audio signals from in-room participants in the telephonic session and audio signals output by the one or more audio output devices of the telephonic system. In accordance with examples of this disclosure, the telephonic system may be configured to detect an orientation of the microphone array (e.g., relative to the base) and control one or more audio processing parameters based on the detected orientation. As used herein, “orientation” refers to the position, angle, and/or pose of microphone array and/or display relative each other. For example, the telephonic system may be configured to set equalization parameters for audio data being or to be output by one or more audio output devices of the telephonic system based on the orientation of the microphone array of the telephonic system.

In some examples, the telephonic system may be configured to set digital signal processing (DSP) parameters for audio data being or to be input via the microphone array based on the orientation of the microphone array of the video telephonic system. In some examples, the telephonic system may be configured to set echo cancellation parameters for audio data being or to be input via the microphone array based on the orientation of the microphone array of the telephonic system. In this way, the telephonic system may be configured to at least partially compensate for changing orientation of the microphone array with respect to the environment and/or audio output devices of the telephonic system.

While described primarily in the context of video telephony technology in this disclosure as an example, it will be appreciated that the techniques of this disclosure may implemented in other types of systems as well. For example, the configurations of this disclosure may be implemented in artificial reality systems. Artificial reality is a form of reality that has been adjusted in some manner before presentation to a user, and may include one or more of virtual reality (VR), augmented reality (AR), mixed reality (MR), hybrid reality, or some combination and/or derivative thereof. Artificial reality systems that incorporate the audio data manipulation techniques of this disclosure may update audio data captured and/or rendered for playback via a head-mounted device (HMD) or other devices incorporating display microphone and/or speaker hardware combined with hardware configured to display artificial reality content in visual form.

FIG. 1A is an illustration depicting an example video telephony system 10 having audiovisual telephonic systems 12A, 12B engaged in a telephonic session. In the example of FIG. 1A, audiovisual telephonic systems 12A and 12B are engaged in a video conferencing session, and both of audiovisual telephonic systems 12A, 12B include video input and output capabilities. In other examples, aspects of this disclosure may be applied in the context of audio telephony, such as standalone audio conferencing or combined audio/video conferencing, and may be applied seamlessly across switches between the two (e.g., if video capabilities are temporarily disabled due to bandwidth issues, etc.).

Audiovisual telephonic systems 12A, 12B of FIG. 1A are shown for purposes of example, and may represent any of a variety of devices with audio and/or audio/video telephonic capabilities, such as a mobile computing device, laptop, tablet computer, smartphone, server, stand-alone tabletop device, wearable device (e.g., smart glasses, an artificial reality HMD, or a smart watch) or dedicated video conferencing equipment. As described herein, at least one of audiovisual telephonic systems 12A, 12B is configured to set one of audio rendering parameters, audio capture parameters, or echo cancellation parameters based on the orientation of display devices 18A and 18B.

In the example of FIG. 1A, video telephony system 10 includes a first audiovisual telephonic system 12A connected to a second audiovisual telephonic system 12B over a communications channel 16. Each audiovisual telephonic system 12A, 12B includes one of display devices 18A and 18B and image capture systems 20A-20B. Each of image capture systems 20 is equipped with image capture capabilities (often supplemented with, and sometimes incorporating, one or more microphones providing voice capture capabilities). Each of image capture systems 20 includes camera hardware configured to capture still images and moving pictures of the surrounding environment.

Video telephony system 10 may in some cases be in communication, via a network, with one or more compute nodes (not shown) that correspond to computing resources in any form. Each of the compute nodes may be a physical computing device or may be a component of a cloud computing system, server farm, and/or server cluster (or portion thereof) that provides services to client devices and other devices or systems. Accordingly, any such compute nodes may represent physical computing devices, virtual computing devices, virtual machines, containers, and/or other virtualized computing device. The compute nodes may receive, process, and output video to perform techniques described herein. The compute nodes may be located at or otherwise supported by various high-capacity computing clusters, telecommunication clusters, or storage systems, such as systems housed by data centers, network operations centers, or internet exchanges.

In the example shown in FIG. 1A, participants 30A and 30B share and use audiovisual telephonic system 12A to communicate over communications channel 16 with participant 30C operating audiovisual telephonic system 12B. Audiovisual telephonic system 12A includes display device 18A and image capture system 20A, while audiovisual telephonic system 12B includes display device 18B and image capture system 20B. In various implementations, image capture system 20A and display device 18A may be included in a single device or may be separated into separate devices.

Display devices 18 and image capture systems 20 are configured to operate as video communication equipment for audiovisual telephonic systems 12A, 12B. That is, participants 30A and 30C may communicate with one another in an audio and/or video conferencing session over communications channel 16 using display devices 18 and image capture systems 20. Image capture systems 20A and 20B capture still and/or moving pictures of participants 30A-30C, respectively. Computing hardware and network interface hardware of audiovisual telephonic systems 12A and 12B process and transmit the captured images substantially in real time over communications channel 16.

Communications channel 16 may be implemented over a private network (e.g., a local area network or LAN), a public network (e.g., the Internet), a private connection implemented on public network infrastructure (e.g., a virtual private network or VPN tunnel implemented over an Internet connection), other type of packet-switched network, etc. Network interface hardware and computing hardware of the audiovisual telephonic systems 12A and 12B receive and process the images (e.g., video streams) transmitted over communications channel 16. Display devices 18 are configured to output image data (e.g., still images and/or video feeds) to participants 30, using the image data received over communications channel 16 and processed locally for rendering and output.

In this way, audiovisual telephonic systems 12A and 12B, by way of image capture systems 20, display devices 18 enable participants 30 to engage in a video conferencing session. While the video conferencing session implemented over video telephony system 10 is illustrated as including two actively communicating devices in FIG. 1A as one non-limiting example, it will be appreciated that the systems and techniques of this disclosure are scalable, in that video conferencing sessions of this disclosure may accommodate three or more participating devices in some scenarios. The systems and techniques of this disclosure are also compatible with video conferencing sessions with in-session variance in terms of the number of participants, such as video conferencing sessions in which one or more participants are added and removed throughout the lifetime of the session.

In the example of FIG. 1A, display device 18A outputs display content 24 to participants 30A, 30B. Display content 24 represents a still frame of a moving video sequence output to participants 30A, 30B as part of the video conferencing session presently in progress. Display content 24 includes a visual representation of participant 30C, who is a complementing participant to participant 30A in the video telephonic session. In some examples, display content 24 may also include a video feedthrough to provide an indication of how the image data captured by image capture system 20A appears to other users in the video telephonic session, such as to participant 30C via display device 18B. As such, a video feedthrough, if included in display content 24, would provide participants 30A, 30B with a low-to-zero time-lagged representation of the image data attributed to the surroundings of audiovisual telephonic system 12A and displayed to other participants in the video conferencing session.

Audiovisual telephonic systems 12A and 12B may provide privacy settings that facilitate operators of the audiovisual telephonic systems 12A (e.g., participants 30A and 30C, etc.) to individually specify (e.g., by opting out, by not opting in) whether the audiovisual telephonic systems 12A and 12B, or any associated online system, may receive, collect, log, or store particular objects or information associated with the participant for any purpose. For example, privacy settings may allow the participant 30A to specify whether particular video capture devices, audio capture devices, applications or processes may access, store, or use particular objects or information associated with participants 30A and 30B. The privacy settings may allow participants 30A and 30C to opt in or opt out of having objects or information accessed, stored, or used by specific devices, applications, or processes for users of respective audiovisual telephonic systems 12A and 12B. Before accessing, storing, or using such objects or information, an online system associated with audiovisual telephonic systems 12A and 12B may prompt the participants 30A and 30C to provide privacy settings specifying which applications or processes, if any, may access, store, or use the object or information prior to allowing any such action. For example, participant 30A or participant 30C may specify privacy settings that audio and visual data should not be stored by audiovisual telephonic systems 12A and 12B and/or any associated online service, and/or audiovisual telephonic systems 12A and 12B and/or any associated online service should not store any metadata (e.g., time of the communication, who participated in the communication, duration of the communication, etc.) and/or text messages associated with use of audiovisual telephonic systems 12A and 12B.

Audiovisual telephonic systems 12A, 12B also enable audio communication between participants 30A-30C, alone, or substantially in synchrony (e.g., with low-to-zero offset) with the video feeds described above. Each of audiovisual telephonic systems 12A, 12B incorporate audio capture hardware to capture audio communications provided by the local participant(s), and audio output hardware to play back audio communications received over communications channel 16. As shown in FIG. 1A, audiovisual telephonic system 12A includes (or is communicatively coupled to) each of microphone array 22 and speaker array 26. Audiovisual telephonic system 12B may also include or be coupled to corresponding microphone hardware and/or speaker hardware, but these devices are not explicitly shown or numbered in FIG. 1A for ease of illustration based on the illustrated perspective of audiovisual telephonic system 12B.

Microphone array 22 represents a data-input component that includes multiple microphones configured to capture audio data from the surrounding environment of audiovisual telephonic system 12A. In the particular example of FIG. 1A, microphone array 22 is constructed as a cluster of individual microphones disposed on the surface of a substantially spherical ball, which, in turn, is connected to the rest of audiovisual telephonic system 12A via a so-called “gooseneck” mount or stand. In other examples, the individual microphones of microphone array 22 may be integrated into the periphery of display device 18A, such as along the top width edge of display device 18A.

In some examples, microphone array 22 may represent a four-microphone array, with at least three of the four individual microphones being mounted fixedly to a top edge or panel of display device 18A, and with the four individual microphones of microphone array 22 being arranged in the general shape of a truncated pyramid array. In other examples, the individual microphones of microphone array 22 may be positioned on/within/near the remaining components of audiovisual telephonic system 12A in other ways. In any event, the relative positions of the individual microphones of microphone array with respect to one another is fixed, regardless of the position or orientation of display device 18A. Additionally, in some examples, relative positions of the individual microphones of microphone array 22 are fixed relative to a component of audiovisual telephonic system 12A, e.g., are fixed relative to display device 18A. For instance, microphone array 22 may be fixedly attached to a portion of display device 18A, such as a bezel of display device 18A.

In some examples, microphone array 22 may capture not only audio data, but additional metadata describing various attributes of the captured audio data, as well. For instance, microphone array 22 may capture a combination of audio data and directional data. In these examples, microphone array 22 may be collectively configured to capture a three-dimensional sound field in the immediate vicinity of audiovisual telephonic system 12A.

Whether captured directly by microphone array 22 or indirectly extrapolated from the collective audio signals (e.g. via audio beamforming, etc.) by digital signal processing (DSP) logic of audiovisual telephonic system 12A, audiovisual telephonic system 12A may associate directionality information with the audio data captured by each individual microphone of microphone array 22. As such, audiovisual telephonic system 12A may attach directionality information, whether determined indirectly by the DSP logic or received directly from microphone array 22, to one or more audio signals received from microphone array 22. In other words, audiovisual telephonic system 12A may process the various audio signals captured by microphone array 22 to be one-dimensional, or to have two-dimensional diversity, or to have three-dimensional diversity, depending on which individual microphones of microphone array 22 detect sound inputs of a threshold acoustic energy (e.g., sound intensity or loudness) at a given time.

As discussed in greater detail below, display device 18A may be rotated about one or more of an X axis (pitch), Y axis (yaw), or Z axis (roll), thereby changing the directionality (or directional diversity) with respect to the audio signals captured by the various microphones of microphone array 22. Display device 18A may, in some examples, also be moved translationally, such as by sliding along side panels and/or top and bottom panels that enable translational movement. As used herein, rotational and/or translational movement of display device 18A refer to orientation and/or position changes of display device with respect to an otherwise stationary component of audiovisual telephonic system 12A, such as base 34. The DSP logic or other audio processing hardware of audiovisual telephonic system 12A may encode or transcode the audio data and packetize the encoded/transcoded data for transmission over a packet-switched network, such as over communications channel 16.

Audiovisual telephonic system 12A also includes speaker array 26, as shown in FIG. 1A. One or more speakers of speaker array 26 may be included within other components of audiovisual telephonic system 12A, in various examples. In the particular example of FIG. 1A, all of the speakers of speaker array 26 are physically incorporated into another component (in this case, the base 34) of audiovisual telephonic system 12A. Speaker array 26 may include various types of speakers, such as piezoelectric speakers that are commonly incorporated into computing devices. In various examples in accordance with aspects of this disclosure, speaker array 26 may include cone drivers and passive radiators. In some examples that include passive radiators, the passive radiators may be horizontally opposed, and move out of phase with each other to help dampen/cancel vibrations due to low frequencies output by the passive radiators.

Speaker array 26 may, in some examples, include separate speakers with the same audio output capabilities, such as a pair or an array of full-range speakers. In other examples, speaker array 26 may include at least two speakers with different audio output capabilities, such as two or more of subwoofers, woofers, mid-range speakers, or tweeters. Speaker array 26 may incorporate speakers with different types of connectivity capabilities, such as wired speakers, or wireless speakers, or both.

Audiovisual telephonic system 12A may include driver logic configured to drive one or more of the speakers of speaker array 26 to render audio data to participants 30A, 30B. The driver logic of audiovisual telephonic system 12A may provide speaker feeds to one or more of the individual speakers of speaker array 26, and the receiving speakers may render the audio data provided in the feeds as audible sound data. The driver logic of audiovisual telephonic system 12A may configure the speaker feeds on a multi-channel basis based on a geometry according to which the speakers of speaker array 26 are arranged.

In this way, audiovisual telephonic system 12A may leverage microphone array 22 and speaker array 26 to assist participants 30A, 30B in participating in the video conferencing session shown in FIG. 1 over communications channel 16. Audiovisual telephonic system 12A uses microphone array 22 to enable participants 30A, 30B to provide audio data (spoken words/sounds, background music/audio, etc.) to accompany the video feed captured by image capture system 20A. correspondingly, audiovisual telephonic system 12A uses speaker array 26 to render audio data that accompanies the moving/still image data shown in display content 24.

FIG. 1B is an illustration depicting further details of audiovisual telephonic system 12A of FIG. 1A and its surrounding environment. The relative positions of participants 30A and 30B with respect to each other and with respect to audiovisual telephonic system 12A are different in FIG. 1B as compared to FIG. 1A. Audiovisual telephonic system 12A is configured according to aspects of this disclosure to manipulate audio input data and audio output data to accommodate these positional changes, as described below in greater detail.

Although described with respect to the design illustrated in FIGS. 1A and 1B, the configurations of this disclosure are also applicable to other designs of audiovisual telephonic systems 12, such as may be for smart speaker-type applications, or any other device in which a portion of the device that includes or is fixedly attached to the microphone array is movable relative to the respective speaker or speaker array. For example, the configurations of this disclosure may be applicable to laptop computer designs in which the speaker(s) output audio data via a slot near the hinge connecting the monitor to the keyboard and in which the microphone is positioned above the display portion of the monitor.

Microphone array 22 captures audio input data 14 which, in the particular use case scenario shown in FIG. 1B, includes speech input provided by participant 30A. Audio input data 14 might be augmented by ambient sound(s) captured by microphone array 22. For example, microphone array 22 may detect speech or movement-related sounds (footsteps, etc.) emitted by participant 30B, and/or sounds emitted by other components of audiovisual telephonic system 12A, and/or other background noise that occurs within audible range of microphone array 22. In some non-limiting examples, microphone array 22 may represent a four-microphone array, with at least three of the four individual microphones being mounted fixedly to a top edge or panel of display device 18A. In one such example, the four individual microphones of microphone array 22 may be arranged in the general shape of a truncated pyramid array.

Speaker array 26 renders audio output data 28 at the physical location of audiovisual telephonic system 12A. Audio output data 28 may include (or in some cases, consist entirely of) audio data received by audiovisual telephonic system 12A over communications channel 16 as part of the active video conferencing session with audiovisual telephonic system 12B. For instance, audio output data 28 may include audio data that accompanies the video feed that is rendered for display in the form of display content 24. In some instances, even if the video feed is interrupted, causing display content 24 to reflect a freeze frame or default picture, audiovisual telephonic system 12A may continue to drive speaker array 26 to render audio output data 28, thereby maintaining the audio feed of the currently active video conferencing session.

As shown in FIG. 1B, display device 18A is mounted on base 34 by way of stand 32, thereby providing audiovisual telephonic system 12A with upright display capabilities. It will be appreciated that stand 32, base 34, and other components of audiovisual telephonic system 12A are not drawn to scale for all possible use case scenarios in accordance with this disclosure, and that the aspect ratio shown in FIG. 1B represents only one of many different aspect ratios that are compatible with the configurations of this disclosure. In another example, stand 32 and base 34 may be substantially integrated, and have little to no difference in width/circumference.

Stand 32 may be equipped with mount hardware (e.g., at the interface of stand 32 and display device 18A and/or at the interface of stand 32 and base 34) with one or more degrees of freedom with respect to movement capabilities. The degrees of freedom may include rotational capabilities around the X axis (providing pitch or “tilt” movement), the Y axis (providing yaw or “swivel” capabilities), and/or Z axis (providing roll capabilities), and/or translational capabilities alone the X axis, Y axis, and/or Z axis.

Participants 30A and 30B may adjust the position and orientation of display device 18A using the degrees of freedom provided by the mount described above. For instance, one of participants 30A or 30B may temporarily position display device 18A in a such a way that display content 24 is visible to him/her. At the particular time instance shown in FIG. 1B, display device 18A is positioned for participant 30A to view display content 24 in a convenient way. In other examples, positioning of display device 18A may be powered, and may be controlled by audiovisual telephonic system 12A based on one or more parameters, e.g., to position display device 18A and microphone array 22 toward a currently speaking participant of participants 30A and 30B.

Audiovisual telephonic system 12A is configured according to aspects of this disclosure to modify audio input data 14 before transmission over communications channel 16 and/or to drive speaker array 26 to render audio output data 28 in a modified way in response to the position/orientation of display device 18A. According to some examples of this disclosure, DSP logic of audiovisual telephonic system 12 may modify one or more of the capture, the selection, or the processing of individual audio signals of audio input data 14 based on the position/orientation of display device 18A. For example, the DSP logic of audiovisual telephonic system 12A may modify audio input data 14 in a way that fully or partially compensates an angular offset to the horizon caused by rotation angle(s) of the mount of stand 32.

In examples in which microphone array 22 is not configured to capture directional information at the sound source, the DSP logic of audiovisual telephonic system 12A may be configured to implement a virtual directional microphone with a first direction toward a sound source (participant 30A in this instance). Because of the physical attachment of microphone array 22 to display device 18A, any changes to the relative position/orientation of display device 18A with respect to the sound source at the location of participant 30A may also change the relative position/orientation of one or more of the individual microphones of microphone array 22 with respect to the sound source at the location of participant 30A.

If the DSP logic of audiovisual telephonic system 12A detects a change in the relative orientation/position of display device 18A (e.g., based on information received directly or indirectly from sensing hardware of audiovisual telephonic system 12A), the DSP logic may modify the direction of the virtual directional microphone to compensate for the detected change in the rotation angle of display device 18A. In some examples, the DSP logic may use the data describing the rotation angle of display device 18A to constrain the search space to which to direct the virtual microphone (e.g., in the direction of participant 30A). For instance, the DSP logic may constrain the search space to a range of vertical angles with respect to the horizontal, where the range of vertical angles in based on expected locations of the head of participant 30A, and thus, the expected locations from which speech may originate.

In other examples, the DSP logic may drop or disregard signals received from those individual microphones of microphone array 22 that are positioned such that they detect audio data originating primarily from sound sources other than the physical location of participant 30A. For example, the DSP logic may drop or disregard signals received from those individual microphones of microphone array 22 that detect sounds emanating from the location of participant 30A only as ambient sound, or do not detect sounds emanating from the location of participant 30A at all.

According to some of the configurations of this disclosure, driver logic of audiovisual telephonic system 12A may adjust the driver signals provided to speaker array 26 based on the relative orientation (e.g., based on rotation angle) of display device 18A with respect to base 34 or another stationary component of audiovisual telephonic system 12A. Again, in this example, speaker array 26 is physically affixed to (or encased within) base 34. For at least some positions and/or orientations of display device 18A, the display device 18A may at least partially occlude the direct path of soundwaves from speaker array 26 to the listener (in this case, participant 30A and potentially participant 30B).

As such, display device 18A (e.g., a back of display device 18A) may act as a reflective, dispersive, and/or absorptive surface that interacts with sound output by speaker array 26 and affects the sound heard by participant 30A and/or 30B. As the orientation of display device 18A changes, the interaction between display device 18A and the sound output by speaker array 26 may change. The driver logic of audiovisual telephonic system 12A may compensate for changing surfaces (e.g., the surfaces of display device 18A) located between speaker array 26 and the listener (in this case, participant 30A and potentially participant 30B).

For example, the driver logic of audiovisual telephonic system 12A may compensate for audio quality changes (e.g., frequency, amplitude, and/or phase changes) occurring due to a reflective, dispersive, and/or absorptive back surface of the display device 18A being between speaker array 26 and the listener(s). In some use cases, the driver logic of audiovisual telephonic system 12A may additionally or alternatively adjust the driver signals such that speaker array 26 renders audio output data 28 in a way that targets participant 30A (and in this particular example, participant 30B as well).

For example, the driver logic of audiovisual telephonic system 12A may map the relative position/orientation of display device 18A (e.g., with reference to base 34) to a set of equalization parameters, and drive speaker array 26 to render audio output data 28 according to the set of equalization parameters that map to the relative position/orientation of display device 18A. To map an equalization parameter set to the relative position/orientation angle of display device 18A, the driver logic of audiovisual telephonic system 12A may select the parameter set from a superset of available equalization parameters.

Speaker array 26 (or a subset of the speakers thereof) may in turn render audio output data 28 according to the set of equalization parameters. In some examples, to map the rotation angle of display device 18A to the appropriate set of equalization parameters, the driver logic of audiovisual telephonic system 12A utilizes a lookup table that provides a one-to-one or many-to-one mapping of different rotation angles to respective (predetermined) sets of equalization parameters.

In other examples, to map the rotation angle of display device 18A to the appropriate set of equalization parameters, the driver logic of audiovisual telephonic system 12A applies a finite element solution or a specific function that determines the equalization parameter set for a given rotation angle of display device 18A. In other examples still, to map the rotation angle of display device 18A to the appropriate set of equalization parameters, the driver logic of audiovisual telephonic system 12A may apply an artificial intelligence (AI) or machine learning (ML) model trained using a mapping of rotation angles to respective equalization parameter sets to predict the equalization parameter set that suits the present rotation angle of display device 18A.

In this way, the driver logic of audiovisual telephonic system 12A may drive speaker array 26 to render audio output data 28 in a way that is customized to the present position and orientation of display device 18A. In some instances, the driver logic of video conferencing device 12A may compensate for effects caused by factors external to speaker array 26, such as of one or more of the individual speakers caused by repositioning or rotation of display device 18A with respect to base 34.

According to some examples of this disclosure, DSP logic of audiovisual telephonic system 12A may edit the capture parameters and/or the preprocessing parameters of audio input data 14 prior to transmission over communications channel 16 as part of the active video conferencing session. For example, the DSP logic of audiovisual telephonic system 12A may manipulate audio input data 14 to compensate for the angular offset of microphone array 22 to the horizon (e.g. as shown by the azimuth and/or the altitude of microphone array 22).

In the example shown in FIGS. 1A & 1B, the DSP logic of audiovisual telephonic system 12A may determine the azimuth and/or altitude (collectively, the “horizontal coordinate angles”) of microphone array 22 based on the orientation of display device 18A (e.g., based on the rotation angle of display device 18A, to which microphone array 22 is fixedly attached, with respect to stand 32 and/or base 34). That is, the DSP logic of video conferencing device 12A may leverage the physical attachment of microphone array 22 to display device 18A in the illustrated designs to extrapolate the horizontal coordinate angles of microphone array 22 from tracked and known orientation information for display device 18A.

In examples in which the DSP logic of audiovisual telephonic system 12A does not receive directional information directly from microphone array 22 or associated hardware, the DSP logic may be configured to generate a virtual directional microphone with a particular directionality (e.g., facing toward the sound source at the current location of participant 30A). For instance, the DSP logic of video audiovisual telephonic device 12A may constrain the search space with respect to audio input data 14 only to those individual microphone(s) that are optimally suited to capture input data from the sound source without ambient sounds (zero-energy ambient sound data) or with minimal ambient sounds (low-energy or negligible-energy ambient sound data).

Based on microphone array 22 being affixed to display device 18A in the device designs illustrated in FIGS. 1A & 1B, the DSP logic of audiovisual telephonic system 12A may estimate or determine the relative position of microphone array 22 or the various individual microphones of microphone array 22 based on the rotation angle and/or translational position of display device 18A. The DSP logic of audiovisual telephonic system 12A may dynamically update the search space constraints with respect to microphone array 22 in response to detecting any changes in the orientation or rotation angles of display device 18A.

That is, the DSP logic of audiovisual telephonic system 12A may modify the direction of the virtual directional microphone based on the detected changes in the orientation of display device 18A in real time (e.g., with no lag time) or substantially in real time (e.g., with little or negligible lag time). By dynamically modifying the direction of the virtual directional microphone to track rotational angle changes with respect to display device 18A, the DSP logic of audiovisual telephonic system 12A may compensate for changes in the rotation angle(s) of display device 18A with respect to conditioning or preprocessing audio input data 14 before transmission over communications channel 16 as part of the active video conferencing session.

According to some examples of this disclosure, audiovisual telephonic system 12A may incorporate acoustic echo cancellation logic. The acoustic echo cancellation logic may be implemented as part of other processing circuitry of audiovisual telephonic system 12A, or as part of the DSP logic that implements the manipulation of audio input data 14 described above, or may represent dedicated hardware or firmware unit(s) of audiovisual telephonic system 12A.

The acoustic echo cancellation logic of audiovisual telephonic system 12A directs an adaptive filter algorithm to search for coherence among signals. The acoustic echo cancellation logic of audiovisual telephonic system 12A detects or predicts one or more effects that audio output data 28 may have on audio input data 14. The acoustic echo cancellation logic of audiovisual telephonic system 12A manipulates the capture and/or preprocessing of audio input data 14 prior to transmission over communications channel as part of the active video conferencing session based on these detected or predicted effects.

Again, according to the device designs illustrated in FIGS. 1A and 1B, speaker array 26 is either encased in or otherwise physically (and proximately) coupled to base 34. Based on the distance and relative positioning between display device 18A (which substantially incorporates microphone array 22 or to which microphone array 22 is fixedly attached) and base 34, the echo/feedback effects of audio output data 28 on audio input data 14 may vary. As such, the position and orientation (e.g., rotation angles) of display device 18A affects the direct path and echo paths between the speaker array 26 and microphone array 22. Based on these design properties of audiovisual telephonic system 12A, the echo cancellation logic may initiate, adapt, or readapt an adaptive filter based on the rotation angle of display device 18A. The adaptive filter can be implemented in digital logic and is configured to detect coherence among audio signals and reduce or eliminate redundancies based on any detected coherences.

The acoustic echo cancellation logic of audiovisual telephonic system 12A may also effectuate changes to the adaptive filter dynamically, such as in (relatively short turnaround or substantially immediate) response to detecting changes in the position or orientation of display device 18A. That is, based on the device design of audiovisual telephonic system 12A shown in FIGS. 1A & 1B, the acoustic echo cancellation logic may predict audio signal coherence based on changes in the rotation angle of display device 18A.

Because the rotation angle of display device 18A affects the distance and relative angular information between each individual speaker of speaker array 26 and each individual microphone of microphone array 22, the acoustic echo cancellation logic may map the rotation angle(s) of display device 18A to a set of echo cancellation parameters that compensate for any feedback that audio output data 28 may cause with respect to audio input data 14, in view of the present relative positioning between speaker array 26 and microphone array 22. As long as display device 18A is positioned/oriented statically in a particular detected way, the acoustic echo cancellation logic applies the corresponding set of echo cancellation parameters to configure the adaptive filter.

A given set of echo cancellation parameters may determine how the adaptive filter constrains (if at all) the search space for identifying coherence timings, for coherence thresholds with respect to audio signal similarity, etc. While described herein as implementing acoustic echo cancellation as an example, it will be appreciated that audiovisual telephonic system 12A may compensate for feedback or loopback effects of audio output data 28 with respect to audio input data 14 in other ways, such as by implementing acoustic echo suppression logic. In some examples, audiovisual telephonic system 12A may implement other refinement techniques with respect to audio input data 14, such as active noise cancellation (ANC) to cancel out persistent noises, such as those emanating from ambient devices (air conditioners, etc.) or from other components of audiovisual telephonic system 12A itself (CPU cooling fans, etc.).

Various techniques of this disclosure are described above as being performed in response to detecting positional and/or orientational data (or changes thereto) with respect to display device 18A. In various examples, audiovisual telephonic system 12A may be equipped with various components and/or sensor hardware for determining the orientation (and changes thereto) of display device 18A about one or more of the X, Y, or Z axes, with the aid of the mount hardware at the interface of display device 18A and stand 32. The sensor hardware may include one or more of an accelerometer, a position encoder, a gyroscope, a motion sensor, etc. (and may be supplemented by additional repurposing of microphone array 22 and/or image capture system 20A). One or more components of audiovisual telephonic system 12A are configured to analyze sensor data generated by and received from the sensor hardware to determine the current orientation of display device 18A.

FIG. 2 is a block diagram illustrating an example of a telephonic system that implements one or more of the display position-based audio data manipulation techniques of the disclosure. While a number of different devices may be configured to perform the techniques described herein, FIG. 2 is described with reference to the non-limiting example of audiovisual telephonic system 12A of FIGS. 1A & 1B. In the example shown in FIG. 2, audiovisual telephonic system 12A includes memory 42 and processing circuitry 28 communicatively connected to memory 42. In some examples, memory 42 and processing circuitry 44 may be collocated to form a portion of an integrated circuit, or may be integrated into a single hardware unit, such as a system on a chip (SoC).

Processing circuitry 44 may include, be, or be part of one or more of a multi-core processor, a controller, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field-programmable gate array (FPGA), processing circuitry (e.g., fixed function circuitry, programmable circuitry, or any combination of fixed function circuitry and programmable circuitry) or equivalent discrete logic circuitry or integrated logic circuitry. Memory 42 may include any form of memory for storing data and executable software instructions, such as random-access memory (RAM), read only memory (ROM), programmable read only memory (PROM), erasable programmable read only memory (EPROM), electronically erasable programmable read only memory (EEPROM), and flash memory.

Memory 42 and processing circuitry 44 provide a computing platform for executing operation system 36. In turn, operating system 36 provides a multitasking operating environment for executing one or more software components installed on audiovisual telephonic system 12A. Software components supported by the multitasking operating environment provided by operating system 36 represent executable software instructions that may take the form of one or more software applications, software packages, software libraries, hardware drivers, and/or Application Program Interfaces (APIs). For instance, software components installed on audiovisual telephonic system 12A may display configuration menus on display device 18A for eliciting configuration information.

Processing circuitry 44 may connect via input/output (I/O) interface 40 to external systems and devices, such as to display 12A, image capture system 20A, microphone array 22, speaker array 26, and the like. I/O interface 40 may also incorporate network interface hardware, such as one or more wired and/or wireless network interface controllers (NICs) for communicating via communication channel 16, which may represent a packet-switched network.

Telephonic application 38 implements functionalities that enable participation in a communication session over communication channel 16 using audiovisual telephonic system 12A as end-user hardware. Telephonic application 38 includes functionality to provide and present a communication session between two or more participants 30. For example, telephonic application 38 receives an inbound stream of audio data and video data from audiovisual telephonic system 12B and presents, via I/O interface 40, audio output data 28 and corresponding video output data to participant 30A via speaker array 26 and display device 18A, respectively. Similarly, telephonic application 38 captures audio input data 14 using microphone array and image data using image capture system 20A, and transmits audio/video data processed therefrom to audiovisual telephonic system 12B for presenting to participant 30C. Telephonic application 38 may include, for example, one or more software packages, software libraries, hardware drivers, and/or Application Program Interfaces (APIs) for implementing the video conferencing session.

Telephonic application 38 may process image data received via I/O interface 40 from image capture system 20A and audio input data 14 received from microphone array 22, and may relay the processed video and audio feeds over communications channel 16 to other end-user hardware devices connected to the in-progress conferencing session (which, in the example of FIG. 1A, is a video conferencing session). Additionally, video conferencing application 38 may process video and audio feeds received over communications channel 16 as part of the video conferencing session, and may enable other components of audiovisual telephonic system 12A to output the processed video data via display device 18A and the processed audio data via speaker array 26 (as audio output data 28) using I/O interface 40 as an intermediate relay.

Audiovisual telephonic system 12A may include a rendering engine configured to construct visual content to be output by display device 18A, using video data received over communications channel 16 and processed by telephonic application 38. In some examples, the rendering engine constructs content to include multiple video feeds, as in the case of picture-in-picture embodiments of display content 24. In the example of FIGS. 1A & 1B, the rendering engine constructs display content 24 to include the video stream reflecting video data received from video presence device 18B over communications channel 16. In other examples, the rendering engine may overlay data of a second video stream (in the form of a video feedthrough) reflecting video data received locally from image capture system 20A. In some examples, the rendering engine may construct display content 24 to include sections representing three or more video feeds, such as individual video feeds of two or more remote participants.

As shown in FIG. 2, audiovisual telephonic system 12A includes sensor hardware 58. Sensor hardware 58 may incorporate one or more types of sensors, such as one or more of an accelerometer, a position encoder, a gyroscope, a motion sensor, and the like. Various components of audiovisual telephonic system 12A may use data generated by sensor hardware 58 to determine the current orientation (and changes from prior positions/orientations) of display device 18A. As microphone array 22 is fixedly attached to display device 18A, components of audiovisual telephonic system 12A may use data from sensor hardware 58 to determine a current orientation of microphone array 22. Sensor hardware 58 may perform other sensing-related functionalities as well, in addition to monitoring the position and orientation of display device 18A.

In the example shown in FIG. 2, audiovisual telephonic system 12A includes driver logic 46, DSP logic 48, and acoustic echo cancellation logic 50. Any of driver logic 46, DSP logic 48, or acoustic echo cancellation logic 50 may be implemented in hardware or as hardware implementing software. One or more of driver logic 46, DSP logic 48, or acoustic echo cancellation logic 50 may be implemented in an integrated circuitry, such as by being collocated with processing circuitry 44 and memory 42, or in another integrated circuit by being collocated with different memory and processing hardware.

Driver logic 46 may modify driver signals provided via I/O interface 40 to speaker array 26 based on the orientation of display device 18A, as determined using data obtained from sensor hardware 58. For example, driver logic 46 may use a mapping of the rotation angle of display device 18A to a particular parameter set available from equalization parameters 52. Equalization parameters 52 may include one or more of amplitude (e.g., expressed as function of frequency), a high pass filter, a low pass filter, notch filters, a Q factor of one or more filters, a filter amplitude, a phase, etc.

In turn, driver logic 46 may drive speaker array 26 according to the parameter set selected from equalization parameters 52 based on the mapping to the present relative orientation/position of display device 18A with respect to other stationary components of audiovisual telephonic system 12, such as base 34. In this way, driver logic 46 may use equalization parameters 52 to drive speaker array 26 such that audio output data 28 is rendered in a customized way with respect to the present position and orientation of display device 18A.

DSP logic 48 may select parameter sets from audio capture parameters 54 to customize the capture parameters and/or the preprocessing of audio input data 14 prior to transmission over communications channel 16 as part of the active video conferencing session. While referred to herein as “capture” parameters, it will be appreciated that audio capture parameters 54 may also include data that DSP logic 48 can use to configure the preprocessing of audio input data 14 prior to transmission over communications channel 16 with respect to the active conferencing session (e.g., a video conferencing session, as shown in the example of FIG. 1A). For example, DSP logic 48 may select, from audio capture parameters 54, a parameter set that, when applied, compensates for the horizontal coordinate angles of microphone array 22, as indicated by the orientation of display device 18A as detected using data received from sensor hardware 58.

In various examples, DSP logic 48 may be configured to generate a virtual directional microphone with a particular directionality, based on the parameter set selected from audio capture parameters 54. DSP logic 48 may extrapolate the relative positions of the various individual microphones of microphone array 22 based on the rotation angle of display device 18A, as detected using data obtained from sensor hardware 58. In various examples, DSP logic 48 may dynamically update the parameter set selection form audio capture parameters 54 in response to sensor hardware 58 indicating any changes in the position/orientation or rotation angles of display device 18A, to which microphone array 22 is fixedly attached in some examples.

Acoustic echo cancellation logic 50 directs an adaptive filter algorithm to search for coherence among signals received via I/O interface from microphone array 22. Acoustic echo cancellation logic detects or predicts one or more effects that audio output data 28 may have on audio input data 14. Based on these detected or predicted effects, acoustic echo cancellation logic 50 selects parameter sets from echo cancellation parameters 56, and configures the adaptive filter using the parameter set selected for the present echo detection/prediction information. Acoustic echo cancellation logic 50 may also readapt the adaptive filter dynamically, such as in response to sensor hardware 58 providing data indicating changes in the position or orientation of display device 18A.

Acoustic echo cancellation logic 50 may map some or all possible rotation angles of display device 18A to respective parameter sets included in echo cancellation parameters 56. Each parameter set may compensate for feedback or interference that audio output data 28 causes with respect to audio input data 14 at a given rotation angle of display device 18A as detected using sensor hardware 58. Acoustic echo cancellation logic 50 may apply a given set of echo cancellation parameters to configure the adaptive filter to constrain the search space for identifying coherence timings, for coherence thresholds with respect to audio signal similarity, etc.

In some examples, one or more of equalization parameters 52, audio capture parameters 54, or echo cancellation parameters 56 may be stored locally at audiovisual telephonic system 12A. In these examples, audiovisual telephonic system 12A may include one or more storage devices configured to store information within audiovisual telephonic system 12A during operation. The storage device(s) of audiovisual telephonic system 12A, in some examples, are described as a computer-readable storage medium and/or as one or more computer-readable storage devices, such as a non-transitory computer-readable storage medium, and various computer-readable storage devices.

The storage device(s) of audiovisual telephonic system 12A may be configured to store larger amounts of information than volatile memory, and may further be configured for long-term storage of information. In some examples, the storage device(s) of audiovisual telephonic system 12A include non-volatile storage elements, such as solid state drives (SSDs), magnetic hard discs, optical discs, floppy discs, flash memories, or forms of electrically programmable memories (EPROM) or electrically erasable and programmable (EEPROM) memories.

Audiovisual telephonic system 12A may also include capabilities to receive, access, and write to various types of removable non-volatile storage devices, such as USB flash drives, SD cards, and the like.

In some examples, one or more of equalization parameters 52, audio capture parameters 54, or echo cancellation parameters 56 may be stored at an external (e.g., remote) device, such as a real or virtual server to which audiovisual telephonic system 12A is communicatively coupled via network interface card hardware of I/O interface 40. In these examples, one or more of driver logic 46, DSP logic 48, or acoustic echo cancellation logic 50 may access and download parameter information on an as-needed basis using over a packet-switched network via network interface hardware of I/O interface 40. The real or virtual server may be hosted at a data center, server farm, server cluster, or other high storage capacity facility.

FIG. 3 is a flowchart illustrating an example of a display position-based audio rendering process that audiovisual telephonic system 12A may perform, in accordance with aspects of this disclosure. Driver logic 46 and/or other components of audiovisual telephonic system 12A may perform process 60 to optimize signal processing so that audio data received over communications channel 16 as part of the active conferencing session (e.g., a video conferencing session with audio accompaniment, as in the example of FIG. 1A) is rendered to participants 30A and/or 30B with enhanced rendering properties.

For instance, process 60 may enable driver logic 46 to modify audio output data 28, such as by amplifying the output of certain speakers of speaker array 26, damping the output of other speakers of speaker array 26, directing audio output 28 in an optimal direction as determined from the orientation of display device 18A, or in one or more other ways. In some examples, driver logic 46 may compensate for occlusion of audio output data 28, such as by compensating for occlusion occurring due to a reflective, dispersive, and/or absorptive back surface of the display device 18A being positioned between speaker array 26 and the listener(s).

Process 60 may begin when I/O interface 40 receives audio data of a video conferencing session over communications channel 40 (62). While described with respect to audio data of a conferencing session as an example, it will be appreciated that process 60 may be applied to any audio data to be rendered by speaker array 26 as part of audio output data 28, such as music or podcast data, etc. played using audiovisual telephonic system 12 while compensating for the presence and position of display device 18A. Audio data received as part of the active video conferencing session is also referred to herein as a “downlink signal” from the perspective of audiovisual telephonic system 12A. Using data received from sensor hardware 58, driver logic 46 may detect the orientation of display device 18A (64). For example, an accelerometer of sensor hardware 58 may provide data indicating the relative orientation of display device 18A in comparison to stationary components of audiovisual telephonic system 12A (e.g., base 34), whether by stasis (remaining in the last detected position and orientation) or by movement (changing orientation from the last detected position-orientation combination).

Driver logic 46 may select an equalization parameter set from equalization parameters 52 based on the orientation of display device 18A as determined from the data received from sensor hardware 58 (66). In various examples, driver logic 46 may use one or more of a lookup table, a finite element solution, a specific function, an AI model trained with equalization parameter set-to-position/orientation mappings, or a ML model trained with equalization parameter set-to-position/orientation mappings to select, from equalization parameters 52, the equalization parameter set that corresponds to the most recently detected position-orientation combination of display device 18A.

Driver logic 46 may drive speaker array 26 based on the selected equalization parameter set (68). For example, driver logic 46 may generate driver signals that modify the downlink signal to optimize spectral, level, and directional response from one or more speakers of speaker array 26. Speaker array 26 may render audio output data 28 based on the driver signals received from driver logic 26 and generated based on the selected equalization parameter set (72). The driver signals may compensate for various effects (e.g., shadowing) or may energize different subsets of the speakers of speaker array 26, depending on the position and orientation of display device 18A.

FIG. 4 is a flowchart illustrating an example of a display position-based audio capture process that audiovisual telephonic system 12A may perform, in accordance with aspects of this disclosure. DSP logic 48 and/or other components of audiovisual telephonic system 12A may perform process 70 to optimize signal processing so that audio data relayed using I/O interface 40 over communications channel 16 as part of the active video conferencing session is rendered to participant 30C with relevant audio signals amplified (having greater acoustic energy) and ambient sounds damped (having little to no acoustic energy).

For instance, process 70 may enable driver logic 46 to modify the microphone response of one or more individual microphones of microphone array 22 to account for changes in the microphone path(s) from the speaker (e.g., participant 30A) to the relevant individual microphones of microphone array 22. For example, DSP logic 48 compensate for increased shadowing of individual microphones of microphone array 22 by applying a corresponding equalization filter, or may select audio input signals from particular subsets of the individual microphones of microphone array 22 based on the physical configuration of microphone array 22 and the position-orientation combination of display device 18A as detected by sensor hardware 58.

Process 70 may begin when, using data received from sensor hardware 58, DSP logic 48 may detect the orientation of microphone array 22 (74). For example, an accelerometer of sensor hardware 58 may provide data indicating the orientation of display device 18A, whether by stasis (remaining in the last detected position and orientation) or by movement (changing orientation from the last detected position-orientation combination). Because microphone array 22 is fixedly attached to display device 18A in the implementations shown in FIGS. 1A and 1B, DSP logic 48 may determine the orientation based on the orientation of display device 18A. for example, DSP logic 48 may determine the orientation of display device 18A in a relative sense with respect to base 34, which does not move with display device 18A if display device 18A is rotated and moved translationally using the mounting hardware that couples display device 18A to stand 32.

DSP logic 48 may set one or more audio capture parameters based on the orientation of microphone array 22 as determined from the data received from sensor hardware 58 (76). In various examples, DSP logic 48 may use one or more of a lookup table, a finite element solution, a specific function, an AI model trained with audio capture parameter-to-position/orientation mappings, or a ML model trained with audio capture parameter-to-position/orientation mappings to select, from audio capture parameters 54, the particular audio capture parameter(s) corresponding to the most recently detected position-orientation combination of display device 18A.

DSP logic 48 may capture and/or preprocess the raw input signals detected by microphone array 22 to form audio input data 14 according to the audio capture parameter(s) set at step 76 (78). In some examples, DSP logic 48 may generate a virtual directional microphone using digital logic, such as by constraining a signal search space to signals received via only particular individual microphones of microphone array 22. In some examples, DSP logic 48 may preprocess the raw signals received from microphone array based on the audio capture parameter(s), such as by amplifying signals from certain individual microphones (e.g., via electrical gain or boost), and/or by damping signals from other individual microphones (reducing or entirely eliminating the acoustic energies of these signals) prior to transmission over communications channel 16.

In turn, DSP logic 48 may transmit audio input data 14 using network card hardware of I/O interface 40 over communications channel 16 as part of the active video conferencing session (82). The preprocessed version of audio input data transmitted as part of video conferencing session is also referred to herein as an “uplink signal” from the perspective of audiovisual telephonic system 12A. Process 70 illustrates one of multiple aspects of this disclosure by which audiovisual telephonic system 12A integrates sensor information and signal processing modules to modify (to potentially optimize) audio data on the uplink channel and other processing circuitry used to obtain information from the sensor signals.

FIG. 5 is a flowchart illustrating an example of a display position-based echo cancellation process that audiovisual telephonic system 12 may perform, in accordance with aspects of this disclosure. Acoustic echo cancellation logic 50 and/or other components of audiovisual telephonic system 12A may perform process 80 to compensate for echo path interference with respect to audio input data 14 so that audio data relayed using I/O interface 40 over communications channel 16 as part of the active video conferencing session is rendered to participant 30C with reduced feedback or no feedback from audio output data 28. Echo path changes (e.g., changes stemming from changes of the relative physical positioning between microphone array 22 and speaker array 26 and/or the surrounding environment of audiovisual telephonic system 12A) may cause a variety of data precision diminishments with respect to audio input data 14. Acoustic echo cancellation logic 50 may optimize the convergence of audio signals received from microphone array 22 to compensate for or emphasize various conditions, such as double talk, single talk, volume levels, ambient noise conditions, etc.

Process 80 may begin when, using data received from sensor hardware 58, acoustic echo cancellation logic 50 may detect the orientation of display device 18A (84). For example, an accelerometer of sensor hardware 58 may provide data indicating the orientation of display device 18A, whether by stasis (remaining in the last detected position and orientation) or by movement (changing orientation from the last detected position-orientation combination). Acoustic echo cancellation logic 50 may determine the relative position between microphone array 22 and speaker array 26 based on the position-orientation combination of display device 18A as detected by sensor hardware 58 (86).

Acoustic echo cancellation logic 50 may configure an adaptive filter based on the relative position determined between microphone array 22 and speaker array 26 (88). Acoustic echo cancellation logic 50 may use the adaptive filter configured based on the relative position between microphone array 22 and speaker array 26 to perform acoustic echo cancellation on audio input data 14 (92). In turn, network interface hardware of I/O interface 40 may transmit the echo-cancelled version of audio input data 14 over communications channel 16 as part of the active video conferencing session (94).

In performing any of processes 60, 70, or 80, audiovisual telephonic system 12A invokes sensor hardware 58 to detect the physical configuration of aspects of the overall device, such as the tilt of display device 18A, of side panels, or of other parts of audiovisual telephonic system 12A or its peripherals. In turn, sensor hardware 58 directly or indirectly provides the information to one or more of the signal processing logic modules shown in FIG. 2. Sensor hardware 58 and the various signal processing logic modules enable device configuration awareness. In turn, the signal processing logic modules use the sensors information and device configuration awareness are to generate signal processing that optimizes audio data received by or originating from audiovisual telephonic system 12A.

In various examples, the signal processing logic modules modify (e.g., amplify, filter, direct towards a particular “sweet spot”, etc.) uplink or downlink audio data to improve data precision and (in cases of signal pruning) reduce bandwidth consumption. The various techniques discussed with respect to FIGS. 35 integrate sensor information and processing modules, including signal processing modifying audio data on the uplink and downlink channels, and other processing logic used to obtain information from the sensor signals.

The techniques described in this disclosure may be implemented, at least in part, in hardware, software, firmware or any combination thereof. For example, various aspects of the described techniques may be implemented within one or more processors, including one or more microprocessors, DSPs, application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs), processing circuitry (e.g., fixed function circuitry, programmable circuitry, or any combination of fixed function circuitry and programmable circuitry) or equivalent discrete logic circuitry or integrated logic circuitry, as well as any combinations of such components. The term “processor” or “processing circuitry” may generally refer to any of the foregoing logic circuitry, alone or in combination with other logic circuitry, or any other equivalent circuitry. A control unit comprising hardware may also perform one or more of the techniques of this disclosure.

Such hardware, software, and firmware may be implemented within the same device or within separate devices to support the various operations and functions described in this disclosure. In addition, any of the described units, modules or components may be implemented together or separately as discrete but interoperable logic devices. Depiction of different features as modules or units is intended to highlight different functional aspects and does not necessarily imply that such modules or units must be realized by separate hardware or software components. Rather, functionality associated with one or more modules or units may be performed by separate hardware or software components or integrated within common or separate hardware or software components.

As described by way of various examples herein, the techniques of the disclosure may include or be implemented in conjunction with a video communications system. The techniques described in this disclosure may also be embodied or encoded in a computer-readable medium, such as a computer-readable storage medium, containing instructions. Instructions embedded or encoded in a computer-readable storage medium may cause a programmable processor, or other processor, to perform the method, e.g., when the instructions are executed. Computer readable storage media may include random access memory (RAM), read only memory (ROM), programmable read only memory (PROM), erasable programmable read only memory (EPROM), electronically erasable programmable read only memory (EEPROM), flash memory, a hard disk, a CD-ROM, a floppy disk, a cassette, magnetic media, optical media, or other computer readable media.

As described by way of various examples herein, the techniques of the disclosure may include or be implemented in conjunction with an artificial reality system. As described, artificial reality is a form of reality that has been adjusted in some manner before presentation to a user, which may include, e.g., a virtual reality (VR), an augmented reality (AR), a mixed reality (MR), a hybrid reality, or some combination and/or derivatives thereof. Artificial reality content may include completely generated content or generated content combined with captured content (e.g., real-world photographs). The artificial reality content may include video, audio, haptic feedback, or some combination thereof, and any of which may be presented in a single channel or in multiple channels (such as stereo video that produces a three-dimensional effect to the viewer).

Additionally, in some examples, artificial reality may be associated with applications, products, accessories, services, or some combination thereof, that are, e.g., used to create content in an artificial reality and/or used in (e.g., perform activities in) an artificial reality. The artificial reality system that provides the artificial reality content may be implemented on various platforms, including a head-mounted display (HMD) connected to a host computer system, a standalone HMD, a video conferencing system, a mobile device or computing system, or any other hardware platform capable of providing artificial reality content to one or more viewers.

文章《Meta Patent | Systems, devices, and methods of manipulating audio data based on microphone orientation》首发于Nweon Patent

]]>
Niantic Patent | Low latency datagram-responsive computer network protocol https://patent.nweon.com/26751 Thu, 26 Jan 2023 15:19:46 +0000 https://patent.nweon.com/?p=26751 ...

文章《Niantic Patent | Low latency datagram-responsive computer network protocol》首发于Nweon Patent

]]>
Patent: Low latency datagram-responsive computer network protocol

Patent PDF: 加入映维网会员获取

Publication Number: 20230022262

Publication Date: 2023-01-26

Assignee: Niantic

Abstract

Systems and methods for providing a shared augmented reality environment are provided. In particular, the latency of communication is reduced by using a peer-to-peer protocol to determine where to send datagrams. Datagrams describe actions that occur within the shared augmented reality environment, and the processing of datagrams is split between an intermediary node of a communications network (e.g., a cell tower) and a server. As a result, the intermediary node may provide updates to a local state of a client device when a datagram is labelled peer-to-peer, and otherwise provides updates to the master state on the server. This may reduce the latency of communication and allow users of the location-based parallel reality game to see actions occur more quickly in the shared augmented reality environment.

Claims

What is claimed is:

1.A method, comprising: receiving, at a network node, a datagram from a sending client device that is connected to a shared augmented reality environment, the datagram including a payload and a flag; determining whether the datagram is peer-to-peer based on the flag; responsive to determining that the datagram is peer-to-peer, sending the datagram to one or more other client devices connected to the shared augmented reality environment to update a local state of the shared augmented reality environment at the one or more other client devices in view of the payload; and responsive to determining that the datagram is not peer-to-peer, sending the datagram to a server to update a master state of the shared augmented reality environment at the server in view of the payload.

2.The method of claim 1, wherein the flag is included in a header portion of the datagram.

3.The method of claim 1, wherein determining whether the datagram is peer-to-peer comprises: determining, based on the flag, that the datagram should be sent peer-to-peer; identifying a specific other client device based on a header of the datagram; determining whether the specific other client device is currently connected to the network node; and responsive to the specific client device currently being connected to the network node, determining that the datagram is peer-to-peer.

4.The method of claim 3, wherein the datagram is sent to the server responsive to determining the specific other client device is not currently connected to the network node.

5.The method of claim 1, wherein the local state of the shared augmented reality environment is updated in view of the payload with a latency between 1 millisecond and 20 milliseconds.

6.The method of claim 1, wherein sending the datagram to one or more other client devices comprises: identifying client devices currently connected to the network node and the shared augmented reality environment; and sending the datagram to the identified client devices to update each client device’s local state of the shared augmented reality environment in view of the payload.

7.The method of claim 1, wherein, responsive to determining that the datagram is peer-to-peer, the datagram is also sent to the server.

8.The method of claim 1, wherein determining whether the datagram is peer-to-peer comprises: comparing an indicator in a header portion of the datagram to a list of indicators maintained by the network node; and determining the datagram is peer-to-peer responsive to the indicator being included in the list of indicators.

9.The method of claim 8, wherein the indicator identifies at least one of: an AR session, a user, a device, or a game account.

10.A non-transitory computer-readable storage medium comprising instructions executable by a processor, the instructions comprising: instructions for receiving, at a network node, a datagram from a sending client device that is connected to a shared augmented reality environment, the datagram including a payload and a flag; instructions for determining whether the datagram is peer-to-peer based on the flag; responsive to determining that the datagram is peer-to-peer, instructions for sending the datagram to one or more other client devices connected to the shared augmented reality environment to update a local state of the shared augmented reality environment at the one or more other client devices in view of the payload; and responsive to determining that the datagram is not peer-to-peer, instructions for sending the datagram to a server to update a master state of the shared augmented reality environment at the server in view of the payload.

11.The non-transitory computer-readable storage medium of claim 10, wherein the flag is included in a header portion of the datagram.

12.The non-transitory computer-readable storage medium of claim 10, wherein the instructions for determining whether the datagram is peer-to-peer comprise: instructions for determining, based on the flag, that the datagram should be sent peer-to-peer; instructions for identifying a specific other client device based on a header of the datagram; instructions for determining whether the specific other client device is currently connected to the network node; and responsive to the specific client device currently being connected to the network node, instructions for determining that the datagram is peer-to-peer.

13.The non-transitory computer-readable storage medium of claim 12, wherein the datagram is sent to the server responsive to determining the specific other client device is not currently connected to the network node.

14.The non-transitory computer-readable storage medium of claim 10, wherein the local state of the shared augmented reality environment is updated in view of the payload with a latency between 1 millisecond and 20 milliseconds.

15.The non-transitory computer-readable storage medium of claim 10, wherein the instructions for sending the datagram to one or more other client devices comprise: instructions for identifying client devices currently connected to the network node and the shared augmented reality environment; and instructions for sending the datagram to the identified client devices to update each client device’s local state of the shared augmented reality environment in view of the payload.

16.The non-transitory computer-readable storage medium of claim 10, wherein, responsive to determining that the datagram is peer-to-peer, the datagram is also sent to the server.

17.The non-transitory computer-readable storage medium of claim 10, wherein the instructions for determining whether the datagram is peer-to-peer comprise: instructions for comparing an indicator in a header portion of the datagram to a list of indicators maintained by the network node; and instructions for determining the datagram is peer-to-peer responsive to the indicator being included in the list of indicators.

18.The non-transitory computer-readable storage medium of claim 10, wherein the indicator identifies at least one of: an AR session, a user, a device, or a game account.

19.A network node comprising: a local data store storing a list of client devices connected to the network node; and a routing module configured to perform operations comprising: receiving a datagram addressed to a client device, wherein the datagram includes a payload and a flag; determining whether the client device is in the list; determining, responsive to the client device being in the list, whether the datagram is peer-to-peer based on the flag; responsive to determining that the datagram is peer-to-peer, sending the datagram to the client device to update a local state of a shared augmented reality environment at the client device in view of the payload; and responsive to determining that the datagram is not peer-to-peer, sending the datagram to a server to update a master state of the shared augmented reality environment at the server in view of the payload.

20.The network node of claim 19, wherein the datagram further includes an indicator of whether the datagram is peer-to-peer in a header portion of the datagram, and wherein determining that the datagram is peer-to-peer is further responsive to the indicator.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation of U.S. Nonprovisional application Ser. No. 17/325,137, filed May 19, 2021, which claims the benefit of U.S. Nonprovisional application Ser. No. 16/450,904, filed Jun. 24, 2019, which claims the benefit of Provisional Application No. 62/690,578, filed Jun. 27, 2018, which are incorporated by reference.

FIELD

The present disclosure relates to computer network protocols, and in particular to protocols for providing low-latency wireless communication between devices within physical proximity of each other.

BACKGROUND

Computer networks are interconnected sets of computing devices that exchange data, such as the Internet. Communication protocols such as the User Datagram Protocol (UDP) define systems of rules for exchanging data using computer networks. UDP adheres to a connectionless communication model without guaranteed delivery, ordering, or non-duplicity of datagrams. A datagram is a basic unit for communication and includes a header and a payload. The header is metadata specifying aspects of the datagram, such as a source port, a destination port, a length of the datagram, and a checksum of the datagram. The payload is the data communicated by the datagram. Computing devices communicating using UDP transmit datagrams to one another via the computer network.

Connectionless communication protocols such as UDP generally have lower overhead and latency than connection-oriented communication protocols like the Transmission Control Protocol (TCP), which establish connections between computing devices before transmitting data. However, existing connectionless communication protocols are inadequate for data transfers that require less latency than is accommodated by the existing art. For example, an augmented reality (AR) environment streaming at 60 frames per second (FPS) may require latency an order of magnitude lower than provided by current techniques. In such an AR environment, the frames are spaced at approximately sixteen millisecond intervals, while current network protocols typically provide latency of approximately one hundred milliseconds (or more).

As such, with existing techniques, a user does not interact with the current state of the AR environment, only a recent state. A user using a client to interact with the AR environment over a computer network may interact with an old state of AR positional data. For example, in an AR game, a player may see an AR object at an old location (e.g., where the object was 100 milliseconds previously), while the AR positional data in fact has a new location for the object (e.g. the AR object has been moved by another player). This latency in communication between the client and a server hosting or coordinating the AR game may lead to a frustrating user experience. This problem may be particularly acute where more than one user is participating in the AR game because the latency may cause a noticeable delay between the actions of one player showing up in other players’ views of the AR environment.

SUMMARY

Augmented reality (AR) systems supplement views of the real world with computer-generated content. Incorporating AR into a parallel-reality game may improve the integration between the real and virtual worlds. AR may also increase interactivity between players by providing opportunities for them to participate in shared gaming experiences in which they interact. For example, in a tank battle game, players might navigate virtual tanks around a real-world location, attempting to destroy each other’s tanks. The movement of the tanks may be limited by real-world geography (e.g., the tanks move more slowly through rivers, move more quickly on roads, cannot move through walls, etc.).

Existing AR session techniques involve a server maintaining a master state and periodically synchronizing the local state of the environment at client devices to the master state via a network (e.g., the internet). However, synchronizing a device’s local state may take a significant amount of time (e.g., ˜100 s of milliseconds), which is detrimental to the gaming experience. The player is, in effect, interacting with a past game state rather than the current game state. This problem may be particularly acute where more than one user is participating in the AR game because the latency causes a noticeable delay between the actions of one player showing up in other players’ views. For example, if one player moves an AR object in the world, other players may not see it has moved until one hundred milliseconds (or more) later, which is a human-perceptible delay. As such, another player may try to interact with the object in its previous location and be frustrated when the game corrects for the latency (e.g., by declining to implement the action requested by the player or initially implementing the action and then revoking it when the player’s client device next synchronizes with the server).

This and other problems may be addressed by processing datagrams at an intermediary node (e.g., a cell tower). Latency may be reduced using a peer-to-peer (P2P) protocol that exchanges game updates between clients connected to the same edge node without routing the updates via the game server. For example, using these approaches, latency may be reduced to ˜10 milliseconds or less. Furthermore, this may increase bandwidth availability enabling a greater number of players to share a common AR experience.

In one embodiment, a method for updating a game state of a shared AR environment includes receiving, at an intermediary node, a datagram addressed to a target client device. The datagram includes data regarding an action that occurred in the shared augmented reality environment and an indicator of whether the datagram is P2P. The intermediary node, such as a cell tower, determines whether the datagram is peer-to-peer based on the indicator. If the datagram is P2P, the intermediary node sends the datagram to the target client device to update a local state of the shared AR environment at the target client device in view of the action that occurred. Otherwise, if the datagram is not P2P, the intermediary node sends the datagram to a server to update a master state of the shared AR environment at the server in view of the action that occurred. In further embodiments, the intermediary node also sends some or all of the datagrams it sends to the target client device to the server to update the master shared AR state.

BRIEF DESCRIPTION OF DRAWINGS

FIG. 1 illustrates a computer network in which the disclosed protocol may be used, according to one embodiment.

FIG. 2 illustrates a datagram configured according to the disclosed protocol, according to one embodiment.

FIG. 3 is a block diagram illustrating the cell tower of FIG. 1, according to one embodiment.

FIG. 4 illustrates a process for using a low-latency datagram-responsive computer network protocol, according to one embodiment.

FIG. 5 is a high-level block diagram illustrating an example computer suitable for use within the computer network shown in FIG. 1, according to an embodiment.

DETAILED DESCRIPTION

The figures and the following description describe certain embodiments by way of illustration only. One skilled in the art will readily recognize from the following description that alternative embodiments of the structures and methods may be employed without departing from the principles described. Reference will now be made to several embodiments, examples of which are illustrated in the accompanying figures. Wherever practicable similar or like reference numbers are used in the figures to indicate similar or like functionality. Where elements share a common numeral followed by a different letter, the elements are similar or identical. The numeral alone refers to any one or any combination of such elements.

Example Computing Environment

As disclosed herein, a datagram-responsive computer network protocol (“the disclosed protocol”) may lower computer network latency (e.g., in one embodiment, the latency is ˜10 milliseconds). FIG. 1 illustrates a computer network that communicates using the disclosed protocol, according to one embodiment. The figure illustrates a simplified example using block figures for purposes of clarity. The computer network includes two clients 110, a server 120, and a cell tower 130. In other embodiments the computer network may include fewer, additional, or other components, such as additional clients 110, servers 120, cell towers 130, or other network nodes. For example, the computer network may be a local area network (LAN) using one or more WiFi routers as network nodes rather than a cell tower 130.

A client 110 is a computing device such as a personal computer, laptop, tablet computer, smartphone, or so on. Clients 110 can communicate using the disclosed protocol. The server 120 is similarly a computing device capable of communication via the disclosed protocol. Clients 110 may communicate with the server 120 using the disclosed protocol, or in some embodiments may use a different protocol. For example, clients 110 may communicate with one another using the disclosed protocol but with the server 120 using TCP. In an embodiment, each client 110 includes a local AR module and the server 120 includes a master AR module. Each local AR module communicates AR data to local AR modules upon other clients 110 and/or the master AR module upon the server 120.

The cell tower 130 is a network node that serves as an intermediary node for end nodes such as clients 110. As described above, in other embodiments the computer network may include other network nodes replacing or in addition to a cell tower 130 but enabling similar communication. The cell tower 130 increases the range over which messages may be communicated. For example, a client 110A may send a message to a cell tower 130 which proceeds to transmit the message to a client 110B, where client 110A would not have been able to communicate with client 110B without the cell tower 130.

In an embodiment, client 110 communications may be routed through the server 120 or peer-to-peer (P2P). Communications routed through the server 120 may go from a first client 110A to the server 120 via the cell tower 130 and then back through the cell tower 130 to a second client 110B. In contrast, P2P communication may go from the first client 110A to the cell tower 130 and then directly to the second client 120B. Note that in some cases, the communications may pass through other intermediary devices, such as signal boosters. As used herein, a communication is considered P2P if it is routed to the target client 110B without passing through the server 120. For example, a message (e.g., a datagram) may be sent P2P if the target client 110B is connected to the same cell tower 130 as the sending client 110A and routed via the server 120 otherwise. In another embodiment, clients 110 communicate entirely using P2P. Furthermore, in some embodiments, UDP hole punching may be used to establish a connection among two or more clients 110.

In one embodiment, the clients 110 use a coordination service (e.g., hosted at the server and communicated with via TCP) to synchronize IP addresses. The clients 110 can then communicate (e.g., via UDP) using public facing IP addresses or a local area network (LAN). For example, a first client 110A might send a request via TCP to the coordination service to join a local AR shared environment. The coordination service may provide the first client 110A with the IP address of a second client 110B connected to the AR environment (e.g., via the same cell tower 130). The coordination service may also provide the first client’s IP address to the second client 110B or the first client 110A may provide it directly using the second client’s IP address (as provided by the coordination service). In some embodiments, the coordination service may prompt the second client 110B to approve the first client 110A (e.g., by requesting user confirmation or checking a list of approved clients 110 to connect with the second client 110B) before the second client’s IP address is provided.

FIG. 2 illustrates a datagram 200 configured according to the disclosed protocol, according to one embodiment. The datagram 200 includes a payload 202, which as described above is the content of the datagram 200. The datagram 200 also includes a header 204, a portion of which is a P2P flag 206, also known as an indicator. The header may be similar to a UDP header plus a P2P flag 206, or may contain different or additional metadata in addition to the P2P flag. The P2P flag is used to determine whether the datagram 200 is sent to the server 120 or is sent P2P to another client 110. In other embodiments the P2P flag 206 is replaced with one or more other indicators within the header providing similar functionality.

The cell tower 130 receives a datagram 200 from client 110A and determines how to route the datagram based on the P2P flag 206. In one embodiment, the P2P flag 206 may be set by the sending client 110A to indicate that the datagram 200 should be sent P2P if possible. The cell tower 130 analyzes the datagram 200 and, assuming the P2P flag 206 indicates the datagram 200 should be sent P2P, determines whether the target client 110B is currently connected to the cell tower (e.g., by comparing an identifier of the target client 110B to a list of currently connected clients). If the target client 110B is connected to the cell tower 130, the datagram 200 is sent to it without going via the server 120. In contrast, if the target client 110B is not connected to the cell tower 130, the datagram 200 is sent to the server 120 to be sent on to the target client 110B (e.g., via a second cell tower 130 to which it is currently connected). For example, the server 120 might maintain a database or other list of which cell towers 130 are currently or have been recently connected to which client devices 110. In some embodiments, the cell tower 130 may send the datagram 200 to both the target client 110B and the server 120.

In another embodiment, the P2P flag 206 may be an identifier of an AR session, a user, a device, a game account, or the like. The cell tower 130 maintains a list of P2P flags 206 for which the datagram 200 should be sent P2P (or P2P if possible). The cell tower 130 analyzes the datagram 200 to determine whether it should be sent via the server 120 or P2P. If the P2P flag 206 includes an identifier on the list, the datagram 200 is a P2P message and the cell tower 130 sends the datagram 200 to the target client 110. For example, if the header 204 of the datagram 200 indicates the destination port is that of client 110B, the cell tower 130 sends the datagram 200 to client 110B. In contrast, if the P2P flag 206 indicates the datagram 200 is not a P2P message, the cell tower 130 sends the P2P flag 206 to the server 120. Alternatively, the list may indicate P2P flags 206 for messages that are not to be sent P2P, in which case the default behavior if the P2P flag 206 is not on the list is to send the corresponding datagram P2P to the target client (e.g., client 110B).

Example Intermediary Node

FIG. 3 is a block diagram illustrating one embodiment of an intermediary node. In the embodiment shown, the intermediary node is a cell tower 130 that includes a routing module 310, a data ingest module 320, an AR environment module 330, a map processing module 340, an authority check module 350, and a local data store 360. The cell tower 130 also includes hardware and firmware or software (not shown) for establishing connections to the server 120 and clients 110 for exchanging data. For example, the cell tower 130 may connect to the server 120 via a fiberoptic or other wired internet connection and clients 110 using a wireless connection (e.g., 4G or 5G). In other embodiments, the cell tower 130 may include different or additional components. In addition, the functions may be distributed among the elements in a different manner than described.

The routing module 310 receives data packets and sends those packets to one or more recipient devices. In one embodiment, the routing module 310 receives datagrams 200 from clients 110 and uses the method described with reference to FIG. 4 to determine where to send the received datagrams. The routing module 310 may also receive data packets from the server addressed to either particular clients 110 or all clients that are connected to the cell tower. The routing module 310 forwards the data packets to the clients 110 to which they are addressed.

The data ingest module 320 receives data from one or more sources that the cell tower 130 uses to provide a shared AR experience to players via the connected clients 110. In one embodiment, the data ingest module 320 receives real-time or substantially real-time information about real-world conditions (e.g., from third party services). For example, the data ingest module 320 might periodically (e.g., hourly) receive weather data from a weather service indicating weather conditions in the geographic area surrounding the cell tower. As another example, the data ingest module 320 might retrieve opening hours for a park, museum, or other public space. As yet another example, the data ingest module 320 may receive traffic data indicating how many vehicles are travelling on roads in the geographic area surrounding the cell tower 130. Such information about real-world conditions may be used to improve the synergy between the virtual and real worlds.

The AR environment module 330 manages AR environments in which players in the geographic area surrounding the cell tower 130 may engage in shared AR experiences. In one embodiment, a client 110 connects to the cell tower 130 while executing an AR game and the AR environment module 330 connects the client to an AR environment for the game. All players of the game who connect to the cell tower 130 may share a single AR environment or players may be divided among multiple AR environments. For example, there may be a maximum number of players in a particular AR environment (e.g., ten, twenty, one hundred, etc.). Where there are multiple AR environments, newly connecting clients 110 may be placed in a session randomly or the client may provide a user interface (UI) to enable the player to select which session to join. Thus, a player may elect to engage in an AR environment with friends. In some embodiments, players may establish private AR environments that are access protected (e.g., requiring a password or code to join).

In various embodiments, to enable AR objects (e.g., creatures, vehicles, etc.) to appear to interact with real world features (e.g., to jump over obstacles rather than going through them), the AR environment module 330 provides connected clients 110 with map data representing the real world in the proximity of the client (e.g., stored in the local data store 360). The AR environment module 330 may receive location data for a client 110 (e.g., a GPS location) and provide map data for the geographic area surrounding the client (e.g., within a threshold distance of the client’s current position).

The received map data can include one or more representations of the real world. For example, the map data can include a point cloud model, a plane matching model, a line matching model, a geographic information system (GIS) model, a building recognition model, a landscape recognition model, etc. The map data may also include more than one representation of a given type at different levels of detail. For example, the map data may include two or more point cloud models, each including different number of points.

The client 110 may compare the map data to data collected by one or more sensors to refine the client’s location. For example, by mapping the images being captured by a camera on the client 110 to a point cloud model, the client’s location and orientation may be accurately determined (e.g., to within one centimeter and 0.1 degrees). The client 110 provides the determined location and orientation back to the AR environment module 330 along with any actions taken by the player (e.g., shooting, selecting a virtual item to interact with, dropping a virtual item, etc.). Thus, the AR environment module 330 can update the status of the game for all players engaged in the AR environment.

The map processing module 340 updates map data based on current conditions (e.g., data from the data ingest module 320). Because the real world is not static, the map data in the local data store 360 may not represent current real-world conditions. For example, the same park trail in Vermont may look very different in different seasons. In summer, the trail might be clear and the surrounding trees will be covered in foliage. In contrast, in winter, the trail may be blocked by drifts of snow and the trees may be bare. The map processing module 340 may transform the map data to approximate such changes.

In one embodiment, the map processing module 340 retrieves current condition data to identify a transformation and applies that transformation to the map data. The transformations for different conditions may be defined by heuristic rules, take the form of trained machine-learning models, or use a combination of both approaches. For example, the map processing module 340 might receive current weather condition data, select a transformation for the current weather conditions, and apply that transformation to the map data. Alternatively, the map processing module 340 may pre-calculate the transformed maps and store them (e.g., in the local data store 360). In this case, when a client 110 connects to the cell tower, the map processing module determines the current conditions, selects the appropriate pre-calculated version of the map data, and provides that version to the client.

The authority check module 350 maintains synchronization between game states of different clients 110. In one embodiment, the authority check module 350 confirms that game actions received from clients 110 are consistent with the game state maintained by the AR environment module 330. For example, if two players both try to pick up the same in-game item, the authority check module 350 determines which player receives the item (e.g., based on timestamps associated with the requests). As described, the use of a P2P protocol and local processing at the cell tower may significantly reduce the latency of a player’s actions being seen at other players’ clients 110. Therefore, the likelihood (and number) of instances of such conflicts arising and being resolved by the authority check module 350 is reduced. Therefore, the AR experience may be improved.

The authority check module 350 may also maintain synchronization between its copy of the state of the AR environment (the intermediate node state) and a master state maintained by the server 120. In one embodiment, the authority check module 350 periodically (e.g., every 1 to 10 seconds) receives global updates regarding the state of the AR environment from the server 120. The authority check module 350 compares these updates to the intermediate node state and resolves any discrepancies. For example, if a player’s request to pick up an item was initially approved by the authority check module 350 but a game update from the server 120 indicates the item was picked up by another player (or otherwise made unavailable) before the player attempted to pick it up, the authority check module 350 might send an update to the player’s client 110 indicating the item should be removed from the player’s inventory.

This process may provide value for clients 110 located close to a boundary between coverage provided by different cell towers 130. In this case, players connected to different cell towers 130 may both be able to interact with the same virtual element. Thus, each individual cell tower 130 might initially approve conflicting interactions with the element, but the sever 120 would detect the conflict and send updates to resolve the conflict (e.g., instructing one of the cell towers to revoke its initial approval of the action and update its local state accordingly).

The local data store 360 is one or more non-transitory computer-readable media configured to store data used by the cell tower. In one embodiment, the stored data may include map data, current conditions data, a list of currently (or recently) connected clients 110, a local copy of the game state for the geographic region, etc. Although the local data store 360 is shown as a single entity, the data may be split across multiple storage media. Furthermore, some of the data may be stored elsewhere in the communication network and accessed remotely. For example, the cell tower 130 may access current condition data remotely (e.g., from a third-party server) as needed.

Example Method

FIG. 4 illustrates a process for using a low-latency datagram-responsive computer network protocol, according to one embodiment. A cell tower 130 receives 405 a datagram 200 addressed to a client 110A, the target client device. In other embodiments, the cell tower 130 may be another type of intermediary node that performs the same operations as the cell tower 130 of this embodiment. The datagram 200 may have been sent to the cell tower 130 from another client device, such as client 110B. The datagram 200 also describes an action that occurred in a shared AR environment, such as a one associated with a parallel-reality game in which players’ locations in the real world correlate with their positions in the game world.

The cell tower 130 analyzes 410 the datagram 200 to determine whether the datagram is P2P based on its P2P flag 206. If the datagram 200 is P2P, the cell tower 130 sends 415 the datagram 200 to the client 110A to update a local state of the shared AR environment at the client 110A to show the action. If the datagram 200 is not P2P, the cell tower 130 sends 420 the datagram 200 to the server 120 to update a master state of the shared AR environment to show the action and its effects on the AR environment. In some embodiments, the cell tower 130 also sends some or all of the P2P datagrams to the server 120 after sending them to the client 110A. Thus, the clients 110 may synchronize their local states based on the P2P message while the server 120 maintains the master state that may be used to resolve discrepancies between local states of different clients. By bypassing the server 120 for P2P datagrams, the cell tower 130 may improve the latency of actions that occur in the AR environment when the datagram does not need processing by the server 120 before the local state of the AR environment is updated. In various embodiments, the local state is updated with a latency between 1 millisecond and 10 milliseconds, 1 millisecond and 15 milliseconds, 1 millisecond and 20 milliseconds, or 1 millisecond and 30 milliseconds.

In some embodiments, the cell tower 130 follows multiple steps to determine if a datagram 200 is P2P. The cell tower 130 analyzes the indicator, or P2P flag 206, to determine if the datagram 200 should be sent P2P. The cell tower 130 then determines whether client 110A is currently connected to the cell tower 130. If so, the cell tower 130 determines that the datagram 200 is P2P and can be sent straight to the client 110A instead of the server 120. If the cell tower 130 is not current connected to the client 110A, then the cell tower 130 sends the datagram 200 to the server 120, even if the P2P flag 206 indicates that the datagram is P2P.

The process in FIG. 4 may be further described in relation to an example shared AR environment that is incorporated into a parallel reality game where players throw balls of light at one another, which other players dodge or catch in real-time. When a sending player, associated with client 110B in this example, throws a ball of light at a target player, associated with client 110A, the client 110B creates a datagram 200 describing the action (e.g., throwing the ball of light). The action is between players and should occur quickly in the shared AR environment, so the client 110B indicates on the datagram 200 that the datagram 200 is P2P. The client 110B sends the datagram 200 to the cell tower 130, which determines what to do with the datagram 200. Since the datagram is P2P, the cell tower 130 sends the datagram 200 to client 110A instead of the server 120. Client 110A receives the datagram 200 and integrates the data from the payload 202 into the local state of the shared AR environment (e.g., shows the target player that the sending player threw a ball of light at them). By not sending the datagram 200 to the server 120, the latency is reduced. With a low enough latency, the action may appear in the target player’s local presentation as though it happened in real-time, allowing the game play to continue more quickly. This may also allow players to experience a sense of direct interaction with each either. For example, in a virtual catch game, one player could throw a virtual ball and witness another player catch the virtual ball by placing their client 110 in the trajectory of the ball.

The cell tower 130 may send the datagram 200, or a copy of the datagram 200, to the server 120 after sending the datagram 200 to the client 110A. This way, the master state of the shared AR environment is updated to show that the sending player threw a ball of light at the target player. Sending the datagram 300 to the server 120 may provide a way to resolve conflicts between actions performed by different players that are closer together in time than the latency. Additionally, the server 120 may handle sending information from the datagram 200 to other cell towers when a client 110 is connected to another cell tower (e.g., when the client 110 switches to a neighboring cell tower, a player messages another player with a client 110 connected to a different cell tower, etc.). In some embodiments, the cell tower may determine a group of clients 110 that are currently connected to the cell tower 130 and the shared AR environment (e.g., all clients that are connected to the cell tower and the AR environment) and send the datagram to the group of clients 110 so that players associated with those clients 110 can see the action occur quickly, seemingly in real-time (e.g., with latency of less than 10 milliseconds).

FIG. 5 is a high-level block diagram illustrating an example computer 500 suitable for use within the computer network shown in FIG. 1, according to an embodiment. The example computer 500 includes at least one processor 502 coupled to a chipset 504. The chipset 504 includes a memory controller hub 520 and an input/output (I/O) controller hub 522. A memory 506 and a graphics adapter 512 are coupled to the memory controller hub 520, and a display 518 is coupled to the graphics adapter 512. A storage device 508, keyboard 510, pointing device 514, and network adapter 516 are coupled to the I/O controller hub 522. Other embodiments of the computer 500 have different architectures.

In the embodiment shown in FIG. 5, the storage device 508 is a non-transitory computer-readable storage medium such as a hard drive, compact disk read-only memory (CD-ROM), DVD, or a solid-state memory device. The memory 506 holds instructions and data used by the processor 502. The pointing device 514 is a mouse, track ball, touch-screen, or other type of pointing device, and is used in combination with the keyboard 510 (which may be an on-screen keyboard) to input data into the computer system 500. The graphics adapter 512 displays images and other information on the display 518. The network adapter 516 couples the computer system 500 to one or more computer networks.

The types of computers used by the entities of FIG. 1 can vary depending upon the embodiment and the processing power required by the entity. For example, a server 120 might include a distributed database system comprising multiple blade servers working together to provide the functionality described. Furthermore, the computers can lack some of the components described above, such as keyboards 510, graphics adapters 512, and displays 518.

Those skilled in the art can make numerous uses and modifications of and departures from the apparatus and techniques disclosed herein without departing from the described concepts. For example, components or features illustrated or described in the present disclosure are not limited to the illustrated or described locations, settings, or contexts. Examples of apparatuses in accordance with the present disclosure can include all, fewer, or different components than those described with reference to one or more of the preceding figures. The present disclosure is therefore not to be limited to specific implementations described herein, but rather is to be accorded the broadest scope possible consistent with the appended claims, and equivalents thereof.

文章《Niantic Patent | Low latency datagram-responsive computer network protocol》首发于Nweon Patent

]]>
MagicLeap Patent | Virtual or augmented reality headsets having adjustable interpupillary distance https://patent.nweon.com/26711 Thu, 26 Jan 2023 15:17:38 +0000 https://patent.nweon.com/?p=26711 ...

文章《MagicLeap Patent | Virtual or augmented reality headsets having adjustable interpupillary distance》首发于Nweon Patent

]]>
Patent: Virtual or augmented reality headsets having adjustable interpupillary distance

Patent PDF: 加入映维网会员获取

Publication Number: 20230022317

Publication Date: 2023-01-26

Assignee: Magic Leap

Abstract

A virtual or augmented reality headset is provided having a frame, a pair of virtual or augmented reality eyepieces, and an interpupillary distance adjustment mechanism. The frame includes opposing arm members and a bridge positioned intermediate the opposing arm members. The adjustment mechanism is coupled to the virtual or augmented reality eyepieces and operable to simultaneously move the eyepieces to adjust the interpupillary distance of the eyepieces.

Claims

1.A virtual or augmented reality headset, comprising: a frame including opposing arm members and a bridge portion positioned intermediate the opposing arm members; and a pair of virtual or augmented reality eyepieces each having an optical center, the pair of virtual or augmented reality eyepieces movably coupled to the frame to enable adjustment of an interpupillary distance between the optical centers, wherein the frame includes at least two rails on each of the opposing sides of the frame vertically offset from each other to guide a respective one of the virtual or augmented reality eyepieces, and wherein, for each of the opposing sides of the frame, the at least two rails and the arm member form a fork structure.

2.The headset of claim 1, further comprising: an adjustment mechanism coupled to both of the pair of virtual or augmented reality eyepieces and operable to simultaneously move the pair of virtual or augmented reality eyepieces to adjust the interpupillary distance.

3.The headset of claim 2, wherein the adjustment mechanism includes a linear actuator physically coupled to the pair of virtual or augmented reality eyepieces.

4.The headset of claim 3, wherein the linear actuator is physically coupled to the pair of virtual or augmented reality eyepieces by a plurality of links which are arranged such that movement of the linear actuator in a first direction causes the plurality of links to increase the interpupillary distance between the optical centers of the pair of virtual or augmented reality eyepieces and movement of the linear actuator in a second direction opposite the first direction causes the plurality of links to decrease the interpupillary distance between the optical centers of the pair of virtual or augmented reality eyepieces.

5.The headset of claim 4, further comprising: a motor electro-mechanically coupled to the linear actuator; and an electronic controller electrically coupled to the motor to control movement of the virtual or augmented reality eyepieces via the motor, the linear actuator and the plurality of links.

6.The headset of claim 3 wherein the adjustment mechanism is coupled to the bridge portion and the linear actuator includes a user manipulable control for selectively adjusting a position of each of the virtual or augmented reality eyepieces simultaneously.

7.The headset of claim 1 wherein each virtual or augmented reality eyepiece is arcuate and includes a medial end and a lateral end, the medial end positioned proximate the bridge of the frame and the lateral end positioned proximate a temple region of a respective one of the opposing arm members.

8.The headset of claim 7 wherein the frame includes a respective arcuate profile on each of opposing sides of the frame to at least partially nest with a respective one of the virtual or augmented reality eyepieces when the virtual or augmented reality eyepieces are in a narrowest configuration in which the interpupillary distance is at a minimum.

9.The headset of claim 1 wherein, for each of the opposing sides of the frame, the two rails are located proximate the bridge portion to guide a medial end of the respective virtual or augmented reality eyepiece and support the respective virtual or augmented reality eyepiece in a cantilevered manner.

10.The headset of claim 1 wherein the fork structure supports the respective one of the virtual or augmented reality eyepieces.

11.A virtual or augmented reality headset, comprising: a frame including opposing arm members and a bridge positioned intermediate the opposing arm members; and a pair of virtual or augmented reality eyepieces each having an optical center, the pair of virtual or augmented reality eyepieces movably coupled to the frame to enable adjustment of an interpupillary distance between the optical centers, wherein the frame includes at least two rails on each of the opposing sides of the frame vertically offset from each other to guide a respective one of the virtual or augmented reality eyepieces, and wherein, for each of the opposing sides of the frame, the two rails and a portion of the bridge form a fork structure configured to support the respective one of the virtual or augmented reality eyepieces.

12.The headset of claim 11, further comprising: an adjustment mechanism coupled to both of the pair of virtual or augmented reality eyepieces and operable to simultaneously move the pair of virtual or augmented reality eyepieces to adjust the interpupillary distance.

13.The headset of claim 12, wherein the adjustment mechanism includes a linear actuator physically coupled to the pair of virtual or augmented reality eyepieces.

14.The headset of claim 13, wherein the linear actuator is physically coupled to the pair of virtual or augmented reality eyepieces by a plurality of links which are arranged such that movement of the linear actuator in a first direction causes the plurality of links to increase the interpupillary distance between the optical centers of the pair of virtual or augmented reality eyepieces and movement of the linear actuator in a second direction opposite the first direction causes the plurality of links to decrease the interpupillary distance between the optical centers of the pair of virtual or augmented reality eyepieces.

15.The headset of claim 14, further comprising: a motor electro-mechanically coupled to the linear actuator; and an electronic controller electrically coupled to the motor to control movement of the virtual or augmented reality eyepieces via the motor, the linear actuator and the plurality of links.

16.The headset of claim 13 wherein the adjustment mechanism is coupled to the bridge portion and the linear actuator includes a user manipulable control for selectively adjusting a position of each of the virtual or augmented reality eyepieces simultaneously.

17.The headset of claim 10 wherein each virtual or augmented reality eyepiece is arcuate and includes a medial end and a lateral end, the medial end positioned proximate the bridge of the frame and the lateral end positioned proximate a temple region of a respective one of the opposing arm members.

18.The headset of claim 17 wherein the frame includes a respective arcuate profile on each of opposing sides of the frame to at least partially nest with a respective one of the virtual or augmented reality eyepieces when the virtual or augmented reality eyepieces are in a narrowest configuration in which the interpupillary distance is at a minimum.

19.The headset of claim 10 wherein, for each of the opposing sides of the frame, a lower one of the two rails extends away from the bridge portion and is coupled to a lower lateral end of the respective one of the virtual or augmented reality eyepieces.

20.The headset of claim 10 wherein the fork structure supports the respective one of the virtual or augmented reality eyepieces.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation of U.S. patent application Ser. No. 16/863,485, filed Apr. 30, 2020, which is a continuation of U.S. patent application Ser. No. 15/914,811, filed Mar. 7, 2018, now U.S. Pat. No. 10,649,219, which is a continuation of U.S. patent application Ser. No. 15/229,001, filed Aug. 4, 2016, now U.S. Pat. No. 10,073,272, which is a continuation of U.S. patent application Ser. No. 14/516,180, filed Oct. 16, 2014, now U.S. Pat. No. 9,470,906, which claims the benefit of U.S. Provisional Patent Application No. 61/891,801, filed Oct. 16, 2013, the entire contents of which are hereby incorporated by reference in their entireties.

BACKGROUNDTechnical Field

This disclosure generally relates to virtual or augmented reality headsets, and more particularly to virtual or augmented reality headsets wherein the interpupillary distance of the eyepieces is adjustable.

Description of the Related Art

Virtual or augmented reality headsets have long been proven invaluable for many applications, spanning the fields of scientific visualization, medicine and military training, engineering design and prototyping, tele-manipulation and tele-presence, and personal entertainment systems. In virtual reality systems, computer-generated virtual scenes are generally provided on an opaque display. In mixed and augmented reality systems, computer-generated virtual scenes or objects are combined with the views of a real-world scene on a see-through display. In many virtual or augmented reality headsets, virtual or augmented scenes are displayed on separate eyepieces. The interpupillary distance between the optical centers of such eyepieces are often fixed, and corrections that may be needed to adjust for variations in users having different interpupillary distances is made via software to provide corrective display adjustments. In some instances, the interpupillary distance between the optical centers of eyepieces may be mechanically adjustable; however, in such instances, adjustment devices can suffer from various drawbacks. For example, the adjustment mechanisms may be overly complex, bulky, lack precision and/or include a limited range of motion.

BRIEF SUMMARY

Embodiments described herein provide virtual or augmented reality headsets with robust and efficient form factors that enable simultaneous movement of viewer eyepieces along one or more linear rails to provide interpupillary distance adjustment.

A virtual or augmented reality headset may be summarized as including a frame, a pair of virtual or augmented reality eyepieces, and an adjustment mechanism coupled to both of the pair of virtual or augmented reality eyepieces. The frame may include opposing arm members, a bridge positioned intermediate the opposing arm members, and a plurality of linear rails. At least one linear rail may be provided at each of opposing sides of the frame defined by a central reference plane. The pair of virtual or augmented reality eyepieces each have an optical center and may be movably coupled to the plurality of linear rails of the frame to enable adjustment of an interpupillary distance between the optical centers. The adjustment mechanism may be operable to simultaneously move the pair of virtual or augmented reality eyepieces in adjustment directions aligned with the plurality of linear rails to adjust the interpupillary distance.

The virtual or augmented reality eyepieces may be movable between a narrowest configuration and a widest configuration, and a difference between the interpupillary distance in the widest configuration and the interpupillary distance in the narrowest configuration may be between about 20 mm and about 24 mm.

The adjustment mechanism may be coupled to the bridge of the frame and may include a manipulable actuator coupled to the virtual or augmented reality eyepieces for selectively adjusting a linear position of each of the virtual or augmented reality eyepieces simultaneously. The frame may further include a lock to selectively fix the virtual or augmented reality eyepieces in a selected linear position along the plurality of linear rails.

The adjustment mechanism may include a manipulable actuator manually operable by a user and one or more links physically may couple the manipulable actuator to the virtual or augmented reality eyepieces. The headset may further include a selectively removable cover that is selectively positionable to alternatively prevent access to the manipulable actuator and to provide access to the manipulable actuator by the user. The manipulable actuator may be constrained to translate back and forth in directions perpendicular to the adjustment directions aligned with the plurality of linear rails, and movement of the manipulable actuator in one direction may move the virtual or augmented reality eyepieces toward an expanded configuration while movement of the manipulable actuator in the opposite direction may move the virtual or augmented reality eyepieces toward a collapsed configuration. The manipulable actuator may be accessible to the user while the headset is worn.

The adjustment mechanism may include one or more linear actuators, such as, for example, a piezoelectric linear actuator or a motor-driven lead screw.

The bridge of the frame may include a nosepiece to engage a nose of the user and support the virtual or augmented reality eyepieces in front of the user’s eyes. The nosepiece may be removably coupleable to a base portion of the bridge to selectively lock the virtual or augmented reality eyepieces in a selected position.

Each virtual or augmented reality eyepiece may be arcuate and may include a medial end and a lateral end. The medial end may be positioned proximate the bridge of the frame and the lateral end may be positioned proximate a temple region of a respective one of the opposing arm members. The frame may include a respective arcuate profile on each of opposing sides of the central reference plane to at least partially nest with a respective one of the virtual or augmented reality eyepieces when the virtual or augmented reality eyepieces are in a narrowest configuration in which the interpupillary distance is at a minimum. The plurality of linear rails may include at least two linear rails on each of opposing sides of the frame to guide a respective one of the virtual or augmented reality eyepieces, and wherein, for each of the opposing sides of the frame, a first one of the linear rails may be located proximate the bridge to guide the medial end of the respective virtual or augmented reality eyepiece and a second one of the linear rails may be located proximate the temple region to guide the lateral end of the respective virtual or augmented reality eyepiece. Each of the virtual or augmented reality eyepieces may be coupled to at least two linear rails that are offset fore and aft from each other.

The plurality of linear rails of the frame may include at least two linear rails on each of opposing sides of the frame to guide a respective one of the virtual or augmented reality eyepieces, and wherein, for each of the opposing sides of the frame, the two linear rails may be located proximate the bridge to guide a medial end of the respective virtual or augmented reality eyepiece and support the respective virtual or augmented reality eyepiece in a cantilevered manner.

The plurality of linear rails of the frame may include at least two linear rails on each of opposing sides of the frame vertically offset from each other to guide a respective one of the virtual or augmented reality eyepieces. For each of the opposing sides of the frame, the at least two linear rails and the arm member may form a fork structure. For each of the opposing sides of the frame, the two linear rails and a portion of the bridge may form a fork structure that supports the respective one of the virtual or augmented reality eyepieces.

Each of the virtual or augmented reality eyepieces may be supported by a single respective linear rail underlying the eyepiece and supported in space only by a connection to the single respective linear rail. In other instances, each of the virtual or augmented reality eyepieces may be supported by a single respective linear rail positioned above a horizontal plane defined by the optical centers of the pair of virtual or augmented reality eyepieces and supported in space only by a connection to the single respective linear rail.

The bridge and the plurality of rails of the frame may be integrally formed as a single-piece. The bridge, the opposing arm members and the plurality of rails of the frame may be integrally formed as a single-piece.

The frame may further include a central frame portion comprising the bridge, and the opposing arm members may be hingedly connected to the central frame portion.

BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS

FIG. 1 is a perspective view of a headset according to one embodiment.

FIG. 2 is a top plan view of a portion of the headset of FIG. 1 shown in a collapsed configuration.

FIG. 3 is a top plan view of a portion of the headset of FIG. 1 shown in an expanded configuration.

FIG. 4 is a top plan view of the headset of FIG. 1 shown in the collapsed configuration.

FIG. 5 is a front elevational view of the headset of FIG. 1 shown in the collapsed configuration.

FIG. 6 is a side elevational view of the headset of FIG. 1 shown in the collapsed configuration.

FIG. 7 is a perspective view of a headset according to another embodiment.

FIG. 8 is a front elevational view of the headset of FIG. 7 shown in a collapsed configuration.

FIG. 9 is a front elevational view of the headset of FIG. 7 shown in an expanded configuration.

FIG. 10 is a top plan view of the headset of FIG. 7 shown in the collapsed configuration.

FIG. 11 is a front elevational view of the headset of FIG. 7 shown in the collapsed configuration.

FIG. 12 is a side elevational view of the headset of FIG. 7 shown in the collapsed configuration.

FIG. 13 is a perspective view of a headset according to another embodiment.

FIG. 14 is a front elevational view of the headset of FIG. 13 shown in a collapsed configuration.

FIG. 15 is a front elevational view of the headset of FIG. 13 shown in an expanded configuration.

FIG. 16 is a top plan view of the headset of FIG. 13 shown in the collapsed configuration.

FIG. 17 is a front elevational view of the headset of FIG. 13 shown in the collapsed configuration.

FIG. 18 is a side elevational view of the headset of FIG. 13 shown in the collapsed configuration.

FIG. 19 is a perspective view of a headset according to another embodiment.

FIG. 20 is a front perspective view of a portion of the headset of FIG. 19 shown in a collapsed configuration.

FIG. 21 is a front perspective view of a portion of the headset of FIG. 19 shown in an expanded configuration.

FIG. 22 is a top plan view of the headset of FIG. 19 shown in the collapsed configuration.

FIG. 23 is a front elevational view of the headset of FIG. 19 shown in the collapsed configuration.

FIG. 24 is a side elevational view of the headset of FIG. 19 shown in the collapsed configuration.

FIG. 25 is a perspective view of a portion of a headset shown in an expanded configuration according to another embodiment.

FIG. 26 is an enlarged perspective view of a portion of the headset of FIG. 25 showing an adjustable mechanism.

FIG. 27 is a partial cutaway perspective view of the headset of FIG. 25.

FIG. 28 is a front elevational view of a headset shown in an expanded configuration according to yet another embodiment.

DETAILED DESCRIPTION

In the following description, certain specific details are set forth in order to provide a thorough understanding of various disclosed embodiments. However, one skilled in the relevant art will recognize that embodiments may be practiced without one or more of these specific details, or with other methods, components, materials, etc. In other instances, well-known structures associated with virtual and augmented reality systems have not been shown or described in detail to avoid unnecessarily obscuring descriptions of the embodiments.

Unless the context requires otherwise, throughout the specification and claims which follow, the word “comprise” and variations thereof, such as, “comprises” and “comprising” are to be construed in an open, inclusive sense that is as “including, but not limited to.”

Reference throughout this specification to “one embodiment” or “an embodiment” means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment. Thus, the appearances of the phrases “in one embodiment” or “in an embodiment” in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.

FIGS. 1 through 6 show one example embodiment of a virtual or augmented reality headset 10. The headset 10 includes a frame 12 and a pair of virtual or augmented reality eyepieces 30a, 30b supported by the frame 12. The frame 12 has opposing arm members 14a, 14b, a bridge 16 positioned intermediate the opposing arm members 14a, 14b, and a plurality of linear rails 18a, 18b, 20a, 20b. More particularly, two linear rails 18a, 18b, 20a, 20b are provided at each of opposing sides 22, 24 of the frame 12 defined by a central reference plane 26.

The pair of virtual or augmented reality eyepieces 30a, 30b each has an optical center 32a, 32b, a distance between which defines an interpupillary distance IPD. The eyepieces 30a, 30b are movably coupled to the plurality of linear rails 18a, 18b, 20a, 20b to enable adjustment of the interpupillary distance IPD as desired to correspond to or more closely correspond to an actual interpupillary distance between the pupils of a wearer.

The headset 10 further includes an adjustment mechanism 34 coupled to both of the pair of virtual or augmented reality eyepieces 30a, 30b. The adjustment mechanism 34 is operable to simultaneously move the eyepieces 30a, 30b in adjustment directions 42, 44 aligned with the linear rails 18a, 18b, 20a, 20b to adjust the interpupillary distance IPD. The virtual or augmented reality eyepieces 30a, 30b are movable between a fully collapsed or narrowest configuration (FIGS. 1, 2 and 46) and a fully expanded or widest configuration (FIG. 3). The frame 12, eyepieces 30a, 30b, and rails 18a, 18b, 20a, 20b are configured relative to each other such that a difference between the interpupillary distance IPD in the fully expanded or widest configuration and the interpupillary distance IPD in the fully collapsed or narrowest configuration is between about 20 mm and about 24 mm. As such, each individual eyepiece 30a, 30b may be adjusted a distance between about 10 mm and 12 mm. It is appreciated, however, that in some embodiments, more or less adjustment may be provided.

A nosepiece 36 may be provided at the bridge 16 of the frame 12 to engage a nose of the user and support the virtual or augmented reality eyepieces 30a, 30b in front of the user’s eyes during use. The nosepiece 36 may be integrally formed as a portion of the bridge 16, fixedly secured to the bridge 16, or removably coupled to the bridge 16. In some embodiments, the nosepiece 36 may be removably coupleable to a base portion of the bridge 16 and impede the travel of the adjustment mechanism 34 to lock the virtual or augmented reality eyepieces 30a, 30b in a selected position. In other instances, a lock may be provided on each eyepiece 30a, 30b, to clamp to a respective one of the linear rails 18a, 18b, 20a, 20b, or vice versa. In this manner, a user may selectively unlock the eyepieces 30a, 30b for adjustment, adjust the eyepieces 30a, 30b transversely to a new interpupillary distance IPD, and lock the eyepieces 30a, 30b in place at the new interpupillary distance IPD. The lock may include, for example, one or more clamps, set screws, clips or other fasteners to impede movement of the adjustment mechanism 34 and/or eyepieces 30a, 30b, or otherwise lock the same. The lock may be spring-biased toward a locked position.

With continued reference to FIGS. 1 through 6, each virtual or augmented reality eyepiece 30a, 30b may be arcuate and include a medial end and a lateral end. The medial end may be positioned proximate the bridge 16 of the frame 12 and the lateral end may be positioned proximate a temple region of a respective one of the opposing arm members 14a, 14b. The frame 12 may include a respective arcuate profile on each of opposing sides 22, 24 of the central reference plane 26 to at least partially nest with a respective one of the virtual or augmented reality eyepieces 30a, 30b when the virtual or augmented reality eyepieces 30a, 30b are in the fully collapsed or narrowest configuration (FIGS. 1, 2 and 56) in which the interpupillary distance IPD is at a minimum.

The headset 10 may include a pair of linear rails 18a, 20a and 18b, 20b on each of opposing sides 22, 24 of the frame 12 to guide a respective one of the virtual or augmented reality eyepieces 30a, 30b. In addition, for each of the opposing sides 22, 24 of the frame 12, a first one of the linear rails 18a, 18b may be located proximate the bridge 16 to guide the medial end of the respective virtual or augmented reality eyepiece 30a, 30b and a second one of the linear rails 20a, 20b may be located proximate the temple region to guide the lateral end of the respective virtual or augmented reality eyepiece 30a, 30b. In this manner, each of the virtual or augmented reality eyepieces 30a, 30b may be coupled to at least two linear rails 18a, 20a and 18b, 20b that are offset fore and aft from each other. The linear rails may be protruding rods or telescoping elements that project from a side of the frame 12. In some instances, the rails 18a, 18b, 20a, 20b may be substantially or completely concealed from view when in the fully collapsed or narrowest configuration and/or when in the fully expanded or widest configuration.

As can be appreciated from the embodiment shown in FIGS. 1 through 6, the eyepieces 30a, 30b, may be generally arc-shaped and may move transversely along the linear rails 18a, 18b, 20a, 20b between an extreme medial position nearer the central plane 26 and an extreme lateral position farther from the central plane 26. The eyepieces 30a, 30b may be located at any position between the extreme end positions and secured in place with a lock or other fastening mechanism or fixation method.

FIGS. 7 through 12 show another example embodiment of a virtual or augmented reality headset 110. The headset 110 includes a frame 112 and a pair of virtual or augmented reality eyepieces 130a, 130b supported by the frame 112. The frame 112 has opposing arm members 114a, 114b, a bridge 116 positioned intermediate the opposing arm members 114a, 114b, and a plurality of linear rails 118a, 118b, 120a, 120b. More particularly, two linear rails 118a, 118b, 120a, 120b are provided at each of opposing sides 122, 124 of the frame 112 defined by a central reference plane 126. As shown in FIGS. 7 through 12, the linear rails 118a, 118b, 120a, 120b may transition to curvilinear rails or rail portions beyond the range of adjustability range of the eyepieces 130a, 130b.

Again, the pair of virtual or augmented reality eyepieces 130a, 130b each have an optical center 132a, 132b, a distance between which defines an interpupillary distance IPD. The eyepieces 130a, 130b are movably coupled to the plurality of linear rails 118a, 118b, 120a, 120b to enable adjustment of the interpupillary distance IPD as desired to correspond to or more closely correspond to an actual interpupillary distance between the pupils of a wearer.

The headset 110 further includes an adjustment mechanism 134 coupled to both of the pair of virtual or augmented reality eyepieces 130a, 130b. The adjustment mechanism 134 is operable to simultaneously move the eyepieces 130a, 130b in adjustment directions 142, 144 aligned with the linear rails 118a, 118b, 120a, 120b to adjust the interpupillary distance IPD. The virtual or augmented reality eyepieces 130a, 130b are movable between a fully collapsed or narrowest configuration (FIGS. 7, 8 and 1012) and a fully expanded or widest configuration (FIG. 9). The frame 112, eyepieces 130a, 130b, and rails 118a, 118b, 120a, 120b are configured relative to each other such that a difference between the interpupillary distance IPD in the fully expanded or widest configuration and the interpupillary distance IPD in the fully collapsed or narrowest configuration is between about 20 mm and about 24 mm. As such, each individual eyepiece 130a, 130b may be adjusted a distance between about 10 mm and 12 mm. It is appreciated, however, that in some embodiments, more or less adjustment may be provided.

Again, a nosepiece 136 may be provided at the bridge 116 of the frame 112 to engage a nose of the user and support the virtual or augmented reality eyepieces 130a, 130b in front of the user’s eyes during use. The nosepiece 136 may be integrally formed as a portion of the bridge 116, fixedly secured to the bridge 116, or removably coupled to the bridge 116. In some embodiments, the nosepiece 136 may be removably coupleable to a base portion of the bridge 116 and impede the travel of the adjustment mechanism 134 to lock the virtual or augmented reality eyepieces 130a, 130b in a selected position. In other instances, a lock may be provided on each eyepiece 130a, 130b, to clamp to a respective one of the linear rails 118a, 118b, 120a, 120b, or vice versa. In this manner, a user may selectively unlock the eyepieces 130a, 130b for adjustment, adjust the eyepieces 130a, 130b transversely to a new interpupillary distance IPD, and lock the eyepieces 130a, 130b in place at the new interpupillary distance IPD. The lock may include, for example, one or more clamps, set screws, clips or other fasteners to impede movement of the adjustment mechanism 134 and/or eyepieces 130a, 130b, or otherwise fix the same in place. The lock may be spring-biased toward a locked position.

With continued reference to FIGS. 7 through 12, each virtual or augmented reality eyepiece 130a, 130b may be arcuate and include a medial end and a lateral end. The medial end may be positioned proximate the bridge 116 of the frame 112 and the lateral end may be positioned proximate a temple region of a respective one of the opposing arm members 114a, 114b. The frame 112 may include a respective arcuate profile on each of opposing sides 122, 124 of the central reference plane 126 that generally reflects that of the respective virtual or augmented reality eyepieces 130a, 130b.

The headset 110 may include a pair of linear rails 118a, 120a and 118b, 120b on each of opposing sides 122, 124 of the frame 112 to guide a respective one of the virtual or augmented reality eyepieces 130a, 130b. In addition, for each of the opposing sides 122, 124 of the frame 112, a first one of the linear rails 118a, 118b may be located proximate the bridge 116 at an upper region of the headset 110 to guide an upper portion of the medial end of the respective virtual or augmented reality eyepiece 130a, 130b and a second one of the linear rails 120a, 120b may be located proximate the bridge 116 at a lower region of the headset 110 to guide a lower portion of the medial end of the respective virtual or augmented reality eyepiece 130a, 130b. In this manner, at least two linear rails 118a, 120a and 118b, 120b may be provided on each of opposing sides 122, 124 of the frame 112 to guide a respective one of the virtual or augmented reality eyepieces 130a, 130b. The two linear rails 118a, 120a and 118b, 120b on each side 122, 124 may be located proximate the bridge 16 to guide the medial end of the respective virtual or augmented reality eyepiece 130a, 130b and support the eyepiece in a cantilevered manner. The two linear rails 118a, 120a and 118b, 120b on each of opposing sides 122, 124 of the frame 112 may be vertically offset from each other and may form a fork structure with a respective arm member 114a, 114b of the frame 112. The eyepieces 130a, 130b may be received within the tines of the fork structure. In an alternate embodiment, the two linear rails 118a, 120a and 118b, 120b on each of opposing sides 122, 124 of the frame 112 and a portion of the bridge 116 may form a fork structure oriented away from the central plane 126 to support the eyepieces 130a, 130b.

As can be appreciated from the embodiment shown in FIGS. 7 through 12, the eyepieces 130a, 130b, may be generally arc-shaped and may move transversely along the linear rails 118a, 118b, 120a, 120b between an extreme medial position nearer the central plane 126 and an extreme lateral position farther from the central plane 126. The eyepieces 130a, 130b may be located at any position between the extreme end positions and secured in place with a lock or other fastening mechanism or fixation method.

FIGS. 13 through 18 show another example embodiment of a virtual or augmented reality headset 210. The headset 210 includes a frame 212 and a pair of virtual or augmented reality eyepieces 230a, 230b supported by the frame 212. The frame 212 has opposing arm members 214a, 214b, a bridge 216 positioned intermediate the opposing arm members 214a, 214b, and a plurality of linear rails 220a, 220b. More particularly, a single linear rail 220a, 220b is provided at each of opposing sides 222, 224 of the frame 212 defined by a central reference plane 226. As shown in FIGS. 13 through 18, the linear rails 220a, 220b may transition to curvilinear rails or rail portions beyond the range of adjustability range of the eyepieces 230a, 230b.

Again, the pair of virtual or augmented reality eyepieces 230a, 230b each have an optical center 232a, 232b, a distance between which defines an interpupillary distance IPD. The eyepieces 230a, 230b are movably coupled to the plurality of linear rails 220a, 220b to enable adjustment of the interpupillary distance IPD as desired to correspond to or more closely correspond to an actual interpupillary distance between the pupils of a wearer.

The headset 210 further includes an adjustment mechanism 234 coupled to both of the pair of virtual or augmented reality eyepieces 230a, 230b. The adjustment mechanism 234 is operable to simultaneously move the eyepieces 230a, 230b in adjustment directions 242, 244 aligned with the linear rails 220a, 220b to adjust the interpupillary distance IPD. The virtual or augmented reality eyepieces 230a, 230b are movable between a fully collapsed or narrowest configuration (FIGS. 13, 14 and 1618) and a fully expanded or widest configuration (FIG. 15). The frame 212, eyepieces 230a, 230b, and rails 220a, 220b are configured relative to each other such that a difference between the interpupillary distance IPD in the fully expanded or widest configuration and the interpupillary distance IPD in the fully collapsed or narrowest configuration is between about 20 mm and about 24 mm. As such, each individual eyepiece 230a, 230b may be adjusted a distance between about 10 mm and 12 mm. It is appreciated, however, that is some embodiments, more or less adjustment may be provided.

Again, a nosepiece 236 may be provided at the bridge 216 of the frame 212 to engage a nose of the user and support the virtual or augmented reality eyepieces 230a, 230b in front of the user’s eyes during use. The nosepiece 236 may be integrally formed as a portion of the bridge 216, fixedly secured to the bridge 216 or removably coupled to the bridge 216. In some embodiments, the nosepiece 236 may be removably coupleable to a base portion of the bridge 216 and impede the travel of the adjustment mechanism 234 to lock the virtual or augmented reality eyepieces 230a, 230b in a selected position. In other instances, a lock may be provided on each eyepiece 230a, 230b, to clamp to a respective one of the linear rails 220a, 220b, or vice versa. In this manner, a user may selectively unlock the eyepieces 230a, 230b for adjustment, adjust the eyepieces 230a, 230b transversely to a new interpupillary distance IPD, and lock the eyepieces 230a, 230b in place at the new interpupillary distance IPD. The lock may include, for example, one or more clamps, set screws, clips or other fasteners to impede movement of the adjustment mechanism 234 and/or eyepieces 230a, 230b, or otherwise fix the same in place. The lock may be spring-biased toward a locked position.

With continued reference to FIGS. 13 through 18, each virtual or augmented reality eyepiece 230a, 230b may be arcuate and include a medial end and a lateral end. The medial end may be positioned proximate the bridge 216 of the frame 212 and the lateral end may be positioned proximate a temple region of a respective one of the opposing arm members 214a, 214b. The frame 212 may include a respective arcuate profile on each of opposing sides 222, 224 of the central reference plane 226 that generally transitions with that of the respective eyepieces 230a, 230b.

The headset 210 includes a single linear rail 220a, 220b on each of opposing sides 222, 224 of the frame 212 to guide a respective one of the virtual or augmented reality eyepieces 230a, 230b. The linear rail 220a, 220b of each side 222, 224 may be located remote from the bridge 216 and may underlay the respective eyepiece 230a, 230b to guide a lower portion of the eyepiece 230a, 230b only.

As can be appreciated from the embodiment shown in FIGS. 13 through 18, the eyepieces 230a, 230b, may be generally arc-shaped and may move transversely along the linear rails 220a, 220b between an extreme medial position nearer the central plane 226 and an extreme lateral position farther from the central plane 226. The eyepieces 230a, 230b may be located at any position between the extreme end positions and secured in place with a lock or other fastening mechanism or fixation method.

FIGS. 19 through 24 show yet another example embodiment of a virtual or augmented reality headset 310. The headset 310 includes a frame 312 and a pair of virtual or augmented reality eyepieces 330a, 330b supported by the frame 312. The frame 312 has opposing arm members 314a, 314b, a bridge 316 positioned intermediate the opposing arm members 314a, 314b, and a plurality of linear rails 318a, 318b. More particularly, a single linear rail 318a, 318b is provided at each of opposing sides 322, 324 of the frame 312 defined by a central reference plane 326. As shown in FIGS. 19 through 24, the linear rails 318a, 318b may be concealed or substantially concealed within the eyepieces 330a, 330b.

Again, the pair of virtual or augmented reality eyepieces 330a, 330b each have an optical center 332a, 332b, a distance between which defines an interpupillary distance IPD. The eyepieces 330a, 330b are movably coupled to the plurality of linear rails 318a, 318b to enable adjustment of the interpupillary distance IPD as desired to correspond to or more closely correspond to an actual interpupillary distance between the pupils of a wearer.

The headset 310 further includes an adjustment mechanism 334 coupled to both of the pair of virtual or augmented reality eyepieces 330a, 330b. The adjustment mechanism 334 is operable to simultaneously move the eyepieces 330a, 330b in adjustment directions 342, 344 aligned with the linear rails 318a, 318b to adjust the interpupillary distance IPD. The virtual or augmented reality eyepieces 330a, 330b are movable between a fully collapsed or narrowest configuration (FIGS. 19, 20 and 2224) and a fully expanded or widest configuration (FIG. 21). The frame 312, eyepieces 330a, 330b, and rails 318a, 318b are configured relative to each other such that a difference between the interpupillary distance IPD in the fully expanded or widest configuration and the interpupillary distance IPD in the fully collapsed or narrowest configuration is between about 20 mm and about 24 mm. As such, each individual eyepiece 330a, 330b may be adjusted a distance between about 10 mm and 12 mm. It is appreciated, however, that is some embodiments, more or less adjustment may be provided.

Again, a nosepiece 336 may be provided at the bridge 316 of the frame 312 to engage a nose of the user and support the virtual or augmented reality eyepieces 330a, 330b in front of the user’s eyes during use. The nosepiece 336 may be integrally formed as a portion of the bridge 316, fixedly secured to the bridge 316 or removably coupled to the bridge 316. In some embodiments, the nosepiece 336 may be removably coupleable to a base portion of the bridge 316 and impede the travel of the adjustment mechanism 334 to lock the virtual or augmented reality eyepieces 330a, 330b in a selected position. In other instances, a lock may be provided on each eyepiece 330a, 330b, to clamp to a respective one of the linear rails 318a, 318b, or vice versa. In this manner, a user may selectively unlock the eyepieces 330a, 330b for adjustment, adjust the eyepieces 330a, 330b transversely to a new interpupillary distance IPD, and lock the eyepieces 330a, 330b in place at the new interpupillary distance IPD. The lock may include, for example, one or more clamps, set screws, clips or other fasteners to impede movement of the adjustment mechanism 334 and/or eyepieces 330a, 330b, or otherwise fix the same in place. The lock may be spring-biased toward a locked position.

With continued reference to FIGS. 19 through 24, each virtual or augmented reality eyepiece 330a, 330b may include a straight-line construction with flared lateral ends. A medial end of each eyepiece 330a, 330b may be positioned proximate the bridge 316 of the frame 312 and the lateral end may be positioned proximate a temple region of a respective one of the opposing arm members 314a, 314b. The frame 312 may include a respective straight-line construction on each of opposing sides 322, 324 of the central reference plane 326 that generally mimics that of the respective eyepieces 330a, 330b.

The headset 310 includes a single linear rail 318a, 318b on each of opposing sides 322, 324 of the frame 312 to guide a respective one of the virtual or augmented reality eyepieces 330a, 330b. The linear rail 318a, 318b of each side 322, 324 may be located above a horizontal plane defined by the optical centers of the eyepiece 330a, 330b to guide an upper portion of the eyepiece 330a, 330b only. The eyepiece 330a, 330b may hang from the rails 318a, 318b.

As can be appreciated from the embodiment shown in FIGS. 19 through 24, the eyepieces 330a, 330b, may have a generally straight-lined construction and may move transversely along the linear rails 318a, 318b between an extreme medial position nearer the central plane 326 and an extreme lateral position farther from the central plane 326. The eyepieces 330a, 330b may be located at any position between the extreme end positions and secured in place with a lock or other fastening mechanism or fixation method.

FIGS. 25 through 27 show another example embodiment of a virtual or augmented headset 410. The headset 410 includes a frame 412 and a pair of virtual or augmented reality eyepieces 430a, 430b supported by the frame 412. The frame has opposing arm members 414a, 414b, a bridge 416 positioned intermediate the opposing arm members 414a, 414b, and an adjustment mechanism 434. The adjustment mechanism 434 includes a rotary dial 436 with a coaxial output shaft or pin 438 that extends axially and rotatably couples to the bridge 416. A pair of gear pinions 440a, 440b are mounted to the output pin 438 and are positioned at opposite sides of the rotary dial 436. Gear pinions 440b may be a mirror image of gear pinion 440a, simply reflected across a plane that bisects the rotary dial 436 and is perpendicular to a rotational axis thereof. Each of the gear pinions 440a, 440b are sized and shaped to releasably and simultaneously engage a respective gear rack 442a, 442b. Each of the gear racks 442a, 442b are coupleable to a respective virtual or augmented reality eyepiece 430a, 430b.

With continued reference to FIGS. 25 through 27, each opposing arm members 414a, 414b includes a respective guide pin 444 coupled thereto. The guide pins 444 are positioned proximate the temple region and, more particularly, between the temple and ear regions of a wearer. Each of the guide pins 444 extends through respective arm member apertures 446 and is received by a respective eyepiece aperture 448. A cylindrical projection 450 extends inwardly from each eyepiece aperture 448. The cylindrical projection 450 is sized and shaped to be slideably received by the respective arm member apertures 446 when the headset 410 is in a collapsed or narrowest configuration. In some embodiments, the arm member apertures 446 may include a counterbore or a countersink to allow the guide pin 444 head to sit at least flush with an interior surface of the opposing arm members 414a, 414b to substantially or completely conceal the guide pin 444 from view when in the fully collapsed or narrowest configuration and/or when in the fully expanded or widest configuration. Further, in some embodiments, the opposing arm member apertures 446 may include bushings coupled thereto in order to reduce wear and friction, guide, or constrain the motion of the headset 410. The bushings may be lubricated or unlubricated.

With continued reference to FIGS. 2527, rotation of the gear pinions 440a, 440b via the rotary dial 436 in a clockwise direction causes the gear pinions 440a, 440b to engage the respective gear racks 442a, 442b. Such engagement moves the virtual or augmented reality eyepieces 430a, 430b approximately equal distances simultaneously and outwardly relative to the rotary dial 436. At the temporal or lateral end, moreover, the guide pins 444 assist in guiding the virtual or augmented reality eyepieces 430a, 430b as they move outwardly relative to the opposing arm members 414a, 414b. Conversely, counterclockwise rotation of the rotary dial 436 causes the gear racks 442a, 442b to move approximately equal distances simultaneously and inwardly relative to the rotary dial 436. Similarly, the guide pins 444 assist in guiding the virtual or augmented reality eyepieces 430a, 430b as they move inwardly relative to the opposing arm members 414a, 414b.

By manipulating the adjustment mechanism 434 to move the virtual or augmented reality eyepieces 430a, 430b inwardly or outwardly, the interpupillary distance IPD can be conveniently controlled by a user. By way of example, in the illustrated embodiment of the headset 410, the gear racks 442a, 442b are sized and shaped to allow movement of the virtual or augmented reality eyepieces 430a, 430b relative to the rotary dial 436 such that a difference between the interpupillary distance IPD in the fully expanded or widest configuration (FIGS. 2527) and the interpupillary distance IPD in the fully collapsed or narrowest configuration is between 10 mm and about 12 mm. It is appreciated, however, that in some embodiments, more or less adjustment may be provided.

To allow the user access to the rotary dial 436, the bridge 416 includes a recess 452 through which a portion of the rotary dial 436 protrudes outwardly. The user may rotate the rotary dial 436 to adjust the interpuppilary distance IPD until the optimal interpuppilary distance IPD for the user is determined. Once the optimal interpuppilary distance IPD is set, each of the virtual or augmented reality eyepieces 430a, 430b can be locked in place through a lock. The lock may include, for example, one or more clamps, set screws, clips or other fasteners to impede movement of the adjustment mechanism 34 and/or eyepieces 430a, 430b, or otherwise lock the same. The lock may be spring-biased toward a locked position.

The adjustable mechanism 434 may further include a cover 453 to releasably attach to the recess 452 in the bridge 416. The cover 453 may substantially seal the adjustable mechanism 434 from the environment, such as water or moisture ingress, and may also selectively control access to the rotary dial 436 during use. In some embodiments, the cover 453 may include male connectors that can snap into place when matingly received by a female connector located in the recess 452 of the bridge 416. In other embodiments, the cover 453 may include any number of posts or pegs that may extend outwardly. The posts or pegs may be received by holes or dents in the recess 452 of the bridge 416 to releasably secure the cover 453 to the bridge 416. As can be appreciated from the foregoing, other mechanisms may be used to releasably attach the cover 453 to the headset 410.

FIG. 28 shows another example embodiment of a virtual or augmented headset 510 in an expanded or widest configuration. The headset includes a frame 512 and a pair of virtual or augmented reality eyepieces 530a, 530b supported by the frame 512. The frame has opposing arm members 514a, 514b, a bridge 516 positioned intermediate the opposing arm members 514a, 514b, and a plurality of linear rails 518a, 518b. More particularly, a single linear rail 518a, 518b is provided at each of opposing sides 522, 524 of the frame 512 defined by a central reference plane 526.

Again, the pair of virtual or augmented reality eyepieces 530a, 530b each has an optical center 532a, 532b, a distance between which defines an interpupillary distance IPD. The eyepieces 530a, 530b are movably coupled to the plurality of linear rails 518a, 518b to enable adjustment of the interpupillary distance IPD as desired to correspond to or more closely correspond to an actual interpupillary distance IPD between the pupils of a wearer.

The headset 510 further includes an adjustment mechanism 534 coupled to both of the pair of virtual or augmented reality eyepieces 530a, 530b. The adjustment mechanism illustrated in FIG. 28 includes a linear actuator device 560 to convert rotary motion into linear motion, such as a lead screw, jackscrew, ball screw, roller screw, or other types of devices that may mechanically convert rotary motion into linear motion. By way of example, FIG. 28 illustrates a lead screw with some of the hardware, such as a control knob, nuts, etc., removed for clarity. The linear actuator device 560 is coupled to a pair of links 562a, 562b. The links 562a, 562b are angularly spaced apart relative to each other and about the central reference plane 526. At a lower end, the links 562a, 562b are coupled to the respective linear rails 518a, 518b.

The adjustment mechanism 534 allows the user to manipulate the interpupillary distance IPD by moving the virtual or augmented reality eyepieces 530a, 530b inwardly or outwardly relative to the adjustment mechanism 534. The user can rotate the control knob of the linear actuator device 560 in a clockwise direction, which causes a linear extension of the linear actuator device 560 shaft. This linear extension causes an increase in the angular displacement of the links 562a, 562b relative to one another, resulting in an outward linear translation of the respective rails 518a, 518b and the virtual or augmented reality eyepieces 530a, 530b. Conversely, the user can rotate the control knob of the linear actuator device 560 in a counterclockwise direction to cause an inward movement of the virtual or augmented reality eyepieces 530a, 530b in a similar manner.

The adjustment mechanism 534 can be substantially or completely concealed from view by housing the adjustment mechanism 534 within the bridge 516. The bridge 516 may further include a recess to allow a portion of the control knob to protrude outwardly. A cover may releasably attach to the recess in the bridge 516. The cover may substantially seal the adjustable mechanism 534 from the environment, such as water or moisture ingress, and may also selectively control access to the control knob during use. In some embodiments, the cover may include male connectors that can snap into place when matingly received by a female connector located in the recess of the bridge 516. In other embodiments, the cover may include any number of posts or pegs that may extend outwardly. The posts or pegs may be received by holes or dents in the recess of the bridge 516 to releasably secure the cover to the bridge 516. As can be appreciated from the foregoing, other mechanisms may be used to releasably attach the cover to the headset 510.

In some embodiments, the adjustment mechanisms described herein may be controlled electro-mechanically. One or more motors may be electro-mechanically coupled to the linear actuator device, such as a lead screw, jack screw, ball screw, roller screw, etc. The rotatory motion of the motors may be converted into linear motion through the linear actuator device to cause inward or outward movement of the virtual or augmented eyepieces. The motors may be a servo motor, stepper motor, or other types of electric motors. To control movement of the virtual or augmented eyepieces, the motors may be electrically coupled to an electronic controller. The electronic controller may include a microcontroller and a motor driver to control and drive the motors. Moreover, the microcontroller may comprise a microprocessor, memory, and a plurality of peripheral devices to form a system on a chip that may be applicable for a wide variety of applications.

In some embodiments, the adjustment mechanism may include one or more piezoelectric motors. The one or more piezoelectric motors may include piezoelectric linear actuators, which may be coupled to the virtual or augmented reality eyepieces to cause inward or outward movement of the virtual or augmented reality eyepieces. To control movement of the virtual or augmented eyepieces, the piezoelectric motors may be electrically coupled to an electronic controller. The electronic controller may include a microcontroller and a piezoelectric motor driver to control and drive the piezoelectric motor. Moreover, the microcontroller may comprise a microprocessor, memory, and a plurality of peripheral devices to form a system on a chip that may be applicable for a wide variety of applications.

Moreover, the various embodiments described above can be combined to provide further embodiments. These and other changes can be made to the embodiments in light of the above-detailed description. In general, in the following claims, the terms used should not be construed to limit the claims to the specific embodiments disclosed in the specification and the claims, but should be construed to include all possible embodiments along with the full scope of equivalents to which such claims are entitled. Accordingly, the claims are not limited by the disclosure.

文章《MagicLeap Patent | Virtual or augmented reality headsets having adjustable interpupillary distance》首发于Nweon Patent

]]>
Snap Patent | System and method for dynamic images virtualisation https://patent.nweon.com/26749 Thu, 26 Jan 2023 15:16:35 +0000 https://patent.nweon.com/?p=26749 ...

文章《Snap Patent | System and method for dynamic images virtualisation》首发于Nweon Patent

]]>
Patent: System and method for dynamic images virtualisation

Patent PDF: 加入映维网会员获取

Publication Number: 20230022344

Publication Date: 2023-01-26

Assignee: Snap Inc

Abstract

A dynamic image virtualization system and method configured to utilize an AI model in order to conduct a reduced latency real-time prediction process upon at least one input image, wherein said prediction process is designated to create free-viewpoint 3D extrapolated output dynamic images tailored in advance to the preferences or needs of a user and comprising more visual data than the at least one input image.

Claims

1.A dynamic images virtualization system, comprising: (i) a controller configured to perform digital image processing by using an AI model trained to perform data fetching prediction upon at least one input image in order to produce extrapolated output dynamic images, wherein said at least on input image is generated by a static 2D computer generated imagery (CGI); and (ii) at least one display means configured to present said extrapolated output dynamic images to at least one user, wherein the at least one input image is generated offline and the extrapolated output dynamic images are free viewpoint 3D images and wherein said data fetching prediction process having a reduced-latency and results in the production of extrapolated output dynamic images that comprise novel images as well as novel multi directional and image scenery parameters in comparison with the at least one input image.

2.The system of claim 1, wherein at least one input image is subdivided into multiple image tiles.

3.The system of claim 1, wherein the reduced-latency prediction process is conducted using a content-delivery-network (CDN).

4.The system of claim 1, wherein the reduced latency prediction process is configured to produce extrapolated output dynamic images by calculating and generating subsequent future tiles that are based on the at least one input image, wherein said extrapolated output dynamic images comprise novel images as well as novel multi directional and image scenery parameters in comparison with the at least one input image.

5.The system of claim 4, wherein each tile includes an array of visual data.

6.The system of claim 5, wherein the array of visual data of each tile is compressed.

7.The system of claim 4, wherein each tile is a multi-resolution tile.

8.The system of claim 4, wherein each tile is a multi-view compressed tile.

9.The system of claim 4, wherein each tile is temporally compressed.

10.The system of claim 4, wherein each tile is combined with at least one other tile to create a larger tile comprising the visual data of said combined tiles.

11.The system of claim 4, wherein the extrapolated output dynamic images comprise an unrestricted stack of overlay layers and resolution pyramids.

12.The system of claim 1, wherein the extrapolated output dynamic images provide an input to the AI model that was trained to conduct image quality enhancement using DNN.

13.The system of claim 1, wherein further image quality enhancement is performed upon the extrapolated output dynamic images using Super resolution (SP) technique.

14.The system of claim 1, wherein the digital image processing performs streaming of object-centric volumetric content presented to the at least one user using the at least one display means.

15.The system of claim 1, wherein the digital image processing performs streaming of view-centric volumetric content presented to the at least one user using the at least one display means.

16.The system of claim 1, wherein the extrapolated output dynamic images are presented using unstructured light-field technology.

17.The system of claim 1, wherein the extrapolated output dynamic images are presented using billboard based quad rendering.

18.The system of claim 1, wherein the at least one input image is created and then displayed as extrapolated output dynamic images by using a view-dependent reconstruction of a virtual camera.

19.The system of claim 1, wherein the extrapolated output dynamic images display a virtualized architectural space or structure.

20.The system of claim 1, wherein the extrapolated output dynamic images display at least one virtualized visual effect.

21. 21.-44. (canceled)

Description

FIELD OF THE INVENTION

The present invention relates to image virtualization systems in general and in particular to a low-latency image virtualization system used to produce dynamic images and comprising a prediction ability.

BACKGROUND OF THE INVENTION

Images virtualization systems may be used for many purposes, for example, they may be used to visualize objects or surroundings from different perspectives or to provide an immersive sensation enabling a user to explore environments or objects of interest. In order to achieve these abilities, a visualization system preferably needs to provide a constant operation with a minimal latency while preferably using minimal computational requirements and resources. For example, a virtualization system that is configured to provide an immersive experience, for example, by using augmented reality (AR) or mixed reality (MR), is required to provide real-time monitoring of a user’s bearings with minimum response delay. These abilities are hard to reach in an uncontrolled network environment.

Known virtualization systems are challenged by limited computational resources and as a result, the visual quality of the 3D content displayed to a user of such a system is relatively poor comparing to the quality of feature films or computer games.

One reason for the above-mentioned difficulties is the fact that image files may be very large and can typically span from several megabytes to several gigabytes in size and as a result, may be impractical to distribute over a network with limited bandwidth. Even when there is no real-time requirement, the time used for transferring image files may be too long to be of a practical use.

Several approaches disclosed in prior art Publications and the drawbacks they pose are disclosed below:

Images cloud rendering—Cloud rendering may pose various drawbacks. One such drawback relates to the fact that attempts to offload rendering resources into cloud computing systems turned out to be sensitive to latency resulting from disruptions in network communication. The fact that costs associated with cloud computing grow linearly with increase in the number of customers consuming its content, makes the use of such systems challenging from a business model perspective. Procedural real-time rendering on customer device—Such an approach may be limited in its visual quality results due to limited local computing resources and may also require a long start-up time which has the potential to increase latency and affect a desired real-time operation. Point cloud streaming—This technology can only stream low visual quality images due to the fact that it supports Lambertian surfaces only. The scalability of this technology may be limited where large complex volumetric topology is involved.

As previously mentioned, several Publications disclose image virtualization systems. For example, Publication US 2006/0061584 A1 discloses a method, system and device for distributing in real time data related to three-dimensional computer modeled images scenes over a network. Said publication discloses the use of mipmap textures technology in order to reduce images size and efficiently render the data over the network.

Publication US 2006/0061584 A1 does not disclose applying a prediction process resulted in creating extrapolated output dynamic images that comprises more visual data than the input image/s. Moreover, Publication US 2006/0061584 A1 does not disclose a prediction process that creates the further visual data by using an AI model of any sort. The use of AI in the current application enables a prediction process that provides a reduced latency real-time prediction and, in turn, enables the creation of extrapolated output dynamic images tailored in advance to the preferences or needs of a user.

SUMMARY OF THE INVENTION

The present invention provides a dynamic images virtualization system that comprises low-latency virtualization abilities and can be used to produce and display enhanced-quality dynamic images comprising broadened visual data by using AI models.

The present invention uses AI models to conduct a low-latency prediction process in real-time while requiring relatively low computing resources.

The invention is further implemented by using AI in order to enhance the image quality of said extrapolated output dynamic images, hence providing an efficient rendering technique to compress and decode visual data while displaying real-time stream of high-quality images to a user.

The present invention suggests using a virtualization system to create extrapolated real-time output dynamic images tailored in advance to the preferences or needs of a user while requiring a modest amount of computing resources.

The following embodiments and aspects thereof are described and illustrated in conjunction with systems, devices and methods which are meant to be exemplary and illustrative, not limiting in scope. In various embodiments, one or more of the above-described problems have been reduced or eliminated, while other embodiments are directed to other advantages or improvements.

According to one aspect, there is provided a dynamic images virtualization system, comprising a controller configured to perform digital image processing upon at least one input image and produce extrapolated output dynamic images and at least one display means configured to present said extrapolated output dynamic images to at least one user.

According to some embodiments, said digital image processing comprises a reduced-latency prediction process that results in extrapolated output dynamic images comprising more visual data than the at least one input image.

According to some embodiments, at least one input image is subdivided into multiple image tiles.

According to some embodiments, an AI model is trained to perform a data fetching prediction in order to conduct the reduced-latency prediction process that produces the extrapolated output dynamic images.

According to some embodiments, the reduced-latency prediction process is conducted using a content-delivery-network (CDN).

According to some embodiments, the reduced latency prediction process is configured to produce extrapolated output dynamic images by calculating and suggesting subsequent future tiles that are based on the at least one input image and comprise further visual data than the at least one input image.

According to some embodiments, each tile includes an array of visual data that may be compressed.

According to some embodiments, each tile is a multi-resolution tile, a multi-view compressed tile or temporally compressed tile.

According to some embodiments, each tile is combined with at least one other tile to create a larger tile comprising the visual data of said combined tiles.

According to some embodiments, the extrapolated output dynamic images comprise an unlimited stack of overlay layers and resolution pyramids.

According to some embodiments, the extrapolated output dynamic images provide an input to an AI model that was trained to conduct image quality enhancement using DNN.

According to some embodiments, the image quality enhancement using SP technique.

According to some embodiments, the digital image processing performs streaming of object-centric volumetric content presented to the at least one user using the at least one display means.

According to some embodiments, the digital image processing performs streaming of view-centric volumetric content presented to the at least one user using the at least one display means.

According to some embodiments, the extrapolated output dynamic images are presented using unstructured light-field technology.

According to some embodiments, wherein the extrapolated output dynamic images are presented using billboard based quad rendering.

According to some embodiments, wherein the at least one input image is created and then displayed as extrapolated output dynamic images by using a view-dependent reconstruction of a virtual camera.

According to some embodiments, the at least one input image is captured using a hardware camera.

According to some embodiments, the at least one input image is created using computer generated imagery.

According to some embodiments, the at least one input image is a 2D image and the extrapolated output dynamic images are 3D images.

According to some embodiments, the extrapolated output dynamic images display a virtualized architectural space or structure.

According to some embodiments, the extrapolated output dynamic images display at least one virtualized visual effect.

According to some embodiments, the bearings of the at least one user are captured by at least one sensor and relayed to and analyzed by the controller.

According to some embodiments, the digital image processing uses multiple layers of caching.

According to some embodiments, the extrapolated output dynamic images can be relayed using a wireless network or a wired network.

According to some embodiments, the extrapolated output dynamic images are conveyed using remote streaming.

According to some embodiments, the at least one display means is a mobile cellular device or a head-mounted display (HMD).

According to some embodiments, wherein the processed input images are protected using authentication or verification algorithms.

According to a second aspect, there is provided a method for using a dynamic images virtualization system comprising the steps of capturing or creating at least one input image, applying compression upon the at least one input image, hence relatively reducing the size of each image tile, creating a data-set and its associated metadata, applying reduced-latency prediction based on the created data-set, applying decompression by restoring compressed image tiles and extracting encrypted data, creating extrapolated output dynamic images and presenting the extrapolated output dynamic images to a user.

According to some embodiments, the reduced latency prediction process is configured to produce extrapolated output dynamic images by calculating and suggesting subsequent future tiles that are based on the at least one input image and comprise further visual data than the at least one input image.

According to some embodiments, data regarding the bearings of the user is acquired and used during the dynamic image virtualization method.

According to some embodiments, Artificial intelligence (AI) techniques are used to process and analyze the captured input images.

According to some embodiments, compressed image tiles are distributed using content delivery network (CDN).

According to some embodiments, deep neural network (DNN) is applied in order to execute a fetching reduced-latency prediction process.

According to some embodiments, wherein a controlled access on demand process is used to regulate the rendering of image tiles undergoing fetching reduced-latency prediction process.

According to some embodiments, 3D images created after decompression of image tiles are converted into 2D extrapolated output dynamic images.

According to some embodiments, the extrapolated output dynamic images undergo quality enhancement processes performed by artificial intelligence (AI) trained model.

According to some embodiments, the extrapolated output dynamic images undergo image repair techniques in order to repair possible image defects.

According to a third aspect, there is provided a method for data processing using a dynamic images virtualization system comprising the steps of parsing metadata containing an array of statically defined data-structures, initializing the visual scene and camera, gathering data in order to present a user with tiles that represent the current position of the camera, extracting current and future subsequent probable tiles to be fetched and ultimately used for constructing extrapolated output dynamic images, updating texture atlases in accordance with extracted data, constructing extrapolated output dynamic images, applying image refinement techniques in order to improve the extrapolated output dynamic images presented to the user, predicting future positions of the camera using prediction techniques and gathering future tiles data based on future positions of the camera.

According to some embodiments, input images can be restored by creating extrapolated output images comprising unlimited stack of overlay layers and resolution pyramids.

According to some embodiments, each image tile comprises low frequency data.

According to some embodiments, each image tile is compressed using temporal compression.

According to some embodiments, input images are compressed using multi view compression.

According to a fourth aspect, there is provided a method for data compression using a dynamic images virtualization system comprising the steps of capturing or creating at least one input image, subdividing each captured input image into image tiles and applying compression techniques, hence relatively reducing the size of each image tile.

BRIEF DESCRIPTION OF THE FIGURES

Some embodiments of the invention are described herein with reference to the accompanying figures. The description, together with the figures, makes apparent to a person having ordinary skill in the art how some embodiments may be practiced. The figures are for the purpose of illustrative description and no attempt is made to show structural details of an embodiment in more detail than is necessary for a fundamental understanding of the invention. For the sake of clarity, some objects depicted in the figures are not to scale.

In the Figures:

FIG. 1 constitutes a schematic perspective view of a dynamic images virtualization system, according to some embodiments of the invention.

FIG. 2 constitutes a flowchart diagram illustrating a method for conducting a dynamic image virtualization using the dynamic image virtualization system, according to some embodiments of the invention.

FIG. 3 constitutes a flowchart diagram illustrating possible sub-operations previously disclosed in FIG. 2, according to some embodiments of the invention.

FIG. 4 constitutes a structure diagram illustrating various sub-operations of data-set structure of the various compression methods used during the operation of a dynamic images virtualization system, according to some embodiments of the invention.

FIG. 5 constitutes a flowchart diagram illustrating possible further sub-operations partly disclosed in FIG. 2 and FIG. 3, according to some embodiments of the invention.

DETAILED DESCRIPTION OF SOME EMBODIMENTS

In the following detailed description, numerous specific details are set forth in order to provide a thorough understanding of the invention. However, it will be understood by those skilled in the art that the present invention may be practiced without these specific details. In other instances, well-known methods, procedures, and components, modules, units and/or circuits have not been described in detail so as not to obscure the invention. Some features or elements described with respect to one embodiment may be combined with features or elements described with respect to other embodiments. For the sake of clarity, discussion of same or similar features or elements may not be repeated.

Although embodiments of the invention are not limited in this regard, discussions utilizing terms such as, for example, “processing,” “computing,” “calculating,” “determining,” “establishing”, “analyzing”, “checking”, “setting”, “receiving”, or the like, may refer to operation(s) and/or process(es) of a controller, a computer, a computing platform, a computing system, or other electronic computing device, that manipulates and/or transforms data represented as physical (e.g., electronic) quantities within the computer’s registers and/or memories into other data similarly represented as physical quantities within the computer’s registers and/or memories or other information non-transitory storage medium that may store instructions to perform operations and/or processes.

Unless explicitly stated, the method embodiments described herein are not constrained to a particular order or sequence. Additionally, some of the described method embodiments or elements thereof can occur or be performed simultaneously, at the same point in time, or concurrently.

The term “Controller”, as used herein, refers to any type of computing platform that may be provisioned with a memory device, a Central Processing Unit (CPU) or microprocessors, and several input/output (I/O) ports, for example, a general-purpose computer such as a personal computer, laptop, tablet, mobile cellular phone or a cloud computing system.

The term “Artificial Intelligence” or “AI”, as used herein, refers to any computer model that can mimic cognitive functions such as learning and problem-solving. AI can further include specific fields such as artificial neural networks (ANN) and deep neural networks (DNN) that are inspired by biological neural networks.

The term “Content-Delivery-Network” or “CDN”, as used herein, refers to a geographically distributed network of servers and their data centers wherein said distribution provides caching layer with low latency data access.

The term “Unstructured Light Fields”, as used herein, refers to a faithful reproduction of 3D scenes by densely sampling light rays from multiple directions in an unstructured manner.

The term “Billboard Based Quad Rendering”, as used herein, refers to a technique of rendering 2D textured quadrilateral elements in a 3D world.

The term “Volumetric Content”, as used herein, refers to a video technique that captures a three-dimensional images. This type of videography acquires data that can be viewed on ordinary screens as well as 3D and VR devices. The viewer may experience the volumetric content in real-time.

The term “Virtual Camera”, as used herein, refers to a computer-generated camera used to capture and present images in a virtual world. Virtual camera can capture and display objects or surroundings from multiple angles/distances as well as capture and display a user point of view (POV).

The term “Computer Generated Imagery (CGI)”, as used herein, refers to the application of computer graphics to create virtualized images wherein images created using CGI can be in any field such as, for example, art, media, computer games, simulations and marketing etc. The CGI may be either dynamic or static and may be comprised from 2D, 3D or higher dimensional images.

The term “Reduced Latency Prediction Process”, as used herein, refers to a process wherein probable image tiles are fetched and prepared to be presented to a user in accordance with a forecast based on calculating the likelihood that said tiles represent a future image of interest to said user. This process may result in reduced latency associated with image rendering.

The term “Extrapolated Output Dynamic Images”, as used herein, refers to constant flow of images that comprise an extended visual data with regard to the captured input images forming the base upon said extrapolated output images are fetched and relayed.

The term “Multi-View Compression” (MVC or MVC 3D), as used herein, refers to a compression method which is based on the similarity of images acquired from various viewpoints of a scene by multiple video cameras. For example, dynamic images (such as, for example, stereoscopic 3D video) that are captured simultaneously using multiple cameras that captures images from various angles and creating a single video stream, may be compressed using this technology. According to some embodiments, free viewpoint dynamic images or multi-view 3D video may also be compressed using this technology which results in images being efficiently reduced in size and rendered along the rendering pipeline.

The term “Temporal Compression”, as used herein, refers to compression of a sequence of image tiles along a timeline. For example, the temporal correlation that often exists between consecutive video frames and may display objects or image features moving from one location to another may be compressed using temporal tile compression in order to reduce overall size in bytes of the video frames as well as the time required for images to be rendered along the rendering pipeline.

Reference is made to FIG. 1, which constitutes a schematic perspective view of a dynamic images virtualization system 10 according to some embodiments of the invention. As shown, dynamic images virtualization system 10 comprises a controller 100 configured to execute a digital image processing and may control various devices forming the dynamic images virtualization system 10. According to some embodiments, at least one display means 200 is configured to display an extrapolated output dynamic images produced by the controller 100 to at least one user 20. According to some embodiments, controller 100 may be a separated device or may be integrated into or form a part of the display means 200. According to some embodiments, display means 200 comprises image capturing component 202 that can be, for example, a camera or any other kind of image capturing sensor.

According to some embodiments, display means 200 is a head-mounted display (HMD) configured to produce images to be perceived by the user 20 associated with it. According to some embodiments, display means 200 may be an off-the-shelf component such as, for example, a head mounted displays (HMD) of manufacturers such as HTC Oculus (e.g HTC Vive®, Oculus Rift®, Oculus Quest®, etc.), MagicLeap (e.g. MacigLeap One) or Microsoft (e.g. Hololens). According to some embodiments, display means 200 is an off-the shelf mobile cellular device, a laptop or a tablet configured to be held and viewed by the at least one user 20.

According to some embodiments, display means 200 may comprise various sensors 204, such as, for example, motion sensors, accelerometer, etc. and the data recorded by said sensors may be conveyed and relayed to the controller 100 for analysis.

According to some embodiments, both controller 100 and display means 200 comprise either wire or wireless communication means (not shown) enabling a constant data transfer from display means 200 to controller 100 and vice versa.

Reference is made to FIG. 2 which constitutes a flowchart diagram illustrating a method for conducting a dynamic image virtualization using the dynamic image virtualization system 10, according to some embodiments of the invention. In operation 302, the method may include capturing at least one input image. According to some embodiments, input image/s may be captured by a hardware sensor that can be, for example, a camera, or alternatively, the input image/s may be captured by a virtual camera. According to some embodiments, the captured input image/s may be created using computer generated imagery (CGI). According to some embodiments, the captured input images are protected by authentication and verification algorithms to ensure exposure to an authorized user 20 only.

In operation 304, the method may include compressing the captured input image/s using various compression techniques and protocols. According to some embodiments, the captured input image/s is subdivided into independent tiles, which are then loaded into the rendering pipeline and in turn conveyed to user 20. According to some embodiments, the size of each tile is relatively reduced and requires shorter time to be transferred over the network.

According to some embodiments, the captured input images that are subdivided into independent tiles may be compressed to create 2D/3D output images comprising an unlimited stack of overlay layers and resolution pyramids. According to some embodiments, each image may include an array of compressed visual data such as, for example, color data (RGB), depth channel, transparency, motion vectors, normal maps, reflection/refraction, vectors etc.

According to some embodiments, each tile may be compressed using various tile compression techniques and protocols, such as, quantization of YUV, ETC or DXT. According to some embodiments, Image tile compression using the aforementioned techniques may help in minimizing the size in bytes of a graphics file without degrading the quality of said image to an unacceptable level. Image tile compression may also reduce the time required for images to be rendered along the rendering pipeline.

According to some embodiments, each tile may be compressed using multi-view compression MVC (also known as MVC 3D), which is based on the similarity of images acquired from various viewpoints, for example, images acquired from a moving scene that changes along a timeline or from a stationary scene captured from various angles. According to some embodiments, each tile may also be compressed using temporal compression of a sequence of image tiles along a timeline.

In operation 306, the method may include creating a data-set and its associated metadata. The metadata may contain an array of statically defined data-structures which define capturing and rendering properties such as, for example, data-structure of visual data, scale of the dataset in real-world unites, available levels of details, proxy objects, resolution, compression, streaming parameters, deep neural network (DNN) weights for current dataset, etc. According to some embodiments, Iterative process of rendering may start from the moment spatial relations between the virtual or hardware camera and the data-set orientation are defined.

According to some embodiments, the bearings of user 20 may be captured by sensors 204, such as, motion sensors, accelerometer, etc. and the captured data may be conveyed and relayed to the controller 100 for analysis. According to some embodiments, said bearings analysis may comprise part of the data-set created on operation 306 and may be used in the execution of the prediction disclosed hereinafter in operation 308.

In operation 308, the method may include applying prediction based on the data-set created in operation 306. According to some embodiments, the prediction operation 308 may improve user experience as well as the exploitation of system’s resources by reducing the latency associated with a digital image processing.

According to some embodiments, the prediction process may apply a calculation regarding the likelihood that user 20 will be interested in viewing certain images’ properties, such as, certain angles or viewpoints, a 3D reconstruction of the captured input image/s, a fly-over or navigation view, visual effects or any other visual aspect that can be predicted upon the captured input image/s.

According to some embodiments, the prediction operation 308 may be conducted using data fetching prediction process, for example, the prediction process may prefetch resources and predictable data even before the user 20 decide or made any kind of operation implying what he is interested in viewing next. According to some embodiments, the data fetching prediction process can be accomplished by training a model using artificial intelligence (AI), such as, artificial neural network (ANN) or deep neural network (DNN), etc. and suggest probable tiles in accordance to the AI model results.

The use of AI can reduce latency by applying machine-learning in order to accurately predict the user 20 preferences and provide him with a real-time output dynamic images. While suggesting probable tiles, the prediction operation 308 enables the creation of extrapolated output dynamic images that comprises more visual data than the captured input image\s (further disclosed in operation 310).

According to some embodiments, the data fetching prediction process can be conducted using content-delivery-network (CDN). The use of CDN may reduce latency by providing local cache from distributed servers and applying optimization processes regarding data rendering.

According to some embodiments, a quality enhancement of the extrapolated output dynamic images operation 312 may be conducted using quality enhancement process accomplished by training a model using artificial intelligence (AI) such as, for example, artificial neural network (ANN) or deep neural network (DNN) and by using techniques such as DeepPrior or other super resolution solutions based on DNN with or without the usage of Generative Adversarial Networks (GAN) in accordance to the AI model results.

The use of AI can enhance the output dynamic images quality that may have been reduced as a result of compression operation 304 or as a result of other operations disclosed in FIG. 2. According to some embodiments, applying machine-learning to a certain region of an image tile or to an entire image tile may fix or improve the overall visual quality of the output dynamic images presented to user 20.

According to some embodiments, the compression 304 and decompression 309 operations used to produce the extrapolated output dynamic image can be accomplished by training a model using artificial intelligence (AI) such as, for example, artificial neural network (ANN) or deep neural network (DNN).

The use of AI, ML, ANN or DNN during the phase of compression by applying machine-learning analysis to the entire dataset or it’s parts may be used in order to compress or decompress tile data on the basis of common semantics identified during changing conditions. According to some embodiments, said semantics are obtainable by machine learning training in order to enable and operate a super resolution technique.

According to some embodiments, the prediction process may use the user 20 bearings’ data gathered from sensors 204 and analyzed by controller 100 in order to present extrapolated output dynamic images in accordance with the position and movement of user 20, for example, a motion sensor may sense that a user 20 is turning its head in a certain direction, and relay said reading to controller 100 which in turn, according to said sensed movement, may apply a calculation regarding the likelihood that the user is headed to or interested in viewing images from that particular direction. According to some embodiments, a data fetching prediction process may then by applied and result in presenting user 20 with probable tiles forming a desired extrapolated output dynamic images showing, for example, said particular direction.

According to some embodiments, the prediction operation 308 may further use cache memory in order to provide a quick access data resource which in turn contributes in reducing latency. According to some embodiments, a replicated or distributed multi-tiers cache memory architecture that comprises multiple layers may be used in order to further improve computing efficiency and reduce latency associated with a digital image processing.

In operation 309 a decompression process may apply in order to restore compressed image/s tiles and extract encrypted data. According to some embodiments, the compression 304 and decompression 309 operations of tiles used to produce the extrapolated output dynamic image can be accomplished by training a model using artificial intelligence (AI) such as, for example, artificial neural network (ANN) or deep neural network (DNN).

In operation 310, the method may include creating extrapolated output dynamic images that comprise more visual data than the input image/s captured on operation 302. According to some embodiments, the extrapolated output dynamic images are created in real-time, meaning, the user 20 experiences a constant dynamic sensation of movement in an environment, or constant dynamic sensation of viewing object\s from different perspectives. For example, user 20 may experience a real-time sensation of observing a commercial product in various angles\viewpoints\distances, alternatively, the user 20 may experience in real-time, a sensation of movement in a certain architectural structure or any kind of surroundings.

According to some embodiments, the reduced-latency prediction operation 308 enables the creation of extrapolated output dynamic images in operation 310 by predicting and suggesting subsequent probable tiles. According to some embodiments, a probable tile may be an image or a part of an image that the user is probably interested in seeing at the near future, such tile can be, for example, an image or a part of an image of another angle or view-point of an object or surroundings.

According to some embodiments, tile rendering process is applied in order to reduce the amount of memory and system resources needed to produce the extrapolated output dynamic images.

According to some embodiments, each tile may include an array of classified visual data that may contribute to an optimized efficiency in data locating and as a consequence, may contribute in reducing latency. According to some embodiments, the array of classified visual data forming each tile is compressed using various compression protocols in order to save processing resources. According to some embodiments, each tile is a multi-resolution tile.

According to some embodiments, the captured input image/s is a 2D image/s that, after going through operations 302309, is converted in operation 310 into 3D extrapolated output dynamic images that are presented to user 20. According to some embodiments, the captured input image/s is a 3D image/s that, after going through operations 302309, is converted in operation 310 into 2D extrapolated output dynamic images that are presented to user 20.

According to some embodiments, the extrapolated output dynamic images may display at least one virtualized visual effect. Such visual effect can be, for example, a virtual character used for presentation or any other purpose. Another possible implementation of a visual effect is applying a visual effect upon a real object or surrounding captured in an input image/s, for example, a real object can be decorated with virtualized visual effects such as smoke, sparkling light, accessories or any other visual effect according to changing need and desires of the user 20 or the operators of the dynamic images virtualization system 10.

According to some embodiments, each dynamic output image/s comprises a pyramid of resolutions produced by one of the following ways: Scalable video coding (SVC), Laplacian pyramid or any other multi-resolution approach.

In operation 312, the method may include presenting the extrapolated output dynamic images to a user 20. According to some embodiments, the extrapolated output dynamic images are presented to the user 20 by using a view-dependent reconstruction of a virtual camera, for example, the system may present the extrapolated output dynamic images to a user 20 from his own point of view, meaning the user 20 may view the extrapolated output dynamic images as if he is observing it using his own eyes.

According to some embodiments, the extrapolated output dynamic images are presented as a view-dependent reconstruction of a virtual camera allowing users 20 to freely change the camera’s perspective in a virtual world and observe an object or environment from different angels, distances etc.

According to some embodiments, the extrapolated output dynamic images are presented using unstructured light-field technology using projectors that may be used in order to capture light rays from various directions. According to some embodiments, image capturing means 202 may be configured to capture light-field images with no need for external projectors. According to some embodiments, the extrapolated output dynamic images are presented using billboard based quad rendering.

According to some embodiments, a free navigation mode may allow the user 20 to move a desired view point from one location to another, giving the impression that the camera is physically moving from one point to another. In yet another example, a fly-over perspective view may be achieved by providing an unlimited multiple upper viewpoint of certain object or surroundings. According to some embodiments, the aforementioned examples and many more may be achieved in real-time while user 20 experiences minimal latency by creating an extrapolated output dynamic images that comprise more visual data than the originally captured input image/s as disclosed in operation 310.

According to some embodiments, a remote streaming process is used to convey the input images created on operation 306 wherein said remote streaming can be performed using any known streaming protocol. According to some embodiments, the digital image processing performed by the controller 100 may include streaming of an object-centric volumetric content and present it to user 20 wherein the presented object can be any physical or virtual object. According to some embodiments, the digital image processing performed by controller 100 may apply streaming of a view-centric volumetric content presented to user 20 wherein the presented view can be any environment or surroundings, either outdoor or indoor, realistic or stylized, such as an architectural structure, landscape etc.

According to some embodiments, the streaming of input images can be relayed by either wire or wireless communication in accordance with various needs or constraints.

Reference is made to FIG. 3 which constitutes a flowchart diagram illustrating possible sub-operations of operations 306312 previously disclosed in FIG. 2 from an algorithmic point of view, according to some embodiments of the invention. In operation 402 the method may include downloading and parse the metadata created on operation 306 disclosed in FIG. 2. In operation 404 the method may include the setup of the visual scene that will be presented to user 20 through a reconstruction of a virtual camera as previously disclosed in operations 310312 of FIG. 2.

According to some embodiments, the visual scene may also be presented to user 20 through a hardware camera that captures a physical representation of an actual scene or object. In operation 406 the method may include gathering statistics and other valuable data in order to present user 20 with tiles that represent the current position of the virtual or hardware camera. In operation 408 the method may include extracting from the cache memory, current and future subsequent probable tiles to be fetched and presented to user 20 as part of the prediction process 308 previously disclosed in FIG. 2. In operation 410 the method may include updating the texture atlases in accordance with extracted data disclosed in operation 408. In operation 412 the method may include the construction of extrapolated output dynamic images previously disclosed in operation 310 of FIG. 2.

According to some embodiments, the extrapolated output dynamic images constructed in operation 412 may be 2D or 3D images. In operation 414 the method may include applying any sort of image refinement filter, technique or protocol in order to improve the extrapolated output dynamic images presented to user 20. In operation 416 the method may include a prediction of future positions of the virtual or hardware camera using the prediction stages and techniques previously disclosed in operation 308 of FIG. 2. In operation 418 the method may include gathering statistics and valuable data regarding the fetching and presentation of future predictable tiles based on an estimation of future camera positions.

According to some embodiments, such future predictable tiles may be an image or part of an image of a view/object-dependent or virtual/hardware camera that reflects the point of view of user 20. According to some embodiments, the statistics and other valuable data gathered in operation 418 may be relayed using feedback loop 420 in order provide feedback to the statistic gathering operation 406.

Reference is made to FIG. 4 which constitutes a structure diagram illustrating various sub-operations of data-set structure of the various compression and packaging methods which contribute to a reduced latency and used during the operation of the dynamic images virtualization system 10, according to some embodiments of the invention. A plurality of cameras, (for example, camera 1 to camera N) are configured to capture images that will eventually be presented as part of a frame sequence (for example, frames 1 to Z) along a timeline. According to some embodiments, said plurality of cameras can be either hardware of virtual cameras. According to some embodiments, each captured image is subdivided into independent tiles, for example, tiles 502a to 516c that can be later combined to form a larger image.

According to some embodiments, few compression techniques may be applied upon said tiles, as disclosed below:

a.) According to some embodiments, each captured image that has been subdivided into independent tiles can be restored by compressing said tiles in order to create output images comprising an unlimited stack of overlay layers and resolution pyramids, for example, tile 502a that may be a 10*10 pixel tile and tile 502b that may be 50*50 pixel tile may be compressed in order to eventually form an output image comprised from said tiles.

According to some embodiments, each tile 502a to 516c may further include an array of compressed visual data such as, for example, color data (RGB), depth bitmap, alpha transparency bitmap, motion vectors, normal maps, reflection/refraction bitmapetc. According to some embodiments, each tile 502a to 516c may be combined with other tiles to create a larger tile comprising visual data derived from several individual tiles. According to some embodiments, in order to reduce the amount of data being rendered, the tiles may comprise only low frequency data while an algorithm may be used to complete and compensate missing visual data and by that restoring the actual capture image.

b.) According to some embodiments, each tile 502a to 516c may be compressed using temporal compression, for example, tiles comprising a dynamic images sequence along a timeline such as the temporal correlation that exists between consecutive video frames and display objects or image features moving from one location to another, may be compressed using temporal tile compression in order to reduce their size in bytes as well as the time required for images to be rendered along the rendering pipeline.

c.) According to some embodiments, multi view compression (MVC or MVC 3D) may be applied by using multiple cameras 1 to N to simultaneously acquire various viewpoints of a scene. For example, tiles created by subdividing the input dynamic images captured simultaneously from various angles using multiple cameras 1 to N, may be compressed using this technology to create a single dynamic images stream. Due to the extensive raw bit rate of multi-view video, efficient compression techniques are essential in order that images will be efficiently rendered along the rendering pipeline. According to some embodiments, MVC compression may be conducted using artificial intelligence (AI) such as, for example, deep neural network (DNN) or any other AI model.

According to some embodiments, tiles subdivided from captured free viewpoint dynamic images or from multi-view 3D video may also be compressed using this technology. According to some embodiments, the aforementioned compression techniques may be combined with each other to achieve a greater extent of data compression.

Reference is made to FIG. 5 which constitutes a flowchart diagram illustrating possible sub-operations of operations 302312 previously disclosed in FIG. 2, according to some embodiments of the invention. In operation 602 the method may include acquiring input image/s captured by a camera, that can be, for example, a hardware camera or a virtual camera. According to some embodiments, the captured input image/s may also be created using computer generated imagery (CGI). According to some embodiments, during operation, 602 further data may be acquired, for example, a real-time monitoring of a user’s 20 bearings.

In operations 604 and 606 the acquired input image/s may be processed and analyzed by artificial intelligence (AI) such as, for example, deep neural network (DNN) in order to, according to some embodiments, subdivide said input image/s into multiple image tiles. According to some embodiments, said tiles may exhibit different resolutions and have different sizes, for example, a tile may be a 10*10 pixel, 50*50 pixel and so on and be in various sizes, from a few kilobytes to a few gigabytes.

In operation 608 the tiles produced in operation 606 may be compressed using various compression techniques and protocols. According to some embodiments, the size of each compressed tile is relatively reduced such that each tile requires a modest amount of computing resources to be rendered along the rendering pipeline. Said compression techniques may include pyramids of resolutions, temporal compression or multi view compression as detailed above. According to some embodiments, the aforementioned compression techniques may be combined to achieve a greater extent of data compression (As further detailed in the description of FIG. 4).

In operation 610 the compressed tiles may be stored, for example, in an available physical memory or in a remote server as part of cloud computing network. In operation 612 the compressed tiles may be distributed to a content delivery network (CDN). According to some embodiments, the use of CDN may reduce latency by providing local cache from distributed servers and applying optimization processes regarding data rendering. According to some embodiments, security measures, such as, a controlled access on demand process may be used in order to regulate the rendering of the compressed tiles to undergo operation 614. According to some embodiments, said compressed tiles may be protected by a verification algorithm to ensure exposure to an authorized user 20 only.

In operation 614 artificial intelligence (AI) such as, for example, deep neural network (DNN) may be applied in order to execute fetching prediction process that may prefetch resources and predictable data in order to create selected further tiles of operation 616. According to some embodiments, said fetching prediction process may be used to create extrapolated output dynamic Image/s that comprises an extended visual data with regard to the input images captured in operation 602. According to some embodiments, said fetching prediction process enables probable image tiles to be fetched and prepared to be presented to a user in accordance with a forecast based on calculating the likelihood that said tiles represent a future image that user 20 has interest in.

According to some embodiments, said fetching prediction process may result in reduced latency associated with image rendering. According to some embodiments, the device location and bearings may have an influence on said fetching prediction process. According to some embodiments, sensors, such as, for example, motion sensors, accelerometer, etc. may record the bearings of user 20 and said bearings analysis may be used in the execution of the said fetching prediction process.

In operation 618 a decompression process may apply in order to restore compressed tiles and extract encrypted data. In operation 620 3D dynamic images may be created from said decompressed tiles. In operation 622 said 3D dynamic images are processed to a 2D dynamic images. In operation 624 quality enhancement process may be accomplished in order to produce output dynamic images by training a model using artificial intelligence (AI) such as, for example, artificial neural network (ANN) or deep neural network (DNN) and in accordance to the AI model results. According to some embodiments, said quality enhancement process performed by a super resolution algorithm.

According to some embodiments and as mentioned above, quality enhancement of the extrapolated output dynamic images produced as a result of the methods described in FIG. 2 and FIG. 5 may be conducted using a quality enhancement process accomplished by training a model using artificial intelligence (AI) such as, for example, artificial neural network (ANN) or deep neural network (DNN) and in accordance to the AI model results.

According to some embodiments, the use of AI can enhance quality by applying various technologies upon the output dynamic images produced as a result of the methods described in FIG. 2 and FIG. 5. For example, Super Resolution or SR may be used for upscaling and/or improving the details of said output dynamic images. According to some embodiments, a low-resolution output dynamic images may be upscaled to a higher resolution using said AI model wherein the further details in the high-resolution output dynamic images are filled in where the details are essentially unknown. According to some embodiments, a mathematical function takes a low-resolution image that lacks details and apply a prediction of the missing details/features in said image, and by doing so, the mathematical function may produce details that potentially never recorded in the original input image, but nevertheless may serve in enhancing the image quality.

According to some embodiments, an image repair technique such as inpainting may be executed upon the output dynamic images produced as a result of the methods described in FIG. 2 and FIG. 5 in order to repair image defects by retouching to remove unwanted elements. According to some embodiments, training an inpainting AI model can be executed by cutting out sections of an image and train the AI model to replace the missing parts based on prior knowledge and a prediction process.

Although the present invention has been described with reference to specific embodiments, this description is not meant to be construed in a limited sense. Various modifications of the disclosed embodiments, as well as alternative embodiments of the invention will become apparent to persons skilled in the art upon reference to the description of the invention. It is, therefore, contemplated that the appended claims will cover such modifications that fall within the scope of the invention.

文章《Snap Patent | System and method for dynamic images virtualisation》首发于Nweon Patent

]]>
LG Patent | Method for signaling deblocking filter parameter information in video or image coding system https://patent.nweon.com/26755 Thu, 26 Jan 2023 15:16:21 +0000 https://patent.nweon.com/?p=26755 ...

文章《LG Patent | Method for signaling deblocking filter parameter information in video or image coding system》首发于Nweon Patent

]]>
Patent: Method for signaling deblocking filter parameter information in video or image coding system

Patent PDF: 加入映维网会员获取

Publication Number: 20230022350

Publication Date: 2023-01-26

Assignee: Lg Electronics Inc

Abstract

A method by which a decoding device decodes an image, according to the present document, comprises the steps of: obtaining image information from a bitstream; generating a reconstructed picture of the current picture; deriving deblocking filter parameter information related to deblocking filter parameters on the basis of the image information; and performing deblocking filtering on the reconstructed picture on the basis of the deblocking filter parameters so as to generate a modified reconstructed picture.

Claims

1. 115. (canceled)

16.An image decoding method performed by a decoding apparatus, the method comprising: obtaining image information including information related to a deblocking filter through a bitstream; generating a reconstructed picture of a current picture; and generating a modified reconstructed picture based on the information related to the deblocking filter and the reconstructed picture, wherein the information related to the deblocking filter includes a flag information specifying whether deblocking filter information is present in a Picture Header (PH), and deblocking filter parameter information, and wherein the flag information specifying whether the deblocking filter information is present in the PH is parsed prior to the deblocking filter parameter information.

17.The method of claim 16, wherein the image information further includes a Picture Parameter Set (PPS), and wherein the information related to the deblocking filter is included in the PPS.

18.The method of claim 17, wherein the information related to the deblocking filter further includes deblocking filter control present flag information related to whether deblocking filter control syntax elements are present in the PPS, wherein based on the deblocking filter control present flag information, deblocking filter disabled flag information and the flag information specifying whether the deblocking filter information is present in the PH are included in the information related to the deblocking filter, and wherein the deblocking filter disabled flag information specifies whether the deblocking filter is disabled.

19.The method of claim 18, the deblocking filter parameter information is parsed based on a value of the deblocking filter disabled flag information.

20.The method of claim 16, the deblocking filter parameter information includes deblocking filter parameter information for a luma sample and deblocking filter parameter information for a chroma sample.

21.The method of claim 16, wherein based on a value of the flag information specifying whether the deblocking filter information is present in the PH being equal to 1, the deblocking filter parameter information is present in the PH.

22.The method of claim 21, wherein the deblocking filter parameter information present in the PH includes deblocking filter parameter information for a luma sample and deblocking filter parameter information for a chroma sample.

23.The method of claim 16, wherein based on a value of the flag information specifying whether the deblocking filter information is present in the PH being equal to 0, the deblocking filter parameter information is present in a Slice Header (SH).

24.The method of claim 23, wherein the deblocking filter parameter information present in the SH includes deblocking filter parameter information for a luma sample and deblocking filter parameter information for a chroma sample.

25.The method of claim 16, wherein the step of generating the modified reconstructed picture comprises: deriving a deblocking filter parameter based on the deblocking filter parameter information, and applying the deblocking filter on the reconstructed picture based on the deblocking filter parameter.

26.An image encoding method performed by an encoding apparatus, the method comprising: generating a reconstructed picture of a current picture; generating a modified reconstructed picture based on a deblocking filter and the reconstructed picture, generating information related to the deblocking filter, encoding image information including the information related to the deblocking filter, wherein information related to the deblocking filter includes a flag information specifying whether deblocking filter information is present in a Picture Header (PH), and deblocking filter parameter information, and wherein the flag information specifying whether the deblocking filter information is present in the PH is signaled prior to the deblocking filter parameter information.

27.The method of claim 26, wherein the image information further includes a Picture Parameter Set (PPS), and wherein the information related to the deblocking filter is included in the PPS.

28.The method of claim 26, wherein based on a value of the flag information specifying whether the deblocking filter information is present in the PH being equal to 1, the deblocking filter parameter information is present in the PH, and wherein the deblocking filter parameter information present in the PH includes deblocking filter parameter information for a luma sample and deblocking filter parameter information for a chroma sample.

29.The method of claim 26, wherein based on a value of the flag information specifying whether the deblocking filter information is present in the PH being equal to 0, the deblocking filter parameter information is present in a Slice Header (SH), and wherein the deblocking filter parameter information present in the SH includes deblocking filter parameter information for a luma sample and deblocking filter parameter information for a chroma sample.

30.Non-transitory computer readable digital storage medium storing a bitstream generated by the method of claim 26.

31.A transmission method of data for an image, the transmission method comprising: obtaining a bitstream generated based on generating a reconstructed picture of a current picture, generating a modified reconstructed picture based on a deblocking filter and the reconstructed picture, generating information related to the deblocking filter, encoding image information including the information related to the deblocking filter to generate the bitstream; and transmitting the data comprising the bitstream, wherein information related to the deblocking filter includes a flag information specifying whether deblocking filter information is present in a Picture Header (PH), and deblocking filter parameter information, and wherein the flag information specifying whether the deblocking filter information is present in the PH is configured to be parsed prior to the deblocking filter parameter information.

Description

BACKGROUND OF THE DISCLOSUREField of the Disclosure

The present disclosure relates to video or image coding technology and, more particularly, to a method for signaling deblocking filter parameter information in a video or image coding system.

Related Art

The demands for high-resolution and high-quality images and video, such as an ultra-high definition (UHD) image and video of 4K or 8K or more, are recently increasing in various fields. As image and video data become high resolution and high quality, the amount of information or the number of bits that is relatively transmitted is increased compared to the existing image and video data. Accordingly, if image data is transmitted using a medium, such as the existing wired or wireless wideband line, or image and video data are stored using the existing storage medium, transmission costs and storage costs are increased.

Furthermore, interests and demands for immersive media, such as virtual reality (VR), artificial reality (AR) content or a hologram, are recently increasing. The broadcasting of an image and video having image characteristics different from those of real images, such as game images, is increasing.

Accordingly, there is a need for a high-efficiency image and video compression technology in order to effectively compress and transmit or store and playback information of high-resolution and high-quality images and video having such various characteristics.

SUMMARY OF THE DISCLOSURETechnical Objects

One embodiment of the present disclosure may provide a method and an apparatus for improving video/image coding efficiency.

One embodiment of the present disclosure may provide a method and an apparatus for improving video/image quality.

One embodiment of the present disclosure may provide a method and an apparatus for efficiently signaling deblocking filter parameter information.

Technical Solutions

According to one embodiment of the present disclosure, a video/image decoding method performed by a decoding apparatus is provided. The method may comprise obtaining image information from a bitstream, generating a reconstructed picture of a current picture, deriving deblocking filter parameter information related to deblocking filter parameters based on the image information, and generating a modified reconstructed picture by performing deblocking filtering on the reconstructed picture based on the deblocking filter parameters, wherein the image information may include a sequence parameter set (SPS), a picture parameter set (PPS), a picture header (PH), and a slice header (SH); at least one of the PPS or the SPS includes first flag information indicating whether the deblocking filtering is enabled and second flag information related to whether first deblocking filter parameter information exists in at least one of the PPS or the SPS; and the deblocking filter parameters may be derived based on the first flag information and the second flag information.

According to one embodiment of the present disclosure, a decoding apparatus for performing video/image decoding is provided. The apparatus may comprise an entropy decoder obtaining image information from a bitstream, an adder generating a reconstructed picture of the current picture, and a filter deriving deblocking filter parameter information related to deblocking filter parameters based on the image information and generating a modified reconstructed picture by performing deblocking filtering on the reconstructed picture based on the deblocking filter parameters, wherein the image information may include a sequence parameter set (SPS), a picture parameter set (PPS), a picture header (PH), and a slice header (SH); at least one of the PPS or the SPS includes first flag information indicating whether the deblocking filtering is enabled and second flag information related to whether first deblocking filter parameter information exists in at least one of the PPS or the SPS; and the deblocking filter parameters may be derived based on the first flag information and the second flag information.

According to one embodiment of the present disclosure, a video/image encoding method performed by an encoding apparatus is provided. The method may comprise generating a reconstructed picture of a current picture, deriving deblocking filter parameter information related to deblocking filter parameters for the reconstructed picture, generating a modified reconstructed picture by performing deblocking filtering on the reconstructed picture based on the deblocking filter parameters, and encoding image information including the deblocking filter parameter information, wherein the image information may include a sequence parameter set (SPS), a picture parameter set (PPS), a picture header (PH), and a slice header (SH); at least one of the PPS or the SPS includes first flag information indicating whether the deblocking filtering is enabled and second flag information related to whether first deblocking filter parameter information exists in at least one of the PPS or the SPS; and the deblocking filter parameters may be derived based on the first flag information and the second flag information.

According to one embodiment of the present disclosure, an encoding apparatus for performing video/image encoding is provided. The apparatus may comprise an adder generating a reconstructed picture of a current picture, a filter deriving deblocking filter parameter information related to deblocking filter parameters for the reconstructed picture and generating a modified reconstructed picture by performing deblocking filtering on the reconstructed picture based on the deblocking filter parameters, and an entropy encoder encoding image information including the deblocking filter parameter information, wherein the image information may include a sequence parameter set (SPS), a picture parameter set (PPS), a picture header (PH), and a slice header (SH); at least one of the PPS or the SPS includes first flag information indicating whether the deblocking filtering is enabled and second flag information related to whether first deblocking filter parameter information exists in at least one of the PPS or the SPS; and the deblocking filter parameters may be derived based on the first flag information and the second flag information.

According to one embodiment of the present disclosure, there is provided a computer-readable digital storage medium in which encoded video/image information generated according to the video/image encoding method disclosed in at least one of the embodiments of the present disclosure is stored.

According to an embodiment of the present disclosure, there is provided a computer-readable digital storage medium in which encoded information or encoded video/image information causing the decoding apparatus to perform the video/image decoding method disclosed in at least one of the embodiments of the present disclosure is stored.

Effects of the Disclosure

According to one embodiment of the present disclosure, the overall video/image coding efficiency may be improved.

According to one embodiment of the present disclosure, video/image quality may be improved.

According to one embodiment of the present disclosure, signaling of deblocking filter parameter information may be performed efficiently by efficiently arranging syntax elements based on a syntax structure related to deblocking filtering.

The technical effects of the present disclosure are not limited to the technical effects described above, and other technical effects not mentioned herein may be clearly understood by those skilled in the art to which the present disclosure belongs from the descriptions of the appended claims.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 schematically shows an example of a video/image coding system applicable to embodiments of the present disclosure.

FIG. 2 is a diagram schematically illustrating a configuration of a video/image encoding apparatus to which embodiments of the present disclosure are applicable.

FIG. 3 is a diagram schematically illustrating a configuration of a video/image decoding apparatus to which embodiments of the present disclosure are applicable.

FIG. 4 illustrates a layered structure of a coded video/image.

FIG. 5 illustrates a deblocking filtering-based video/image encoding method to which embodiments of the present disclosure may be applied, and FIG. 6 illustrates a filter within an encoding apparatus.

FIG. 7 illustrates a deblocking filtering-based video/image decoding method to which embodiments of the present disclosure may be applied, and FIG. 8 illustrates a filter within a decoding apparatus.

FIG. 9 illustrates one embodiment of a method for performing deblocking filtering.

FIG. 10 illustrates a conventional deblocking filtering process.

FIG. 11 illustrates a deblocking filtering process according to one embodiment of the present disclosure.

FIGS. 12 and 13 illustrate a deblocking filtering-based video/image encoding method according to an embodiment(s) of the present disclosure and one example of related components.

FIGS. 14 and 15 illustrate a deblocking filtering-based video/image encoding method according to an embodiment(s) of the present disclosure and one example of related components.

FIG. 16 illustrates an example of a contents streaming system to which the embodiments of the present disclosure may be applied.

DESCRIPTION OF EXEMPLARY EMBODIMENTS

This document may be modified in various ways and may have various embodiments, and specific embodiments will be illustrated in the drawings and described in detail. However, this does not intend to limit this document to the specific embodiments. Terms commonly used in this specification are used to describe a specific embodiment and is not used to limit the technical spirit of this document. An expression of the singular number includes plural expressions unless evidently expressed otherwise in the context. A term, such as “include” or “have” in this specification, should be understood to indicate the existence of a characteristic, number, step, operation, element, part, or a combination of them described in the specification and not to exclude the existence or the possibility of the addition of one or more other characteristics, numbers, steps, operations, elements, parts or a combination of them.

Meanwhile, elements in the drawings described in this document are independently illustrated for convenience of description related to different characteristic functions. This does not mean that each of the elements is implemented as separate hardware or separate software. For example, at least two of elements may be combined to form a single element, or a single element may be divided into a plurality of elements. An embodiment in which elements are combined and/or separated is also included in the scope of rights of this document unless it deviates from the essence of this document.

Hereinafter, preferred embodiments of this document are described more specifically with reference to the accompanying drawings. Hereinafter, in the drawings, the same reference numeral is used in the same element, and a redundant description of the same element may be omitted.

FIG. 1 schematically illustrates an example of a video/image coding system to which embodiments of this document may be applied.

Referring to FIG. 1, a video/image coding system may include a first device (a source device) and a second device (a receiving device). The source device may deliver encoded video/image information or data in the form of a file or streaming to the receiving device via a digital storage medium or network.

The source device may include a video source, an encoding apparatus, and a transmitter. The receiving device may include a receiver, a decoding apparatus, and a renderer. The encoding apparatus may be called a video/image encoding apparatus, and the decoding apparatus may be called a video/image decoding apparatus. The transmitter may be included in the encoding apparatus. The receiver may be included in the decoding apparatus. The renderer may include a display, and the display may be configured as a separate device or an external component.

The video source may acquire video/image through a process of capturing, synthesizing, or generating the video/image. The video source may include a video/image capture device and/or a video/image generating device. The video/image capture device may include, for example, one or more cameras, video/image archives including previously captured video/images, and the like. The video/image generating device may include, for example, computers, tablets and smartphones, and may (electronically) generate video/images. For example, a virtual video/image may be generated through a computer or the like. In this case, the video/image capturing process may be replaced by a process of generating related data.

The encoding apparatus may encode input video/image. The encoding apparatus may perform a series of procedures such as prediction, transform, and quantization for compression and coding efficiency. The encoded data (encoded video/image information) may be output in the form of a bitstream.

The transmitter may transmit the encoded image/image information or data output in the form of a bitstream to the receiver of the receiving device through a digital storage medium or a network in the form of a file or streaming. The digital storage medium may include various storage mediums such as USB, SD, CD, DVD, Blu-ray, HDD, SSD, and the like. The transmitter may include an element for generating a media file through a predetermined file format and may include an element for transmission through a broadcast/communication network. The receiver may receive/extract the bitstream and transmit the received bitstream to the decoding apparatus.

The decoding apparatus may decode the video/image by performing a series of procedures such as dequantization, inverse transform, and prediction corresponding to the operation of the encoding apparatus.

The renderer may render the decoded video/image. The rendered video/image may be displayed through the display.

This document relates to video/image coding. For example, the methods/embodiments disclosed in this document may be applied to a method disclosed in the versatile video coding (VVC). Further, the methods/embodiments disclosed in this document may be applied to a method disclosed in the essential video coding (EVC) standard, the AOMedia Video 1 (AV1) standard, the 2nd generation of audio video coding standard (AVS2), or the next generation video/image coding standard (ex. H.267 or H.268, etc.).

The present disclosure provides various embodiments related to video/image coding, and unless otherwise explicitly stated, the embodiments may b