Intel Patent | Optimizing Head Mounted Displays For Augmented Reality
Patent: Optimizing Head Mounted Displays For Augmented Reality
Publication Number: 20200184728
Publication Date: 20200611
Applicants: Intel
Abstract
While many augmented reality systems provide “see-through” transparent or translucent displays upon which to project virtual objects, many virtual reality systems instead employ opaque, enclosed screens. Indeed, eliminating the user’s perception of the real-world may be integral to some successful virtual reality experiences. Thus, head mounted displays designed exclusively for virtual reality experiences may not be easily repurposed to capture significant portions of the augmented reality market. Various of the disclosed embodiments facilitate the repurposing of a virtual reality device for augmented reality use. Particularly, by anticipating user head motion, embodiments may facilitate scene renderings better aligned with user expectations than naive renderings generated within the enclosed field of view. In some embodiments, the system may use procedural mapping methods to generate a virtual model of the environment. The system may then use this model to supplement the anticipatory rendering.
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application claims the benefit of and priority to U.S. Provisional Patent Application No. 62/279,604 filed Jan. 15, 2016, as well as U.S. Provisional Patent Application No. 62/279,615 filed Jan. 15, 2016, each of which are incorporated by reference herein in their entireties for all purposes. This application also incorporates herein by reference in their entireties for all purposes U.S. Provisional Patent Application No. 62/080,400 filed Nov. 16, 2014, U.S. Provisional Patent Application No. 62/080,983 filed Nov. 17, 2014, U.S. Provisional Patent Application No. 62/121,486, filed Feb. 26, 2015, as well as U.S. Non-Provisional application Ser. No. 15/054,082 filed Feb. 25, 2016.
TECHNICAL FIELD
[0002] Various of the disclosed embodiments relate to optimizations and improvements for head mounted displays.
BACKGROUND
[0003] Head Mounted Displays (HMDs) are becoming increasingly popular for augmented reality (AR) and virtual reality (VR) applications. While many AR systems provide “see-through” transparent or translucent displays upon which to project virtual objects, many VR systems instead employ opaque, enclosed screens. These enclosed screens may completely obscure the user’s field of view of the real world. Indeed, eliminating the user’s perception of the real world may be integral to a successful VR experience.
[0004] HMDs designed exclusively for VR experiences may fail to capture significant portions of the AR market. For instance, despite possibly including functionality for capturing and presenting images of the user’s real-world field of view, VR headsets may still not readily lend themselves to being repurposed for AR applications. Accordingly, it may be desirable to allow users to repurpose a VR HMD for use as an AR device. Alternatively, one may simply wish to design an AR device that does not incorporate a transparent or translucent real-world field of view to the user. Such HMDs may already include a camera and/or pose estimation system as part of their original functionality, e.g., as described in U.S. Provisional Patent Application 62/080,400 and U.S. Provisional Patent Application 62/080,983. For example, an immersive VR experience may rely upon an inertial measurement unit (IMU), electromagnetic transponders, laser-based range-finder systems, depth-data based localization with a previously captured environment model, etc. to determine the location and orientation of the HMD, and consequently, the user’s head. Accordingly, the disclosed embodiments provide AR functionality for opaque, “non-see-through” HMDs (generally referred to as a VR HMD herein), which may include, e.g., an RGB or RGBD camera.
BRIEF DESCRIPTION OF THE DRAWINGS
[0005] Various of the disclosed embodiments may be better understood by referring to the following Detailed Description in conjunction with the accompanying drawings, in which like reference numerals indicate identical or functionally similar elements:
[0006] FIG. 1 is a conceptual diagram illustrating an overview of environment data capture, model creation, and model application as may occur in some embodiments;
[0007] FIG. 2 is an image of an example tablet device implementing a portion of an AR system as may be used in some embodiments;
[0008] FIG. 3 is a block diagram of various components appearing in a system as may be implemented in some embodiments;
[0009] FIG. 4 is a perspective view of example mapping and AR device as may be used in some embodiments;
[0010] FIG. 5 is a flow diagram generally depicting an overview of various steps in a mapping and tracking process as may be implemented in some embodiments;
[0011] FIG. 6 is a conceptual diagram illustrating a transform representation of a pose as may be used in some embodiments;
[0012] FIG. 7 is a conceptual block diagram of the relations between various concepts relevant to some embodiments;
[0013] FIG. 8 is a series of inputs, configurations, and outputs as may be applied to a Pose Search Algorithm (PSA) for Mapping, Standard Tracking, and Global Localization, as may occur in some embodiments;
[0014] FIG. 9 is a flow diagram generally depicting various steps in a Mapping process to create a model of an environment (e.g., a Truncated Signed Distance Function (TSDF)-based representation) as may be implemented in some embodiments;
[0015] FIG. 10 is a block diagram of a dynamic Bayesian network as may be used in accordance with some embodiments;
[0016] FIG. 11 is a flow diagram generally depicting a summary of an Estimation Maximization algorithm (e.g., for tracking) as may be implemented in some embodiments;
[0017] FIG. 12 is a graphical depiction of an example iterative convergence procedure during Estimation Maximization as may be applied in some embodiments;
[0018] FIG. 13 is a pseudocode listing reflecting one possible Estimation Maximization algorithm as may be implemented in some embodiments;
[0019] FIG. 14 is a graphical depiction of an example Scaling Series algorithm in a hypothetical two-dimensional universe to facilitate understanding of a higher-dimensional algorithm as may be implemented in some embodiments;
[0020] FIG. 15 is a flow diagram describing the operations of an example Scaling Series algorithm implemented in some embodiments;
[0021] FIG. 16 is a pseudocode listing reflecting one possible Scaling Series algorithm implementation as may be implemented in some embodiments;
[0022] FIG. 17 is an idealized two-dimensional representation of a Likelihood Field Integer (LFI) data structure corresponding to a higher-dimensional structure in some embodiments;
[0023] FIG. 18 is an idealized two-dimensional representation of a Likelihood Field Float (LFF) data structure corresponding to a higher-dimensional structure in some embodiments;
[0024] FIG. 19 is an example HMD configuration which may be used in some embodiments;
[0025] FIG. 20 is a view of a user wearing an HMD in a real-world environment as may occur in various embodiments;
[0026] FIG. 21 is a timing diagram of various operations in a rendering prediction process as may be performed in some embodiments;
[0027] FIG. 22 is a perspective view comparing a user’s real world change in pose with the projection upon a virtual camera within the HMD as may occur in some embodiments;
[0028] FIG. 23 is a perspective view illustrating the generation of a transformed predicted RGBD frame as may occur in some embodiments;
[0029] FIG. 24 is a flow diagram illustrating aspects of an example rendering prediction process as may occur in various embodiments;
[0030] FIG. 25 is a perspective view illustrating a border constraint as may be applied in some embodiments;
[0031] FIG. 26 is a plurality of field-of-view transformations performed by the user relative to objects in a real-world environment as may occur in various embodiments;
[0032] FIG. 27 is a an example orientation transformation illustrating the pixel/vertex skipping that may be applied by the system in some embodiments following pixel/vertex stretching;
[0033] FIG. 28 is a flow diagram depicting various example anticipatory rendering operations as may be performed in some embodiments;* and*
[0034] FIG. 29 is a block diagram of a computer system as may be used to implement features of some of the embodiments.
[0035] While the flow and sequence diagrams presented herein show an organization designed to make them more comprehensible by a human reader, those skilled in the art will appreciate that actual data structures used to store this information may differ from what is shown, in that they, for example, may be organized in a different manner; may contain more or less information than shown; may be compressed and/or encrypted; etc.
[0036] The headings provided herein are for convenience only and do not necessarily affect the scope or meaning of the embodiments. Further, the drawings have not necessarily been drawn to scale. For example, the dimensions of some of the elements in the figures may be expanded or reduced to help improve the understanding of the embodiments. Similarly, some components and/or operations may be separated into different blocks or combined into a single block for the purposes of discussion of some of the embodiments. Moreover, while the various embodiments are amenable to various modifications and alternative forms, specific embodiments have been shown by way of example in the drawings and are described in detail below. The intention, however, is not to limit the particular embodiments described. On the contrary, the embodiments are intended to cover all modifications, equivalents, and alternatives falling within the scope of the disclosed embodiments.
DETAILED DESCRIPTION
[0037] Various of the disclosed embodiments relate to optimizations and improvements for head-mounted displays. Some of the embodiments may be enabled by recently developed technology, e.g., the high fidelity and more efficient systems and methods presented in U.S. Provisional Patent Application No. 62/080,400 and U.S. Provisional Patent Application No. 62/080,983. Accurate mapping and localization may facilitate commercial and social interactions that would otherwise be unfeasible.
1.* Example AR System Overview–Example System Topoloav*
[0038] Various of the disclosed embodiments include systems and methods which provide or facilitate an augmented reality, and in some instances virtual reality, experiences. Augmented reality may include any application presenting both virtual and real-world objects in a user’s field of view as the user interacts with the real-world. For example, the user may hold a tablet, headpiece, head-mounted-display, or other device capable of capturing an image and presenting it on a screen, or capable of rendering an image in the user’s field of view (e.g., projecting images upon a transparency between the user and the real-world environment), projecting an image upon a user’s eyes (e.g., upon a contact lens), but more generally, in any situation wherein virtual images may be presented to a user in a real-world context. These virtual objects may exist persistently in space and time in a fashion analogous to real objects. For example, as the user scans a room, the object may reappear in the user’s field of view in a position and orientation similar to a real-world object.
[0039] FIG. 1 is a conceptual diagram illustrating an overview of environment data capture, model creation, and model application as may be relevant to some embodiments. Initially 100a, a user 110 may scan a capture device 105a (illustrated here as a device similar to that depicted in FIG. 4 and discussed in greater detail herein) about an environment 150. The capture device 105a may include a depth sensor and may additionally include a camera for capturing photographic images (e.g., some suitable devices for various embodiments include a Kinect.RTM. sensor, a Senz3D.RTM. sensor, ASUS Xtion PRO.RTM., etc.). Generally, a “camera” as referenced herein refers to a device able to capture depth and/or photographic images. As the user 110 moves the capture device 105a, the capture device 105a may acquire a plurality of depth frames 115a, 115b, 115c using the depth sensor. Each depth frame may provide depth values for each point in the capture device’s 105a field of view. This raw data may be recorded on the capture device 105a in a data log (including, e.g., depth, RGB, and IMU data) as the user walks through and/or scans the environment 150. The data log may be a file stored on the capture device 105a. The capture device 105a may capture both shape and color information into a form suitable for storage in the log. In some embodiments, the capture device 105a may transmit the captured data directly to a remote system 125 (e.g., a laptop computer, or server, or virtual server in the “cloud”, or multiple servers e.g. in the “cloud”) across a network 120 (though depicted here as communicating across a network, one will recognize that a portable memory, e.g., a USB memory stick, may also be used). In some embodiments, the data may be transmitted in lieu of local storage on the capture device 105a. Remote system 125 may be at the same location or a different location as user 110. An application running on the capture device 105a or on a remote system 125 in communication with the capture device 105a via a network 120 may integrate 160 the frames in the data log to form a three-dimensional internal model representation 130 (e.g., one or more vertex meshes represented here in a top-down view 100b). This integration, also referred to as “mapping” herein, may be performed on the capture device 105a or on the remote system 125 or on a combination of the two. The capture device 105a may also acquire a photographic image with each depth frame, e.g., to generate textures for the map as described herein.
[0040] An augmented reality (AR) device 105b (which may be the same as the capture device 105a) may then use 170 the model 130 in conjunction with incoming depth frame data to present an augmented reality experience 100c. For example, a user (perhaps the same user as user 110) may hold the AR device 105b in view of the environment 150. As real-time RGB images are captured of the environment 150 and displayed on the AR device 105b, the AR system may supplement the images with virtual elements (the real-time images may be converted to a textured mesh in some embodiments as described herein). For example, here a virtual piece of furniture 135 appears behind a real-world sofa. Similarly, a virtual character 140 is presented in the scene as though it were standing in the real-world environment (rotating the device to the right and downward may bring the character fully into view). The AR device 105b may have more than one camera (e.g. to provide a stereoscopic experience) and the AR system 105b may modify each separate camera image mutatis mutandis (though the capture device 105a, e.g., may have had only one camera).
[0041] The model 130 may also be used in a standalone capacity, e.g., for creating a virtual world mimicking the real-world environment, or for performing measurements of the real-world environment independent of any augmented reality application. Though depicted here in a home environment, one will recognize that the same systems and methods may be applied in other settings, e.g., an office or industrial environments, inside an animal body, etc.
[0042] In order to display virtual objects (such as virtual piece of furniture 135 and virtual character 140) faithfully to the user, some embodiments establish: (a) how the camera(s) on the AR device 105b are positioned with respect to the model 130, or object, or some static reference coordinate system (referred to herein as “world coordinates”). Some embodiments also establish (b) the 3D shape of the surroundings to perform various graphics processing applications, e.g., to properly depict occlusions (of virtual objects by real objects, or vice versa), to render shadows properly (e.g., as depicted for virtual piece of furniture 135 in FIG. 1), perform an Artificial Intelligence operation, etc. Problem (a) is also referred to as the camera localization or pose estimation, e.g., determining position and orientation of the camera in 3D space.
[0043] Various of the disclosed embodiments employ superior methods for resolving how the camera (eyes) are positioned with respect to the model or some static reference coordinate system (“world coordinates”). These embodiments provide superior accuracy of localization, which mitigate virtual object jitter and misplacement-undesirable artifacts that may destroy the illusion to the user of a virtual object being positioned in real space. Whereas prior art devices often rely exclusively on special markers to avoid these issues, those markers need to be embedded in the environment, and thus, are often cumbersome to use. Such markers may also restrict the scope of AR functions which may be performed.
[0044] In contrast to the previous AR solutions, many of the disclosed embodiments provide, e.g.: operation in real time; operation without user intervention; display of virtual objects in a correct location and without jitter; no modification of the environment or other cumbersome preparations; occlusions and shadows on-the-fly; presentation to a user in an easy-to-use package (e.g. smart phone, tablet, or goggles); can be produced at consumer-friendly prices; etc. One will recognize that some embodiments may present only some or none of these features.
[0045] As an example, FIG. 2 is a recreation of a photograph of an embodiment in operation, wherein a virtual television playing a home video is depicted atop a real-world piece of furniture in an AR device 205. The TV does not actually exist in the real-world, but a user viewing their surroundings with AR device 205, may not be able to distinguish between real and virtual objects around them.
[0046] FIG. 3 is a block diagram of various components appearing in a mapping and AR system as may be implemented in some embodiments (though the mapping and AR systems may exist separately in some embodiments). These operational components may consist of the following sub-systems: mapping 310; pose estimation/tracking 325; rendering 315; planning/interaction 330; networking/sensor communication 320; and calibration 335. Though depicted here as components of a single overall system 305, one will recognize that the subcomponents may be separated into separate computer systems (e.g., servers in a “cloud” network), processing functions, and/or devices. For example, one system may comprise a capture device. A second system may receive the depth frames and position information form the capture device and implement a mapping component 310 to generate a model. A third system may then implement the remaining components. One will readily recognize alternative divisions of functionality. Additionally, some embodiments are exclusive to the functions and/or structures associated with one or more modules.
[0047] Similarly, though tracking is discussed herein with reference to a user device to facilitate explanation, one will recognize that some embodiments may implement applications using data captured and processed using the disclosed techniques in alternate form factors. As just one example, depth or other sensors may be placed about a user’s house and a device for projecting images on a contact lens provided. Data captured using the disclosed techniques may then be used to produce an AR experience for the user by projecting the appropriate image onto the contact lens. Third party devices may capture the depth frames of a user’s environment for mapping, while the user’s personal device performs the AR functions. Accordingly, though components may be discussed together herein to facilitate understanding, one will understand that the described functionality may appear across different functional divisions and form factors.
2.* Example Combined Capture and Augmented Reality Device*
[0048] FIG. 4 is a perspective view of example mapping and application device 400 as may be used in some embodiments. Various embodiments may be implemented using consumer-grade off-the-shelf components. In some embodiments, the AR device consists of a tablet, to which an RGBD camera and optionally an IMU have been attached. As depicted, the example device comprises a tablet personal computer 405, with the panel opposite the display attached to a USB hub 410, RGBD camera 415, and an Inertial Measurement Unit (IMU) 420. Though the IMU 420 and camera 415 are here depicted as separate from the tablet’s 405 form factor, one will readily recognize variations wherein the IMU 420, camera 415, and tablet personal computer 405 comprise a single form factor. A touch-screen display 430 (not shown) may be provided on the opposing surface of the tablet. Though shown here separately from the display device, the camera and IMU may be available in embeddable form, and thus could be fitted inside a tablet in some embodiments. Similarly, where a headset display (e.g., a virtual or augmented reality system) is used, the depth-sensor, camera, and/or IMU may be integrated into the headset. Hence, the device can take on multiple forms, e.g., a tablet, a head-mounted system (AR/VR helmet or goggles), a stand-alone device, or a smart phone. Various of the disclosed embodiments, or aspects thereof, may be implemented in software, hardware, and/or firmware (e.g., a system on a chip, an FPGA, etc.).
……
……
……