雨果巴拉:行业北极星Vision Pro过度设计不适合市场

Magic Leap Patent | Methods and systems for creating virtual and augmented reality

Patent: Methods and systems for creating virtual and augmented reality

Publication Number: 10203762

Publication Date: 2019-02-12

Applicants: Magic Leap

Abstract

Configurations are disclosed for presenting virtual reality and augmented reality experiences to users. The system may comprise an image capturing device to capture one or more images, the one or more images corresponding to a field of the view of a user of a head-mounted augmented reality device, and a processor communicatively coupled to the image capturing device to extract a set of map points from the set of images, to identify a set of sparse points and a set of dense points from the extracted set of map points, and to perform a normalization on the set of map points.

Background

Modern computing and display technologies have facilitated the development of systems for so called “virtual reality” or “augmented reality” experiences, wherein digitally reproduced images or portions thereof are presented to a user in a manner wherein they seem to be, or may be perceived as, real. A virtual reality, or “VR”, scenario typically involves presentation of digital or virtual image information without transparency to other actual real-world visual input; an augmented reality, or “AR”, scenario typically involves presentation of digital or virtual image information as an augmentation to visualization of the actual world around the user. For example, an augmented reality scene may allow a user of AR technology may see one or more virtual objects super-imposed on or amidst real world objects (e.g., a real-world park-like setting featuring people, trees, buildings in the background, etc.).

The human visual perception system is very complex, and producing a VR or AR technology that facilitates a comfortable, natural-feeling, rich presentation of virtual image elements amongst other virtual or real-world imagery elements is challenging. Traditional stereoscopic wearable glasses generally feature two displays that are configured to display images with slightly different element presentation such that a three-dimensional perspective is perceived by the human visual system. Such configurations have been found to be uncomfortable for many users due to a mismatch between vergence and accommodation which may be overcome to perceive the images in three dimensions. Indeed, some users are not able to tolerate stereoscopic configurations.

Although a few optical configurations (e.g., head-mounted glasses) are available (e.g., GoogleGlass.RTM., Occulus Rift.RTM., etc.), none of these configurations is optimally suited for presenting a rich, binocular, three-dimensional augmented reality experience in a manner that will be comfortable and maximally useful to the user, in part because prior systems fail to address some of the fundamental aspects of the human perception system, including the photoreceptors of the retina and their interoperation with the brain to produce the perception of visualization to the user.

The human eye is an exceedingly complex organ, and typically comprises a cornea, an iris, a lens, macula, retina, and optic nerve pathways to the brain. The macula is the center of the retina, which is utilized to see moderate detail. At the center of the macula is a portion of the retina that is referred to as the “fovea”, which is utilized for seeing the finest details of a scene, and which contains more photoreceptors (approximately 120 cones per visual degree) than any other portion of the retina.

The human visual system is not a passive sensor type of system; it actively scans the environment. In a manner somewhat akin to use of a flatbed scanner to capture an image, or use of a finger to read Braille from a paper, the photoreceptors of the eye fire in response to changes in stimulation, rather than constantly responding to a constant state of stimulation. Thus, motion is required to present photoreceptor information to the brain.

Indeed, experiments with substances such as cobra venom, which has been utilized to paralyze the muscles of the eye, have shown that a human subject will experience blindness if positioned with eyes open, viewing a static scene with venom-induced paralysis of the eyes. In other words, without changes in stimulation, the photoreceptors do not provide input to the brain and blindness is experienced. It is believed that this is at least one reason that the eyes of normal humans have been observed to move back and forth, or dither, in side-to-side motion, also known as “microsaccades”.

As noted above, the fovea of the retina contains the greatest density of photoreceptors. While it is typically perceived that humans have high-resolution visualization capabilities throughout a field of view, in actuality humans only a small high-resolution center that is mechanically swept around almost constantly, along with a persistent memory of the high-resolution information recently captured with the fovea. In a somewhat similar manner, the focal distance control mechanism of the eye (e.g., ciliary muscles operatively coupled to the crystalline lens in a manner wherein ciliary relaxation causes taut ciliary connective fibers to flatten out the lens for more distant focal lengths; ciliary contraction causes loose ciliary connective fibers, which allow the lens to assume a more rounded geometry for more close-in focal lengths) dithers back and forth by approximately 1/4 to 1/2 diopter to cyclically induce a small amount of “dioptric blur” on both the close side and far side of the targeted focal length. This is utilized by the accommodation control circuits of the brain as cyclical negative feedback that helps to constantly correct course and keep the retinal image of a fixated object approximately in focus.

The visualization center of the brain also gains valuable perception information from the motion of both eyes and components thereof relative to each other. Vergence movements (e.g., rolling movements of the pupils toward or away from each other to converge the lines of sight of the eyes to fixate upon an object) of the two eyes relative to each other are closely associated with focusing (or “accommodation”) of the lenses of the eyes. Under normal conditions, changing the focus of the lenses of the eyes, or accommodating the eyes, to focus upon an object at a different distance will automatically cause a matching change in vergence to the same distance, under a relationship known as the “accommodation-vergence reflex.” Likewise, a change in vergence will trigger a matching change in accommodation, under normal conditions. Working against this reflex (as is the case with most conventional stereoscopic AR or VR configurations) is known to produce eye fatigue, headaches, or other forms of discomfort in users.

Movement of the head, which houses the eyes, also has a key impact upon visualization of objects. Humans tend to move their heads to visualize the world around them, and are often are in a fairly constant state of repositioning and reorienting the head relative to an object of interest. Further, most people prefer to move their heads when their eye gaze needs to move more than about 20 degrees off center to focus on a particular object (e.g., people do not typically like to look at things “from the corner of the eye”). Humans also typically scan or move their heads in relation to sounds–to improve audio signal capture and utilize the geometry of the ears relative to the head. The human visual system gains powerful depth cues from what is called “head motion parallax”, which is related to the relative motion of objects at different distances as a function of head motion and eye vergence distance. In other words, if a person moves his head from side to side and maintains fixation on an object, items farther out from that object will move in the same direction as the head, and items in front of that object will move opposite the head motion. These may be very salient cues for where objects are spatially located in the environment relative to the person. Head motion also is utilized to look around objects, of course.

Further, head and eye motion are coordinated with the “vestibulo-ocular reflex”, which stabilizes image information relative to the retina during head rotations, thus keeping the object image information approximately centered on the retina. In response to a head rotation, the eyes are reflexively and proportionately rotated in the opposite direction to maintain stable fixation on an object. As a result of this compensatory relationship, many humans can read a book while shaking their head back and forth. Interestingly, if the book is panned back and forth at the same speed with the head approximately stationary, the same generally is not true–the person is not likely to be able to read the moving book. The vestibulo-ocular reflex is one of head and eye motion coordination, and is generally not developed for hand motion. This paradigm may be important for AR systems, because head motions of the user may be associated relatively directly with eye motions, and an ideal system preferably will be ready to work with this relationship.

Indeed, given these various relationships, when placing digital content (e.g., 3-D content such as a virtual chandelier object presented to augment a real-world view of a room; or 2-D content such as a planar/flat virtual oil painting object presented to augment a real-world view of a room), design choices may be made to control behavior of the objects. For example, a 2-D oil painting object may be head-centric, in which case the object moves around along with the user’s head (e.g., as in a GoogleGlass.RTM. approach). In another example, an object may be world-centric, in which case it may be presented as though it is part of the real world coordinate system, such that the user may move his head or eyes without moving the position of the object relative to the real world.

Thus when placing virtual content into the augmented reality world presented with an AR system, choices are made as to whether the object should be presented as world centric, body-centric, head-centric or eye centric. In head-centric approaches, the virtual object stays in position in the real world so that the user may move his body, head, eyes around it without changing its position relative to the real world objects surrounding it, such as a real world wall. In body-centric approaches, a virtual element may be fixed relative to the user’s torso, so that the user can move his head or eyes without moving the object, but that is slaved to torso movements, In head centric approaches, the displayed object (and/or display itself) may be moved along with head movements, as described above in reference to GoogleGlass.RTM.). In eye-centric approaches, as in a “foveated display” configuration, as is described below, content is slewed around as a function of the eye position.

With world-centric configurations, it may be desirable to have inputs such as accurate head pose measurement, accurate representation and/or measurement of real world objects and geometries around the user, low-latency dynamic rendering in the augmented reality display as a function of head pose, and a generally low-latency display.

The U.S. patent applications listed above present systems and techniques to work with the visual configuration of a typical human to address various challenges in virtual reality and augmented reality applications. The design of these virtual reality and/or AR systems presents numerous challenges, including the speed of the system in delivering virtual content, quality of virtual content, eye relief of the user, size and portability of the system, and other system and optical challenges.

The systems and techniques described herein are configured to work with the visual configuration of the typical human to address these challenges.

Summary

Embodiments of the present invention are directed to devices, systems and methods for facilitating virtual reality and/or augmented reality interaction for one or more users. In one aspect, a system for displaying virtual content is disclosed.

In one aspect, an augmented reality system comprises an image capturing device to capture one or more images, the one or more images corresponding to a field of the view of a user of a head-mounted augmented reality device, and a processor communicatively coupled to the image capturing device to extract a set of map points from the set of images, to identify a set of sparse points and a set of dense points from the extracted set of map points, and to perform a normalization on the set of map points.

您可能还喜欢...