Apple Patent | Methods for interacting with objects in an environment

编辑：映维 | 分类：Apple | 2022年7月21日

Patent: Methods for interacting with objects in an environment

Publication Number: 20220229524

Publication Date: 20220721

Applicant: Apple

Abstract

In some embodiments, an electronic device selectively performs operations in response to user inputs depending on whether the inputs are preceded by detecting a ready state. In some embodiments, an electronic device processes user inputs based on an attention zone associated with the user. In some embodiments, an electronic device enhances interactions with user interface elements at different distances and/or angles with respect to a gaze of a user. In some embodiments, an electronic device enhances interactions with user interface elements for mixed direct and indirect interaction modes. In some embodiments, an electronic device manages inputs from two of the user’s hands and/or presents visual indications of user inputs. In some embodiments, an electronic device enhances interactions with user interface elements in a three-dimensional environment using visual indications of such interactions. In some embodiments, an electronic device redirects a selection input from one user interface element to another.

Claims

A method comprising: at an electronic device in communication with a display generation component and one or more input devices: displaying, via the display generation component, a user interface that includes a user interface element; while displaying the user interface element, detecting, via the one or more input devices, an input from a predefined portion of a user of the electronic device; and in response to detecting the input from the predefined portion of the user of the electronic device: in accordance with a determination that a pose of the predefined portion of the user prior to detecting the input satisfies one or more criteria, performing a respective operation in accordance with the input from the predefined portion of the user of the electronic device; and in accordance with a determination that the pose of the predefined portion of the user prior to detecting the input does not satisfy the one or more criteria, forgoing performing the respective operation in accordance with the input from the predefined portion of the user of the electronic device.
The method of claim 1, further comprising: while the pose of the predefined portion of the user does not satisfy the one or more criteria, displaying the user interface element with a visual characteristic having a first value and displaying a second user interface element included in the user interface with the visual characteristic having a second value; and while the pose of the predefined portion of the user satisfies the one or more criteria, updating the visual characteristic of a user interface element toward which an input focus is directed, including: in accordance with a determination that that an input focus is directed to the user interface element, updating the user interface element to be displayed with the visual characteristic having a third value; and in accordance with a determination that the input focus is directed to the second user interface element, updating the second user interface element to be displayed with the visual characteristic having a fourth value.
The method of claim 2, wherein: the input focus is directed to the user interface element in accordance with a determination that the predefined portion of the user is within a threshold distance of a location corresponding to the user interface element, and the input focus is directed to the second user interface element in accordance with a determination that the predefined portion of the user is within the threshold distance of the second user interface element.
The method of claim 2, wherein: the input focus is directed to the user interface element in accordance with a determination that a gaze of the user is directed to the user interface element, and the input focus is directed to the second user interface element in accordance with a determination that the gaze of the user is directed to the second user interface element.
The method of claim 2, wherein updating the visual characteristic of a user interface element toward which an input focus is directed includes: in accordance with a determination that the predefined portion of the user is less than a threshold distance from a location corresponding to the user interface element, the visual characteristic of the user interface element toward which the input focus is directed is updated in accordance with a determination that the pose of the predefined portion of the user satisfies a first set of one or more criteria; and in accordance with a determination that the predefined portion of the user is more than the threshold distance from the location corresponding to the user interface element, the visual characteristic of the user interface element toward which the input focus is directed is updated in accordance with a determination that the pose of the predefined portion of the user satisfies a second set of one or more criteria, different from the first set of one or more criteria.
The method of claim 1, wherein the pose of the predefined portion of the user satisfying the one or more criteria includes: in accordance with a determination that the predefined portion of the user is less than a threshold distance from a location corresponding to the user interface element, the pose of the predefined portion of the user satisfying a first set of one or more criteria; and in accordance with a determination that the predefined portion of the user is more than the threshold distance from the location corresponding to the user interface element, the pose of the predefined portion of the user satisfying a second set of one or more criteria, different from the first set of one or more criteria.
The method of claim 1, wherein the pose of the predefined portion of the user satisfying the one or more criteria includes: in accordance with a determination that the predefined portion of the user is holding an input device of the one or more input devices, the pose of the predefined portion of the user satisfying a first set of one or more criteria, and in accordance with a determination that the predefined portion of the user is not holding the input device, the pose of the predefined portion of the user satisfying a second set of one or more criteria.
The method of claim 1, wherein the pose of the predefined portion of the user satisfying the one or more criteria includes: in accordance with a determination that the predefined portion of the user is less than a threshold distance from a location corresponding to the user interface element, the pose of the predefined portion of the user satisfying a first set of one or more criteria; and in accordance with a determination that the predefined portion of the user is more than the threshold distance from the location corresponding to the user interface element, the pose of the predefined portion of the user satisfying the first set of one or more criteria.
The method of claim 1, wherein: in accordance with a determination that the predefined portion of the user, during the input, is more than a threshold distance away from a location corresponding to the user interface element, the one or more criteria include a criterion that is satisfied when an attention of the user is directed towards the user interface element, and in accordance with a determination that the predefined portion of the user, during the respective input, is less than the threshold distance away from the location corresponding to the user interface element, the one or more criteria do not include a requirement that the attention of the user is directed towards the user interface element in order for the one or more criteria to be met.
The method of claim 1, further comprising: in response to detecting that a gaze of the user is directed to a first region of the user interface, visually de-emphasizing, via the display generation component, a second region of the user interface relative to the first region of the user interface; and in response to detecting that the gaze of the user is directed to the second region of the user interface, visually de-emphasizing, via the display generation component, the first region of the user interface relative to the second region of the user interface.
The method of claim 10, wherein the user interface is accessible by the electronic device and a second electronic device, the method further comprising: in accordance with an indication that a gaze of a second user of the second electronic device is directed to the first region of the user interface, forgoing visually de-emphasizing, via the display generation component, the second region of the user interface relative to the first region of the user interface; and in accordance with an indication that the gaze of the second user of the second electronic device is directed to the second region of the user interface, forgoing visually de-emphasizing, via the display generation component, the first region of the user interface relative to the second region of the user interface.
The method of claim 1, wherein detecting the input from the predefined portion of the user of the electronic device includes detecting, via a hand tracking device, a pinch gesture performed by the predefined portion of the user.
The method of claim 1, wherein detecting the input from the predefined portion of the user of the electronic device includes detecting, via a hand tracking device, a press gesture performed by the predefined portion of the user.
The method of claim 1, wherein detecting the input from the predefined portion of the user of the electronic device includes detecting lateral movement of the predefined portion of the user relative to a location corresponding to the user interface element.
The method of claim 1, further comprising: prior to determining that the pose of the predefined portion of the user prior to detecting the input satisfies the one or more criteria: detecting, via an eye tracking device, that a gaze of the user is directed to the user interface element; and in response to detecting, that the gaze of the user is directed to the user interface element, displaying, via the display generation component, a first indication that the gaze of the user is directed to the user interface element.
The method of claim 15, further comprising: prior to detecting the input from the predefined portion of the user of the electronic device, while the pose of the predefined portion of the user prior to detecting the input satisfies the one or more criteria: displaying, via the display generation component, a second indication that the pose of the predefined portion of the user prior to detecting the input satisfies the one or more criteria, wherein the first indication is different from the second indication.
The method of claim 1, further comprising: while displaying the user interface element, detecting, via the one or more input devices, a second input from a second predefined portion of the user of the electronic device; and in response to detecting the second input from the second predefined portion of the user of the electronic device: in accordance with a determination that a pose of the second predefined portion of the user prior to detecting the second input satisfies one or more second criteria, performing a second respective operation in accordance with the second input from the second predefined portion of the user of the electronic device; and in accordance with a determination that the pose of the second predefined portion of the user prior to detecting the second input does not satisfy the one or more second criteria, forgoing performing the second respective operation in accordance with the second input from the second predefined portion of the user of the electronic device.
The method of claim 1, wherein the user interface is accessible by the electronic device and a second electronic device, the method further comprising: prior to detecting that the pose of the predefined portion of the user prior to detecting the input satisfies the one or more criteria, displaying the user interface element with a visual characteristic having a first value; while the pose of the predefined portion of the user prior to detecting the input satisfies the one or more criteria, displaying the user interface element with the visual characteristic having a second value, different from the first value; and while a pose of a predefined portion of a second user of the second electronic device satisfies the one or more criteria while displaying the user interface element with the visual characteristic having the first value, maintaining display of the user interface element with the visual characteristic having the first value.
The method of claim 18, further comprising: in response to detecting the input from the predefined portion of the user of the electronic device, displaying the user interface element with the visual characteristic having a third value; and in response to an indication of an input from the predefined portion of the second user of the second electronic device, displaying the user interface element with the visual characteristic having the third value.
An electronic device, comprising: one or more processors; memory; and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for: displaying, via a display generation component, a user interface that includes a user interface element; while displaying the user interface element, detecting, via one or more input devices, an input from a predefined portion of a user of the electronic device; and in response to detecting the input from the predefined portion of the user of the electronic device: in accordance with a determination that a pose of the predefined portion of the user prior to detecting the input satisfies one or more criteria, performing a respective operation in accordance with the input from the predefined portion of the user of the electronic device; and in accordance with a determination that the pose of the predefined portion of the user prior to detecting the input does not satisfy the one or more criteria, forgoing performing the respective operation in accordance with the input from the predefined portion of the user of the electronic device.
A non-transitory computer readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by one or more processors of an electronic device, cause the electronic device to perform a method comprising: displaying, via a display generation component, a user interface that includes a user interface element; while displaying the user interface element, detecting, via one or more input devices, an input from a predefined portion of a user of the electronic device; and in response to detecting the input from the predefined portion of the user of the electronic device: in accordance with a determination that a pose of the predefined portion of the user prior to detecting the input satisfies one or more criteria, performing a respective operation in accordance with the input from the predefined portion of the user of the electronic device; and in accordance with a determination that the pose of the predefined portion of the user prior to detecting the input does not satisfy the one or more criteria, forgoing performing the respective operation in accordance with the input from the predefined portion of the user of the electronic device.

22-200. (canceled)

Description

CROSS-REFERENCE TO RELATED APPLICATIONS

[0001] This application claims the benefit of U.S. Provisional Application No. 63/139,566, filed Jan. 20, 2021, and U.S. Provisional Application No. 63/261,559, filed Sep. 23, 2021, the contents of which are incorporated herein by reference in their entireties for all purposes.

TECHNICAL FIELD

[0002] This relates generally to computer systems with a display generation component and one or more input devices that present graphical user interfaces, including but not limited to electronic devices that present interactive user interface elements via the display generation component.

BACKGROUND

[0003] The development of computer systems for augmented reality has increased significantly in recent years. Example augmented reality environments include at least some virtual elements that replace or augment the physical world. Input devices, such as cameras, controllers, joysticks, touch-sensitive surfaces, and touch-screen displays for computer systems and other electronic computing devices are used to interact with virtual/augmented reality environments. Example virtual elements include virtual objects include digital images, video, text, icons, and control elements such as buttons and other graphics.

[0004] But methods and interfaces for interacting with environments that include at least some virtual elements (e.g., applications, augmented reality environments, mixed reality environments, and virtual reality environments) are cumbersome, inefficient, and limited. For example, systems that provide insufficient feedback for performing actions associated with virtual objects, systems that require a series of inputs to achieve a desired outcome in an augmented reality environment, and systems in which manipulation of virtual objects are complex, tedious and error-prone, create a significant cognitive burden on a user, and detract from the experience with the virtual/augmented reality environment. In addition, these methods take longer than necessary, thereby wasting energy. This latter consideration is particularly important in battery-operated devices.

SUMMARY

[0005] Accordingly, there is a need for computer systems with improved methods and interfaces for providing computer generated experiences to users that make interaction with the computer systems more efficient and intuitive for a user. Such methods and interfaces optionally complement or replace conventional methods for providing computer generated reality experiences to users. Such methods and interfaces reduce the number, extent, and/or nature of the inputs from a user by helping the user to understand the connection between provided inputs and device responses to the inputs, thereby creating a more efficient human-machine interface.

[0006] The above deficiencies and other problems associated with user interfaces for computer systems with a display generation component and one or more input devices are reduced or eliminated by the disclosed systems. In some embodiments, the computer system is a desktop computer with an associated display. In some embodiments, the computer system is portable device (e.g., a notebook computer, tablet computer, or handheld device). In some embodiments, the computer system is a personal electronic device (e.g., a wearable electronic device, such as a watch, or a head-mounted device). In some embodiments, the computer system has a touchpad. In some embodiments, the computer system has one or more cameras. In some embodiments, the computer system has a touch-sensitive display (also known as a “touch screen” or “touch-screen display”). In some embodiments, the computer system has one or more eye-tracking components. In some embodiments, the computer system has one or more hand-tracking components. In some embodiments, the computer system has one or more output devices in addition to the display generation component, the output devices including one or more tactile output generators and one or more audio output devices. In some embodiments, the computer system has a graphical user interface (GUI), one or more processors, memory and one or more modules, programs or sets of instructions stored in the memory for performing multiple functions. In some embodiments, the user interacts with the GUI through stylus and/or finger contacts and gestures on the touch-sensitive surface, movement of the user’s eyes and hand in space relative to the GUI or the user’s body as captured by cameras and other movement sensors, and voice inputs as captured by one or more audio input devices. In some embodiments, the functions performed through the interactions optionally include image editing, drawing, presenting, word processing, spreadsheet making, game playing, telephoning, video conferencing, e-mailing, instant messaging, workout support, digital photographing, digital videoing, web browsing, digital music playing, note taking, and/or digital video playing. Executable instructions for performing these functions are, optionally, included in a non-transitory computer readable storage medium or other computer program product configured for execution by one or more processors.

[0007] There is a need for electronic devices with improved methods and interfaces for interacting with objects in a three-dimensional environment. Such methods and interfaces may complement or replace conventional methods for interacting with objects in a three-dimensional environment. Such methods and interfaces reduce the number, extent, and/or the nature of the inputs from a user and produce a more efficient human-machine interface.

[0008] In some embodiments, an electronic device performs or does not perform an operation in response to a user input depending on whether the user input is preceded by detecting a ready state of the user. In some embodiments, an electronic device processes user inputs based on an attention zone associated with the user. In some embodiments, an electronic device enhances interactions with user interface elements at different distances and/or angles with respect to a gaze of a user in a three-dimensional environment. In some embodiments, an electronic device enhances interactions with user interface elements for mixed direct and indirect interaction modes. In some embodiments, an electronic device manages inputs from two of the user’s hands. In some embodiments, an electronic device presents visual indications of user inputs. In some embodiments, an electronic device enhances interactions with user interface elements in a three-dimensional environment using visual indications of such interactions. In some embodiments, an electronic device redirects an input from one user interface element to another in accordance with movement included in the input.

[0009] Note that the various embodiments described above can be combined with any other embodiments described herein. The features and advantages described in the specification are not all inclusive and, in particular, many additional features and advantages will be apparent to one of ordinary skill in the art in view of the drawings, specification, and claims. Moreover, it should be noted that the language used in the specification has been principally selected for readability and instructional purposes, and may not have been selected to delineate or circumscribe the inventive subject matter.

BRIEF DESCRIPTION OF THE DRAWINGS

[0010] For a better understanding of the various described embodiments, reference should be made to the Description of Embodiments below, in conjunction with the following drawings in which like reference numerals refer to corresponding parts throughout the figures.

[0011] FIG. 1 is a block diagram illustrating an operating environment of a computer system for providing CGR experiences in accordance with some embodiments.

[0012] FIG. 2 is a block diagram illustrating a controller of a computer system that is configured to manage and coordinate a CGR experience for the user in accordance with some embodiments.

[0013] FIG. 3 is a block diagram illustrating a display generation component of a computer system that is configured to provide a visual component of the CGR experience to the user in accordance with some embodiments.

[0014] FIG. 4 is a block diagram illustrating a hand tracking unit of a computer system that is configured to capture gesture inputs of the user in accordance with some embodiments.

[0015] FIG. 5 is a block diagram illustrating an eye tracking unit of a computer system that is configured to capture gaze inputs of the user in accordance with some embodiments.

[0016] FIG. 6A is a flowchart illustrating a glint-assisted gaze tracking pipeline in accordance with some embodiments.

[0017] FIG. 6B illustrates an exemplary environment of an electronic device providing a CGR experience in accordance with some embodiments.

[0018] FIGS. 7A-7C illustrate exemplary ways in which electronic devices perform or do not perform an operation in response to a user input depending on whether the user input is preceded by detecting a ready state of the user in accordance with some embodiments.

[0019] FIGS. 8A-8K is a flowchart illustrating a method of performing or not performing an operation in response to a user input depending on whether the user input is preceded by detecting a ready state of the user in accordance with some embodiments.

[0020] FIGS. 9A-9C illustrate exemplary ways in which an electronic device processes user inputs based on an attention zone associated with the user in accordance with some embodiments.

[0021] FIGS. 10A-10H is a flowchart illustrating a method of processing user inputs based on an attention zone associated with the user in accordance with some embodiments.

[0022] FIGS. 11A-11C illustrate examples of how an electronic device enhances interactions with user interface elements at different distances and/or angles with respect to a gaze of a user in a three-dimensional environment in accordance with some embodiments.

[0023] FIGS. 12A-12F is a flowchart illustrating a method of enhancing interactions with user interface elements at different distances and/or angles with respect to a gaze of a user in a three-dimensional environment in accordance with some embodiments.

[0024] FIGS. 13A-13C illustrate examples of how an electronic device enhances interactions with user interface elements for mixed direct and indirect interaction modes in accordance with some embodiments.

[0025] FIGS. 14A-14H is a flowchart illustrating a method of enhancing interactions with user interface elements for mixed direct and indirect interaction modes in accordance with some embodiments.

[0026] FIGS. 15A-15E illustrate exemplary ways in which an electronic device manages inputs from two of the user’s hands according to some embodiments.

[0027] FIGS. 16A-16I is a flowchart illustrating a method of managing inputs from two of the user’s hands according to some embodiments.

[0028] FIGS. 17A-17E illustrate various ways in which an electronic device presents visual indications of user inputs according to some embodiments.

[0029] FIGS. 18A-180 is a flowchart illustrating a method of presenting visual indications of user inputs according to some embodiments.

[0030] FIGS. 19A-19D illustrate examples of how an electronic device enhances interactions with user interface elements in a three-dimensional environment using visual indications of such interactions in accordance with some embodiments.

[0031] FIGS. 20A-20F is a flowchart illustrating a method of enhancing interactions with user interface elements in a three-dimensional environment using visual indications of such interactions in accordance with some embodiments.

[0032] FIGS. 21A-21E illustrate examples of how an electronic device redirects an input from one user interface element to another in response to detecting movement included in the input in accordance with some embodiments.

[0033] FIGS. 22A-22K is a flowchart illustrating a method of redirecting an input from one user interface element to another in response to detecting movement included in the input in accordance with some embodiments.

DESCRIPTION OF EMBODIMENTS

[0034] The present disclosure relates to user interfaces for providing a computer generated reality (CGR) experience to a user, in accordance with some embodiments.

[0035] The systems, methods, and GUIs described herein provide improved ways for an electronic device to interact with and manipulate objects in a three-dimensional environment. The three-dimensional environment optionally includes one or more virtual objects, one or more representations of real objects (e.g., displayed as photorealistic (e.g., “pass-through”) representations of the real objects or visible to the user through a transparent portion of the display generation component) that are in the physical environment of the electronic device, and/or representations of users in the three-dimensional environment.

[0036] In some embodiments, an electronic device automatically updates the orientation of a virtual object in a three-dimensional environment based on a viewpoint of a user in the three-dimensional environment. In some embodiments, the electronic device moves the virtual object in accordance with a user input and, in response to termination of the user input, displays the object at an updated location. In some embodiments, the electronic device automatically updates the orientation of the virtual object at the updated location (e.g., and/or as the virtual object moves to the updated location) so that the virtual object is oriented towards a viewpoint of the user in the three-dimensional environment (e.g., throughout and/or at the end of its movement). Automatically updating the orientation of the virtual object in the three-dimensional environment enables the user to view and interact with the virtual object more naturally and efficiently, without requiring the user to adjust the orientation of the object manually.

[0037] In some embodiments, an electronic device automatically updates the orientation of a virtual object in a three-dimensional environment based on viewpoints of a plurality of users in the three-dimensional environment. In some embodiments, the electronic device moves the virtual object in accordance with a user input and, in response to termination of the user input, displays the object at an updated location. In some embodiments, the electronic device automatically updates the orientation of the virtual object at the updated location (e.g., and/or as the virtual object moves to the updated location) so that the virtual object is oriented towards viewpoints of a plurality of users in the three-dimensional environment (e.g., throughout and/or at the end of its movement). Automatically updating the orientation of the virtual object in the three-dimensional environment enables the users to view and interact with the virtual object more naturally and efficiently, without requiring the users to adjust the orientation of the object manually.

[0038] In some embodiments, the electronic device modifies an appearance of a real object that is between a virtual object and the viewpoint of a user in a three-dimensional environment. The electronic device optionally blurs, darkens, or otherwise modifies a portion of a real object (e.g., displayed as a photorealistic (e.g., “pass-through”) representation of the real object or visible to the user through a transparent portion of the display generation component) that is in between a viewpoint of a user and a virtual object in the three-dimensional environment. In some embodiments, the electronic device modifies a portion of the real object that is within a threshold distance (e.g., 5, 10, 30, 50, 100, etc. centimeters) of a boundary of the virtual object without modifying a portion of the real object that is more than the threshold distance from the boundary of the virtual object. Modifying the appearance of the real object allows the user to more naturally and efficiently view and interact with the virtual object. Moreover, modifying the appearance of the real object reduces cognitive burden on the user.

[0039] In some embodiments, the electronic device automatically selects a location for a user in a three-dimensional environment that includes one or more virtual objects and/or other users. In some embodiments, a user gains access to a three-dimensional environment that already includes one or more other users and one or more virtual objects. In some embodiments, the electronic device automatically selects a location with which to associate the user (e.g., a location at which to place the viewpoint of the user) based on the locations and orientations of the virtual objects and other users in the three-dimensional environment. In some embodiments, the electronic device selects a location for the user to enable the user to view the other users and the virtual objects in the three-dimensional environment without blocking other users’ views of the users and the virtual objects. Automatically placing the user in the three-dimensional environment based on the locations and orientations of the virtual objects and other users in the three-dimensional environment enables the user to efficiently view and interact with the virtual objects and other users in the three-dimensional environment, without requiring the user manually select a location in the three-dimensional environment with which to be associated.

[0040] In some embodiments, the electronic device redirects an input from one user interface element to another in accordance with movement included in the input. In some embodiments, the electronic device presents a plurality of interactive user interface elements and receives, via one or more input devices, an input directed to a first user interface element of the plurality of user interface elements. In some embodiments, after detecting a portion of the input (e.g., without detecting the entire input), the electronic device detects a movement portion of the input corresponding to a request to redirect the input to a second user interface element. In response, in some embodiments, the electronic device directs the input to the second user interface element. In some embodiments, in response to movement that satisfies one or more criteria (e.g., based on speed, duration, distance, etc.), the electronic device cancels the input instead of redirecting the input. Enabling the user to redirect or cancel an input after providing a portion of the input enables the user to efficiently interact with the electronic device with fewer inputs (e.g., to undo unintended actions and/or to direct the input to a different user interface element).

[0041] FIGS. 1-6 provide a description of example computer systems for providing CGR experiences to users (such as described below with respect to methods 800, 1000, 1200, 1400, 1600, 1800, 2000, and 2200). In some embodiments, as shown in FIG. 1, the CGR experience is provided to the user via an operating environment 100 that includes a computer system 101. The computer system 101 includes a controller 110 (e.g., processors of a portable electronic device or a remote server), a display generation component 120 (e.g., a head-mounted device (HMD), a display, a projector, a touch-screen, etc.), one or more input devices 125 (e.g., an eye tracking device 130, a hand tracking device 140, other input devices 150), one or more output devices 155 (e.g., speakers 160, tactile output generators 170, and other output devices 180), one or more sensors 190 (e.g., image sensors, light sensors, depth sensors, tactile sensors, orientation sensors, proximity sensors, temperature sensors, location sensors, motion sensors, velocity sensors, etc.), and optionally one or more peripheral devices 195 (e.g., home appliances, wearable devices, etc.). In some embodiments, one or more of the input devices 125, output devices 155, sensors 190, and peripheral devices 195 are integrated with the display generation component 120 (e.g., in a head-mounted device or a handheld device).

[0042] The processes described below enhance the operability of the devices and make the user-device interfaces more efficient (e.g., by helping the user to provide proper inputs and reducing user mistakes when operating/interacting with the device) through various techniques, including by providing improved visual feedback to the user, reducing the number of inputs needed to perform an operation, providing additional control options without cluttering the user interface with additional displayed controls, performing an operation when a set of conditions has been met without requiring further user input, and/or additional techniques. These techniques also reduce power usage and improve battery life of the device by enabling the user to use the device more quickly and efficiently.

[0043] When describing a CGR experience, various terms are used to differentially refer to several related but distinct environments that the user may sense and/or with which a user may interact (e.g., with inputs detected by a computer system 101 generating the CGR experience that cause the computer system generating the CGR experience to generate audio, visual, and/or tactile feedback corresponding to various inputs provided to the computer system 101). The following is a subset of these terms:

[0044] Physical environment: A physical environment refers to a physical world that people can sense and/or interact with without aid of electronic systems. Physical environments, such as a physical park, include physical articles, such as physical trees, physical buildings, and physical people. People can directly sense and/or interact with the physical environment, such as through sight, touch, hearing, taste, and smell.

[0045] Computer-generated reality: In contrast, a computer-generated reality (CGR) environment refers to a wholly or partially simulated environment that people sense and/or interact with via an electronic system. In CGR, a subset of a person’s physical motions, or representations thereof, are tracked, and, in response, one or more characteristics of one or more virtual objects simulated in the CGR environment are adjusted in a manner that comports with at least one law of physics. For example, a CGR system may detect a person’s head turning and, in response, adjust graphical content and an acoustic field presented to the person in a manner similar to how such views and sounds would change in a physical environment. In some situations (e.g., for accessibility reasons), adjustments to characteristic(s) of virtual object(s) in a CGR environment may be made in response to representations of physical motions (e.g., vocal commands). A person may sense and/or interact with a CGR object using any one of their senses, including sight, sound, touch, taste, and smell. For example, a person may sense and/or interact with audio objects that create 3D or spatial audio environment that provides the perception of point audio sources in 3D space. In another example, audio objects may enable audio transparency, which selectively incorporates ambient sounds from the physical environment with or without computer-generated audio. In some CGR environments, a person may sense and/or interact only with audio objects.

[0046] Examples of CGR include virtual reality and mixed reality.

[0047] Virtual reality: A virtual reality (VR) environment refers to a simulated environment that is designed to be based entirely on computer-generated sensory inputs for one or more senses. A VR environment comprises a plurality of virtual objects with which a person may sense and/or interact. For example, computer-generated imagery of trees, buildings, and avatars representing people are examples of virtual objects. A person may sense and/or interact with virtual objects in the VR environment through a simulation of the person’s presence within the computer-generated environment, and/or through a simulation of a subset of the person’s physical movements within the computer-generated environment.

[0048] Mixed reality: In contrast to a VR environment, which is designed to be based entirely on computer-generated sensory inputs, a mixed reality (MR) environment refers to a simulated environment that is designed to incorporate sensory inputs from the physical environment, or a representation thereof, in addition to including computer-generated sensory inputs (e.g., virtual objects). On a virtuality continuum, a mixed reality environment is anywhere between, but not including, a wholly physical environment at one end and virtual reality environment at the other end. In some MR environments, computer-generated sensory inputs may respond to changes in sensory inputs from the physical environment. Also, some electronic systems for presenting an MR environment may track location and/or orientation with respect to the physical environment to enable virtual objects to interact with real objects (that is, physical articles from the physical environment or representations thereof). For example, a system may account for movements so that a virtual tree appears stationery with respect to the physical ground.

[0049] Examples of mixed realities include augmented reality and augmented virtuality.

[0050] Augmented reality: An augmented reality (AR) environment refers to a simulated environment in which one or more virtual objects are superimposed over a physical environment, or a representation thereof. For example, an electronic system for presenting an AR environment may have a transparent or translucent display through which a person may directly view the physical environment. The system may be configured to present virtual objects on the transparent or translucent display, so that a person, using the system, perceives the virtual objects superimposed over the physical environment. Alternatively, a system may have an opaque display and one or more imaging sensors that capture images or video of the physical environment, which are representations of the physical environment. The system composites the images or video with virtual objects, and presents the composition on the opaque display. A person, using the system, indirectly views the physical environment by way of the images or video of the physical environment, and perceives the virtual objects superimposed over the physical environment. As used herein, a video of the physical environment shown on an opaque display is called “pass-through video,” meaning a system uses one or more image sensor(s) to capture images of the physical environment, and uses those images in presenting the AR environment on the opaque display. Further alternatively, a system may have a projection system that projects virtual objects into the physical environment, for example, as a hologram or on a physical surface, so that a person, using the system, perceives the virtual objects superimposed over the physical environment. An augmented reality environment also refers to a simulated environment in which a representation of a physical environment is transformed by computer-generated sensory information. For example, in providing pass-through video, a system may transform one or more sensor images to impose a select perspective (e.g., viewpoint) different than the perspective captured by the imaging sensors. As another example, a representation of a physical environment may be transformed by graphically modifying (e.g., enlarging) portions thereof, such that the modified portion may be representative but not photorealistic versions of the originally captured images. As a further example, a representation of a physical environment may be transformed by graphically eliminating or obfuscating portions thereof.

[0051] Augmented virtuality: An augmented virtuality (AV) environment refers to a simulated environment in which a virtual or computer generated environment incorporates one or more sensory inputs from the physical environment. The sensory inputs may be representations of one or more characteristics of the physical environment. For example, an AV park may have virtual trees and virtual buildings, but people with faces photorealistically reproduced from images taken of physical people. As another example, a virtual object may adopt a shape or color of a physical article imaged by one or more imaging sensors. As a further example, a virtual object may adopt shadows consistent with the position of the sun in the physical environment.

[0052] Hardware: There are many different types of electronic systems that enable a person to sense and/or interact with various CGR environments. Examples include head mounted systems, projection-based systems, heads-up displays (HUDs), vehicle windshields having integrated display capability, windows having integrated display capability, displays formed as lenses designed to be placed on a person’s eyes (e.g., similar to contact lenses), headphones/earphones, speaker arrays, input systems (e.g., wearable or handheld controllers with or without haptic feedback), smartphones, tablets, and desktop/laptop computers. A head mounted system may have one or more speaker(s) and an integrated opaque display. Alternatively, a head mounted system may be configured to accept an external opaque display (e.g., a smartphone). The head mounted system may incorporate one or more imaging sensors to capture images or video of the physical environment, and/or one or more microphones to capture audio of the physical environment. Rather than an opaque display, a head mounted system may have a transparent or translucent display. The transparent or translucent display may have a medium through which light representative of images is directed to a person’s eyes. The display may utilize digital light projection, OLEDs, LEDs, uLEDs, liquid crystal on silicon, laser scanning light source, or any combination of these technologies. The medium may be an optical waveguide, a hologram medium, an optical combiner, an optical reflector, or any combination thereof. In one embodiment, the transparent or translucent display may be configured to become opaque selectively. Projection-based systems may employ retinal projection technology that projects graphical images onto a person’s retina. Projection systems also may be configured to project virtual objects into the physical environment, for example, as a hologram or on a physical surface. In some embodiments, the controller 110 is configured to manage and coordinate a CGR experience for the user. In some embodiments, the controller 110 includes a suitable combination of software, firmware, and/or hardware. The controller 110 is described in greater detail below with respect to FIG. 2. In some embodiments, the controller 110 is a computing device that is local or remote relative to the scene 105 (e.g., a physical environment). For example, the controller 110 is a local server located within the scene 105. In another example, the controller 110 is a remote server located outside of the scene 105 (e.g., a cloud server, central server, etc.). In some embodiments, the controller 110 is communicatively coupled with the display generation component 120 (e.g., an HMD, a display, a projector, a touch-screen, etc.) via one or more wired or wireless communication channels 144 (e.g., BLUETOOTH, IEEE 802.11x, IEEE 802.16x, IEEE 802.3x, etc.). In another example, the controller 110 is included within the enclosure (e.g., a physical housing) of the display generation component 120 (e.g., an HMD, or a portable electronic device that includes a display and one or more processors, etc.), one or more of the input devices 125, one or more of the output devices 155, one or more of the sensors 190, and/or one or more of the peripheral devices 195, or share the same physical enclosure or support structure with one or more of the above.

[0053] In some embodiments, the display generation component 120 is configured to provide the CGR experience (e.g., at least a visual component of the CGR experience) to the user. In some embodiments, the display generation component 120 includes a suitable combination of software, firmware, and/or hardware. The display generation component 120 is described in greater detail below with respect to FIG. 3. In some embodiments, the functionalities of the controller 110 are provided by and/or combined with the display generation component 120.

[0054] According to some embodiments, the display generation component 120 provides a CGR experience to the user while the user is virtually and/or physically present within the scene 105.

[0055] In some embodiments, the display generation component is worn on a part of the user’s body (e.g., on his/her head, on his/her hand, etc.). As such, the display generation component 120 includes one or more CGR displays provided to display the CGR content. For example, in various embodiments, the display generation component 120 encloses the field-of-view of the user. In some embodiments, the display generation component 120 is a handheld device (such as a smartphone or tablet) configured to present CGR content, and the user holds the device with a display directed towards the field-of-view of the user and a camera directed towards the scene 105. In some embodiments, the handheld device is optionally placed within an enclosure that is worn on the head of the user. In some embodiments, the handheld device is optionally placed on a support (e.g., a tripod) in front of the user. In some embodiments, the display generation component 120 is a CGR chamber, enclosure, or room configured to present CGR content in which the user does not wear or hold the display generation component 120. Many user interfaces described with reference to one type of hardware for displaying CGR content (e.g., a handheld device or a device on a tripod) could be implemented on another type of hardware for displaying CGR content (e.g., an HMD or other wearable computing device). For example, a user interface showing interactions with CGR content triggered based on interactions that happen in a space in front of a handheld or tripod mounted device could similarly be implemented with an HMD where the interactions happen in a space in front of the HMD and the responses of the CGR content are displayed via the HMD. Similarly, a user interface showing interactions with CRG content triggered based on movement of a handheld or tripod mounted device relative to the physical environment (e.g., the scene 105 or a part of the user’s body (e.g., the user’s eye(s), head, or hand)) could similarly be implemented with an HMD where the movement is caused by movement of the HMD relative to the physical environment (e.g., the scene 105 or a part of the user’s body (e.g., the user’s eye(s), head, or hand)).

[0056] While pertinent features of the operation environment 100 are shown in FIG. 1, those of ordinary skill in the art will appreciate from the present disclosure that various other features have not been illustrated for the sake of brevity and so as not to obscure more pertinent aspects of the example embodiments disclosed herein.

[0057] FIG. 2 is a block diagram of an example of the controller 110 in accordance with some embodiments. While certain specific features are illustrated, those skilled in the art will appreciate from the present disclosure that various other features have not been illustrated for the sake of brevity, and so as not to obscure more pertinent aspects of the embodiments disclosed herein. To that end, as a non-limiting example, in some embodiments, the controller 110 includes one or more processing units 202 (e.g., microprocessors, application-specific integrated-circuits (ASICs), field-programmable gate arrays (FPGAs), graphics processing units (GPUs), central processing units (CPUs), processing cores, and/or the like), one or more input/output (I/O) devices 206, one or more communication interfaces 208 (e.g., universal serial bus (USB), FIREWIRE, THUNDERBOLT, IEEE 802.3x, IEEE 802.11x, IEEE 802.16x, global system for mobile communications (GSM), code division multiple access (CDMA), time division multiple access (TDMA), global positioning system (GPS), infrared (IR), BLUETOOTH, ZIGBEE, and/or the like type interface), one or more programming (e.g., I/O) interfaces 210, a memory 220, and one or more communication buses 204 for interconnecting these and various other components.

[0058] In some embodiments, the one or more communication buses 204 include circuitry that interconnects and controls communications between system components. In some embodiments, the one or more I/O devices 206 include at least one of a keyboard, a mouse, a touchpad, a joystick, one or more microphones, one or more speakers, one or more image sensors, one or more displays, and/or the like.

[0059] The memory 220 includes high-speed random-access memory, such as dynamic random-access memory (DRAM), static random-access memory (SRAM), double-data-rate random-access memory (DDR RAM), or other random-access solid-state memory devices. In some embodiments, the memory 220 includes non-volatile memory, such as one or more magnetic disk storage devices, optical disk storage devices, flash memory devices, or other non-volatile solid-state storage devices. The memory 220 optionally includes one or more storage devices remotely located from the one or more processing units 202. The memory 220 comprises a non-transitory computer readable storage medium. In some embodiments, the memory 220 or the non-transitory computer readable storage medium of the memory 220 stores the following programs, modules and data structures, or a subset thereof including an optional operating system 230 and a CGR experience module 240.

[0060] The operating system 230 includes instructions for handling various basic system services and for performing hardware dependent tasks. In some embodiments, the CGR experience module 240 is configured to manage and coordinate one or more CGR experiences for one or more users (e.g., a single CGR experience for one or more users, or multiple CGR experiences for respective groups of one or more users). To that end, in various embodiments, the CGR experience module 240 includes a data obtaining unit 242, a tracking unit 244, a coordination unit 246, and a data transmitting unit 248.

[0061] In some embodiments, the data obtaining unit 242 is configured to obtain data (e.g., presentation data, interaction data, sensor data, location data, etc.) from at least the display generation component 120 of FIG. 1, and optionally one or more of the input devices 125, output devices 155, sensors 190, and/or peripheral devices 195. To that end, in various embodiments, the data obtaining unit 242 includes instructions and/or logic therefor, and heuristics and metadata therefor.

[0062] In some embodiments, the tracking unit 244 is configured to map the scene 105 and to track the position/location of at least the display generation component 120 with respect to the scene 105 of FIG. 1, and optionally, to one or more of the input devices 125, output devices 155, sensors 190, and/or peripheral devices 195. To that end, in various embodiments, the tracking unit 244 includes instructions and/or logic therefor, and heuristics and metadata therefor. In some embodiments, the tracking unit 244 includes hand tracking unit 243 and/or eye tracking unit 245. In some embodiments, the hand tracking unit 243 is configured to track the position/location of one or more portions of the user’s hands, and/or motions of one or more portions of the user’s hands with respect to the scene 105 of FIG. 1, relative to the display generation component 120, and/or relative to a coordinate system defined relative to the user’s hand. The hand tracking unit 243 is described in greater detail below with respect to FIG. 4. In some embodiments, the eye tracking unit 245 is configured to track the position and movement of the user’s gaze (or more broadly, the user’s eyes, face, or head) with respect to the scene 105 (e.g., with respect to the physical environment and/or to the user (e.g., the user’s hand)) or with respect to the CGR content displayed via the display generation component 120. The eye tracking unit 245 is described in greater detail below with respect to FIG. 5.

[0063] In some embodiments, the coordination unit 246 is configured to manage and coordinate the CGR experience presented to the user by the display generation component 120, and optionally, by one or more of the output devices 155 and/or peripheral devices 195. To that end, in various embodiments, the coordination unit 246 includes instructions and/or logic therefor, and heuristics and metadata therefor.

[0064] In some embodiments, the data transmitting unit 248 is configured to transmit data (e.g., presentation data, location data, etc.) to at least the display generation component 120, and optionally, to one or more of the input devices 125, output devices 155, sensors 190, and/or peripheral devices 195. To that end, in various embodiments, the data transmitting unit 248 includes instructions and/or logic therefor, and heuristics and metadata therefor.

[0065] Although the data obtaining unit 242, the tracking unit 244 (e.g., including the eye tracking unit 243 and the hand tracking unit 244), the coordination unit 246, and the data transmitting unit 248 are shown as residing on a single device (e.g., the controller 110), it should be understood that in other embodiments, any combination of the data obtaining unit 242, the tracking unit 244 (e.g., including the eye tracking unit 243 and the hand tracking unit 244), the coordination unit 246, and the data transmitting unit 248 may be located in separate computing devices.

[0066] Moreover, FIG. 2 is intended more as functional description of the various features that may be present in a particular implementation as opposed to a structural schematic of the embodiments described herein. As recognized by those of ordinary skill in the art, items shown separately could be combined and some items could be separated. For example, some functional modules shown separately in FIG. 2 could be implemented in a single module and the various functions of single functional blocks could be implemented by one or more functional blocks in various embodiments. The actual number of modules and the division of particular functions and how features are allocated among them will vary from one implementation to another and, in some embodiments, depends in part on the particular combination of hardware, software, and/or firmware chosen for a particular implementation.

[0067] FIG. 3 is a block diagram of an example of the display generation component 120 in accordance with some embodiments. While certain specific features are illustrated, those skilled in the art will appreciate from the present disclosure that various other features have not been illustrated for the sake of brevity, and so as not to obscure more pertinent aspects of the embodiments disclosed herein. To that end, as a non-limiting example, in some embodiments the HMD 120 includes one or more processing units 302 (e.g., microprocessors, ASICs, FPGAs, GPUs, CPUs, processing cores, and/or the like), one or more input/output (I/O) devices and sensors 306, one or more communication interfaces 308 (e.g., USB, FIREWIRE, THUNDERBOLT, IEEE 802.3x, IEEE 802.11x, IEEE 802.16x, GSM, CDMA, TDMA, GPS, IR, BLUETOOTH, ZIGBEE, and/or the like type interface), one or more programming (e.g., I/O) interfaces 310, one or more CGR displays 312, one or more optional interior- and/or exterior-facing image sensors 314, a memory 320, and one or more communication buses 304 for interconnecting these and various other components.

[0068] In some embodiments, the one or more communication buses 304 include circuitry that interconnects and controls communications between system components. In some embodiments, the one or more I/O devices and sensors 306 include at least one of an inertial measurement unit (IMU), an accelerometer, a gyroscope, a thermometer, one or more physiological sensors (e.g., blood pressure monitor, heart rate monitor, blood oxygen sensor, blood glucose sensor, etc.), one or more microphones, one or more speakers, a haptics engine, one or more depth sensors (e.g., a structured light, a time-of-flight, or the like), and/or the like.

[0069] In some embodiments, the one or more CGR displays 312 are configured to provide the CGR experience to the user. In some embodiments, the one or more CGR displays 312 correspond to holographic, digital light processing (DLP), liquid-crystal display (LCD), liquid-crystal on silicon (LCoS), organic light-emitting field-effect transitory (OLET), organic light-emitting diode (OLED), surface-conduction electron-emitter display (SED), field-emission display (FED), quantum-dot light-emitting diode (QD-LED), micro-electro-mechanical system (MEMS), and/or the like display types. In some embodiments, the one or more CGR displays 312 correspond to diffractive, reflective, polarized, holographic, etc. waveguide displays. For example, the HMD 120 includes a single CGR display. In another example, the HMD 120 includes a CGR display for each eye of the user. In some embodiments, the one or more CGR displays 312 are capable of presenting MR and VR content. In some embodiments, the one or more CGR displays 312 are capable of presenting MR or VR content.

[0070] In some embodiments, the one or more image sensors 314 are configured to obtain image data that corresponds to at least a portion of the face of the user that includes the eyes of the user (and may be referred to as an eye-tracking camera). In some embodiments, the one or more image sensors 314 are configured to obtain image data that corresponds to at least a portion of the user’s hand(s) and optionally arm(s) of the user (and may be referred to as a hand-tracking camera). In some embodiments, the one or more image sensors 314 are configured to be forward-facing so as to obtain image data that corresponds to the scene as would be viewed by the user if the HMD 120 was not present (and may be referred to as a scene camera). The one or more optional image sensors 314 can include one or more RGB cameras (e.g., with a complimentary metal-oxide-semiconductor (CMOS) image sensor or a charge-coupled device (CCD) image sensor), one or more infrared (IR) cameras, one or more event-based cameras, and/or the like.

[0071] The memory 320 includes high-speed random-access memory, such as DRAM, SRAM, DDR RAM, or other random-access solid-state memory devices. In some embodiments, the memory 320 includes non-volatile memory, such as one or more magnetic disk storage devices, optical disk storage devices, flash memory devices, or other non-volatile solid-state storage devices. The memory 320 optionally includes one or more storage devices remotely located from the one or more processing units 302. The memory 320 comprises a non-transitory computer readable storage medium. In some embodiments, the memory 320 or the non-transitory computer readable storage medium of the memory 320 stores the following programs, modules and data structures, or a subset thereof including an optional operating system 330 and a CGR presentation module 340.

[0072] The operating system 330 includes instructions for handling various basic system services and for performing hardware dependent tasks. In some embodiments, the CGR presentation module 340 is configured to present CGR content to the user via the one or more CGR displays 312. To that end, in various embodiments, the CGR presentation module 340 includes a data obtaining unit 342, a CGR presenting unit 344, a CGR map generating unit 346, and a data transmitting unit 348.

[0073] In some embodiments, the data obtaining unit 342 is configured to obtain data (e.g., presentation data, interaction data, sensor data, location data, etc.) from at least the controller 110 of FIG. 1. To that end, in various embodiments, the data obtaining unit 342 includes instructions and/or logic therefor, and heuristics and metadata therefor.

[0074] In some embodiments, the CGR presenting unit 344 is configured to present CGR content via the one or more CGR displays 312. To that end, in various embodiments, the CGR presenting unit 344 includes instructions and/or logic therefor, and heuristics and metadata therefor.

[0075] In some embodiments, the CGR map generating unit 346 is configured to generate a CGR map (e.g., a 3D map of the mixed reality scene or a map of the physical environment into which computer generated objects can be placed to generate the computer generated reality) based on media content data. To that end, in various embodiments, the CGR map generating unit 346 includes instructions and/or logic therefor, and heuristics and metadata therefor.

[0076] In some embodiments, the data transmitting unit 348 is configured to transmit data (e.g., presentation data, location data, etc.) to at least the controller 110, and optionally one or more of the input devices 125, output devices 155, sensors 190, and/or peripheral devices 195. To that end, in various embodiments, the data transmitting unit 348 includes instructions and/or logic therefor, and heuristics and metadata therefor.

[0077] Although the data obtaining unit 342, the CGR presenting unit 344, the CGR map generating unit 346, and the data transmitting unit 348 are shown as residing on a single device (e.g., the display generation component 120 of FIG. 1), it should be understood that in other embodiments, any combination of the data obtaining unit 342, the CGR presenting unit 344, the CGR map generating unit 346, and the data transmitting unit 348 may be located in separate computing devices.

[0078] Moreover, FIG. 3 is intended more as a functional description of the various features that could be present in a particular implementation as opposed to a structural schematic of the embodiments described herein. As recognized by those of ordinary skill in the art, items shown separately could be combined and some items could be separated. For example, some functional modules shown separately in FIG. 3 could be implemented in a single module and the various functions of single functional blocks could be implemented by one or more functional blocks in various embodiments. The actual number of modules and the division of particular functions and how features are allocated among them will vary from one implementation to another and, in some embodiments, depends in part on the particular combination of hardware, software, and/or firmware chosen for a particular implementation.

[0079] FIG. 4 is a schematic, pictorial illustration of an example embodiment of the hand tracking device 140. In some embodiments, hand tracking device 140 (FIG. 1) is controlled by hand tracking unit 243 (FIG. 2) to track the position/location of one or more portions of the user’s hands, and/or motions of one or more portions of the user’s hands with respect to the scene 105 of FIG. 1 (e.g., with respect to a portion of the physical environment surrounding the user, with respect to the display generation component 120, or with respect to a portion of the user (e.g., the user’s face, eyes, or head), and/or relative to a coordinate system defined relative to the user’s hand. In some embodiments, the hand tracking device 140 is part of the display generation component 120 (e.g., embedded in or attached to a head-mounted device). In some embodiments, the hand tracking device 140 is separate from the display generation component 120 (e.g., located in separate housings or attached to separate physical support structures).

[0080] In some embodiments, the hand tracking device 140 includes image sensors 404 (e.g., one or more IR cameras, 3D cameras, depth cameras, and/or color cameras, etc.) that capture three-dimensional scene information that includes at least a hand 406 of a human user. The image sensors 404 capture the hand images with sufficient resolution to enable the fingers and their respective positions to be distinguished. The image sensors 404 typically capture images of other parts of the user’s body, as well, or possibly all of the body, and may have either zoom capabilities or a dedicated sensor with enhanced magnification to capture images of the hand with the desired resolution. In some embodiments, the image sensors 404 also capture 2D color video images of the hand 406 and other elements of the scene. In some embodiments, the image sensors 404 are used in conjunction with other image sensors to capture the physical environment of the scene 105, or serve as the image sensors that capture the physical environments of the scene 105. In some embodiments, the image sensors 404 are positioned relative to the user or the user’s environment in a way that a field of view of the image sensors or a portion thereof is used to define an interaction space in which hand movement captured by the image sensors are treated as inputs to the controller 110.

[0081] In some embodiments, the image sensors 404 outputs a sequence of frames containing 3D map data (and possibly color image data, as well) to the controller 110, which extracts high-level information from the map data. This high-level information is typically provided via an Application Program Interface (API) to an application running on the controller, which drives the display generation component 120 accordingly. For example, the user may interact with software running on the controller 110 by moving his hand 408 and changing his hand posture.

[0082] In some embodiments, the image sensors 404 project a pattern of spots onto a scene containing the hand 406 and captures an image of the projected pattern. In some embodiments, the controller 110 computes the 3D coordinates of points in the scene (including points on the surface of the user’s hand) by triangulation, based on transverse shifts of the spots in the pattern. This approach is advantageous in that it does not require the user to hold or wear any sort of beacon, sensor, or other marker. It gives the depth coordinates of points in the scene relative to a predetermined reference plane, at a certain distance from the image sensors 404. In the present disclosure, the image sensors 404 are assumed to define an orthogonal set of x, y, z axes, so that depth coordinates of points in the scene correspond to z components measured by the image sensors. Alternatively, the hand tracking device 440 may use other methods of 3D mapping, such as stereoscopic imaging or time-of-flight measurements, based on single or multiple cameras or other types of sensors.

[0083] In some embodiments, the hand tracking device 140 captures and processes a temporal sequence of depth maps containing the user’s hand, while the user moves his hand (e.g., whole hand or one or more fingers). Software running on a processor in the image sensors 404 and/or the controller 110 processes the 3D map data to extract patch descriptors of the hand in these depth maps. The software matches these descriptors to patch descriptors stored in a database 408, based on a prior learning process, in order to estimate the pose of the hand in each frame. The pose typically includes 3D locations of the user’s hand joints and finger tips.

[0084] The software may also analyze the trajectory of the hands and/or fingers over multiple frames in the sequence in order to identify gestures. The pose estimation functions described herein may be interleaved with motion tracking functions, so that patch-based pose estimation is performed only once in every two (or more) frames, while tracking is used to find changes in the pose that occur over the remaining frames. The pose, motion and gesture information are provided via the above-mentioned API to an application program running on the controller 110. This program may, for example, move and modify images presented on the display generation component 120, or perform other functions, in response to the pose and/or gesture information.

[0085] In some embodiments, the software may be downloaded to the controller 110 in electronic form, over a network, for example, or it may alternatively be provided on tangible, non-transitory media, such as optical, magnetic, or electronic memory media. In some embodiments, the database 408 is likewise stored in a memory associated with the controller 110. Alternatively or additionally, some or all of the described functions of the computer may be implemented in dedicated hardware, such as a custom or semi-custom integrated circuit or a programmable digital signal processor (DSP). Although the controller 110 is shown in FIG. 4, by way of example, as a separate unit from the image sensors 440, some or all of the processing functions of the controller may be performed by a suitable microprocessor and software or by dedicated circuitry within the housing of the hand tracking device 402 or otherwise associated with the image sensors 404. In some embodiments, at least some of these processing functions may be carried out by a suitable processor that is integrated with the display generation component 120 (e.g., in a television set, a handheld device, or head-mounted device, for example) or with any other suitable computerized device, such as a game console or media player. The sensing functions of image sensors 404 may likewise be integrated into the computer or other computerized apparatus that is to be controlled by the sensor output.

[0086] FIG. 4 further includes a schematic representation of a depth map 410 captured by the image sensors 404, in accordance with some embodiments. The depth map, as explained above, comprises a matrix of pixels having respective depth values. The pixels 412 corresponding to the hand 406 have been segmented out from the background and the wrist in this map. The brightness of each pixel within the depth map 410 corresponds inversely to its depth value, i.e., the measured z distance from the image sensors 404, with the shade of gray growing darker with increasing depth. The controller 110 processes these depth values in order to identify and segment a component of the image (i.e., a group of neighboring pixels) having characteristics of a human hand. These characteristics, may include, for example, overall size, shape and motion from frame to frame of the sequence of depth maps.

[0087] FIG. 4 also schematically illustrates a hand skeleton 414 that controller 110 ultimately extracts from the depth map 410 of the hand 406, in accordance with some embodiments. In FIG. 4, the skeleton 414 is superimposed on a hand background 416 that has been segmented from the original depth map. In some embodiments, key feature points of the hand (e.g., points corresponding to knuckles, finger tips, center of the palm, end of the hand connecting to wrist, etc.) and optionally on the wrist or arm connected to the hand are identified and located on the hand skeleton 414. In some embodiments, location and movements of these key feature points over multiple image frames are used by the controller 110 to determine the hand gestures performed by the hand or the current state of the hand, in accordance with some embodiments.

[0088] FIG. 5 illustrates an example embodiment of the eye tracking device 130 (FIG. 1). In some embodiments, the eye tracking device 130 is controlled by the eye tracking unit 245 (FIG. 2) to track the position and movement of the user’s gaze with respect to the scene 105 or with respect to the CGR content displayed via the display generation component 120. In some embodiments, the eye tracking device 130 is integrated with the display generation component 120. For example, in some embodiments, when the display generation component 120 is a head-mounted device such as headset, helmet, goggles, or glasses, or a handheld device placed in a wearable frame, the head-mounted device includes both a component that generates the CGR content for viewing by the user and a component for tracking the gaze of the user relative to the CGR content. In some embodiments, the eye tracking device 130 is separate from the display generation component 120. For example, when display generation component is a handheld device or a CGR chamber, the eye tracking device 130 is optionally a separate device from the handheld device or CGR chamber. In some embodiments, the eye tracking device 130 is a head-mounted device or part of a head-mounted device. In some embodiments, the head-mounted eye-tracking device 130 is optionally used in conjunction with a display generation component that is also head-mounted, or a display generation component that is not head-mounted. In some embodiments, the eye tracking device 130 is not a head-mounted device, and is optionally used in conjunction with a head-mounted display generation component. In some embodiments, the eye tracking device 130 is not a head-mounted device, and is optionally part of a non-head-mounted display generation component.

[0089] In some embodiments, the display generation component 120 uses a display mechanism (e.g., left and right near-eye display panels) for displaying frames including left and right images in front of a user’s eyes to thus provide 3D virtual views to the user. For example, a head-mounted display generation component may include left and right optical lenses (referred to herein as eye lenses) located between the display and the user’s eyes. In some embodiments, the display generation component may include or be coupled to one or more external video cameras that capture video of the user’s environment for display. In some embodiments, a head-mounted display generation component may have a transparent or semi-transparent display through which a user may view the physical environment directly and display virtual objects on the transparent or semi-transparent display. In some embodiments, display generation component projects virtual objects into the physical environment. The virtual objects may be projected, for example, on a physical surface or as a holograph, so that an individual, using the system, observes the virtual objects superimposed over the physical environment. In such cases, separate display panels and image frames for the left and right eyes may not be necessary.

[0090] As shown in FIG. 5, in some embodiments, a gaze tracking device 130 includes at least one eye tracking camera (e.g., infrared (IR) or near-IR (NIR) cameras), and illumination sources (e.g., IR or NIR light sources such as an array or ring of LEDs) that emit light (e.g., IR or NIR light) towards the user’s eyes. The eye tracking cameras may be pointed towards the user’s eyes to receive reflected IR or NIR light from the light sources directly from the eyes, or alternatively may be pointed towards “hot” mirrors located between the user’s eyes and the display panels that reflect IR or NIR light from the eyes to the eye tracking cameras while allowing visible light to pass. The gaze tracking device 130 optionally captures images of the user’s eyes (e.g., as a video stream captured at 60-120 frames per second (fps)), analyze the images to generate gaze tracking information, and communicate the gaze tracking information to the controller 110. In some embodiments, two eyes of the user are separately tracked by respective eye tracking cameras and illumination sources. In some embodiments, only one eye of the user is tracked by a respective eye tracking camera and illumination sources.

[0091] In some embodiments, the eye tracking device 130 is calibrated using a device-specific calibration process to determine parameters of the eye tracking device for the specific operating environment 100, for example the 3D geometric relationship and parameters of the LEDs, cameras, hot mirrors (if present), eye lenses, and display screen. The device-specific calibration process may be performed at the factory or another facility prior to delivery of the AR/VR equipment to the end user. The device-specific calibration process may an automated calibration process or a manual calibration process. A user-specific calibration process may include an estimation of a specific user’s eye parameters, for example the pupil location, fovea location, optical axis, visual axis, eye spacing, etc. Once the device-specific and user-specific parameters are determined for the eye tracking device 130, images captured by the eye tracking cameras can be processed using a glint-assisted method to determine the current visual axis and point of gaze of the user with respect to the display, in accordance with some embodiments.

[0092] As shown in FIG. 5, the eye tracking device 130 (e.g., 130A or 130B) includes eye lens(es) 520, and a gaze tracking system that includes at least one eye tracking camera 540 (e.g., infrared (IR) or near-IR (NIR) cameras) positioned on a side of the user’s face for which eye tracking is performed, and an illumination source 530 (e.g., IR or NIR light sources such as an array or ring of NIR light-emitting diodes (LEDs)) that emit light (e.g., IR or NIR light) towards the user’s eye(s) 592. The eye tracking cameras 540 may be pointed towards mirrors 550 located between the user’s eye(s) 592 and a display 510 (e.g., a left or right display panel of a head-mounted display, or a display of a handheld device, a projector, etc.) that reflect IR or NIR light from the eye(s) 592 while allowing visible light to pass (e.g., as shown in the top portion of FIG. 5), or alternatively may be pointed towards the user’s eye(s) 592 to receive reflected IR or NIR light from the eye(s) 592 (e.g., as shown in the bottom portion of FIG. 5).

[0093] In some embodiments, the controller 110 renders AR or VR frames 562 (e.g., left and right frames for left and right display panels) and provide the frames 562 to the display 510. The controller 110 uses gaze tracking input 542 from the eye tracking cameras 540 for various purposes, for example in processing the frames 562 for display. The controller 110 optionally estimates the user’s point of gaze on the display 510 based on the gaze tracking input 542 obtained from the eye tracking cameras 540 using the glint-assisted methods or other suitable methods. The point of gaze estimated from the gaze tracking input 542 is optionally used to determine the direction in which the user is currently looking.

[0094] The following describes several possible use cases for the user’s current gaze direction, and is not intended to be limiting. As an example use case, the controller 110 may render virtual content differently based on the determined direction of the user’s gaze. For example, the controller 110 may generate virtual content at a higher resolution in a foveal region determined from the user’s current gaze direction than in peripheral regions. As another example, the controller may position or move virtual content in the view based at least in part on the user’s current gaze direction. As another example, the controller may display particular virtual content in the view based at least in part on the user’s current gaze direction. As another example use case in AR applications, the controller 110 may direct external cameras for capturing the physical environments of the CGR experience to focus in the determined direction. The autofocus mechanism of the external cameras may then focus on an object or surface in the environment that the user is currently looking at on the display 510. As another example use case, the eye lenses 520 may be focusable lenses, and the gaze tracking information is used by the controller to adjust the focus of the eye lenses 520 so that the virtual object that the user is currently looking at has the proper vergence to match the convergence of the user’s eyes 592. The controller 110 may leverage the gaze tracking information to direct the eye lenses 520 to adjust focus so that close objects that the user is looking at appear at the right distance.

[0095] In some embodiments, the eye tracking device is part of a head-mounted device that includes a display (e.g., display 510), two eye lenses (e.g., eye lens(es) 520), eye tracking cameras (e.g., eye tracking camera(s) 540), and light sources (e.g., light sources 530 (e.g., IR or NIR LEDs), mounted in a wearable housing. The Light sources emit light (e.g., IR or NIR light) towards the user’s eye(s) 592. In some embodiments, the light sources may be arranged in rings or circles around each of the lenses as shown in FIG. 5. In some embodiments, eight light sources 530 (e.g., LEDs) are arranged around each lens 520 as an example. However, more or fewer light sources 530 may be used, and other arrangements and locations of light sources 530 may be used.

[0096] In some embodiments, the display 510 emits light in the visible light range and does not emit light in the IR or NIR range, and thus does not introduce noise in the gaze tracking system. Note that the location and angle of eye tracking camera(s) 540 is given by way of example, and is not intended to be limiting. In some embodiments, a single eye tracking camera 540 located on each side of the user’s face. In some embodiments, two or more NIR cameras 540 may be used on each side of the user’s face. In some embodiments, a camera 540 with a wider field of view (FOV) and a camera 540 with a narrower FOV may be used on each side of the user’s face. In some embodiments, a camera 540 that operates at one wavelength (e.g. 850 nm) and a camera 540 that operates at a different wavelength (e.g. 940 nm) may be used on each side of the user’s face.

[0097] Embodiments of the gaze tracking system as illustrated in FIG. 5 may, for example, be used in computer-generated reality, virtual reality, and/or mixed reality applications to provide computer-generated reality, virtual reality, augmented reality, and/or augmented virtuality experiences to the user.

[0098] FIG. 6A illustrates a glint-assisted gaze tracking pipeline, in accordance with some embodiments. In some embodiments, the gaze tracking pipeline is implemented by a glint-assisted gaze tracing system (e.g., eye tracking device 130 as illustrated in FIGS. 1 and 5). The glint-assisted gaze tracking system may maintain a tracking state. Initially, the tracking state is off or “NO”. When in the tracking state, the glint-assisted gaze tracking system uses prior information from the previous frame when analyzing the current frame to track the pupil contour and glints in the current frame. When not in the tracking state, the glint-assisted gaze tracking system attempts to detect the pupil and glints in the current frame and, if successful, initializes the tracking state to “YES” and continues with the next frame in the tracking state.

[0099] As shown in FIG. 6A, the gaze tracking cameras may capture left and right images of the user’s left and right eyes. The captured images are then input to a gaze tracking pipeline for processing beginning at 610. As indicated by the arrow returning to element 600, the gaze tracking system may continue to capture images of the user’s eyes, for example at a rate of 60 to 120 frames per second. In some embodiments, each set of captured images may be input to the pipeline for processing. However, in some embodiments or under some conditions, not all captured frames are processed by the pipeline.

[0100] At 610, for the current captured images, if the tracking state is YES, then the method proceeds to element 640. At 610, if the tracking state is NO, then as indicated at 620 the images are analyzed to detect the user’s pupils and glints in the images. At 630, if the pupils and glints are successfully detected, then the method proceeds to element 640. Otherwise, the method returns to element 610 to process next images of the user’s eyes.

[0101] At 640, if proceeding from element 410, the current frames are analyzed to track the pupils and glints based in part on prior information from the previous frames. At 640, if proceeding from element 630, the tracking state is initialized based on the detected pupils and glints in the current frames. Results of processing at element 640 are checked to verify that the results of tracking or detection can be trusted. For example, results may be checked to determine if the pupil and a sufficient number of glints to perform gaze estimation are successfully tracked or detected in the current frames. At 650, if the results cannot be trusted, then the tracking state is set to NO and the method returns to element 610 to process next images of the user’s eyes. At 650, if the results are trusted, then the method proceeds to element 670. At 670, the tracking state is set to YES (if not already YES), and the pupil and glint information is passed to element 680 to estimate the user’s point of gaze.

[0102] FIG. 6A is intended to serve as one example of eye tracking technology that may be used in a particular implementation. As recognized by those of ordinary skill in the art, other eye tracking technologies that currently exist or are developed in the future may be used in place of or in combination with the glint-assisted eye tracking technology describe herein in the computer system 101 for providing CGR experiences to users, in accordance with various embodiments.

[0103] FIG. 6B illustrates an exemplary environment of electronic devices 101a and 101b providing a CGR experience in accordance with some embodiments. In FIG. 6B, real world environment 602 includes electronic devices 101a and 101b, users 608a and 608b, and a real world object (e.g., table 604). As shown in FIG. 6B, electronic devices 101a and 101b are optionally mounted on tripods or otherwise secured in real world environment 602 such that one or more hands of users 608a and 608b are free (e.g., users 608a and 608b are optionally not holding devices 101a and 101b with one or more hands). As described above, devices 101a and 101b optionally have one or more groups of sensors positioned on different sides of devices 101a and 101b, respectively. For example, devices 101a and 101b optionally include sensor group 612-1a and 612-1b and sensor groups 612-2a and 612-2b located on the “back” and “front” sides of devices 101a and 101b, respectively (e.g., which are able to capture information from the respective sides of devices 101a and 101b). As used herein, the front side of devices 101a are the sides that are facing users 608a and 608b, and the back side of devices 101a and 101b are the side facing away from users 608a and 608b.

[0104] In some embodiments, sensor groups 612-2a and 612-2b include eye tracking units (e.g., eye tracking unit 245 described above with reference to FIG. 2) that include one or more sensors for tracking the eyes and/or gaze of the user such that the eye tracking units are able to “look” at users 608a and 608b and track the eye(s) of users 608a and 608b in the manners previously described. In some embodiments, the eye tracking unit of devices 101a and 101b are able to capture the movements, orientation, and/or gaze of the eyes of users 608a and 608b and treat the movements, orientation, and/or gaze as inputs.

[0105] In some embodiments, sensor groups 612-1a and 612-1b include hand tracking units (e.g., hand tracking unit 243 described above with reference to FIG. 2) that are able to track one or more hands of users 608a and 608b that are held on the “back” side of devices 101a and 101b, as shown in FIG. 6B. In some embodiments, the hand tracking units are optionally included in sensor groups 612-2a and 612-2b such that users 608a and 608b are able to additionally or alternatively hold one or more hands on the “front” side of devices 101a and 101b while devices 101a and 101b track the position of the one or more hands. As described above, the hand tracking unit of devices 101a and 101b are able to capture the movements, positions, and/or gestures of the one or more hands of users 608a and 608b and treat the movements, positions, and/or gestures as inputs.

[0106] In some embodiments, sensor groups 612-1a and 612-1b optionally include one or more sensors configured to capture images of real world environment 602, including table 604 (e.g., such as image sensors 404 described above with reference to FIG. 4). As described above, devices 101a and 101b are able to capture images of portions (e.g., some or all) of real world environment 602 and present the captured portions of real world environment 602 to the user via one or more display generation components of devices 101a and 101b (e.g., the displays of devices 101a and 101b, which are optionally located on the side of devices 101a and 101b that are facing the user, opposite of the side of devices 101a and 101b that are facing the captured portions of real world environment 602).

[0107] In some embodiments, the captured portions of real world environment 602 are used to provide a CGR experience to the user, for example, a mixed reality environment in which one or more virtual objects are superimposed over representations of real world environment 602.

[0108] Thus, the description herein describes some embodiments of three-dimensional environments (e.g., CGR environments) that include representations of real world objects and representations of virtual objects. For example, a three-dimensional environment optionally includes a representation of a table that exists in the physical environment, which is captured and displayed in the three-dimensional environment (e.g., actively via cameras and displays of an electronic device, or passively via a transparent or translucent display of the electronic device). As described previously, the three-dimensional environment is optionally a mixed reality system in which the three-dimensional environment is based on the physical environment that is captured by one or more sensors of the device and displayed via a display generation component. As a mixed reality system, the device is optionally able to selectively display portions and/or objects of the physical environment such that the respective portions and/or objects of the physical environment appear as if they exist in the three-dimensional environment displayed by the electronic device. Similarly, the device is optionally able to display virtual objects in the three-dimensional environment to appear as if the virtual objects exist in the real world (e.g., physical environment) by placing the virtual objects at respective locations in the three-dimensional environment that have corresponding locations in the real world. For example, the device optionally displays a vase such that it appears as if a real vase is placed on top of a table in the physical environment. In some embodiments, each location in the three-dimensional environment has a corresponding location in the physical environment. Thus, when the device is described as displaying a virtual object at a respective location with respect to a physical object (e.g., such as a location at or near the hand of the user, or at or near a physical table), the device displays the virtual object at a particular location in the three-dimensional environment such that it appears as if the virtual object is at or near the physical object in the physical world (e.g., the virtual object is displayed at a location in the three-dimensional environment that corresponds to a location in the physical environment at which the virtual object would be displayed if it were a real object at that particular location).

[0109] In some embodiments, real world objects that exist in the physical environment that are displayed in the three-dimensional environment can interact with virtual objects that exist only in the three-dimensional environment. For example, a three-dimensional environment can include a table and a vase placed on top of the table, with the table being a view of (or a representation of) a physical table in the physical environment, and the vase being a virtual object.

[0110] Similarly, a user is optionally able to interact with virtual objects in the three-dimensional environment using one or more hands as though the virtual objects were real objects in the physical environment. For example, as described above, one or more sensors of the device optionally capture one or more of the hands of the user and display representations of the hands of the user in the three-dimensional environment (e.g., in a manner similar to displaying a real world object in three-dimensional environment described above), or in some embodiments, the hands of the user are visible via the display generation component via the ability to see the physical environment through the user interface due to the transparency/translucency of a portion of the display generation component that is displaying the user interface or projection of the user interface onto a transparent/translucent surface or projection of the user interface onto the user’s eye or into a field of view of the user’s eye. Thus, in some embodiments, the hands of the user are displayed at a respective location in the three-dimensional environment and are treated as though they were objects in the three-dimensional environment that are able to interact with the virtual objects in the three-dimensional environment as though they were real physical objects in the physical environment. In some embodiments, a user is able to move his or her hands to cause the representations of the hands in the three-dimensional environment to move in conjunction with the movement of the user’s hand.

[0111] In some of the embodiments described below, the device is optionally able to determine the “effective” distance between physical objects in the physical world and virtual objects in the three-dimensional environment, for example, for the purpose of determining whether a physical object is interacting with a virtual object (e.g., whether a hand is touching, grabbing, holding, etc. a virtual object or within a threshold distance from a virtual object). For example, the device determines the distance between the hands of the user and virtual objects when determining whether the user is interacting with virtual objects and/or how the user is interacting with virtual objects. In some embodiments, the device determines the distance between the hands of the user and a virtual object by determining the distance between the location of the hands in the three-dimensional environment and the location of the virtual object of interest in the three-dimensional environment. For example, the one or more hands of the user are located at a particular position in the physical world, which the device optionally captures and displays at a particular corresponding position in the three-dimensional environment (e.g., the position in the three-dimensional environment at which the hands would be displayed if the hands were virtual, rather than physical, hands). The position of the hands in the three-dimensional environment is optionally compared against the position of the virtual object of interest in the three-dimensional environment to determine the distance between the one or more hands of the user and the virtual object. In some embodiments, the device optionally determines a distance between a physical object and a virtual object by comparing positions in the physical world (e.g., as opposed to comparing positions in the three-dimensional environment). For example, when determining the distance between one or more hands of the user and a virtual object, the device optionally determines the corresponding location in the physical world of the virtual object (e.g., the position at which the virtual object would be located in the physical world if it were a physical object rather than a virtual object), and then determines the distance between the corresponding physical position and the one of more hands of the user. In some embodiments, the same techniques are optionally used to determine the distance between any physical object and any virtual object. Thus, as described herein, when determining whether a physical object is in contact with a virtual object or whether a physical object is within a threshold distance of a virtual object, the device optionally performs any of the techniques described above to map the location of the physical object to the three-dimensional environment and/or map the location of the virtual object to the physical world.

[0112] In some embodiments, the same or similar technique is used to determine where and what the gaze of the user is directed to and/or where and at what a physical stylus held by a user is pointed. For example, if the gaze of the user is directed to a particular position in the physical environment, the device optionally determines the corresponding position in the three-dimensional environment and if a virtual object is located at that corresponding virtual position, the device optionally determines that the gaze of the user is directed to that virtual object. Similarly, the device is optionally able to determine, based on the orientation of a physical stylus, to where in the physical world the stylus is pointing. In some embodiments, based on this determination, the device determines the corresponding virtual position in the three-dimensional environment that corresponds to the location in the physical world to which the stylus is pointing, and optionally determines that the stylus is pointing at the corresponding virtual position in the three-dimensional environment.

[0113] Similarly, the embodiments described herein may refer to the location of the user (e.g., the user of the device) and/or the location of the device in the three-dimensional environment. In some embodiments, the user of the device is holding, wearing, or otherwise located at or near the electronic device. Thus, in some embodiments, the location of the device is used as a proxy for the location of the user. In some embodiments, the location of the device and/or user in the physical environment corresponds to a respective location in the three-dimensional environment. In some embodiments, the respective location is the location from which the “camera” or “view” of the three-dimensional environment extends. For example, the location of the device would be the location in the physical environment (and its corresponding location in the three-dimensional environment) from which, if a user were to stand at that location facing the respective portion of the physical environment displayed by the display generation component, the user would see the objects in the physical environment in the same position, orientation, and/or size as they are displayed by the display generation component of the device (e.g., in absolute terms and/or relative to each other). Similarly, if the virtual objects displayed in the three-dimensional environment were physical objects in the physical environment (e.g., placed at the same location in the physical environment as they are in the three-dimensional environment, and having the same size and orientation in the physical environment as in the three-dimensional environment), the location of the device and/or user is the position at which the user would see the virtual objects in the physical environment in the same position, orientation, and/or size as they are displayed by the display generation component of the device (e.g., in absolute terms and/or relative to each other and the real world objects).

[0114] In the present disclosure, various input methods are described with respect to interactions with a computer system. When an example is provided using one input device or input method and another example is provided using another input device or input method, it is to be understood that each example may be compatible with and optionally utilizes the input device or input method described with respect to another example. Similarly, various output methods are described with respect to interactions with a computer system. When an example is provided using one output device or output method and another example is provided using another output device or output method, it is to be understood that each example may be compatible with and optionally utilizes the output device or output method described with respect to another example. Similarly, various methods are described with respect to interactions with a virtual environment or a mixed reality environment through a computer system. When an example is provided using interactions with a virtual environment and another example is provided using mixed reality environment, it is to be understood that each example may be compatible with and optionally utilizes the methods described with respect to another example. As such, the present disclosure discloses embodiments that are combinations of the features of multiple examples, without exhaustively listing all features of an embodiment in the description of each example embodiment.

[0115] In addition, in methods described herein where one or more steps are contingent upon one or more conditions having been met, it should be understood that the described method can be repeated in multiple repetitions so that over the course of the repetitions all of the conditions upon which steps in the method are contingent have been met in different repetitions of the method. For example, if a method requires performing a first step if a condition is satisfied, and a second step if the condition is not satisfied, then a person of ordinary skill would appreciate that the claimed steps are repeated until the condition has been both satisfied and not satisfied, in no particular order. Thus, a method described with one or more steps that are contingent upon one or more conditions having been met could be rewritten as a method that is repeated until each of the conditions described in the method has been met. This, however, is not required of system or computer readable medium claims where the system or computer readable medium contains instructions for performing the contingent operations based on the satisfaction of the corresponding one or more conditions and thus is capable of determining whether the contingency has or has not been satisfied without explicitly repeating steps of a method until all of the conditions upon which steps in the method are contingent have been met. A person having ordinary skill in the art would also understand that, similar to a method with contingent steps, a system or computer readable storage medium can repeat the steps of a method as many times as are needed to ensure that all of the contingent steps have been performed.

User Interfaces and Associated Processes

[0116] Attention is now directed towards embodiments of user interfaces (“UI”) and associated processes that may be implemented on a computer system, such as portable multifunction device or a head-mounted device, with a display generation component, one or more input devices, and (optionally) one or cameras.

[0117] FIGS. 7A-7C illustrate exemplary ways in which electronic devices 101a or 101b perform or do not perform an operation in response to a user input depending on whether the user input is preceded by detecting a ready state of the user in accordance with some embodiments.

[0118] FIG. 7A illustrates electronic devices 101a and 101b displaying, via display generation components 120a and 120b, a three-dimensional environment. It should be understood that, in some embodiments, electronic devices 101a and/or 101b utilize one or more techniques described with reference to FIGS. 7A-7C in a two-dimensional environment or user interface without departing from the scope of the disclosure. As described above with reference to FIGS. 1-6, the electronic devices 101a and 1010b optionally include display generation components 120a and 120b (e.g., touch screens) and a plurality of image sensors 314a and 314b. The image sensors optionally include one or more of a visible light camera, an infrared camera, a depth sensor, or any other sensor the electronic device 101a and/or 101b would be able to use to capture one or more images of a user or a part of the user while the user interacts with the electronic devices 101a and/or 101b. In some embodiments, display generation components 120a and 120b are touch screens that are able to detect gestures and movements of a user’s hand. In some embodiments, the user interfaces described below could also be implemented on a head-mounted display that includes a display generation component that displays the user interface to the user, and sensors to detect the physical environment and/or movements of the user’s hands (e.g., external sensors facing outwards from the user), and/or gaze of the user (e.g., internal sensors facing inwards towards the face of the user).

[0119] FIG. 7A illustrates two electronic devices 101a and 101b displaying a three-dimensional environment that includes a representation 704 of a table in the physical environment of the electronic devices 101a and 101b (e.g., such as table 604 in FIG. 6B), a selectable option 707, and a scrollable user interface element 705. The electronic devices 101a and 101b present the three-dimensional environment from different viewpoints in the three-dimensional environment because they are associated with different user viewpoints in the three-dimensional environment. In some embodiments, the representation 704 of the table is a photorealistic representation displayed by display generation components 120a and/or 120b (e.g., digital pass-through). In some embodiments, the representation 704 of the table is a view of the table through a transparent portion of display generation components 120a and/or 120b (e.g., physical pass-through). In FIG. 7A, the gaze 701a of the user of the first electronic device 101a is directed to the scrollable user interface element 705 and the scrollable user interface element 705 is within an attention zone 703 of the user of the first electronic device 101a. In some embodiments, the attention zone 703 is similar to the attention zones described in more detail below with reference to FIGS. 9A-10H.

[0120] In some embodiments, the first electronic device 101a displays objects (e.g., the representation of the table 704 and/or option 707) in the three-dimensional environment that are not in the attention zone 703 with a blurred and/or dimmed appearance (e.g., a de-emphasized appearance). In some embodiments, the second electronic device 101b blurs and/or dims (e.g., de-emphasize) portions of the three-dimensional environment based on the attention zone of the user of the second electronic device 101b, which is optionally different from the attention zone of the user of the first electronic device 101a. Thus, in some embodiments, the attention zones and blurring of objects outside of the attention zones is not synced between the electronic devices 101a and 101b. Rather, in some embodiments, the attention zones associated with the electronic devices 101a and 101b are independent from each other.

[0121] In FIG. 7A, the hand 709 of the user of the first electronic device 101a is in an inactive hand state (e.g., hand state A). For example, the hand 709 is in a hand shape that does not correspond to a ready state or an input as described in more detail below. Because the hand 709 is in the inactive hand state, the first electronic device 101a displays the scrollable user interface element 705 without indicating that an input will be or is being directed to the scrollable user interface element 705. Likewise, electronic device 101b also displays the scrollable user interface element 705 without indicating that an input will be or is being directed to the scrollable user interface element 705.

[0122] In some embodiments, the electronic device 101a displays an indication that the gaze 701a of the user is on the user interface element 705 while the user’s hand 709 is in the inactive state. For example, the electronic device 101a optionally changes a color, size, and/or position of the scrollable user interface element 705 in a manner different from the way in which the electronic device 101a updates the scrollable user interface element 705 in response to detecting the ready state of the user, which will be described below. In some embodiments, the electronic device 101a indicates the gaze 701a of the user on user interface element 705 by displaying a visual indication separate from updating the appearance of the scrollable user interface element 705. In some embodiments, the second electronic device 101b forgoes displaying an indication of the gaze of the user of the first electronic device 101a. In some embodiments, the second electronic device 101b displays an indication to indicate the location of the gaze of the user of the second electronic device 101b.

[0123] In FIG. 7B, the first electronic device 101a detects a ready state of the user while the gaze 701b of the user is directed to the scrollable user interface element 705. In some embodiments, the ready state of the user is detected in response to detecting the hand 709 of the user in a direct ready state hand state (e.g., hand state D). In some embodiments, the ready state of the user is detected in response to detecting the hand 711 of the user in an indirect ready state hand state (e.g., hand state B).

[0124] In some embodiments, the hand 709 of the user of the first electronic device 101a is in the direct ready state when the hand 709 is within a predetermined threshold distance (e.g., 0.5, 1, 2, 3, 4, 5, 10, 15, 20, 30, etc. centimeters) of the scrollable user interface element 705, the scrollable user interface element 705 is within the attention zone 703 of the user, and/or the hand 709 is in a pointing hand shape (e.g., a hand shape in which one or more fingers are curled towards the palm and one or more fingers are extended towards the scrollable user interface element 705). In some embodiments, the scrollable user interface element 705 does not have to be in the attention zone 703 for the ready state criteria to be met for a direct input. In some embodiments, the gaze 701b of the user does not have to be directed to the scrollable user interface element 705 for the ready state criteria to be met for a direct input.

[0125] In some embodiments, the hand 711 of the user of the electronic device 101a is in the indirect ready state when the hand 711 is further than the predetermined threshold distance (e.g., 0.5, 1, 2, 3, 4, 5, 10, 15, 20, 30, etc. centimeters) from the scrollable user interface element 705, the gaze 701b of the user is directed to the scrollable user interface element 705, and the hand 711 is in a pre-pinch hand shape (e.g., a hand shape in which the thumb is within a threshold distance (e.g., 0.1, 0.5, 1, 2, 3, etc. centimeters) of another finger on the hand without touching the other finger on the hand). In some embodiments, the ready state criteria for indirect inputs are satisfied when the scrollable user interface element 705 is within the attention zone 703 of the user even if the gaze 701b is not directed to the user interface element 705. In some embodiments, the electronic device 101a resolves ambiguities in determining the location of the user’s gaze 701b as described below with reference to FIGS. 11A-12F.

[0126] In some embodiments, the hand shapes that satisfy the criteria for a direct ready state (e.g., with hand 709) are the same as the hand shapes that satisfy the criteria for an indirect ready state (e.g., with hand 711). For example, both a pointing hand shape and a pre-pinch hand shape satisfy the criteria for direct and indirect ready states. In some embodiments, the hand shapes that satisfy the criteria for a direct ready state (e.g., with hand 709) are different from the hand shapes that satisfy the criteria for an indirect ready state (e.g., with hand 711). For example, a pointing hand shape is required for a direct ready state but a pre-pinch hand shape is required for an indirect ready state.

[0127] In some embodiments, the electronic device 101a (and/or 101b) is in communication with one or more input devices, such as a stylus or trackpad. In some embodiments, the criteria for entering the ready state with an input device are different from the criteria for entering the ready state without one of these input devices. For example, the ready state criteria for these input devices do not require detecting the hand shapes described above for the direct and indirect ready states without a stylus or trackpad. For example, the ready state criteria when the user is using a stylus to provide input to device 101a and/or 101b require that the user is holding the stylus and the ready state criteria when the user is using a trackpad to provide input to device 101a and/or 101b require that the hand of the user is resting on the trackpad.

[0128] In some embodiments, each hand of the user (e.g., a left hand and a right hand) have an independently associated ready state (e.g., each hand must independent satisfy its ready state criteria before devices 101a and/or 101b will respond to inputs provided by each respective hand). In some embodiments, the criteria for the ready state of each hand are different from each other (e.g., different hand shapes required for each hand, only allowing indirect or direct ready states for one or both hands). In some embodiments, the visual indication of the ready state for each hand is different. For example, if the color of the scrollable user interface element 705 changes to indicate the ready state being detected by device 101a and/or 101b, the color of the scrollable user interface element 705 could be a first color (e.g., blue) for the ready state of the right hand and could be a second color (e.g., green) for the ready state of the left hand.

[0129] In some embodiments, in response to detecting the ready state of the user, the electronic device 101a becomes ready to detect input provided by the user (e.g., by the user’s hand(s)) and updates display of the scrollable user interface element 705 to indicate that further input will be directed to the scrollable user interface element 705. For example, as shown in FIG. 7B, the scrollable user interface element 705 is updated at electronic device 101a by increasing the thickness of a line around the boundary of the scrollable user interface element 705. In some embodiments, the electronic device 101a updates the appearance of the scrollable user interface element 705 in a different or additional manner, such as by changing the color of the background of the scrollable user interface element 705, displaying highlighting around the scrollable user interface element 705, updating the size of the scrollable user interface element 705, updating a position in the three-dimensional environment of the scrollable user interface element 705 (e.g., displaying the scrollable user interface element 705 closer to the viewpoint of the user in the three-dimensional environment), etc. In some embodiments, the second electronic device 101b does not update the appearance of the scrollable user interface element 705 to indicate the ready state of the user of the first electronic device 101a.

[0130] In some embodiments, the way in which the electronic device 101a updates the scrollable user interface element 705 in response to detecting the ready state is the same regardless of whether the ready state is a direct ready state (e.g., with hand 709) or an indirect ready state (e.g., with hand 711). In some embodiments, the way in which the electronic device 101a updates the scrollable user interface element 705 in response to detecting the ready state is different depending on whether the ready state is a direct ready state (e.g., with hand 709) or an indirect ready state (e.g., with hand 711). For example, if the electronic device 101a updates the color of the scrollable user interface element 705 in response to detecting the ready state, the electronic device 101a uses a first color (e.g., blue) in response to a direct ready state (e.g., with hand 709) and uses a second color (e.g., green) in response to an indirect ready state (e.g., with hand 711).

[0131] In some embodiments, after detecting the ready state to the scrollable user interface element 705, the electronic device 101a updates the target of the ready state based on an indication of the user’s focus. For example, the electronic device 101a directs the indirect ready state (e.g., with hand 711) to the selectable option 707 (e.g., and removes the ready state from scrollable user interface element 705) in response to detecting the location of the gaze 701b move from the scrollable user interface element 705 to the selectable option 707. As another example, the electronic device 101a directs the direct ready state (e.g., with hand 709) to the selectable option 707 (e.g., and removes the ready state from scrollable user interface element 705) in response to detecting the hand 709 move from being within the threshold distance (e.g., 0.5, 1, 2, 3, 4, 5, 10, 15, 30, etc. centimeters) of the scrollable user interface element 705 to being within the threshold distance of the selectable option 707.

[0132] In FIG. 7B, device 101b detects that the user of the second electronic device 101b directs their gaze 701c to the selectable option 707 while the hand 715 of the user is in the inactive state (e.g., hand state A). Because the electronic device 101b does not detect the ready state of the user, the electronic device 101b forgoes updating the selectable option 707 to indicate the ready state of the user. In some embodiments, as described above, the electronic device 101b updates the appearance of the selectable option 707 to indicate that the gaze 701c of the user is directed to the selectable option 707 in a manner that is different from the manner in which the electronic device 101b updates user interface elements to indicate the ready state.

[0133] In some embodiments, the electronic devices 101a and 101b only perform operations in response to inputs when the ready state was detected prior to detecting the input. FIG. 7C illustrates the users of the electronic devices 101a and 101b providing inputs to the electronic devices 101a and 101b, respectively. In FIG. 7B, the first electronic device 101a detected the ready state of the user, whereas in the second electronic device 101b did not detect the ready state, as previously described. Thus, in FIG. 7C, the first electronic device 101a performs an operation in response to detecting the user input, whereas the second electronic device 101b forgoes performing an operation in response to detecting the user input.

[0134] In particular, in FIG. 7C, the first electronic device 101a detects a scrolling input directed to scrollable user interface element 705. FIG. 7C illustrates a direct scrolling input provided by hand 709 and/or an indirect scrolling input provided by hand 711. The direct scrolling input includes detecting hand 709 within a direct input threshold (e.g., 0.05, 0.1, 0.2, 0.3, 0.5, 1, etc. centimeters) or touching the scrollable user interface element 705 while the hand 709 is in the pointing hand shape (e.g., hand state E) while the hand 709 moves in a direction in which the scrollable user interface element 705 is scrollable (e.g., vertical motion or horizontal motion). The indirect scrolling input includes detecting hand 711 further than the direct input ready state threshold (e.g., 0.5, 1, 2, 3, 4, 5, 10, 15, 30, etc. centimeters) and/or further than the direct input threshold (e.g., 0.05, 0.1, 0.2, 0.3, 0.5, 1, etc. centimeters) from the scrollable user interface element 705, detecting the hand 711 in a pinch hand shape (e.g., a hand shape in which the thumb touches another finger on the hand 711, hand state C) and movement of the hand 711 in a direction in which the scrollable user interface element 705 is scrollable (e.g., vertical motion or horizontal motion), while detecting the gaze 701b of the user on the scrollable user interface element 705.

[0135] In some embodiments, the electronic device 101a requires that the scrollable user interface element 705 is within the attention zone 703 of the user for the scrolling input to be detected. In some embodiments, the electronic device 101a does not require the scrollable user interface element 705 to be within the attention zone 703 of the user for the scrolling input to be detected. In some embodiments, the electronic device 101a requires the gaze 701b of the user to be directed to the scrollable user interface element 705 for the scrolling input to be detected. In some embodiments, the electronic device 101a does not require the gaze 701b of the user to be directed to the scrollable user interface element 705 for the scrolling input to be detected. In some embodiments, the electronic device 101a requires the gaze 701b of the user to be directed to the scrollable user interface element 705 for indirect scrolling inputs but not for direct scrolling inputs.

[0136] In response to detecting the scrolling input, the first electronic device 101a scrolls the content in the scrollable user interface element 705 in accordance with the movement of hand 709 or hand 711, as shown in FIG. 7C. In some embodiments, the first electronic device 101a transmits an indication of the scrolling to the second electronic device 101b (e.g., via a server) and, in response, the second electronic device 101b scrolls the scrollable user interface element 705 the same way in which the first electronic device 101a scrolls the scrollable user interface element 705. For example, the scrollable user interface element 705 in the three-dimensional environment has now been scrolled, and therefore the electronic devices that display viewpoints of the three-dimensional environment (e.g., including electronic devices other than those that detected the input for scrolling the scrollable user interface element 705) that include the scrollable user interface element 705 reflect the scrolled state of the user interface element. In some embodiments, if the ready state of the user shown in FIG. 7B had not been detected prior to detecting the input illustrated in FIG. 7C, the electronic devices 101a and 101b would forgo scrolling the scrollable user interface element 705 in response to the inputs illustrated in FIG. 7C.

[0137] Therefore, in some embodiments, the results of user inputs are synchronized between the first electronic device 101a and the second electronic device 101b. For example, if the second electronic device 101b were to detect selection of the selectable option 707, both the first and second electronic devices 101a and 101b would update the appearance (e.g., color, style, size, position, etc.) of the selectable option 707 while the selection input is being detected and perform the operation in accordance with the selection.

[0138] Thus, because the electronic device 101a detected the ready state of the user in FIG. 7B before detecting the input in FIG. 7C, the electronic device 101a scrolls the scrollable user interface 705 in response to the input. In some embodiments, the electronic devices 101a and 101b forgo performing actions in response to inputs that were detected without first detecting the ready state.

[0139] For example, in FIG. 7C, the user of the second electronic device 101b provides an indirect selection input with hand 715 directed to selectable option 707. In some embodiments, detecting the selection input includes detecting the hand 715 of the user making a pinch gesture (e.g., hand state C) while the gaze 701c of the user is directed to the selectable option 707. Because the second electronic device 101b did not detect the ready state (e.g., in FIG. 7B) prior to detecting the input in FIG. 7C, the second electronic device 101b forgoes selecting the option 707 and forgoes performing an action in accordance with the selection of option 707. In some embodiments, although the second electronic device 101b detects the same input (e.g., an indirect input) as the first electronic device 101a in FIG. 7C, the second electronic device 101b does not perform an operation in response to the input because the ready state was not detected before the input was detected. In some embodiments, if the second electronic device 101b had detected a direct input without having first detected the ready state, the second electronic device 101b would also forgo performing an action in response to the direct input because the ready state was not detected before the input was detected.

[0140] FIGS. 8A-8K is a flowchart illustrating a method 800 of performing or not performing an operation in response to a user input depending on whether the user input is preceded by detecting a ready state of the user in accordance with some embodiments. In some embodiments, the method 800 is performed at a computer system (e.g., computer system 101 in FIG. 1 such as a tablet, smartphone, wearable computer, or head mounted device) including a display generation component (e.g., display generation component 120 in FIGS. 1, 3, and 4) (e.g., a heads-up display, a display, a touchscreen, a projector, etc.) and one or more cameras (e.g., a camera (e.g., color sensors, infrared sensors, and other depth-sensing cameras) that points downward at a user’s hand or a camera that points forward from the user’s head). In some embodiments, the method 800 is governed by instructions that are stored in a non-transitory computer-readable storage medium and that are executed by one or more processors of a computer system, such as the one or more processors 202 of computer system 101 (e.g., control unit 110 in FIG. 1A). Some operations in method 800 are, optionally, combined and/or the order of some operations is, optionally, changed.

[0141] In some embodiments, method 800 is performed at an electronic device 101a or 101b in communication with a display generation component and one or more input devices (e.g., a mobile device (e.g., a tablet, a smartphone, a media player, or a wearable device), or a computer). In some embodiments, the display generation component is a display integrated with the electronic device (optionally a touch screen display), external display such as a monitor, projector, television, or a hardware component (optionally integrated or external) for projecting a user interface or causing a user interface to be visible to one or more users, etc. In some embodiments, the one or more input devices include an electronic device or component capable of receiving a user input (e.g., capturing a user input, detecting a user input, etc.) and transmitting information associated with the user input to the electronic device. Examples of input devices include a touch screen, mouse (e.g., external), trackpad (optionally integrated or external), touchpad (optionally integrated or external), remote control device (e.g., external), another mobile device (e.g., separate from the electronic device), a handheld device (e.g., external), a controller (e.g., external), a camera, a depth sensor, an eye tracking device, and/or a motion sensor (e.g., a hand tracking device, a hand motion sensor), etc. In some embodiments, the electronic device is in communication with a hand tracking device (e.g., one or more cameras, depth sensors, proximity sensors, touch sensors (e.g., a touch screen, trackpad). In some embodiments, the hand tracking device is a wearable device, such as a smart glove. In some embodiments, the hand tracking device is a handheld input device, such as a remote control or stylus.

[0142] In some embodiments, such as in FIG. 7A the electronic device 101a displays (802a), via the display generation component, a user interface that includes a user interface element (e.g., 705). In some embodiments, the user interface element is an interactive user interface element and, in response to detecting an input directed towards the user interface element, the electronic device performs an action associated with the user interface element. For example, the user interface element is a selectable option that, when selected, causes the electronic device to perform an action, such as displaying a respective user interface, changing a setting of the electronic device, or initiating playback of content. As another example, the user interface element is a container (e.g., a window) in which a user interface/content is displayed and, in response to detecting selection of the user interface element followed by a movement input, the electronic device updates the position of the user interface element in accordance with the movement input. In some embodiments, the user interface and/or user interface element are displayed in a three-dimensional environment (e.g., the user interface is the three-dimensional environment and/or is displayed within a three-dimensional environment) that is generated, displayed, or otherwise caused to be viewable by the device (e.g., a computer-generated reality (CGR) environment such as a virtual reality (VR) environment, a mixed reality (MR) environment, or an augmented reality (AR) environment, etc.

[0143] In some embodiments, such as in FIG. 7C, while displaying the user interface element (e.g., 705), the electronic device 101a detects (802b), via the one or more input devices, an input from a predefined portion (e.g., 709) (e.g., hand, arm, head, eyes, etc.) of a user of the electronic device 101a. In some embodiments, detecting the input includes detecting, via the hand tracking device, that the user performs a predetermined gesture with their hand optionally while the gaze of the user is directed towards the user interface element. For example, the predetermined gesture is a pinch gesture that includes touching a thumb to another finger (e.g., index, middle, ring, little finger) on the same hand as the thumb while the looking at the user interface element. In some embodiments, the input is a direct or indirect interaction with the user interface element, such as described with reference to methods 1000, 1200, 1400, 1600, 1800 and/or 2000).

[0144] In some embodiments, in response to detecting the input from the predefined portion of the user of the electronic device (802c), in accordance with a determination that a pose (e.g., position, orientation, hand shape) of the predefined portion (e.g., 709) of the user prior to detecting the input satisfies one or more criteria, the electronic device performs (802d) a respective operation in accordance with the input from the predefined portion (e.g., 709) of the user of the electronic device 101a, such as in FIG. 7C. In some embodiments, the pose of the physical feature of the user is an orientation and/or shape of the hand of the user. For example, the pose satisfies the one or more criteria if the electronic device detects that the hand of the user is oriented with the user’s palm facing away from the user’s torso while in a pre-pinch hand shape in which the thumb of the user is within a threshold distance (e.g., 0.5, 1, 2, etc. centimeters) of another finger (e.g., index, middle, ring, little finger) on the hand of the thumb. As another example, the one or more criteria are satisfied when the hand is in a pointing hand shape in which one or more fingers are extended and one or more other fingers are curled towards the user’s palm. Input by the hand of the user subsequent to the detection of the pose is optionally recognized as directed to the user interface element, and the device optionally performs the respective operation in accordance with that subsequent input by the hand. In some embodiments, the respective operation includes scrolling a user interface, selecting an option, activating a setting, or navigating to a new user interface. In some embodiments, in response to detecting an input that includes selection followed by movement of the portion of the user after detecting the predetermined pose, the electronic device scrolls a user interface. For example, the electronic device detects the user’s gaze directed to the user interface while first detecting a pointing hand shape, followed by movement of the user’s hand away from the torso of the user and in a direction in which the user interface is scrollable and, in response to the sequence of inputs, scrolls the user interface. As another example, in response to detecting the user’s gaze on an option to activate a setting of the electronic device while detecting the pre-pinch hand shape followed by a pinch hand shape, the electronic device activates the setting on the electronic device.

[0145] In some embodiments, such as in FIG. 7C, in response to detecting the input from the predefined portion (e.g., 715) of the user of the electronic device 101b (802c), in accordance with a determination that the pose of the predefined portion (e.g., 715) of the user prior to detecting the input does not satisfy the one or more criteria, such as in FIG. 7B, the electronic device 101b forgoes (802e) performing the respective operation in accordance with the input from the predefined portion (e.g., 715) of the user of the electronic device 101b, such as in FIG. 7C. In some embodiments, even if the pose satisfies the one or more criteria, the electronic device forgoes performing the respective operation in response to detecting that, while the pose and the input were detected, the gaze of the user was not directed towards the user interface element. In some embodiments, in accordance with a determination that the gaze of the user is directed towards the user interface element while the pose and the input are detected, the electronic device performs the respective operation in accordance with the input.

[0146] The above-described manner of performing or not performing the first operation depending on whether or not the pose of the predefined portion of the user prior to detecting the input satisfies one or more criteria provides an efficient way of reducing accidental user inputs, which simplifies the interaction between the user and the electronic device and enhances the operability of the electronic device and makes the user-device interface more efficient, which additionally reduces power usage and improves battery life of the electronic device by enabling the user to use the electronic device more quickly and efficiently, while reducing errors in usage and by reducing the likelihood that the electronic device performs an operation that was not intended and will be subsequently reversed.

[0147] In some embodiments, such as in FIG. 7A, while the pose of the predefined portion (e.g., 709) of the user does not satisfy the one or more criteria (e.g., prior to detecting the input from the predefined portion of the user), the electronic device 101a displays (804a) the user interface element (e.g., 705) with a visual characteristic (e.g., size, color, position, translucency) having a first value and displaying a second user interface element (e.g., 707) included in the user interface with the visual characteristic (e.g., size, color, position, translucency) having a second value. In some embodiments, displaying the user interface element with the visual characteristic having the first value and displaying the second user interface element with the visual characteristic having the second value indicates that the input focus is not directed to the user interface element nor the second user interface element and/or that the electronic device will not direct input from the predefined portion of the user to the user interface element or the second user interface element.

[0148] In some embodiments, such as in FIG. 7B, while the pose of the predefined portion (e.g., 709) of the user satisfies the one or more criteria, the electronic device 101a updates (804b) the visual characteristic of a user interface element (e.g., 705) toward which an input focus is directed, including (e.g., prior to detecting the input from the predefined portion of the user), in accordance with a determination that that an input focus is directed to the user interface element (e.g., 705), the electronic device 101a updates (804c) the user interface element (e.g., 705) to be displayed with the visual characteristic (e.g., size, color, translucency) having a third value (e.g., different from the first value, while maintaining display of the second user interface element with the visual characteristic having the second value). In some embodiments, the input focus is directed to the user interface element in accordance with a determination that the gaze of the user is directed towards the user interface element, optionally including disambiguation techniques according to method 1200. In some embodiments, the input focus is directed to the user interface element in accordance with a determination that the predefined portion of the user is within a threshold distance (e.g., 0.5, 1, 2, 3, 4, 5, 10, 30, 50, etc. centimeters) of the user interface element (e.g., a threshold distance for a direct input). For example, before the predefined portion of the user satisfies the one or more criteria, the electronic device displays the user interface element in a first color and, in response to detecting that the predefine portion of the user satisfies the one or more criteria and the input focus is directed to the user interface element, the electronic device displays the user interface element in a second color different from the first color to indicate that input from the predefined portion of the user will be directed to the user interface element.

[0149] In some embodiments, while the pose of the predefined portion (e.g., 705) of the user satisfies the one or more criteria, such as in FIG. 7B, the electronic device 101a updates (804b) the visual characteristic of a user interface element toward which an input focus is directed (e.g., in the way in which the electronic device 101a updates user interface element 705 in FIG. 7B), including (e.g., prior to detecting the input from the predefined portion of the user), in accordance with a determination that the input focus is directed to the second user interface element, the electronic device 101a updates (804d) the second user interface element to be displayed with the visual characteristic having a fourth value (e.g., updating the appearance of user interface element 707 in FIG. 7B if user interface element 707 has the input focus instead of user interface element 705 having the input focus as is the case in FIG. 7B) (e.g., different from the second value, while maintaining display of the user interface element with the visual characteristic having the first value). In some embodiments, the input focus is directed to the second user interface element in accordance with a determination that the gaze of the user is directed towards the second user interface element, optionally including disambiguation techniques according to method 1200. In some embodiments, the input focus is directed to the second user interface element in accordance with a determination that the predefined portion of the user is within a threshold distance (e.g., 0.5, 1, 2, 3, 4, 5, 10, 50, etc. centimeters) of the second user interface element (e.g., a threshold distance for a direct input). For example, before the predefined portion of the user satisfies the one or more criteria, the electronic device displays the second user interface element in a first color and, in response to detecting that the predefined portion of the user satisfies the one or more criteria and the input focus is directed to the second user interface element, the electronic device displays the second user interface element in a second color different from the first color to indicate that input will be directed to the user interface element.

[0150] The above-described manner of updating the visual characteristic of the user interface element to which input focus is directed in response to detecting that the predefined portion of the user satisfies the one or more criteria provides an efficient way of indicating to the user which user interface element input will be directed towards, which simplifies the interaction between the user and the electronic device and enhances the operability of the electronic device and makes the user-device interface more efficient, which additionally reduces power usage and improves battery life of the electronic device by enabling the user to use the electronic device more quickly and efficiently, while reducing errors in usage.

[0151] In some embodiments, such as in FIG. 7B, the input focus is directed to the user interface element (e.g., 705) in accordance with a determination that the predefined portion (e.g., 709) of the user is within a threshold distance (e.g., 0.5, 1, 2, 3, 4, 5, 10, 50, etc. centimeters) of a location corresponding to the user interface element (e.g., 705) (806a) (e.g., and not within the threshold distance of the second user interface element). In some embodiments, the threshold distance is associated with a direct input, such as described with reference to methods 800, 1000, 1200, 1400, 1600, 1800 and/or 2000. For example, the input focus is directed to the user interface element in response to detecting the finger of the user’s hand in the pointing hand shape within the threshold distance of the user interface element.

[0152] In some embodiments, the input focus is directed to the second user interface element (e.g., 707) in FIG. 7B in accordance with a determination that the predefined portion (e.g., 709) of the user is within the threshold distance (e.g., 0.5, 1, 2, 3, 4, 5, 10, 50, etc. centimeters) of the second user interface element (806b) (e.g., and not within the threshold distance of the user interface element; such as if the user’s hand 709 were within the threshold distance of user interface element 707 instead of user interface element 705 in FIG. 7B, for example). In some embodiments, the threshold distance is associated with a direct input, such as described with reference to methods 800, 1000, 1200, 1400, 1600, 1800 and/or 2000. For example, the input focus is directed to the second user interface element in response to detecting the finger of the user’s hand in the pointing hand shape within the threshold distance of the second user interface element.

[0153] The above-described manner of directing the input focus based on which user interface element the predefined portion of the user is within the threshold distance of provides an efficient way of directing user input when providing inputs using the predefined portion of the user, which simplifies the interaction between the user and the electronic device and enhances the operability of the electronic device and makes the user-device interface more efficient, which additionally reduces power usage and improves battery life of the electronic device by enabling the user to use the electronic device more quickly and efficiently, while reducing errors in usage.

[0154] In some embodiments, such as in FIG. 7B, the input focus is directed to the user interface element (e.g., 705) in accordance with a determination that a gaze (e.g., 701b) of the user is directed to the user interface element (e.g., 705) (808a) (e.g., and the predefined portion of the user is not within the threshold distance of the user interface element and/or any interactive user interface element). In some embodiments, determining that the gaze of the user is directed to the user interface element includes one or more disambiguation techniques according to method 1200. For example, the electronic device directs the input focus to the user interface element for indirect input in response to detecting the gaze of the user directed to the user interface element.

[0155] In some embodiments, the input focus is directed to the second user interface element (e.g., 707) in FIG. 7B in accordance with a determination that the gaze of the user is directed to the second user interface element (e.g., 707) (808b) (e.g., and the predefined portion of the user is not within a threshold distance of the second user interface element and/or any interactable user interface element). For example, if the gaze of the user was directed to user interface element 707 in FIG. 7B instead of user interface element 705, the input focus would be directed to user interface element 707. In some embodiments, determining that the gaze of the user is directed to the second user interface element includes one or more disambiguation techniques according to method 1200. For example, the electronic device directs the input focus to the second user interface element for indirect input in response to detecting the gaze of the user directed to the second user interface element.

[0156] The above-described manner of directing the input focus to the user interface at which the user is looking provides an efficient way of directing user inputs without the user of additional input devices (e.g., other than an eye tracking device and hand tracking device), which simplifies the interaction between the user and the electronic device and enhances the operability of the electronic device and makes the user-device interface more efficient, which additionally reduces power usage and improves battery life of the electronic device by enabling the user to use the electronic device more quickly and efficiently, while reducing errors in usage.

[0157] In some embodiments, such as in FIG. 7B, updating the visual characteristic of a user interface element (e.g., 705) toward which an input focus is directed includes (810a), in accordance with a determination that the predefined portion (e.g., 709) of the user is less than a threshold distance (e.g., 1, 2, 3, 4, 5, 10, 15, 30, etc. centimeters) from a location corresponding to the user interface element (e.g., 705), the visual characteristic of the user interface element (e.g., 705) toward which the input focus is directed is updated in accordance with a determination that the pose of the predefined portion (e.g., 709) of the user satisfies a first set of one or more criteria (810b), such as in FIG. 7B (and, optionally, the visual characteristic of the user interface element toward which the input focus is directed is not updated in accordance with a determination that the pose of the predefined portion of the user does not satisfy the first set of one or more criteria) (e.g., associated with direct inputs such as described with reference to methods 800, 1000, 1200, 1400, 1600, 1800 and/or 2000). For example, while the hand of the user is within the direct input threshold distance of the user interface element, the first set of one or more criteria include detecting a pointing hand shape (e.g., a shape in which a finger is extending out from an otherwise closed hand.

[0158] In some embodiments, such as in FIG. 7B, updating the visual characteristic of a user interface element (e.g., 705) toward which an input focus is directed includes (810a), in accordance with a determination that the predefined portion (e.g., 711) of the user is more than the threshold distance (e.g., 1, 2, 3, 4, 5, 10, 15, 30, etc. centimeters) from the location corresponding to the user interface element (e.g., 705), the visual characteristic of the user interface element (e.g., 705) toward which the input focus is directed is updated in accordance with a determination that the pose of the predefined portion (e.g., 711) of the user satisfies a second set of one or more criteria (e.g., associated with indirect inputs such as described with reference to methods 800, 1000, 1200, 1400, 1600, 1800 and/or 2000), different from the first set of one or more criteria (810c), such as in FIG. 7B (and, optionally, the visual characteristic of the user interface element toward which the input focus is directed is not updated in accordance with a determination that the pose of the predefined portion of the user does not satisfy the second set of one or more criteria). For example, while the hand of the user is more than the direct input threshold from the user interface element, the second set of one or more criteria include detecting a pre-pinch hand shape instead of detecting the pointing hand shape. In some embodiments, the hand shapes that satisfy the one or more first criteria are different from the hand shapes that satisfy the one or more second criteria. In some embodiments, the one or more criteria are not satisfied when the predefined portion of the user is greater than the threshold distance from the location corresponding to the user interface element and the pose of the predefined portion of the user satisfies the first set of one or more criteria without satisfying the second set of one or more criteria. In some embodiments, the one or more criteria are not satisfied when the predefined portion of the user is less than the threshold distance from the location corresponding to the user interface element and the pose of the predefined portion of the user satisfies the second set of one or more criteria without satisfying the first set of one or more criteria.

[0159] The above-described manner of using different criteria to evaluate the predefined portion of the user depending on whether the predefined portion of the user is within the threshold distance of a location corresponding to the user interface element provides efficient and intuitive ways of interacting with the user interface element that are tailored to whether the input is a direct or indirect input, which simplifies the interaction between the user and the electronic device and enhances the operability of the electronic device and makes the user-device interface more efficient, which additionally reduces power usage and improves battery life of the electronic device by enabling the user to use the electronic device more quickly and efficiently, while reducing errors in usage.

[0160] In some embodiments, such as in FIG. 7B, the pose of the predefined portion (e.g., 709) of the user satisfying the one or more criteria includes (812a), in accordance with a determination that the predefined portion (e.g., 709) of the user is less than a threshold distance (e.g., 1, 2, 3, 4, 5, 10, 15, 30, etc. centimeters) from a location corresponding to the user interface element (e.g., 705), the pose of the predefined portion (e.g., 709) of the user satisfying a first set of one or more criteria (812b) (e.g., associated with direct inputs such as described with reference to methods 800, 1000, 1200, 1400, 1600, 1800 and/or 2000). For example, while the hand of the user is within the direct input threshold distance of the user interface element, the first set of one or more criteria include detecting a pointing hand shape (e.g., a shape in which a finger is extending out from an otherwise closed hand).

[0161] In some embodiments, such as in FIG. 7B, the pose of the predefined portion (e.g., 711) of the user satisfying the one or more criteria includes (812a), in accordance with a determination that the predefined portion (e.g., 711) of the user is more than the threshold distance (e.g., 1, 2, 3, 4, 5, 10, 15, 30, etc. centimeters) from the location corresponding to the user interface element (e.g., 705), the pose of the predefined portion (e.g., 711) of the user satisfying a second set of one or more criteria (e.g., associated with indirect inputs such as described with reference to methods 800, 1000, 1200, 1400, 1600, 1800 and/or 2000), different from the first set of one or more criteria (812c). For example, while the hand of the user is more than the direct input threshold from the user interface element, the second set of one or more criteria include detecting a pre-pinch hand shape. In some embodiments, the hand shapes that satisfy the one or more first criteria are different from the hand shapes that satisfy the one or more second criteria. In some embodiments, the one or more criteria are not satisfied when the predefined portion of the user is greater than the threshold distance from the location corresponding to the user interface element and the pose of the predefined portion of the user satisfies the first set of one or more criteria without satisfying the second set of one or more criteria. In some embodiments, the one or more criteria are not satisfied when the predefined portion of the user is less than the threshold distance from the location corresponding to the user interface element and the pose of the predefined portion of the user satisfies the second set of one or more criteria without satisfying the first set of one or more criteria.

[0162] The above-described manner of using different criteria to evaluate the predefined portion of the user depending on whether the predefined portion of the user is within the threshold distance of a location corresponding to the user interface element provides efficient and intuitive ways of interacting with the user interface element that are tailored to whether the input is a direct or indirect input, which simplifies the interaction between the user and the electronic device and enhances the operability of the electronic device and makes the user-device interface more efficient, which additionally reduces power usage and improves battery life of the electronic device by enabling the user to use the electronic device more quickly and efficiently, while reducing errors in usage.

[0163] In some embodiments, the pose of the predefined portion of the user satisfying the one or more criteria such as in FIG. 7B includes (814a), in accordance with a determination that the predefined portion of the user is holding (e.g., or interacting with, or touching) an input device (e.g., stylus, remote control, trackpad) of the one or more input devices, the pose of the predefined portion of the user satisfying a first set of one or more criteria (814b) (e.g., if the hand 709 of the user in FIG. 7B were holding an input device). In some embodiments, the predefined portion of the user is the user’s hand. In some embodiments, the first set of one or more criteria are satisfied when the user is holding a stylus or controller in their hand within a predefined region of the three-dimensional environment, and/or with a predefined orientation relative to the user interface element and/or relative to the torso of the user. In some embodiments, the first set of one or more criteria are satisfied when the user is holding a remote control within a predefined region of the three-dimensional environment, with a predefined orientation relative to the user interface element and/or relative to the torso of the user, and/or while a finger of thumb of the user is resting on a respective component (e.g., a button, trackpad, touchpad, etc.) of the remote control. In some embodiments, the first set of one or more criteria are satisfied when the user is holding or interacting with a trackpad and the predefined portion of the user is in contact with the touch-sensitive surface of the trackpad (e.g., without pressing into the trackpad, as would be done to make a selection).

[0164] In some embodiments, such as in FIG. 7B, the pose of the predefined portion (e.g., 709) of the user satisfying the one or more criteria includes (814a), in accordance with a determination that the predefined portion (e.g., 709) of the user is not holding the input device, the pose of the predefined portion (e.g., 709) of the user satisfying a second set of one or more criteria (814c) (e.g., different from the first set of one or more criteria). In some embodiments, while the user of the electronic device is not holding, touching, or interacting with the input device, the second set of one or more criteria are satisfied when the pose of the user is a predefined pose (e.g., a pose including a pre-pinch or pointing hand shape), such as previously described instead of holding the stylus or controller in their hand. In some embodiments, the pose of the predefined portion of the user does not satisfy the one or more criteria when the predefined portion of the user is holding an input device and the second set of one or more criteria are satisfied and the first set of one or more criteria are not satisfied. In some embodiments, the pose of the predefined portion of the user does not satisfy the one or more criteria when the predefined portion of the user is not holding an input device and the first set of one or more criteria are satisfied and the second set of one or more criteria are not satisfied.

[0165] The above-described manner of evaluating the predefined portion of the user according to different criteria depending on whether or not the user is holding the input device provides efficient ways of switching between accepting input using the input device and input that does not use the input device (e.g., an input device other than eye tracking and/or hand tracking devices) which simplifies the interaction between the user and the electronic device and enhances the operability of the electronic device and makes the user-device interface more efficient, which additionally reduces power usage and improves battery life of the electronic device by enabling the user to use the electronic device more quickly and efficiently.

[0166] In some embodiments, such as in FIG. 7B, the pose of the predefined portion (e.g., 709) of the user satisfying the one or more criteria includes (816a), in accordance with a determination that the predefined portion (e.g., 709) of the user is less than a threshold distance (e.g., 0.5, 1, 2, 3, 4, 5, 10, 15, 30, 50, etc. centimeters, corresponding to direct inputs) from a location corresponding to the user interface element (e.g., 705), the pose of the predefined portion (e.g., 709) of the user satisfying a first set of one or more criteria (816b). For example, while the hand of the user is within the direct input threshold distance of the user interface element, the first set of one or more criteria include detecting a pointing hand shape and/or a pre-pinch hand shape.

[0167] In some embodiments, such as in FIG. 7B, the pose of the predefined portion (e.g., 711) of the user satisfying the one or more criteria includes (816a), in accordance with a determination that the predefined portion (e.g., 711) of the user is more than the threshold distance (e.g., 0.5, 1, 2, 3, 4, 5, 10, 15, 30, 50, etc. centimeters, corresponding to indirect inputs) from the location corresponding to the user interface element (e.g., 705), the pose of the predefined portion (e.g., 711) of the user satisfying the first set of one or more criteria (816c).

[0168] For example, while the hand of the user is more than the direct input threshold from the user interface element, the second set of one or more criteria include detecting a pre-pinch hand shape and/or a pointing hand shape that is the same as the hand shapes used to satisfy the one or more criteria for the. In some embodiments, the hand shapes that satisfy the one or more first criteria are the same regardless of whether or not the predefined portion of the hand is greater than or less than the threshold distance from the location corresponding to the user interface element.

[0169] The above-described manner of evaluating the pose of the predefined portion of the user against the first set of one or more criteria irrespective of the distance between the predefined portion of the user and the location corresponding to the user interface element provides an efficient and consistent way of detecting user inputs provided with the predefined portion of the user, which simplifies the interaction between the user and the electronic device and enhances the operability of the electronic device and makes the user-device interface more efficient, which additionally reduces power usage and improves battery life of the electronic device by enabling the user to use the electronic device more quickly and efficiently, while reducing errors in usage.

[0170] In some embodiments, such as in FIG. 7C, in accordance with a determination that the predefined portion (e.g., 711) of the user, during the respective input, is more than a threshold distance (e.g., 0.5, 1, 2, 3, 4, 5, 10, 15, 30, 50, etc. centimeters, corresponding to indirect input) away from a location corresponding to the user interface element (e.g., 705) (e.g., the input is an indirect input), the one or more criteria include a criterion that is satisfied when an attention of the user is directed towards the user interface element (e.g., 705) (818a) (e.g., and the criterion is not satisfied when the attention of the user is not directed towards the user interface element) (e.g., the gaze of the user is within a threshold distance of the user interface element, the user interface element is within the attention zone of the user, etc., such as described with reference to method 1000). In some embodiments, the electronic device determines which user interface element an indirect input is directed to based on the attention of the user, so it is not possible to provide an indirect input to a respective user interface element without directing the user attention to the respective user interface element.

[0171] In some embodiments, such as in FIG. 7C, in accordance with a determination that the predefined portion (e.g., 709) of the user, during the respective input, is less than the threshold distance (e.g., 0.5, 1, 2, 3, 4, 5, 10, 15, 30, 50, etc. centimeters, corresponding to direct input) away from the location corresponding to the user interface element (e.g., 705) (e.g., the input is a direct input), the one or more criteria do not include a requirement that the attention of the user is directed towards the user interface element (e.g., 709) in order for the one or more criteria to be met (818b) (e.g., it is possible for the one or more criteria to be satisfied without the attention of the user being directed towards the user interface element). In some embodiments, the electronic device determines the target of a direct input based on the location of the predefined portion of the user relative to the user interface elements in the user interface and directs the input to the user interface element closest to the predefined portion of the user irrespective of whether or not the user’s attention is directed to that user interface element.

[0172] The above-described manner of requiring the attention of the user to satisfy the one or more criteria while the predefined portion of the user is more than the threshold distance from the user interface element and not requiring the attention of the user to satisfy the one or more criteria while the predefined portion of the user is less than the threshold distance from the user interface element provides an efficient way of enabling the user to look at other portions of the user interface element while providing direct inputs, thus saving the user time while using the electronic device and reduces user errors while providing indirect inputs, which simplifies the interaction between the user and the electronic device and enhances the operability of the electronic device and makes the user-device interface more efficient, which additionally reduces power usage and improves battery life of the electronic device by enabling the user to use the electronic device more quickly and efficiently.

[0173] In some embodiments, in response to detecting that a gaze (e.g., 701a) of the user is directed to a first region (e.g., 703) of the user interface, such as in FIG. 7A the electronic device 101a visually de-emphasizes (820a) (e.g., blur, dim, darken, and/or desaturate), via the display generation component, a second region of the user interface relative to the first region (e.g., 705) of the user interface. In some embodiments, the electronic device modifies display of the second region of the user interface and/or modifies display of the first region of the user interface to achieve visual de-emphasis of the second region of the user interface relative to the first region of the user interface.

[0174] In some embodiments, such as in FIG. 7B, in response to detecting that the gaze 701c of the user is directed to the second region (e.g., 702) of the user interface, the electronic device 101b visually de-emphasizes (820b) (e.g., blur, dim, darken, and/or desaturate), via the display generation component, the first region of the user interface relative to the second region (e.g., 702) of the user interface. In some embodiments, the electronic device modifies display of the first region of the user interface and/or modifies display of the second region of the user interface to achieve visual de-emphasis of the first region of the user interface relative to the second region of the user interface. In some embodiments, the first and/or second regions of the user interface include one or more virtual objects (e.g., application user interfaces, items of content, representations of other users, files, control elements, etc.) and/or one or more physical objects (e.g., pass-through video including photorealistic representations of real objects, true pass-through wherein a view of the real object is visible through a transparent portion of the display generation component) that are de-emphasized when the regions of the user interface are de-emphasized.

[0175] The above-described manner of visually de-emphasizing the region other than the region to which the gaze of the user is directed provides an efficient way of reducing visual clutter while the user views a respective region of the user interface, which simplifies the interaction between the user and the electronic device and enhances the operability of the electronic device and makes the user-device interface more efficient, which additionally reduces power usage and improves battery life of the electronic device by enabling the user to use the electronic device more quickly and efficiently.

[0176] In some embodiments, such as in FIG. 7A, the user interface is accessible by the electronic device 101a and a second electronic device 101b (822a) (e.g., the electronic device and second electronic device are in communication (e.g., via a wired or wireless network connection). In some embodiments, the electronic device and the second electronic device are remotely located from each other. In some embodiments, the electronic device and second electronic device are collocated (e.g., in the same room, building, etc.). In some embodiments, the electronic device and the second electronic device present the three-dimensional environment in a co-presence session in which representations of the users of both devices are associated with unique locations in the three-dimensional environment and each electronic device displays the three-dimensional environment from the perspective of the representation of the respective user.

[0177] In some embodiments, such as in FIG. 7B, in accordance with an indication that a gaze 701c of a second user of the second electronic device 101b is directed to the first region 702 of the user interface, the electronic device 101a forgoes (822b) visually de-emphasizing (e.g., blur, dim, darken, and/or desaturate), via the display generation component, the second region of the user interface relative to the first region of the user interface. In some embodiments, the second electronic device visually de-emphasizes the second region of the user interface in accordance with the determination that the gaze of the second user is directed to the first region of the user interface. In some embodiments, in accordance with a determination that the gaze of the user of the electronic device is directed to the first region of the user interface, the second electronic device forgoes visually de-emphasizing the second region of the user interface relative to the first region of the user interface.

[0178] In some embodiments, such as in FIG. 7B, in accordance with an indication that the gaze of the second user of the second electronic device 101a is directed to the second region (e.g., 703) of the user interface, the electronic device 101b forgoes (822c) visually de-emphasizing (e.g., blur, dim, darken, and/or desaturate), via the display generation component, the first region of the user interface relative to the second region of the user interface. In some embodiments, the second electronic device visually de-emphasizes the first region of the user interface in accordance with the determination that the gaze of the second user is directed to the second region of the user interface. In some embodiments, in accordance with a determination that the gaze of the user of the electronic device is directed to the second region of the user interface, the second electronic device forgoes visually de-emphasizing the first region of the user interface relative to the second region of the user interface.

[0179] The above-described manner of forgoing visually de-emphasizing regions of the user interface based on the gaze of the user of the second electronic device provides an efficient way of enabling the users to concurrently look at different regions of the user interface, which simplifies the interaction between the user and the electronic device and enhances the operability of the electronic device and makes the user-device interface more efficient, which additionally reduces power usage and improves battery life of the electronic device by enabling the user to use the electronic device more quickly and efficiently.

[0180] In some embodiments, such as in FIG. 7C, detecting the input from the predefined portion (e.g., 705) of the user of the electronic device 101a includes detecting, via a hand tracking device, a pinch (e.g., pinch, pinch and hold, pinch and drag, double pinch, pluck, release without velocity, toss with velocity) gesture performed by the predefined portion (e.g., 709) of the user (824a). In some embodiments, detecting the pinch gesture includes detecting the user move their thumb toward and/or within a predefined distance of another finger (e.g., index, middle, ring, little finger) on the hand of the thumb. In some embodiments, detecting the pose satisfying the one or more criteria includes detecting the user is in a ready state, such as a pre-pinch hand shape in which the thumb is within a threshold distance (e.g., 1, 2, 3, 4, 5, etc. centimeters) of the other finger.

[0181] The above-described manner of detecting an input including a pinch gesture provides an efficient way of accepting user inputs based on hand gestures without requiring the user to physically touch and/or manipulate an input device with their hands which simplifies the interaction between the user and the electronic device and enhances the operability of the electronic device and makes the user-device interface more efficient, which additionally reduces power usage and improves battery life of the electronic device by enabling the user to use the electronic device more quickly and efficiently.

[0182] In some embodiments, such as in FIG. 7C, detecting the input from the predefined portion (e.g., 709) of the user of the electronic device 101a includes detecting, via a hand tracking device, a press (e.g., tap, press and hold, press and drag, flick) gesture performed by the predefined portion (e.g., 709) of the user (826a). In some embodiments, detecting the press gesture includes detecting the predefined portion of the user pressing a location corresponding to a user interface element displayed in the user interface (e.g., such as described with reference to methods 1400, 1600 and/or 2000), such as the user interface element or a virtual trackpad or other visual indication according to method 1800. In some embodiments, prior to detecting the input including the press gesture, the electronic device detects the pose of the predefined portion of the user that satisfies the one or more criteria including detecting the user in a ready state, such as the hand of the user being in a pointing hand shape with one or more fingers extended and one or more fingers curled towards the palm. In some embodiments, the press gesture includes moving the finger, hand, or arm of the user while the hand is in the pointing hand shape.

[0183] The above-described manner of detecting an input including a press gesture provides an efficient way of accepting user inputs based on hand gestures without requiring the user to physically touch and/or manipulate an input device with their hands which simplifies the interaction between the user and the electronic device and enhances the operability of the electronic device and makes the user-device interface more efficient, which additionally reduces power usage and improves battery life of the electronic device by enabling the user to use the electronic device more quickly and efficiently.

[0184] In some embodiments, such as in FIG. 7C, detecting the input from the predefined portion (e.g., 709) of the user of the electronic device 101a includes detecting lateral movement of the predefined portion (e.g., 709) of the user relative to a location corresponding to the user interface element (e.g., 705) (828a) (e.g., such as described with reference to method 1800). In some embodiments, lateral movement includes movement that includes a component normal to a straight line path between the predefined portion of the user and the location corresponding to the user interface element. For example, if the user interface element is in front of the predefined portion of the user and the user moves the predefined portion of the user left, right, up, or down, the movement is a lateral movement. For example, the input is one of a press and drag, pinch and drag, or toss (with velocity) input.

[0185] The above-described manner of detecting an input including lateral movement of the predefined portion of the user relative to the user interface element provides an efficient way of providing directional input to the electronic device with the predefined portion of the user, which simplifies the interaction between the user and the electronic device and enhances the operability of the electronic device and makes the user-device interface more efficient, which additionally reduces power usage and improves battery life of the electronic device by enabling the user to use the electronic device more quickly and efficiently.

[0186] In some embodiments, such as in FIG. 7A, prior to determining that the pose of the predefined portion (e.g., 709) of the user prior to detecting the input satisfies the one or more criteria (830a), the electronic device 101a detects (830b), via an eye tracking device, that a gaze (e.g., 701a) of the user is directed to the user interface element (e.g., 705) (e.g., according to one or more disambiguation techniques of method 1200).

[0187] In some embodiments, prior to determining that the pose of the predefined portion (e.g., 709) of the user prior to detecting the input satisfies the one or more criteria (830a), such as in FIG. 7A, in response to detecting, that the gaze (e.g., 701a) of the user is directed to the user interface element (e.g., 705), the electronic device 101a displays (830c), via the display generation component, a first indication that the gaze (e.g., 701a) of the user is directed to the user interface element (e.g., 705). In some embodiments, the first indication is highlighting overlaid on or displayed around the user interface element. In some embodiments, the first indication is a change in color or change in location (e.g., towards the user) of the user interface element. In some embodiments, the first indication is a symbol or icon displayed overlaid on or proximate to the user interface element.

[0188] The above-described manner of displaying the first indication that the gaze of the user is directed to the user interface element provides an efficient way of communicating to the user that the input focus is based on the location at which the user is looking, which simplifies the interaction between the user and the electronic device and enhances the operability of the electronic device and makes the user-device interface more efficient, which additionally reduces power usage and improves battery life of the electronic device by enabling the user to use the electronic device more quickly and efficiently.

[0189] In some embodiments, such as in FIG. 7B, prior to detecting the input from the predefined portion (e.g., 709) of the user of the electronic device 101a, while the pose of the predefined portion (e.g., 709) of the user prior to detecting the input satisfies the one or more criteria (832a) (e.g., and while the gaze of the user is directed to the user interface element (e.g., according to one or more disambiguation techniques of method 1200), the electronic device 101a displays (832b), via the display generation component, a second indication that the pose of the predefined portion (e.g., 709) of the user prior to detecting the input satisfies the one or more criteria, such as in FIG. 7B, wherein the first indication is different from the second indication. In some embodiments, displaying the second indication includes modifying a visual characteristic (e.g., color, size, position, translucency) of the user interface element at which the user is looking. For example, the second indication is the electronic device moving the user interface element towards the user in the three-dimensional environment. In some embodiments, the second indication is displayed overlaid on or proximate to the user interface element at which the user is looking. In some embodiments, the second indication is an icon or image displayed at a location in the user interface independent of the location to which the user’s gaze is directed.

[0190] The above-described manner of displaying an indication that the pose of the user satisfies one or more criteria that is different from the indication of the location of the user’s gaze provides an efficient way of indicating to the user that the electronic device is ready to accept further input from the predefined portion of the user, which simplifies the interaction between the user and the electronic device and enhances the operability of the electronic device and makes the user-device interface more efficient, which additionally reduces power usage and improves battery life of the electronic device by enabling the user to use the electronic device more quickly and efficiently.

[0191] In some embodiments, such as in FIG. 7C, while displaying the user interface element (e.g., 705), the electronic device 101a detects (834a), via the one or more input devices, a second input from a second predefined portion (e.g., 717) (e.g., a second hand) of the user of the electronic device 101a.

[0192] In some embodiments, in response to detecting the second input from the second predefined portion (e.g., 717) of the user of the electronic device (834b), in accordance with a determination that a pose (e.g., position, orientation, hand shape) of the second predefined portion (e.g., 711) of the user prior to detecting the second input satisfies one or more second criteria, such as in FIG. 7B, the electronic device 101a performs (834c) a second respective operation in accordance with the second input from the second predefined portion (e.g., 711) of the user of the electronic device 101a. In some embodiments, the one or more second criteria differ from the one or more criteria in that a different predefined portion of the user performs the pose, but otherwise the one or more criteria and the one or more second criteria are the same. For example, the one or more criteria require that the right hand of the user is in a ready state such as a pre-pinch or pointing hand shape and the one or more second criteria require that the left hand of the user is in a ready state such as the pre-pinch or pointing hand shape. In some embodiments, the one or more criteria are different from the one or more second criteria. For example, a first subset of poses satisfy the one or more criteria for the right hand of the user and a second, different subset of poses satisfy the one or more criteria for the left hand of the user.

[0193] In some embodiments, such as in FIG. 7C, in response to detecting the second input from the second predefined portion (e.g., 715) of the user of the electronic device 101b (834b), in accordance with a determination that the pose of the second predefined portion (e.g., 721) of the user prior to detecting the second input does not satisfy the one or more second criteria, such as in FIG. 7B, the electronic device forgoes (834d) performing the second respective operation in accordance with the second input from the second predefined portion (e.g., 715) of the user of the electronic device 101b, such as in FIG. 7C. In some embodiments, the electronic device is able to detect inputs from the predefined portion of the user and/or the second predefined portion of the user independently of each other. In some embodiments, in order to perform an action in accordance with an input provided by the left hand of the user, the left hand of the user must have a pose that satisfies the one or more criteria prior to providing the input and in order to perform an action in accordance with an input provided by the right hand of the user, the right hand of the user must have a posed that satisfies the second one or more criteria. In some embodiments, in response to detecting the pose of the predefined portion of the user that satisfies one or more criteria followed by an input provided by the second predefined portion of the user without the second predefined portion of the user satisfying the second one or more criteria first, the electronic device forgoes performing an action in accordance with the input of the second predefined portion of the user. In some embodiments, in response to detecting the pose of the second predefined portion of the user that satisfies the second one or more criteria followed by an input provided by the predefined portion of the user without the predefined portion of the user satisfying the one or more criteria first, the electronic device forgoes performing an action in accordance with the input of the predefined portion of the user.

[0194] The above-described manner of accepting inputs from the second predefined portion of the user independent from the predefined portion of the user provides an efficient way of increasing the rate at which the user is able to provide inputs to the electronic device, which simplifies the interaction between the user and the electronic device and enhances the operability of the electronic device and makes the user-device interface more efficient, which additionally reduces power usage and improves battery life of the electronic device by enabling the user to use the electronic device more quickly and efficiently.

[0195] In some embodiments, such as in FIGS. 7A-7C, the user interface is accessible by the electronic device 101a and a second electronic device 101b (836a) (e.g., the electronic device and second electronic device are in communication (e.g., via a wired or wireless network connection). In some embodiments, the electronic device and the second electronic device are remotely located from each other. In some embodiments, the electronic device and second electronic device are collocated (e.g., in the same room, building, etc.). In some embodiments, the electronic device and the second electronic device present the three-dimensional environment in a co-presence session in which representations of the users of both devices are associated with unique locations in the three-dimensional environment and each electronic device displays the three-dimensional environment from the perspective of the representation of the respective user.

[0196] In some embodiments, such as in FIG. 7A, prior to detecting that the pose of the predefined portion (e.g., 709) of the user prior to detecting the input satisfies the one or more criteria, the electronic device 101a displays (836b) the user interface element (e.g., 705) with a visual characteristic (e.g., size, color, translucency, position) having a first value.

[0197] In some embodiments, such as in FIG. 7B, while the pose of the predefined portion (e.g., 709) of the user prior to detecting the input satisfies the one or more criteria, the electronic device 101a displays (836c) the user interface element (e.g., 705) with the visual characteristic (e.g., size, color, translucency, position) having a second value, different from the first value. In some embodiments, the electronic device updates the visual appearance of the user interface element in response to detecting that the pose of the predefined portion of the user satisfies the one or more criteria. In some embodiments, the electronic device only updates the appearance of the user interface element to which the user’s attention is directed (e.g., according to the gaze of the user or an attention zone of the user according to method 1000). In some embodiments, the second electronic device maintains display of the user interface element with the visual characteristic having the first value in response to the predefined portion of the user satisfying the one or more criteria.

[0198] In some embodiments, while (optionally, in response to an indication that) a pose of a predefined portion of a second user of the second electronic device 101b satisfies the one or more criteria while displaying the user interface element with the visual characteristic having the first value, the electronic device 101a maintains (836d) display of the user interface element with the visual characteristic having the first value, similar to how electronic device 101b maintains display of user interface element (e.g., 705) while the portion (e.g., 709) of the user of the first electronic device 101a satisfies the one or more criteria in FIG. 7B. In some embodiments, in response to detecting the pose of the predefined portion of the user of the second electronic device satisfies the one or more criteria, the second electronic device updates the user interface element to be displayed with the visual characteristic having the second value, similar to how both electronic devices 101a and 101b scroll user interface element (e.g., 705) in response to the input detected by electronic device 101a (e.g., via hand 709 or 711) in FIG. 7C. In some embodiments, in response to an indication that the pose of the user of the electronic device satisfies the one or more criteria while displaying the user interface element with the visual characteristic having the first value, the second electronic device maintains display of the user interface element with the visual characteristic having the first value. In some embodiments, in accordance with a determination that that the pose of the user of the electronic device satisfies the one or more criteria and an indication that that the pose of the user of the second electronic device satisfies the one or more criteria, the electronic device displays the user interface element with the visual characteristic having a third value.

[0199] The above-described manner of not synchronizing the updating of the visual characteristic of the user interface element across the electronic devices provides an efficient way of indicating the portions of the user interface with which the user is interacting without causing confusion by also indicating portions of the user interface with which other users are interacting, which simplifies the interaction between the user and the electronic device and enhances the operability of the electronic device and makes the user-device interface more efficient, which additionally reduces power usage and improves battery life of the electronic device by enabling the user to use the electronic device more quickly and efficiently.

[0200] In some embodiments, in response to detecting the input from the predefined portion (e.g., 709 or 711) of the user of the electronic device, the electronic device 101a displays (836a) the user interface element (e.g., 705) with the visual characteristic having a third value, such as in FIG. 7C (e.g., the third value is different from the first value and the second value. In some embodiments, in response to the input, the electronic device and second electronic device perform the respective operation in accordance with the input.

[0201] In some embodiments, in response to an indication of an input from the predefined portion of the second user of the second electronic device (e.g., after the second electronic device detects that the predefined portion of the user of the second electronic device satisfies the one or more criteria), the electronic device 101a displays (836b) the user interface element with the visual characteristic having the third value, such as though electronic device 101b were to display user interface element (e.g., 705) in the same manner in which electronic device 101a displays the user interface element (e.g., 705) in response to electronic device 101a detecting the user input from the hand (e.g., 709 or 711) of the user of the electronic device 101a. In some embodiments, in response to the input from the second electronic device, the electronic device and the second electronic device perform the respective operation in accordance with the input. In some embodiments, the electronic device displays an indication that the user of the second electronic device has provided an input directed to the user interface element, but does not present an indication of a hover state of the user interface element.

[0202] The above-described manner of updating the user interface element in response to an input irrespective of the device at which the input was detected provides an efficient way of indicating the current interaction state of a user interface element displayed by both devices, which simplifies the interaction between the user and the electronic device and enhances the operability of the electronic device and makes the user-device interface more efficient (e.g., by clearly indicating which portions of the user interface other users are interacting with), which additionally reduces power usage and improves battery life of the electronic device by enabling the user to use the electronic device more quickly and efficiently, and avoids errors caused by changes to the interaction status of the user interface element that would subsequently require correction.

[0203] FIGS. 9A-9C illustrate exemplary ways in which an electronic device 101a processes user inputs based on an attention zone associated with the user in accordance with some embodiments.

[0204] FIG. 9A illustrates an electronic device 101a, via display generation component 120a, a three-dimensional environment. It should be understood that, in some embodiments, electronic device 101a utilizes one or more techniques described with reference to FIGS. 9A-9C in a two-dimensional environment or user interface without departing from the scope of the disclosure. As described above with reference to FIGS. 1-6, the electronic device optionally includes display generation component 120a (e.g., a touch screen) and a plurality of image sensors 314a. The image sensors optionally include one or more of a visible light camera, an infrared camera, a depth sensor, or any other sensor the electronic device 101a would be able to use to capture one or more images of a user or a part of the user while the user interacts with the electronic device 101a. In some embodiments, display generation component 120a is a touch screen that is able to detect gestures and movements of a user’s hand. In some embodiments, the user interfaces described below could also be implemented on a head-mounted display that includes a display generation component that displays the user interface to the user, and sensors to detect the physical environment and/or movements of the user’s hands (e.g., external sensors facing outwards from the user), and/or gaze of the user (e.g., internal sensors facing inwards towards the face of the user).

[0205] FIG. 9A illustrates the electronic device 101a presenting a first selectable option 903, a second selectable option 905, and a representation 904 of a table in the physical environment of the electronic device 101a via display generation component 120a (e.g., such as table 604 in FIG. 6B). In some embodiments, the representation 904 of the table is a photorealistic image of the table generated by the display generation component 120a (e.g., passthrough video or digital passthrough). In some embodiments, the representation 904 of the table is a view of the table through a transparent portion of the display generation component 120a (e.g., true or actual passthrough). In some embodiments, the electronic device 101a displays the three-dimensional environment from a viewpoint associated with the user of the electronic device in the three-dimensional environment.

[0206] In some embodiments, the electronic device 101a defines an attention zone 907 of the user as a cone-shaped volume in the three-dimensional environment that is based on the gaze 901a of the user. For example, the attention zone 907 is optionally a cone centered around a line defined by the gaze 901a of the user (e.g., a line passing through the location of the user’s gaze in the three-dimensional environment and the viewpoint associated with electronic device 101a) that includes a volume of the three-dimensional environment within a predetermined angle (e.g., 1, 2, 3, 5, 10, 15, etc. degrees) from the line defined by the gaze 901a of the user. Thus, in some embodiments, the two-dimensional area of the attention zone 907 increases as a function of distance from the viewpoint associated with electronic device 101a. In some embodiments, the electronic device 101a determines the user interface element to which an input is directed and/or whether to respond to an input based on the attention zone of the user.

[0207] As shown in FIG. 9A, the first selectable option 903 is within the attention zone 907 of the user and the second selectable option 905 is outside of the attention zone of the user. As shown in FIG. 9A, it is possible for the selectable option 903 to be in the attention zone 907 even if the gaze 901a of the user isn’t directed to selectable option 903. In some embodiments, it is possible for the selectable option 903 to be in the attention zone 907 while the gaze of the user is directed to the selectable option 903. FIG. 9A also shows the hand 909 of the user in a direct input ready state (e.g., hand state D). In some embodiments, the direct input ready state is the same as or similar to the direct input ready state(s) described above with reference to FIGS. 7A-8K. Further, in some embodiments, the direct inputs described herein share one or more characteristics of the direct inputs described with reference to methods 800, 1200, 1400, 1600, 1800, and/or 2000. For example, the hand 909 of the user is in a pointing hand shape and within a direct ready state threshold distance (e.g., 0.5, 1, 2, 3, 5, 10, 15, 30, etc. centimeters) of the first selectable option 903. FIG. 9A also shows the hand 911 of the user in a direct input ready state. In some embodiments, hand 911 is an alternative to hand 909. In some embodiments, the electronic device 101a is able to detect two hands of the user at once (e.g., according to one or more steps of method 1600). For example, hand 911 of the user is in the pointing hand shape and within the ready state threshold distance of the second selectable option 905.

[0208] In some embodiments, the electronic device 101a requires user interface elements to be within the attention zone 907 in order to accept inputs. For example, because the first selectable option 903 is within the attention zone 907 of the user, the electronic device 101a updates the first selectable option 903 to indicate that further input (e.g., from hand 909) will be directed to the first selectable option 903. As another example, because the second selectable option 905 is outside of the attention zone 907 of the user, the electronic device 101a forgoes updating the second selectable option 905 to indicate that further input (e.g., from hand 911) will be directed to the second selectable option 905. It should be appreciated that, although the gaze 901a of the user is not directed to the first selectable option 903, the electronic device 101a is still configured to direct inputs to the first selectable option 903 because the first selectable option 903 is within the attention zone 907, which is optionally broader than the gaze of the user.

[0209] In FIG. 9B, the electronic device 101a detects the hand 909 of the user making a direct selection of the first selectable option 903. In some embodiments, the direct selection includes moving the hand 909 to a location touching or within a direct selection threshold (e.g., 0.1, 0.2, 0.3, 0.5, 1, 2, etc. centimeters) of the first selectable option 903 while the hand is in the pointing hand shape. As shown in FIG. 9B, the first selectable option 903 is no longer in the attention zone 907 of the user when the input is detected. In some embodiments, the attention zone 907 moves because the gaze 901b of the user moves. In some embodiments, the attention zone 907 moves to the location illustrated in FIG. 9B after the electronic device 101a detects the ready state of hand 909 illustrated in FIG. 9A. In some embodiments, the input illustrated in FIG. 9B is detected before the ready state 907 moves to the location illustrated in FIG. 9B. In some embodiments, the input illustrated in FIG. 9B is detected after the ready state 907 moves to the location illustrated in FIG. 9B. Although the first selectable option 903 is no longer in the attention zone 907 of the user, in some embodiments, the electronic device 101a still updates the color of the first selectable option 903 in response to the input because the first selectable option 903 was in the attention zone 907 during the ready state, as shown in FIG. 9A. In some embodiments, in addition to updating the appearance of the first selectable option 903, the electronic device 101a performs an action in accordance with the selection of the first selectable option 903. For example, the electronic device 101a performs an operation such as activating/deactivating a setting associated with option 903, initiating playback of content associated with option 903, displaying a user interface associated with option 903, or a different operation associated with option 903.

[0210] In some embodiments, the selection input is only detected in response to detecting the hand 909 of the user moving to the location touching or within the direct selection threshold of the first selectable option 903 from the side of the first selectable option 903 visible in FIG. 9B. For example, if the user were to instead reach around the first selectable option 903 to touch the first selectable option 903 from the back side of the first selectable option 903 not visible in FIG. 9B, the electronic device 101a would optionally forgo updating the appearance of the first selectable option 903 and/or forgo performing the action in accordance with the selection.

[0211] In some embodiments, in addition to continuing to accept a press input (e.g., a selection input) that was started while the first selectable option 903 was in the attention zone 907 and continued while the first selectable option 903 was not in the attention zone 907, the electronic device 101a accepts other types of inputs that were started while the user interface element to which the input was directed was in the attention zone even if the user interface element is no longer in the attention zone when the input continues. For example, the electronic device 101a is able to continue drag inputs in which the electronic device 101a updates the position of a user interface element in response to a user input even if the drag input continues after the user interface element is outside of the attention zone (e.g., and was initiated when the user interface element was inside of the attention zone). As another example, the electronic device 101a is able to continue scrolling inputs in response to a user input even if the scrolling input continues after the user interface element is outside of the attention zone 907 (e.g., and was initiated when the user interface element was inside of the attention zone). As shown in FIG. 9A, in some embodiments, inputs are accepted even if the user interface element to which the input is directed is outside of the attention zone for a portion of the input if the user interface element was in the attention zone when the ready state was detected.

[0212] Moreover, in some embodiments, the location of the attention zone 907 remains in a respective position in the three-dimensional environment for a threshold time (e.g., 0.5, 1, 2, 3, 5, etc. seconds) after detecting movement of the gaze of the user. For example, while the gaze 901a of the user and the attention zone 907 are at the locations illustrated in FIG. 9A, the electronic device 101a detects the gaze 901b of the user move to the location illustrated in FIG. 9B. In this example, the attention zone 907 remains at the location illustrated in FIG. 9A for the threshold time before moving the attention zone 907 to the location in FIG. 9B in response to the gaze 901b of the user moving to the location illustrated in FIG. 9B. Thus, in some embodiments, inputs initiated after the gaze of the user moves that are directed to user interface elements that are within the original attention zone (e.g., the attention zone 907 in FIG. 9A) are optionally responded-to by the electronic device 101a as long as those inputs were initiated within the threshold time (e.g., 0.5, 1, 2, 3, 5, etc. seconds) of the gaze of the user moving to the location in FIG. 9B–in some embodiments, the electronic device 101a does not respond to such inputs that are initiated after the threshold time of the gaze of the user moving to the location in FIG. 9B.

[0213] In some embodiments, the electronic device 101a cancels a user input if the user moves their hand away from the user interface element to which the input is directed or does not provide further input for a threshold time (e.g., 1, 2, 3, 5, 10, etc. seconds) after the ready state was detected. For example, if the user were to move their hand 909 to the location illustrated in FIG. 9C after the electronic device 101a detected the ready state as shown in FIG. 9A, the electronic device 101a would revert the appearance of the first selectable option 903 to no longer indicate that input is being directed to the first selectable option 903 and no longer accept direct inputs from hand 909 directed to option 903 (e.g., unless and until the ready state is detected again).

[0214] As shown in FIG. 9C, the first selectable option 903 is still within the attention zone 907 of the user. The hand 909 of the user is optionally in a hand shape corresponding to the direct ready state (e.g., a pointing hand shape, hand state D). Because the hand 909 of the user has moved away from the first selectable option 903 by a threshold distance (e.g., 1, 2, 3, 5, 10, 15, 20, 30, 50, etc. centimeters) and/or to a threshold distance (e.g., 1, 2, 3, 5, 10, 15, 20, 30, 50, etc. centimeters) away from the first selectable option 903, the electronic device 101a is no longer configured to direct inputs to the first selectable option 903 from hand 909. In some embodiments, even if the user were to maintain the position of the hand 909 illustrated in FIG. 9A, the electronic device 101a would cease directing further input from the hand to the first user interface element 903 if the input were not detected within a threshold period of time (e.g., 1, 2, 3, 5, 10, etc. seconds) of the hand being positioned and having a shape as in FIG. 9A. Likewise, in some embodiments, if the user were to begin to provide additional input (e.g., in addition to satisfying the ready state criteria–for example, beginning to provide a press input to element 903, but not yet reaching the press distance threshold required to complete the press/selection input) and then move the hand away from the first selectable option 903 by the threshold distance and/or move the hand the threshold distance from the first selectable option 903, the electronic device 101a would cancel the input. It should be appreciated, as described above with reference to FIG. 9B, that the electronic device 101a optionally does not cancel an input in response to detecting the gaze 901b of the user or the attention zone 907 of the user moving away from the first selectable option 903 if the input was started while the first selectable option 903 was in the attention zone 907 of the user.

[0215] Although FIGS. 9A-9C illustrate examples of determining whether to accept direct inputs directed to user interface elements based on the attention zone 907 of the user, it should be appreciated that the electronic device 101a is able to similarly determine whether to accept indirect inputs directed to user interface elements based on the attention zone 907 of the user. For example, the various results illustrated in and described with reference to FIGS. 9A-9C would optionally apply to indirect inputs (e.g., as described with reference to methods 800, 1200, 1400, 1800, etc.) as well. In some embodiments, the attention zone is not required in order to accept direct inputs but is required for indirect inputs.

[0216] FIGS. 10A-10H is a flowchart illustrating a method 1000 of processing user inputs based on an attention zone associated with the user in accordance with some embodiments. In some embodiments, the method 1000 is performed at a computer system (e.g., computer system 101 in FIG. 1 such as a tablet, smartphone, wearable computer, or head mounted device) including a display generation component (e.g., display generation component 120 in FIGS. 1, 3, and 4) (e.g., a heads-up display, a display, a touchscreen, a projector, etc.) and one or more cameras (e.g., a camera (e.g., color sensors, infrared sensors, and other depth-sensing cameras) that points downward at a user’s hand or a camera that points forward from the user’s head). In some embodiments, the method 1000 is governed by instructions that are stored in a non-transitory computer-readable storage medium and that are executed by one or more processors of a computer system, such as the one or more processors 202 of computer system 101 (e.g., control unit 110 in FIG. 1A). Some operations in method 1000 are, optionally, combined and/or the order of some operations is, optionally, changed.

[0217] In some embodiments, method 1000 is performed at an electronic device 101a in communication with a display generation component and one or more input devices (e.g., a mobile device (e.g., a tablet, a smartphone, a media player, or a wearable device), or a computer. In some embodiments, the display generation component is a display integrated with the electronic device (optionally a touch screen display), external display such as a monitor, projector, television, or a hardware component (optionally integrated or external) for projecting a user interface or causing a user interface to be visible to one or more users, etc. In some embodiments, the one or more input devices include an electronic device or component capable of receiving a user input (e.g., capturing a user input, detecting a user input, etc.) and transmitting information associated with the user input to the electronic device. Examples of input devices include a touch screen, mouse (e.g., external), trackpad (optionally integrated or external), touchpad (optionally integrated or external), remote control device (e.g., external), another mobile device (e.g., separate from the electronic device), a handheld device (e.g., external), a controller (e.g., external), a camera, a depth sensor, an eye tracking device, and/or a motion sensor (e.g., a hand tracking device, a hand motion sensor), etc. In some embodiments, the electronic device is in communication with a hand tracking device (e.g., one or more cameras, depth sensors, proximity sensors, touch sensors (e.g., a touch screen, trackpad). In some embodiments, the hand tracking device is a wearable device, such as a smart glove. In some embodiments, the hand tracking device is a handheld input device, such as a remote control or stylus.

[0218] In some embodiments, such as in FIG. 9A, the electronic device 101a displays (1002a), via the display generation component 120a, a first user interface element (e.g., 903, 905). In some embodiments, the first user interface element is an interactive user interface element and, in response to detecting an input directed towards the first user interface element, the electronic device performs an action associated with the first user interface element. For example, the first user interface element is a selectable option that, when selected, causes the electronic device to perform an action, such as displaying a respective user interface, changing a setting of the electronic device, or initiating playback of content. As another example, the first user interface element is a container (e.g., a window) in which a user interface/content is displayed and, in response to detecting selection of the first user interface element followed by a movement input, the electronic device updates the position of the first user interface element in accordance with the movement input. In some embodiments, the user interface and/or user interface element are displayed in a three-dimensional environment (e.g., the user interface is the three-dimensional environment and/or is displayed within a three-dimensional environment) that is generated, displayed, or otherwise caused to be viewable by the device (e.g., a computer-generated reality (CGR) environment such as a virtual reality (VR) environment, a mixed reality (MR) environment, or an augmented reality (AR) environment, etc.

[0219] In some embodiments, such as in FIG. 9B, while displaying the first user interface element (e.g., 909), the electronic device 101a detects (1002b), via the one or more input devices, a first input directed to the first user interface element (e.g., 909). In some embodiments, detecting the first user input includes detecting, via the hand tracking device, that the user performs a predetermined gesture (e.g., a pinch gesture in which the user touches a thumb to another finger (e.g., index, middle, ring, little finger) on the same hand as the thumb). In some embodiments, detecting the input includes detecting that the user performs a pointing gesture in which one or more fingers are extended and one or more fingers are curled towards the user’s palm and moves their hand a predetermined distance (e.g., 2, 5, 10, etc. centimeters) away from the torso of the user in a pressing or pushing motion. In some embodiments, the pointing gesture and pushing motion are detected while the hand of the user is within a threshold distance (e.g., 1, 2, 3, 5, 10, etc. centimeters) of the first user interface element in a three-dimensional environment. In some embodiments, the three-dimensional environment includes virtual objects and a representation of the user. In some embodiments, the three-dimensional environment includes a representation of the hands of the user, which can be a photorealistic representation of the hands, pass-through video of the hands of the user, or a view of the hands of the user through a transparent portion of the display generation component. In some embodiments, the input is a direct or indirect interaction with the user interface element, such as described with reference to methods 800, 1200, 1400, 1600, 1800 and/or 2000.

[0220] In some embodiments, in response to detecting the first input directed to the first user interface element (e.g., 903) (1002c), in accordance with a determination that the first user interface element (e.g., 903) is within an attention zone (e.g., 907) associated with a user of the electronic device 101a, such as in FIG. 9A, (e.g., when the first input was detected), the electronic device 101a performs (1002d) a first operation corresponding to the first user interface element (e.g., 903). In some embodiments, the attention zone includes a region of the three-dimensional environment within a predetermined threshold distance (e.g., 5, 10, 30, 50, 100, etc. centimeters) and/or threshold angle (e.g., 5, 10, 15, 20, 30, 45, etc. degrees) of a location in the three-dimensional environment to which the user’s gaze is directed. In some embodiments, the attention zone includes a region of the three-dimensional environment between the location in the three-dimensional environment towards which the user’s gaze is directed and one or more physical features of the user (e.g., the user’s hands, arms, shoulders, torso, etc.). In some embodiments, the attention zone is a three-dimensional region of the three-dimensional environment. For example, the attention zone is cone-shaped, with the tip of the cone corresponding to the eyes/viewpoint of the user and the base of the cone corresponding to the area of the three-dimensional environment towards which the user’s gaze is directed. In some embodiments, the first user interface element is within the attention zone associated with the user while the user’s gaze is directed towards the first user interface element and/or when the first user interface element falls within the conical volume of the attention zone. In some embodiments, the first operation is one of making a selection, activating a setting of the electronic device, initiating a process to move a virtual object within the three-dimensional environment, displaying a new user interface not currently displayed, playing an item of content, saving a file, initiating communication (e.g., phone call, e-mail, message) with another user, and/or scrolling a user interface. In some embodiments, the first input is detected by detecting a pose and/or movement of a predefined portion of the user. For example, the electronic device detects the user moving their finger to a location within a threshold distance (e.g., 0.1, 0.3, 0.5, 1, 3, 5 etc. centimeters) of the first user interface element in the three-dimensional environment with their hand/finger in a pose corresponding to the index finger of the hand pointed out with other fingers curled into the hand.

[0221] In some embodiments, such as in FIG. 9A, in response to detecting the first input directed to the first user interface element (e.g., 905) (1002c), in accordance with a determination that the first user interface element (e.g., 905) is not within the attention zone associated with the user (e.g., when the first input was detected), the electronic device 101a forgoes (1002e) performing the first operation. In some embodiments, the first user interface element is not within the attention zone associated with the user if the user’s gaze is directed towards a user interface element other than the first user interface element and/or if the first user interface element does not fall within the conical volume of the attention zone.

[0222] The above-described manner of performing or not performing the first operation depending on whether or not the first user interface element is within the attention zone associated with the user provides an efficient way of reducing accidental user inputs, which simplifies the interaction between the user and the electronic device and enhances the operability of the electronic device and makes the user-device interface more efficient, which additionally reduces power usage and improves battery life of the electronic device by enabling the user to use the electronic device more quickly and efficiently, while reducing errors in usage.

[0223] In some embodiments, the first input directed to the first user interface element (e.g., 903) is an indirect input directed to the first user interface element (e.g., 903 in FIG. 9C) (1004a). In some embodiments, an indirect input is an input provided by a predefined portion of the user (e.g., a hand, finger, arm, etc. of the user) while the predefined portion of the user is more than a threshold distance (e.g., 0.2, 1, 2, 3, 5, 10, 30, 50 etc. centimeters) from the first user interface element. In some embodiments, the indirect input is similar to the indirect inputs discussed with reference to methods 800, 1200, 1400, 1600, 1800 and/or 2000.

[0224] In some embodiments, such as in FIG. 9B, while displaying the first user interface element (e.g., 905), the electronic device 101a detects (1004b), via the one or more input devices, a second input, wherein the second input corresponds to a direct input directed toward a respective user interface element (e.g., 903). In some embodiments, the direct input is similar to direct inputs discussed with reference to methods 800, 1200, 1400, 1600, 1800 and/or 2000. In some embodiments, the direct input is provided by a predefined portion of the user (e.g., hand, finger, arm) while the predefined portion of the user is less than a threshold distance (e.g., 0.2, 1, 2, 3, 5, 10, 30, 50 etc. centimeters) away from the first user interface element. In some embodiments, detecting the direct input includes detecting the user perform a predefined gesture with their hand (e.g., a press gesture in which the user moves an extended finger to the location of a respective user interface element while the other fingers are curled towards the palm of the hand) after detecting the ready state of the hand (e.g., a pointing hand shape in which one or more fingers are extended and one or more fingers are curled towards the palm). In some embodiments, the ready state is detected according to one or more steps of method 800.

[0225] In some embodiments, such as in FIG. 9B, in response to detecting the second input, the electronic device 101a performs (1004c) an operation associated with the respective user interface element (e.g., 903) without regard to whether the respective user interface element is within the attention zone (e.g., 907) associated with the user (e.g., because it is a direct input). In some embodiments, the electronic device only performs the operation associated with the first user interface element in response to an indirect input if the indirect input is detected while the gaze of the user is directed towards the first user interface element. In some embodiments, the electronic device performs an operation associated with a user interface element in the user’s attention zone in response to a direct input regardless of whether or not the gaze of the user is directed to the user interface element when the direct input is detected.

[0226] The above-described manner of forgoing performing the second operation in response to detecting the indirect input while the gaze of the user is not directed to the first user interface element provides a way of reducing or preventing performance of operations not desired by the user, which simplifies the interaction between the user and the electronic device and enhances the operability of the electronic device and makes the user-device interface more efficient, which additionally reduces power usage and improves battery life of the electronic device by enabling the user to use the electronic device more quickly and efficiently, while reducing errors in usage.

[0227] In some embodiments, such as in FIG. 9B, the attention zone (e.g., 907) associated with the user is based on a direction (and/or location) of a gaze (e.g., 901b) of the user of the electronic device (1006a). In some embodiments, the attention zone is defined as a cone-shaped volume (e.g., extending from a point at the viewpoint of the user out into the three-dimensional environment) including a point in the three-dimensional environment at which the user is looking and the locations in the three-dimensional environment between the point at which the user is looking and the user within a predetermined threshold angle (e.g., 5, 10, 15, 20, 30, 45, etc. degrees) of the gaze of the user. In some embodiments, in addition or alternatively to being based on the user’s gaze, the attention zone is based on the orientation of a head of the user. For example, the attention zone is defined as a cone-shaped volume including locations in the three-dimensional environment within a predetermined threshold angle (e.g., 5, 10, 15, 20, 30, 45, etc. degrees) of a line normal to the face of the user. As another example, the attention zone is a cone centered around an average of a line extending from the gaze of the user and a line normal to the face of the user or a union of a cone centered around the gaze of the user and the cone centered around the line normal to the face of the user.

[0228] The above-described manner of basing the attention zone on the orientation of the gaze of the user provides an efficient way of directing user inputs based on gaze without additional inputs (e.g., to move the input focus, such as moving a cursor) which simplifies the interaction between the user and the electronic device and enhances the operability of the electronic device and makes the user-device interface more efficient, which additionally reduces power usage and improves battery life of the electronic device by enabling the user to use the electronic device more quickly and efficiently, while reducing errors in usage.

[0229] In some embodiments, while the first user interface element (e.g., 903) is within the attention zone (e.g., 907) associated with the user, such as in FIG. 9A, the electronic device 101a detects (1008a) that one or more criteria for moving the attention zone (e.g., 903) to a location at which the first user interface element (e.g., 903) is not within the attention zone are satisfied. In some embodiments, the attention zone is based on the gaze of the user and the one or more criteria are satisfied when the gaze of the user moves to a new location such that the first user interface element is no longer in the attention zone. For example, the attention zone includes regions of the user interface within 10 degrees of a line along the user’s gaze and the user’s gaze moves to a location such that the first user interface element is more than 10 degrees from the line of the user’s gaze.

[0230] In some embodiments, such as in FIG. 9B, after detecting that the one or more criteria are satisfied (1008b), the electronic device 101a detects (1008c) a second input directed to the first user interface element (e.g., 903). In some embodiments, the second input is a direct input in which the hand of the user is within a threshold distance (e.g., 0.2, 1, 2, 3, 5, 10, 30, 50, etc. centimeters) of the first user interface element.

[0231] In some embodiments, after detecting that the one or more criteria are satisfied (1008b), such as in FIG. 9B, in response to detecting the second input directed to the first user interface element (e.g., 903) (1008d), in accordance with a determination that the second input was detected within a respective time threshold (e.g., 0.01, 0.02, 0.05, 0.1, 0.2, 0.3, 0.5, 1 etc. seconds) of the one or more criteria being satisfied, the electronic device 101a performs (1008e) a second operation corresponding to the first user interface element (e.g., 903). In some embodiments, the attention zone of the user does not move until the time threshold (e.g., 0.01, 0.02, 0.05, 0.1, 0.2, 0.3, 0.5, 1 etc. seconds) has passed since the one or more criteria were satisfied.

[0232] In some embodiments, after detecting that the one or more criteria are satisfied (1008b), such as in FIG. 9B, in response to detecting the second input directed to the first user interface element (e.g., 903) (1008d), in accordance with a determination that the second input was detected after the respective time threshold (e.g., 0.01, 0.02, 0.05, 0.1, 0.2, 0.3, 0.5, 1 etc. seconds) of the one or more criteria being satisfied, the electronic device 101a forgoes (1008f) performing the second operation. In some embodiments, once the time threshold (e.g., 0.01, 0.02, 0.05, 0.1, 0.2, 0.3, 0.5, 1 etc. seconds) has passed since the one or more criteria for moving the attention zone were satisfied, the electronic device updates the position of the attention zone associated with the user (e.g., based on the new gaze location of the user). In some embodiments, the electronic device moves the attention zone gradually over the time threshold and initiates the movement with or without a time delay after detecting the user’s gaze move. In some embodiments, the electronic device forgoes performing the second operation in response to an input detected while the first user interface element is not in the attention zone of the user.

[0233] The above-described manner of performing the second operation in response to the second input in response to the second input received within the time threshold of the one or more criteria for moving the attention zone being satisfied provides an efficient way of accepting user inputs without requiring the user to maintain their gaze for the duration of the input and avoiding accidental inputs by preventing activations of the user interface element after the attention zone has moved once the predetermined time threshold has passed, which simplifies the interaction between the user and the electronic device and enhances the operability of the electronic device and makes the user-device interface more efficient, which additionally reduces power usage and improves battery life of the electronic device by enabling the user to use the electronic device more quickly and efficiently, while reducing errors in usage.

……
……
……

本文链接：https://patent.nweon.com/23747

Apple Patent | Methods for interacting with objects in an environment

您可能还喜欢...

分类

最新AR/VR行业分享

最新AR/VR论文

最新AR/VR行业招聘

Apple Patent | Methods for interacting with objects in an environment

您可能还喜欢...

Apple Patent | Privacy screen

Apple Patent | Devices, methods, and graphical user interfaces for modifying avatars in three-dimensional environments

Apple Patent | Methods for infield camera calibrations

分类

最新AR/VR行业分享

最新AR/VR论文

最新AR/VR行业招聘