雨果巴拉:行业北极星Vision Pro过度设计不适合市场

Apple Patent | Method, Device, And System For Delivering Recommendations

Patent: Method, Device, And System For Delivering Recommendations

Publication Number: 20200082576

Publication Date: 20200312

Applicants: Apple

Abstract

An electronic device: obtains pass-through image data characterizing a field of view captured by an image sensor; determines whether a recognized subject in the pass-through image data satisfies a confidence score threshold associated with a user-specific recommendation profile; generates one or more computer-generated reality (CGR) content items associated with the recognized subject in response to determining that the recognized subject in the pass-through image data satisfies the confidence score threshold; and composites the pass-through image data with the one or more CGR content items, where the one or more CGR content items are proximate to the recognized subject in the field of view.

CROSS-REFERENCE TO RELATED APPLICATIONS

[0001] This application claims priority to U.S. Provisional Patent App. No. 62/729,960 filed on Sep. 11, 2018, which is hereby incorporated by reference in its entirety.

TECHNICAL FIELD

[0002] This relates generally to delivering recommendations, including but not limited to, electronic devices that enable the delivery of optimal recommendations in computer-generated reality environments.

BACKGROUND

[0003] A physical environment refers to a physical world that people can sense and/or interact with without aid of electronic systems. Physical environments, such as a physical park, include physical articles, such as physical trees, physical buildings, and physical people. People can directly sense and/or interact with the physical environment, such as through sight, touch, hearing, taste, and smell.

[0004] In contrast, a computer-generated reality (CGR) environment refers to a wholly or partially simulated environment that people sense and/or interact with via an electronic system. In CGR, a subset of a person’s physical motions, or representations thereof, are tracked, and, in response, one or more characteristics of one or more virtual objects simulated in the CGR environment are adjusted in a manner that comports with at least one law of physics. For example, a CGR system may detect a person’s head turning and, in response, adjust graphical content and an acoustic field presented to the person in a manner similar to how such views and sounds would change in a physical environment. In some situations (e.g., for accessibility reasons), adjustments to characteristic(s) of virtual object(s) in a CGR environment may be made in response to representations of physical motions (e.g., vocal commands).

[0005] A person may sense and/or interact with a CGR object using any one of their senses, including sight, sound, touch, taste, and smell. For example, a person may sense and/or interact with audio objects that create 3D or spatial audio environment that provides the perception of point audio sources in 3D space. In another example, audio objects may enable audio transparency, which selectively incorporates ambient sounds from the physical environment with or without computer-generated audio. In some CGR environments, a person may sense and/or interact only with audio objects.

[0006] Examples of CGR include virtual reality and mixed reality.

[0007] A virtual reality (VR) environment refers to a simulated environment that is designed to be based entirely on computer-generated sensory inputs for one or more senses. A VR environment comprises a plurality of virtual objects with which a person may sense and/or interact. For example, computer-generated imagery of trees, buildings, and avatars representing people are examples of virtual objects. A person may sense and/or interact with virtual objects in the VR environment through a simulation of the person’s presence within the computer-generated environment, and/or through a simulation of a subset of the person’s physical movements within the computer-generated environment.

[0008] In contrast to a VR environment, which is designed to be based entirely on computer-generated sensory inputs, a mixed reality (MR) environment refers to a simulated environment that is designed to incorporate sensory inputs from the physical environment, or a representation thereof, in addition to including computer-generated sensory inputs (e.g., virtual objects). On a virtuality continuum, a mixed reality environment is anywhere between, but not including, a wholly physical environment at one end and virtual reality environment at the other end.

[0009] In some MR environments, computer-generated sensory inputs may respond to changes in sensory inputs from the physical environment. Also, some electronic systems for presenting an MR environment may track location and/or orientation with respect to the physical environment to enable virtual objects to interact with real objects (that is, physical articles from the physical environment or representations thereof). For example, a system may account for movements so that a virtual tree appears stationery with respect to the physical ground.

[0010] Examples of mixed realities include augmented reality and augmented virtuality.

[0011] An augmented reality (AR) environment refers to a simulated environment in which one or more virtual objects are superimposed over a physical environment, or a representation thereof. For example, an electronic system for presenting an AR environment may have a transparent or translucent display through which a person may directly view the physical environment. The system may be configured to present virtual objects on the transparent or translucent display, so that a person, using the system, perceives the virtual objects superimposed over the physical environment. Alternatively, a system may have an opaque display and one or more imaging sensors that capture images or video of the physical environment, which are representations of the physical environment. The system composites the images or video with virtual objects, and presents the composition on the opaque display. A person, using the system, indirectly views the physical environment by way of the images or video of the physical environment, and perceives the virtual objects superimposed over the physical environment. As used herein, a video of the physical environment shown on an opaque display is called “pass-through video,” meaning a system uses one or more image sensor(s) to capture images of the physical environment, and uses those images in presenting the AR environment on the opaque display. Further alternatively, a system may have a projection system that projects virtual objects into the physical environment, for example, as a hologram or on a physical surface, so that a person, using the system, perceives the virtual objects superimposed over the physical environment.

[0012] An augmented reality environment also refers to a simulated environment in which a representation of a physical environment is transformed by computer-generated sensory information. For example, in providing pass-through video, a system may transform one or more sensor images to impose a select perspective (e.g., viewpoint) different than the perspective captured by the imaging sensors. As another example, a representation of a physical environment may be transformed by graphically modifying (e.g., enlarging) portions thereof, such that the modified portion may be representative but not photorealistic versions of the originally captured images. As a further example, a representation of a physical environment may be transformed by graphically eliminating or obfuscating portions thereof.

[0013] An augmented virtuality (AV) environment refers to a simulated environment in which a virtual or computer generated environment incorporates one or more sensory inputs from the physical environment. The sensory inputs may be representations of one or more characteristics of the physical environment. For example, an AV park may have virtual trees and virtual buildings, but people with faces photorealistically reproduced from images taken of physical people. As another example, a virtual object may adopt a shape or color of a physical article imaged by one or more imaging sensors. As a further example, a virtual object may adopt shadows consistent with the position of the sun in the physical environment.

[0014] There are many different types of electronic systems that enable a person to sense and/or interact with various CGR environments. Examples include smartphones, tablets, desktop/laptop computers, head-mounted systems, projection-based systems, heads-up displays (HUDs), vehicle windshields having integrated display capability, windows having integrated display capability, displays formed as lenses designed to be placed on a person’s eyes (e.g., similar to contact lenses), headphones/earphones, speaker arrays, input systems (e.g., wearable or handheld controllers with or without haptic feedback and/or cameras having hand tracking and/or other body pose estimation abilities).

[0015] A head-mounted system may have one or more speaker(s) and an integrated opaque display. Alternatively, a head-mounted system may be a head-mounted enclosure (HME) configured to accept an external opaque display (e.g., a smartphone). The head-mounted system may incorporate one or more imaging sensors to capture images or video of the physical environment, and/or one or more microphones to capture audio of the physical environment. Rather than an opaque display, a head-mounted system may have a transparent or translucent display. The transparent or translucent display may have a medium through which light representative of images is directed to a person’s eyes. The display may utilize digital light projection, OLEDs, LEDs, uLEDs, liquid crystal on silicon, laser scanning light source, or any combination of these technologies. The medium may be an optical waveguide, a hologram medium, an optical combiner, an optical reflector, or any combination thereof. In one implementation, the transparent or translucent display may be configured to become opaque selectively. Projection-based systems may employ retinal projection technology that projects graphical images onto a person’s retina. Projection systems also may be configured to project virtual objects into the physical environment, for example, as a hologram or on a physical surface.

[0016] CGR (AR) technology has the potential to be an integral part of a user’s everyday life. Devices that implement CGR can provide information to the user pertaining to many aspects, from navigation, to weather, to architecture, to games, and much more. However, the information provided to the user can be overwhelming and may not pertain to the user’s interests.

SUMMARY

[0017] In accordance with some embodiments, a method is performed at an electronic device with one or more processors and a non-transitory memory. The method includes obtaining pass-through image data characterizing a field of view captured by an image sensor. The method also includes determining whether a recognized subject in the pass-through image data satisfies a confidence score threshold associated with a user-specific recommendation profile. The method further includes generating one or more computer-generated reality (AR) content items associated with the recognized subject in response to determining that the recognized subject in the pass-through image data satisfies the confidence score threshold. The method additionally includes compositing the pass-through image data with the one or more CGR content items, where the one or more CGR content items are proximate to the recognized subject in the field of view.

[0018] In accordance with some embodiments, a method is performed at an electronic device with one or more processors and a non-transitory memory. The method includes obtaining a first set of subjects associated with a first pose of the device. The method also includes determining likelihood estimate values for each of the first set of subjects based on user context and the first pose. The method further includes determining whether at least one likelihood estimate value for at last one respective subject in the first set of subjects exceeds a confidence threshold. The method additionally includes generating recommended content or actions associated with the at least one respective subject using at least one classifier associated with the at least one respective subject and the user context in response to determining that the at least one likelihood estimate value exceeds the confidence threshold.

[0019] In accordance with some embodiments, an electronic device includes a display, one or more input devices, one or more processors, non-transitory memory, and one or more programs; the one or more programs are stored in the non-transitory memory and configured to be executed by the one or more processors and the one or more programs include instructions for performing or causing performance of the operations of any of the methods described herein. In accordance with some embodiments, a non-transitory computer readable storage medium has stored therein instructions which when executed by one or more processors of an electronic device with a display and one or more input devices, cause the device to perform or cause performance of the operations of any of the methods described herein. In accordance with some embodiments, an electronic device includes: a display, one or more input devices; and means for performing or causing performance of the operations of any of the methods described herein. In accordance with some embodiments, an information processing apparatus, for use in an electronic device with a display and one or more input devices, includes means for performing or causing performance of the operations of any of the methods described herein.

BRIEF DESCRIPTION OF THE DRAWINGS

[0020] For a better understanding of the various described embodiments, reference should be made to the Description of Embodiments below, in conjunction with the following drawings in which like reference numerals refer to corresponding parts throughout the figures.

[0021] FIG. 1 is a block diagram of an exemplary operating environment in accordance with some implementations.

[0022] FIGS. 2A-2G illustrate example user interfaces for rendering user-specific computer-generated reality (CGR) content items in accordance with some embodiments.

[0023] FIG. 3 illustrates an example abstract block diagram for generating user-specific CGR content in accordance with some embodiments.

[0024] FIGS. 4A-4C illustrate example user interfaces for recommending user-specific CGR content items based on update user context and/or poses in accordance with some embodiments.

[0025] FIG. 5 illustrates an example abstract block diagram for delivering optimal recommendations in CGR environment in accordance with some embodiments.

[0026] FIG. 6 illustrates a flow diagram of a method of rendering user-specific CGR content items in accordance with some embodiments.

[0027] FIG. 7 illustrates a flow diagram of a method of generating recommended CGR content in accordance with some embodiments.

[0028] FIG. 8 is a block diagram of a computing device in accordance with some embodiments.

DESCRIPTION OF EMBODIMENTS

[0029] Numerous details are described in order to provide a thorough understanding of the example implementations shown in the drawings. However, the drawings merely show some example aspects of the present disclosure and are therefore not to be considered limiting. Those of ordinary skill in the art will appreciate that other effective aspects and/or variants do not include all of the specific details described herein. Moreover, well-known systems, methods, components, devices and circuits have not been described in exhaustive detail so as not to obscure more pertinent aspects of the example implementations described herein.

[0030] In embodiments described below, pass-through image data characterizing a field of view captured by an image sensor is composited with one or more computer-generated reality (CGR) content items. The one or more CGR content items are associated with a recognized subject in the pass-through image data and the recognized subject in the pass-through image data satisfies a confidence score threshold. In the composited image, the one or more CGR content items are placed proximate to the recognized subject in the field of view. Accordingly, the embodiments described below provide a seamless integration of user-specific content. The user-specific content is generated and displayed to a user based on likelihoods of user interests. For example, a cupcake recipe or nutritional information for a cupcake are generated and displayed to the user when a cupcake is recognized within the user’s field of view. As such, the recommended CGR content items generated according to various embodiments described herein allow the user to remain immersed in their experience without having to manually enter in search queries or indicate preferences. The seamless integration also reduces power usage and improves battery life of the device by enabling the user to use the device more quickly and efficiently.

[0031] In embodiments described below, a set of subjects associated with a pose of a device is obtained and likelihood estimate values for each of the set of subjects are determined based on user context and the pose. Recommended content or actions associated with at least one respective subject in the set of subjects are generated. The recommended content or actions are generated using at least one classifier associated with the at least one respective subject in response to determining that at least one likelihood estimate value for the at least one respective subject in the set of subjects exceeds a confidence threshold. As such, the embodiments described below provide a process for generating recommended CGR content based on how likely a user will be interested in a subject. The content recommendation according to various embodiments described herein thus provides a seamless user experience that requires less time and user inputs when locating for information or next action. This also reduces power usage and improves battery life of the device by enabling the user to use the device more quickly and efficiently.

[0032] FIG. 1 is a block diagram of an exemplary operating environment 100 in accordance with some implementations. While pertinent features are shown, those of ordinary skill in the art will appreciate from the present disclosure that various other features have not been illustrated for the sake of brevity and so as not to obscure more pertinent aspects of the example implementations disclosed herein. To that end, as a non-limiting example, the operating environment 100 includes a controller 102 and a CGR device 104. In the example of FIG. 1, the CGR device 104 is worn by a user 10.

[0033] In some implementations, the CGR device 104 corresponds to a tablet or mobile phone. In various implementations, the CGR device 104 corresponds to a head-mounted system, such as a head-mounted device (HMD) or a head-mounted enclosure (HME) having a tablet or mobile phone inserted therein. In some implementations, the CGR device 104 is configured to present CGR content to a user. In some implementations, the CGR device 104 includes a suitable combination of software, firmware, and/or hardware.

[0034] According to some implementations, the CGR device 104 presents, via a display 122, CGR content to the user while the user is virtually and/or physically present within a scene 106. In some implementations, the CGR device 104 is configured to present virtual content (e.g., the virtual cylinder 109) and to enable video pass-through of the scene 106 (e.g., including a representation 117 of the table 107) on a display. In some implementations, the CGR device 104 is configured to present virtual content and to enable optical see-through of the scene 106

[0035] In some implementations, the user holds the CGR device 104 in his/her hand(s). In some implementations, the user wears the CGR device 104 on his/her head. As such, the CGR device 104 includes one or more CGR displays provided to display the CGR content. For example, the CGR device 104 encloses the field-of-view of the user. In some implementations, the CGR device 104 is replaced with a CGR chamber, enclosure, or room configured to present CGR content in which the user does not wear the CGR device 104.

[0036] In some implementations, the controller 102 is configured to manage and coordinate presentation of CGR content for the user. In some implementations, the controller 102 includes a suitable combination of software, firmware, and/or hardware. In some implementations, the controller 102 is a computing device that is local or remote relative to the scene 106. For example, the controller 102 is a local server located within the scene 106. In another example, the controller 102 is a remote server located outside of the scene 106 (e.g., a cloud server, central server, etc.). In some implementations, the controller 102 is communicatively coupled with the CGR device 104 via one or more wired or wireless communication channels 144 (e.g., BLUETOOTH, IEEE 802.11x, IEEE 802.16x, IEEE 802.3x, etc.). In some implementations, the functionalities of the controller 102 are provided by and/or combined with the CGR device 104.

[0037] As illustrated in FIG. 1, the CGR device 104 presents a representation of the scene 106. In some implementations, the representation of the scene 106 is generated by the controller 102 and/or the CGR device 104. In some implementations, the representation of the scene 106 includes a virtual scene that is a simulated replacement of the scene 106. In other words, in some implementations, the representation of the scene 106 is simulated by the controller 102 and/or the CGR device 104. In such implementations, the representation of the scene 106 is different from the scene 106 where the CGR device 104 is located. In some implementations, the representation of the scene 106 includes an augmented scene that is a modified version of the scene 106 (e.g., including the virtual cylinder 109). For example, in some implementations, the controller 102 and/or the CGR device 104 modify (e.g., augment) the scene 106 in order to generate the representation of the scene 106. In some implementations, the controller 102 and/or the CGR device 104 generate the representation of the scene 106 by simulating a replica of the scene 106. In some implementations, the controller 102 and/or the CGR device 104 generate the representation of the scene 106 by removing and/or adding items from the simulated replica of the scene 106.

[0038] FIGS. 2A-2G illustrate exemplary user interfaces for rendering user-specific computer-generated reality (CGR) content in accordance with some embodiments. The user interfaces in these figures are used to illustrate the processes described below, including the process in FIG. 5. In some embodiments, the device detects inputs via an input device that is separate from the display (e.g., a head mounted device (HMD) with voice activated commands, a laptop with a separate touchpad and display, or a desktop with a separate mouse and display).

[0039] As shown in FIG. 2A, the device 104 displays a media capture/interaction interface 202. According to some embodiments, the media capture/interaction interface 202 that displays a scene with subjects in a field of view of an image sensor. The image data (or pass-through image data) representing the scene are captured by the image sensor. In some embodiments, the pass-through image data includes a preview image, a surface image (e.g., planar surface), depth mappings, anchor coordinates (e.g., for depth mappings), and/or the like. In some embodiments, the pass-through image data includes not only visual content, but also includes audio content, 3D renderings, timestamps (of actual frame displayed), a header file (e.g., camera settings such as contrast, saturation, white balance, etc.), and/or metadata.

[0040] As explained above with reference to FIG. 1, in some embodiments, the image sensor for capturing the scene is part of the device 104 or attached to the device 104; while in some other embodiments, the image sensor is detached from the device 104, e.g., on a camera remote from the device. 104 In various embodiments, the scene changes as the field of view of the image sensor changes, as will be shown below with reference to FIGS. 2C-2G. In FIG. 2A, the media capture/interaction interface 202 includes an open doorway with a door sign 210 labeled as “201”. The media capture/interaction interface 202 also shows through the open doorway a picture frame 220 and a table 230 in the room.

[0041] FIG. 2B shows a composited pass-through image data rendering with CGR content items in the media capture/interaction interface 202. According to some embodiments, the composited pass-through image data includes information, e.g., room information 212 and a floor map 214 associated with the room. The room information 212 and the floor map 214 are CGR content items generated based on the device 104 recognizing the door sign 210 and determining that the user is interested in learning more about the room and the building. In some embodiments, as indicated by the dotted line around the door sign 210, the recognized subject in the field of view is emphasized to indicate the association of the additional CGR content items 212 and 214 with the recognized subject 210. In some embodiments, the CGR content items 212 and 214 are animated (e.g., flashing, shrinking/enlarging, moving, etc.) near the recognized subject 210 to indicate the association with the recognized subject 210. In some embodiments, in addition to or in place of displaying the CGR content items 212 and 214 on the display of the device 104, audio content is played as the CGR content items, e.g., reading the door sign, the room information, and/or the floor map to the user.

[0042] FIGS. 2B-2C illustrate a sequence in which the media capture/interaction interface 202 is updated based on a change of the field of view of the image sensor. The perspective or vantage point of the image sensor changes between FIGS. 2B-2C. For example, in FIG. 2C, the doorway is no longer displayed in the media capture/interaction interface 202 indicating the user has entered the room. As a result, the CGR content items 212 and 214 associated with the door sign 210 as shown in FIG. 2B are no longer provided to the user. Instead, as the user enters the room, the user has a full view of the room. As a result, as shown in FIG. 2C, the media capture/interaction interface 202 displays three walls of the room. The media capture/interaction interface 202 also displays the picture frame 220, the table 230, a clock 240, and a dog 236 in the room. Additionally, as shown in FIG. 2C, the media capture/interaction interface 202 displays a cupcake 232 and a book 234 on the table 230.

[0043] FIGS. 2D-2E illustrate different CGR content items rendered to the user based on different user context. In FIG. 2D, the composited pass-through image data includes an CGR content item 250 associated with the cupcake 232. The CGR content item 250 is rendered adjacent to or relative to the cupcake 232. Further, the CGR content item 250 includes information associated with the cupcake 232, e.g., calories of the cupcake, and affordances including a link 252 to a recipe for the cupcake 232 and a button 254 for adding the cupcake 232 to a dietary log. The affordances 252 are provided as options to the user in order to perform an action associated with the cupcake 232, e.g., tapping on the link 252 to find out the receipt for the or clicking the button 254 to add the cupcake 232 to a dietary log.

[0044] The CGR content item 250 shown in FIG. 2D is generated based on a determination that the user is interested in the cupcake 232 and a recommendation is made to provide information regarding the cupcake 232. In comparison, FIG. 2E illustrates a different CGR content item 256, which overlays on the cupcake 232. While the user is still interested in the cupcake 230, the CGR content item 256 is made based on a different user context, e.g., the user has a dietary restriction, etc.

[0045] FIG. 2F illustrates an CGR content item 260 proximate to the recognized subject (e.g., the table 230), where the CGR content item 260 is generated in response to detecting gaze proximate to a region 262 containing at least part of the recognized subject 230. In FIG. 2F, the device 104 detects the region 262 proximate to the gaze which includes part of the table 230, part of the cupcake 232 on the table 230, and part of the book 234 on the table 230. The device 104 recognizes the table 230 using a subset of the pass-through image data corresponding to the region 262 and applying a table classifier to the subset of image data.

[0046] In some embodiments, the table classifier is selected based on weights assigned to a cluster of classifiers. In some embodiments, the classifiers correspond to entries in a library of objects/subjects, e.g., shapes, numbers, animals, foods, plants, people, dogs, squares, flowers, shapes, lighting, or the like. Using one or more classifiers, a subject can be recognized in the image data. During the subject recognition, weights are assigned to different classifiers and one or more classifiers can be selected based on the weight associated with each classifier. The selected classifier(s) can then be used for recognizing a subject in the image data.

[0047] For example, based on the gaze proximate to the region 262, weights are assigned to the table classifier, a cupcake classifier, and a book classifier. As the gaze settles on the table surface, the weight assigned to the table classifier increases, while the weights assigned to the cupcake classifier and the book classifier decrease. Based on the weights assigned to the classifiers, the table classifier is selected for identifying the table subject 230 proximate to the gaze region 262. Having recognized the table 230, the device 104 renders the CGR content 260, such as recommendations of a chair which may match the style of the table 230, adjacent to the table 230.

[0048] FIG. 2G illustrates a CGR content item 270 (e.g., a hand icon in a pointing configuration) proximate to the recognized subject 234, where a gaze region 272 is within a threshold distance from the recognized subject 234. In FIG. 2G, the device 104 detects that the gaze region 272 is on a dog 236 in the field of view. However, based on user context, it is unlikely that the user is interested in seeing more information about the dog 236 displayed in the media capture/interaction interface 202, e.g., the user is afraid of animals. Further based on the user context, the device determines that the book 234 is more of interest to the user (e.g., the user recently obtained the book 234 from a library) and the book 234 is within a threshold distance from the gaze region 272. Subsequently, the device 104 expands the gaze region 272 so that more subjects are included in the region and analyzed. The book 234 is then recognized from image data corresponding to the expanded gaze region and the CGR content item 270 is generated and rendered above the book 234.

[0049] FIG. 2G shows that the CGR content item 270 is generated for a specific user through the likelihood estimation, where a priori information about the user as well as current pass-through image data are inputs. This is particularly useful when the recognized subject includes multiple searchable elements and each is associated with at least classifier. For example, in FIGS. 2A-2G, the picture frame 220 includes multiple searchable elements, including the frame itself, the vase in the picture, and the flowers in the pictured vase. In order to differentiate these searchable elements and generate CGR content items for an element that the user will most likely be interested in, content recommendations are fine-tuned as described below in greater detail with reference to FIG. 3.

[0050] FIG. 3 illustrates an abstract block diagram associated with a multi-iteration process 300 for identifying a subject that the user is most likely interested. While pertinent features are shown, those of ordinary skill in the art will appreciate from the present disclosure that various other features have not been illustrated for the sake of brevity and so as not to obscure more pertinent aspects of the example embodiments disclosed herein. To that end, as a non-limiting example, in FIG. 3, as a gaze region 222 is proximate to the picture frame 220 in the field of view, the picture frame 220 includes multiple searchable elements including the frame 310, the flower 320, and the vase 330, and each of which is proximate to the gaze region. The likelihood estimate values are determined during multi-iterations.

[0051] In some embodiments, each of the likelihood estimate values is assigned an initial value, e.g., all likelihood estimate values are 0 or the likelihood estimate values are equally distributed. As shown in FIG. 3, during the first iteration, the likelihood estimate values for the frame 310, the flower 320, and the vase 330 are equally assigned to approximately 1/3, e.g., 0.33 for the frame 310, 0.33 for the flower 320, and 0.34 for the vase 330. During the second iteration, the likelihood estimate values are updated to reflect what the user is interested in at a next time step after the first iteration. Further, as will be described in detail below with reference to FIGS. 4A-4C, changes in poses and/or the user context can contribute to the changes in the likelihood estimate value. Thus, during the second iteration the likelihood estimate value for the frame 310 is 0.25, the likelihood estimate value for the flower 320 is 0.00, and the likelihood estimate value for the vase 330 is 0.75. Likewise, during the third iteration, more changes in poses and/or the user context cause the likelihood estimate value for the frame 310 changes to 0.75, for the flower 320 to 0.00, and for the vase 330 to 0.25. While the likelihood estimate value of 0.00 for the flower 320 indicates most likely the user is not interesting in learning more about the flower 320, in some embodiments, the device would need more iteration(s) to identify one element that the user is most interested in, e.g., the values of 0.25 and 0.75 do not exceed a confidence threshold. As shown in FIG. 3, during the fourth and final iteration, the likelihood estimate value for the frame 310 has increased to 0.90, indicating that the user is most likely interested in the frame itself, not the picture depicted in the frame.

[0052] The selection process illustrated in FIG. 3 is funnel shaped, such that over time, e.g., after the second and third iterations or a threshold amount of time, the likelihood estimate values below a threshold value (e.g., the flower with the likelihood estimate value of 0.00) are not included in the next iteration. After multiple iterations, the likelihood estimate values are converged to a particular value, so that recommendations can be made for the particular subject that the user is most likely interested in.

[0053] Turning to FIGS. 4A-4C, FIGS. 4A-4C illustrate exemplary user interfaces for rendering user-specific CGR content items based on user context and/or poses in accordance with some embodiments. The exemplary user interfaces are used to illustrate a recommended content generation process in FIG. 5.

[0054] For example, in FIG. 4A, the device 104 detects a gaze region 222, as indicated by the dotted line, proximate to the picture frame 220 based on a pose of the device 104. The picture frame 220 includes the frame itself, the vase in the picture, and the flowers in the pictured vase. The likelihood estimator of the device 104 determines the likelihood estimate values for each of the subjects, e.g., the likelihood estimate value for the frame, the likelihood estimate value for the vase, and the likelihood estimate value for the flowers. In some embodiments, the likelihood estimate values are determined based on both user context and the pose. In FIG. 4A, the gaze region 222a is proximate to the frame, the vase, and the flowers. Using the user context, e.g., the user is a botanist, not an artist, it is more likely that the user is interested in the flowers pictured in the frame 220. Thus, the device 104 generates recommended content 224 to provide flower information to the user.

您可能还喜欢...