空 挡 广 告 位 | 空 挡 广 告 位

Apple Patent | Object detection with instance detection and general scene understanding

Patent: Object detection with instance detection and general scene understanding

Drawings: Click to check drawins

Publication Number: 20210073545

Publication Date: 20210311

Applicant: Apple

Abstract

Various implementations disclosed herein include devices, systems, and methods that determine a particular object instance in CGR environments. In some implementations, an object type of an object depicted in an image of a physical environment is identified. Then, a particular instance is determined based on the object type and the image. In some implementations, objects of the particular instance have a set of characteristics that differs from sets of characteristics associated with other instances of the object type. Then, the set of characteristics of the particular instance of the object depicted in the physical environment is obtained.

Claims

  1. A method comprising: at an electronic device having a processor: identifying an object type of an object depicted in an image of a physical environment; determining a particular instance of the object based on the object type and the image, objects of the particular instance having a set of characteristics that differs from sets of characteristics associated with other instances of the object type; and obtaining the set of characteristics of the particular instance of the object depicted in the physical environment

  2. The method of claim 1, wherein identifying an object type of an object depicted in an image includes detecting the object type of a plurality of object types of the object in the image using a first machine learning model.

  3. The method of claim 1, wherein determining a particular instance of the object based on the object type and the image comprises using features extracted from a portion of the image depicting the object to generate a representation of the particular instance of the object; and determining the particular instance using the representation.

  4. The method of claim 3, wherein determining the particular instance using the representation comprises: querying a first database of instances of the object type using the representation; and receiving the particular instance from the first database.

  5. The method of claim 4, wherein obtaining the set of characteristics of the particular instance of the object depicted in the physical environment comprises accessing a database to receive information on materials, dimensions, physical properties, or visual properties of the particular instance of the object

  6. The method of claim 3, wherein determining the particular instance using the representation comprises using a second machine learning model that inputs the instance type and the representation of the particular instance of the object and outputs the particular instance of the object.

  7. The method of claim 6, wherein obtaining the set of characteristics of the particular instance of the object depicted in the physical environment comprises accessing a database to receive information on materials, dimensions, physical properties, or visual properties of the particular instance of the object

  8. The method of claim 1, further comprising: combining the set of characteristics of the particular instance of the object with a CGR environment depicting the physical environment.

  9. The method of claim 8, wherein the set of characteristics of the particular instance of the object is used to perform scene understanding of the CGR environment depicting the physical environment.

  10. The method of claim 8, wherein the set of characteristics of the particular instance of the object is used to perform scene reconstruction in the CGR environment depicting the physical environment.

  11. The method of claim 8, wherein the set of characteristics of the particular instance of the object is used to perform material detection in the CGR environment depicting the physical environment.

  12. The method of claim 8, wherein the set of characteristics of the particular instance of the object is used to perform environment texturing in the CGR environment depicting the physical environment.

  13. The method of claim 8, wherein the set of characteristics of the particular instance of the object is used to generate reflections of virtual objects in the CGR environment or reflections of real objects of the CGR environment in the virtual objects of the CGR environment depicting the physical environment.

  14. The method of claim 8, wherein the set of characteristics of the particular instance of the object is used to perform physics simulations in the CGR environment depicting the physical environment.

  15. The method of claim 8, wherein the set of characteristics of the particular instance of the object is used to reconstruct object portions in the CGR environment that are not in the image depicting the physical environment.

  16. The method of claim 8, wherein the set of characteristics of the particular instance of the object is used to determine object or plane boundaries in the CGR environment depicting the physical environment.

  17. The method of claim 8, wherein the set of characteristics of the particular instance of the object is used to effect removal of real objects from the CGR environment depicting the physical environment.

  18. A system comprising: a non-transitory computer-readable storage medium; and one or more processors coupled to the non-transitory computer-readable storage medium, wherein the non-transitory computer-readable storage medium comprises program instructions that, when executed on the one or more processors, cause the system to perform operations comprising: identifying an object type of an object depicted in an image of a physical environment; determining a particular instance based on the object type and the image, objects of the particular instance having a set of characteristics that differs from sets of characteristics associated with other instances of the object type; and obtaining the set of characteristics of the particular instance of the object depicted in the physical environment.

  19. A non-transitory computer-readable storage medium, storing program instructions computer-executable on a computer to perform operations comprising: identifying an object type of an object depicted in an image of a physical environment; determining a particular instance based on the object type and the image, objects of the particular instance having a set of characteristics that differs from sets of characteristics associated with other instances of the object type; and obtaining the set of characteristics of the particular instance of the object depicted in the physical environment.

  20. The non-transitory computer-readable storage medium of claim 19, wherein determining a particular instance of the object based on the object type and the image comprises using features extracted from a portion of the image depicting the object to generate a representation of the particular instance of the object; and determining the particular instance using the representation.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS

[0001] This application claims the benefit of U.S. Provisional Application Ser. No. 62/897,625 filed Sep. 9, 2019, which is incorporated herein in its entirety.

TECHNICAL FIELD

[0002] The present disclosure generally relates to electronic devices, and in particular, to systems, methods, and devices that provide computer generated reality (CGR) environments based on images of physical environments.

BACKGROUND

[0003] CGR environments may be created based on images of a physical environment. For example, a device may capture images of a physical environment and add virtual content amongst the physical objects in a CGR environment that is presented to a user. Existing techniques for detecting objects, identifying instances of objects, and generally understanding the physical environment may be improved with respect to efficiency and accuracy.

SUMMARY

[0004] Various implementations disclosed herein include devices, systems, and methods that provide scene understanding of the physical environment that is used to provide a CGR environment. The scene understanding is based on detecting one or more particular physical objects (e.g., brand x chair, model y) present in the physical environment. The object instance is detected by first detecting an object and its type (e.g., a chair is detected using a first neural network) and then performing instance detection guided by the object type, for example, using a second neural network to identify chair model number or performing a visual search using features extracted, for example from within a bounding box around the table, in the current camera image. Implementations disclosed herein may combine object detection with instance detection in various ways. In various implementations, the instance detection of an identified object type is used to access a set of characteristics (e.g., dimensions, material properties, etc.) for instances of an identified object type. In some implementations, the CGR environment is provided based on the instance detection, for example, by combining one or more of the set of characteristics to modify the CGR environment.

[0005] In some implementations, object detection detects and identifies an object type for objects in images of a physical environment using a first machine learning model. For example, object detection detects and identifies an object type (e.g., table, couch, chair, etc.) for furniture objects in images of a physical environment of a room. Then, in some implementations, instance detection uses a second machine learning model trained for that object type (e.g., table), and inputs distinct features from images of the physical environment of the detected object to determine a precise model or particular instance of the object type (e.g., table brand model xyz1). In various implementations, objects of the particular instance have a set of characteristics that differs from sets of characteristics associated with other instances of the object type.

[0006] Some implementations use a database of instances of the identified object type (e.g., tables). The determined particular instance (or the specific brand model identifier xyz1) may be used to access (e.g., via an index) a variety of information or set of characteristics such as materials, dimensions, colors, etc. of that determined particular instance (e.g., table xyz1). In some implementations, the determined particular instance from instance detection is used to access a robust description of that particular instance (e.g., table xyz1) of the identified object type (e.g., table). In some implementations, one or more of the set of characteristics obtained for that determined particular instance are combined with the CGR environment. In some implementations, the one or more of the set of characteristics obtained for that determined particular instance (e.g., table brand model identifier xyz1) are used for scene understanding, scene reconstruction, or material detection in the CGR environment. In some implementations, the one or more of the set of characteristics obtained for that specific table model xyz1 are used to improve the quality of the CGR environment.

[0007] In some implementations, the instance detection after object type identification is used for environment texturing (e.g., reflecting real objects in virtual objects) in the CGR environment. In some implementations, the instance detection is used for reflecting virtual objects on real objects in the CGR environment. In some implementations, the instance detection is used for determining physical properties of real objects for enhanced physics simulation (e.g., friction of a surface, bounce behavior for an object, or audio reflectivity) in the CGR environment. In some implementations, the instance detection is used for generating a high-quality scene or object reconstruction (e.g., without visual data of the entire object) using precise object or plane boundaries (e.g., dimensions) in the CGR environment (e.g., for occlusion handling and physics simulation). In some implementations, the instance detection is used for diminished reality (e.g., removing or replacing real objects) in the CGR environment. In some implementations, the instance detection is used for understanding the light situation (e.g., position, color, or direction of light sources) in the CGR environment.

[0008] Various implementations disclosed herein include devices, systems, and methods that determine a particular object instance in CGR environments. In some implementations, an object type (e.g., table) of an object depicted in an image of a physical environment (e.g., interior room) is identified. In some implementations, a particular instance (e.g., table brand model identifier xyz1) is determined based on the object type and distinct features of the object in the image of the physical environment. Then, the set of characteristics of the determined particular instance of the object depicted in images of the physical environment is obtained using the determined particular instance to perform a lookup in a database of instances of that object type (e.g., table). In some implementations, objects of the particular instance have a set of characteristics (e.g., dimensions, color, materials) that differs from sets of characteristics associated with other instances (e.g., table brand model identifier abc4, table brand model identifier klm11) of the object type. In some implementations, the instance detection is used in a CGR environment by combining one or more of the set of characteristics of the determined particular instance (e.g., table brand model identifier xyz1) with the CGR environment.

[0009] In accordance with some implementations, a non-transitory computer readable storage medium has stored therein instructions that are computer-executable to perform or cause performance of any of the methods described herein. In accordance with some implementations, a device includes one or more processors, a non-transitory memory, and one or more programs; the one or more programs are stored in the non-transitory memory and configured to be executed by the one or more processors and the one or more programs include instructions for performing or causing performance of any of the methods described herein.

BRIEF DESCRIPTION OF THE DRAWINGS

[0010] So that the present disclosure can be understood by those of ordinary skill in the art, a more detailed description may be had by reference to aspects of some illustrative implementations, some of which are shown in the accompanying drawings.

[0011] FIG. 1 is a block diagram of an example operating environment in accordance with some implementations.

[0012] FIG. 2 is a block diagram of an example controller in accordance with some implementations.

[0013] FIG. 3 is a block diagram of an example electronic device (e.g., head-mounted device (HMD)) in accordance with some implementations.

[0014] FIGS. 4A-4D are block diagrams illustrating using a particular instance of an identified object type in CGR environments in accordance with some implementations.

[0015] FIG. 5 is a flowchart illustrating an exemplary method of modifying a CGR environment based on a particular instance of an object type in accordance with some implementations.

[0016] In accordance with common practice the various features illustrated in the drawings may not be drawn to scale. Accordingly, the dimensions of the various features may be arbitrarily expanded or reduced for clarity. In addition, some of the drawings may not depict all of the components of a given system, method or device. Finally, like reference numerals may be used to denote like features throughout the specification and figures.

DESCRIPTION

[0017] Numerous details are described in order to provide a thorough understanding of the example implementations shown in the drawings. However, the drawings merely show some example aspects of the present disclosure and are therefore not to be considered limiting. Those of ordinary skill in the art will appreciate that other effective aspects or variants do not include all of the specific details described herein. Moreover, well-known systems, methods, components, devices and circuits have not been described in exhaustive detail so as not to obscure more pertinent aspects of the example implementations described herein. While FIGS. 1-3 depict exemplary implementations involving an electronic device, other implementations may involve other types of devices including, but not limited to, watches and other wearable electronic devices, mobile devices, laptops, desktops, gaming devices, head mounted device (HMD), home automation devices, and other devices that include or use image capture devices.

[0018] FIG. 1 is a block diagram of an example operating environment 100 in accordance with some implementations. While pertinent features are shown, those of ordinary skill in the art will appreciate from the present disclosure that various other features have not been illustrated for the sake of brevity and so as not to obscure more pertinent aspects of the example implementations disclosed herein. To that end, as a non-limiting example, the operating environment 100 includes a controller 110 and an electronic device 120, one or both of which may be in a physical environment. A physical environment refers to a physical world that people can sense and/or interact with without aid of electronic systems. Physical environments, such as a physical park, include physical articles, such as physical trees, physical buildings, and physical people. People can directly sense and/or interact with the physical environment, such as through sight, touch, hearing, taste, and smell.

[0019] In some implementations, the controller 110 is configured to manage and coordinate a computer-generated reality (CGR) environment for the user. In some implementations, the controller 110 includes a suitable combination of software, firmware, or hardware. The controller 110 is described in greater detail below with respect to FIG. 2. In some implementations, the controller 110 is a computing device that is local or remote relative to the physical environment 105.

[0020] In one example, the controller 110 is a local server located within the physical environment 105. In another example, the controller 110 is a remote server located outside of the physical environment 105 (e.g., a cloud server, central server, etc.). In some implementations, the controller 110 is communicatively coupled with the electronic device 120 via one or more wired or wireless communication channels 144 (e.g., BLUETOOTH, IEEE 802.11x, IEEE 802.16x, IEEE 802.3x, etc.).

[0021] In some implementations, the controller 110 and the electronic device 120 are configured to present the CGR environment to the user together.

[0022] In some implementations, the electronic device 120 is configured to present the CGR environment to the user. In some implementations, the electronic device 120 includes a suitable combination of software, firmware, or hardware. The electronic device 120 is described in greater detail below with respect to FIG. 3. In some implementations, the functionalities of the controller 110 are provided by or combined with the electronic device 120, for example, in the case of an electronic device that functions as a stand-alone unit.

[0023] According to some implementations, the electronic device 120 presents a CGR environment to the user while the user is present within the physical environment 105. A CGR environment refers to a wholly or partially simulated environment that people sense and/or interact with via an electronic system. In CGR, a subset of a person’s physical motions, or representations thereof, are tracked, and, in response, one or more characteristics of one or more virtual objects simulated in the CGR environment are adjusted in a manner that comports with at least one law of physics. For example, a CGR system may detect a person’s head turning and, in response, adjust graphical content and an acoustic field presented to the person in a manner similar to how such views and sounds would change in a physical environment. In some situations (e.g., for accessibility reasons), adjustments to characteristic(s) of virtual object(s) in a CGR environment may be made in response to representations of physical motions (e.g., vocal commands).

[0024] A person may sense and/or interact with a CGR object using any one of their senses, including sight, sound, touch, taste, and smell. For example, a person may sense and/or interact with audio objects that create 3D or spatial audio environment that provides the perception of point audio sources in 3D space. In another example, audio objects may enable audio transparency, which selectively incorporates ambient sounds from the physical environment with or without computer-generated audio. In some CGR environments, a person may sense and/or interact only with audio objects.

[0025] Examples of CGR include virtual reality and mixed reality. A virtual reality (VR) environment refers to a simulated environment that is designed to be based entirely on computer-generated sensory inputs for one or more senses. A VR environment comprises virtual objects with which a person may sense and/or interact. For example, computer-generated imagery of trees, buildings, and avatars representing people are examples of virtual objects. A person may sense and/or interact with virtual objects in the VR environment through a simulation of the person’s presence within the computer-generated environment, and/or through a simulation of a subset of the person’s physical movements within the computer-generated environment.

[0026] In contrast to a VR environment, which is designed to be based entirely on computer-generated sensory inputs, a mixed reality (MR) environment refers to a simulated environment that is designed to incorporate sensory inputs from the physical environment, or a representation thereof, in addition to including computer-generated sensory inputs (e.g., virtual objects). On a virtuality continuum, a mixed reality environment is anywhere between, but not including, a wholly physical environment at one end and virtual reality environment at the other end.

[0027] In some MR environments, computer-generated sensory inputs may respond to changes in sensory inputs from the physical environment. Also, some electronic systems for presenting an MR environment may track location and/or orientation with respect to the physical environment to enable virtual objects to interact with real objects (that is, physical articles from the physical environment or representations thereof). For example, a system may account for movements so that a virtual tree appears stationery with respect to the physical ground.

[0028] Examples of mixed realities include augmented reality and augmented virtuality. An augmented reality (AR) environment refers to a simulated environment in which one or more virtual objects are superimposed over a physical environment, or a representation thereof. For example, an electronic system for presenting an AR environment may have a transparent or translucent display through which a person may directly view the physical environment. The system may be configured to present virtual objects on the transparent or translucent display, so that a person, using the system, perceives the virtual objects superimposed over the physical environment. Alternatively, a system may have an opaque display and one or more imaging sensors that capture images or video of the physical environment, which are representations of the physical environment. The system composites the images or video with virtual objects, and presents the composition on the opaque display. A person, using the system, indirectly views the physical environment by way of the images or video of the physical environment, and perceives the virtual objects superimposed over the physical environment. As used herein, a video of the physical environment shown on an opaque display is called “pass-through video,” meaning a system uses one or more image sensor(s) to capture images of the physical environment, and uses those images in presenting the AR environment on the opaque display. Further alternatively, a system may have a projection system that projects virtual objects into the physical environment, for example, as a hologram or on a physical surface, so that a person, using the system, perceives the virtual objects superimposed over the physical environment.

[0029] An augmented reality environment also refers to a simulated environment in which a representation of a physical environment is transformed by computer-generated sensory information. For example, in providing pass-through video, a system may transform one or more sensor images to impose a select perspective (e.g., viewpoint) different than the perspective captured by the imaging sensors. As another example, a representation of a physical environment may be transformed by graphically modifying (e.g., enlarging) portions thereof, such that the modified portion may be representative but not photorealistic versions of the originally captured images. As a further example, a representation of a physical environment may be transformed by graphically eliminating or obfuscating portions thereof.

[0030] An augmented virtuality (AV) environment refers to a simulated environment in which a virtual or computer generated environment incorporates one or more sensory inputs from the physical environment. The sensory inputs may be representations of one or more characteristics of the physical environment. For example, an AV park may have virtual trees and virtual buildings, but people with faces photorealistically reproduced from images taken of physical people. As another example, a virtual object may adopt a shape or color of a physical article imaged by one or more imaging sensors. As a further example, a virtual object may adopt shadows consistent with the position of the sun in the physical environment.

[0031] There are many different types of electronic systems that enable a person to sense and/or interact with various CGR environments. Examples include head mounted systems, projection-based systems, heads-up displays (HUDs), vehicle windshields having integrated display capability, windows having integrated display capability, displays formed as lenses designed to be placed on a person’s eyes (e.g., similar to contact lenses), headphones/earphones, speaker arrays, input systems (e.g., wearable or handheld controllers with or without haptic feedback), smartphones, tablets, and desktop/laptop computers. A head mounted system may have one or more speaker(s) and an integrated opaque display. Alternatively, a head mounted system may be configured to accept an external opaque display (e.g., a smartphone). The head mounted system may incorporate one or more imaging sensors to capture images or video of the physical environment, and/or one or more microphones to capture audio of the physical environment. Rather than an opaque display, a head mounted system may have a transparent or translucent display. The transparent or translucent display may have a medium through which light representative of images is directed to a person’s eyes. The display may utilize digital light projection, OLEDs, LEDs, uLEDs, liquid crystal on silicon, laser scanning light source, or any combination of these technologies. The medium may be an optical waveguide, a hologram medium, an optical combiner, an optical reflector, or any combination thereof. In one embodiment, the transparent or translucent display may be configured to become opaque selectively. Projection-based systems may employ retinal projection technology that projects graphical images onto a person’s retina. Projection systems also may be configured to project virtual objects into the physical environment, for example, as a hologram or on a physical surface.

[0032] FIG. 2 is a block diagram of an example of the controller 110 in accordance with some implementations. While certain specific features are illustrated, those skilled in the art will appreciate from the present disclosure that various other features have not been illustrated for the sake of brevity, and so as not to obscure more pertinent aspects of the implementations disclosed herein. To that end, as a non-limiting example, in some implementations the controller 110 includes one or more processing units 202 (e.g., microprocessors, application-specific integrated-circuits (ASICs), field-programmable gate arrays (FPGAs), graphics processing units (GPUs), central processing units (CPUs), processing cores, or the like), one or more input/output (I/O) devices 206, one or more communication interfaces 208 (e.g., universal serial bus (USB), FIREWIRE, THUNDERBOLT, IEEE 802.3x, IEEE 802.11x, IEEE 802.16x, global system for mobile communications (GSM), code division multiple access (CDMA), time division multiple access (TDMA), global positioning system (GPS), infrared (IR), BLUETOOTH, ZIGBEE, or the like type interface), one or more programming (e.g., I/O) interfaces 210, a memory 220, and one or more communication buses 204 for interconnecting these and various other components.

[0033] In some implementations, the one or more communication buses 204 include circuitry that interconnects and controls communications between system components. In some implementations, the one or more I/O devices 206 include at least one of a keyboard, a mouse, a touchpad, a joystick, one or more microphones, one or more speakers, one or more image capture devices or other sensors, one or more displays, or the like.

[0034] The memory 220 includes high-speed random-access memory, such as dynamic random-access memory (DRAM), static random-access memory (CGRAM), double-data-rate random-access memory (DDR RAM), or other random-access solid-state memory devices. In some implementations, the memory 220 includes non-volatile memory, such as one or more magnetic disk storage devices, optical disk storage devices, flash memory devices, or other non-volatile solid-state storage devices. The memory 220 optionally includes one or more storage devices remotely located from the one or more processing units 202. The memory 220 comprises a non-transitory computer readable storage medium. In some implementations, the memory 220 or the non-transitory computer readable storage medium of the memory 220 stores the following programs, modules and data structures, or a subset thereof including an optional operating system 230 and computer-generated reality (CGR) module 240.

……
……
……

您可能还喜欢...