Apple Patent | Techniques for viewing 3d photos and 3d videos
Patent: Techniques for viewing 3d photos and 3d videos
Patent PDF: 20240007607
Publication Number: 20240007607
Publication Date: 2024-01-04
Assignee: Apple Inc
Abstract
Various implementations disclosed herein include devices, systems, and methods that determine and provide a transition (optionally including a transition effect) between different types of views of three-dimensional (3D) content. For example, an example process may include obtaining a 3D content item, providing a first view of the 3D content item within a 3D environment, determining to transition from the first view to a second view of the 3D content item based on a criterion, and providing the second view of the 3D content item within the 3D environment, where the left eye view and the right eye view of the 3D content item are based on at least one of the left eye content and the right eye content.
Claims
What is claimed is:
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
Description
CROSS-REFERENCE TO RELATED APPLICATIONS
This patent application is a continuation of International Application No. PCT/US2022/021243 filed Mar. 22, 2022, which claims the benefit of U.S. Provisional Application No. 63/168,320 filed Mar. 31, 2021, entitled “TECHNIQUES FOR VIEWING 3D PHOTOS AND 3D VIDEOS,” each of which is incorporated herein by this reference in its entirety.
TECHNICAL FIELD
The present disclosure generally relates to viewing three-dimensional (3D) photos and 3D videos, and particularly to systems and techniques that enable transitioning between different types of views during the display of 3D photos and 3D videos.
BACKGROUND
Various techniques are used to present three-dimensional (3D) photos and 3D videos. Some techniques store a 3D photo as a stereo image (e.g., a pair of images for each of the right eye and left eye). Such 3D images are viewed by simultaneously presenting the respective image of the image pair to the appropriate eye. While stereo image-based techniques provide relatively high-quality representation of the 3D content, they are limited with respect to providing a viewing experience in which a user can change his or her viewpoint relative to the presented images. For example, such techniques would not be well suited for an extended reality (XR) viewing environment in which a 3D photo is positioned at a fixed position and orientation for the user to view from different viewpoints. Other 3D photo/3D video capturing/viewing techniques use 3D point clouds. Such techniques may store a 3D photo as a point cloud representing the 3D positions of points of the environment depicted in the 3D photo. Such point cloud-based 3D images may be viewed by using the point cloud to render views. While point cloud-based techniques may facilitate viewing environments in which a user is able to view a 3D photo from a greater variety of viewpoints, the resolution of the views is often limited and will typically be significantly less than the resolution provided by a stereo image-based technique. Thus, generally, existing techniques may not adequately enable a 3D photo/3D video viewing environment that provides both high resolution views of a 3D photo/3D video and the ability to view the 3D photo/3D video from a variety of viewpoints.
SUMMARY
Various implementations disclosed herein include devices, systems, and methods that display a three-dimensional (3D) content item (e.g., a 3D photo or 3D video) at a position within a 3D environment and provides different types of views of the 3D content item based on various criteria. For example, the type of view may differ based on the viewer/head position and/or the 3D content item location within the 3D environment. The view type transitions from a first type (e.g., a high quality stereo view in which each eye is presented a view based on stereo image pairs or a respective dense point cloud of a stereo point cloud) to a second type (e.g., a stereo view based on a single point cloud) in which each eye is presented a view based on rendering a single point cloud from a different respective viewing positions. The transition may use an animation to avoid undesirable effects and appearances. There may be an additional gradual or abrupt transition to a third type, which may be more abstract, e.g., depicting sparser point cloud points.
In general, one innovative aspect of the subject matter described in this specification can be embodied in methods, at an electronic device having a processor, that include the actions of obtaining a three-dimensional (3D) content item, the 3D content item including left eye content and right eye content generated based on camera images corresponding to left and right eye viewpoints. The methods provide a first view of the 3D content item at a position within a 3D environment. The first view includes a left eye view of the 3D content item that is based on the left eye content and a right eye view of the 3D content item that is based on the right eye content. The methods determine to transition from the first view to a second view of the 3D content item based on a criterion. The methods provide the second view of the 3D content item within the 3D environment. The second view includes the left eye view of the 3D content item that is based on a shared content item and the right eye view of the 3D content item that is based on the shared content item, where the shared content item comprises at least one of the left eye content and the right eye content.
These and other embodiments can each optionally include one or more of the following features.
In some aspects, the at least one of the left eye content and the right eye content includes the left eye content without the right eye content, the right eye content without the left eye content, or content generated based on the left eye content and the right eye content.
In some aspects, the left eye content and the right eye content includes a stereo camera image or a stereo point cloud generated based on a stereo camera image.
In some aspects, the method further includes providing a transitioning effect while transitioning from the first view to the second view, wherein providing the transitioning effect includes providing animated content when transitioning between the first view and the second view. In some aspects, the electronic device includes sensor data of a physical environment proximate the electronic device and providing the transitioning effect includes providing a view of the physical environment.
In some aspects, the criterion for determining to transition from the first view to the second view is based on a position of the electronic device or a user of the electronic device relative to a portion of the position of the 3D content item within the 3D environment.
In some aspects, criterion for determining to transition from the first view to the second view is based on a change in a gaze direction a user of the electronic device relative to the position of the 3D content item within the 3D environment.
In some aspects, the left eye content and right eye are based on depth data and the criterion for determining to transition from the first view to the second view is based on a change in quality of the depth data.
In some aspects, the criterion for determining to transition from the first view to the second view is based on a distance between the electronic device and the position of the 3D content item within the 3D environment.
In some aspects, the at least one of the left eye content and the right eye content includes a left eye point cloud of a stereo point cloud, a right eye point cloud of the stereo point cloud, or a combination of the left eye point cloud and right eye point cloud.
In some aspects, the left eye view of the 3D content item is not based on the right eye content, and the right eye view of the 3D content item is not based on the left eye content when providing the first view.
In some aspects, the method further includes determining to transition from the second view to a third view of the 3D content based on an additional criterion, and providing the third view of the 3D content in which the left eye view and right eye view are based on content that is sparser than the at least one of the left eye content and the right eye content.
In some aspects, the additional criterion used to determine to transition from the second view to the third view is based on a same type of criterion used to determine to transition from the first view to the second view.
In some aspects, the left eye content and right eye content are obtained based on depth data and light intensity image data.
In some aspects, the light intensity image data is based on light intensity image data for the right eye view from a first light intensity image sensor and light intensity image data for the left eye view from a second light intensity image sensor.
In some aspects, the depth data is based on depth data from a single depth sensor, depth data from a first depth sensor for the left eye and a second depth sensor the right eye, or depth data determined from the light intensity image data.
In some aspects, the 3D environment is an extended reality (XR) environment.
In some aspects, the electronic device is a head-mounted device (HMD).
In accordance with some implementations, a device includes one or more processors, a non-transitory memory, and one or more programs; the one or more programs are stored in the non-transitory memory and configured to be executed by the one or more processors and the one or more programs include instructions for performing or causing performance of any of the methods described herein. In accordance with some implementations, a non-transitory computer readable storage medium has stored therein instructions, which, when executed by one or more processors of a device, cause the device to perform or cause performance of any of the methods described herein. In accordance with some implementations, a device includes: one or more processors, a non-transitory memory, and means for performing or causing performance of any of the methods described herein.
BRIEF DESCRIPTION OF THE DRAWINGS
So that the present disclosure can be understood by those of ordinary skill in the art, a more detailed description may be had by reference to aspects of some illustrative implementations, some of which are shown in the accompanying drawings.
FIG. 1 is a block diagram that illustrates an example of generating a first view of display data of three-dimensional (3D) content in accordance with some implementations.
FIG. 2 is a block diagram that illustrates an example of generating a second view of display data of 3D content in accordance with some implementations.
FIG. 3 is a system flow diagram of an example environment in which a system can provide transition effects between different types of views of 3D content in accordance with some implementations.
FIG. 4 is a flowchart representation of an exemplary method that provides a transition between different types of views of 3D content in accordance with some implementations.
FIG. 5 is an example device in accordance with some implementations.
In accordance with common practice the various features illustrated in the drawings may not be drawn to scale. Accordingly, the dimensions of the various features may be arbitrarily expanded or reduced for clarity. In addition, some of the drawings may not depict all of the components of a given system, method or device. Finally, like reference numerals may be used to denote like features throughout the specification and figures.
DESCRIPTION
Numerous specific details are provided herein to afford those skilled in the art a thorough understanding of the claimed subject matter. However, the claimed subject matter may be practiced without these details. In other instances, methods, apparatuses, or systems, that would be known by one of ordinary skill, have not been described in detail so as not to obscure claimed subject matter.
Various techniques may be used to capture and present 3D display data (e.g., 3D photos and 3D videos). Such 3D display data may include stereoscopic data or monoscopic data from which 3D views can be generated (e.g., a point cloud). FIGS. 1 and 2 illustrate techniques for receiving 3D content data (e.g., 3D content item 110) and presenting the 3D display data as different views, such as stereoscopic (stereo) image data (e.g., RGB images with or without depth) or point cloud data (e.g., a dense stereo point cloud, a sparse point cloud, etc.) based on two different viewpoints for each eye.
FIG. 1 illustrates an example environment 100 in which a view selection and generation system obtains a 3D content item 110 and generates a first view of 3D display data 130. For example, the 3D display data 130 may include stereoscopic display data (e.g., a stereo image pair or a stereo point cloud including a separate point cloud for each eye based on the left eye content 112 and the right eye content 114 from the 3D content item 110).
FIG. 2 illustrates an example environment 200 in which a view selection and generation system obtains a 3D content item and generates a second view of 3D display data 230. For example, the second view of 3D display data 230 may include monoscopic display data (e.g., a point cloud obtained from either the left eye content 112 or the right eye content 114, or a point cloud generated from both the left eye content 112 and the right eye content 114).
The generation of different views of 3D display data of the example environments 100 and 200 is performed on a device, such as a mobile device, desktop, laptop, or server device. The 3D display data can be stored and/or displayed on the same or another device, e.g., on a device that has left eye and right eye displays for viewing stereoscopic images, such as a head-mounted device (HMD). The display data may be displayed in real-time as it is captured and/or displayed at a later time. In some implementations, the processes of the example environments 100 and 200 are performed by hardware, firmware, software, or a combination thereof. In some implementations, the processes of the example environments 100 and 200 are performed on a processor executing code stored in a non-transitory computer-readable medium (e.g., a memory).
The 3D content item 110 may include light intensity image data, depth data, and point cloud data captured for the two different viewpoints for each eye (e.g., left eye view content 112 and right eye view content 114). In some implementations, light intensity image data, depth data, and point cloud data are concurrently captured by a device to capture the 3D content item 110 (e.g., a user wearing an HMD capturing a 3D photo or video). Each environment 100 and 200 includes a display data composition pipeline that acquires or obtains sensor data (e.g., image sensor data from image sensors) of a physical environment. The composition pipeline may acquire sensor data (e.g., light intensity data, depth data, and position information) for a plurality of image frames and generate a rendering of the 3D content (e.g., a stereo image pair or one or more point clouds) based on the sensor data. The image sensors(s) may include cameras corresponding to left and right eye viewpoints, e.g., a first light intensity camera (e.g., RGB camera) that acquires light intensity image data for the left eye viewpoint and a second light intensity camera that acquires light intensity image data for the right eye viewpoint (e.g., a sequence of RGB image frames). The image sensors(s) may similarly include a first depth camera that acquires depth image data for the left eye viewpoint and a second depth camera that acquires depth image data for the right eye viewpoint of the physical environment. Alternatively, one depth sensor is utilized for both depth image data for the left eye viewpoint and the right eye viewpoint. Thus, the depth data is equivalent. Alternatively, the depth data can be determined based on the light intensity image data, thus not requiring a depth sensor.
Additionally, the system may include position sensors to acquire positioning information. For the positioning information, some implementations include a visual inertial odometry (VIO) system to determine equivalent odometry information using sequential camera images (e.g., light intensity data) to estimate the distance traveled. Alternatively, some implementations include a SLAM system (e.g., position sensors). The SLAM system may include a multidimensional (e.g., 3D) laser scanning and range measuring system that is GPS-independent and that provides real-time simultaneous location and mapping. The system may generate and manage data for an accurate point cloud that results from reflections of laser scanning from objects in an environment. Movements of any of the points in the point cloud are accurately tracked over time, so that the system can maintain precise understanding of its location and orientation as it travels through an environment, using the points in the point cloud as reference points for the location. The system may further be a visual SLAM system that relies on light intensity image data to estimate the position and orientation of the image sensor(s) and/or the device. The rendering for the left eye view content 112 and right eye view content 114 may be concurrently generated during the image acquisition, or may be generated by post processing techniques.
Example environment 100 includes a view selection and generation instruction set 120 that is configured with instructions executable by a processor to acquire 3D content data (e.g., 3D content item 110) and generate different views of 3D display data (e.g., first view 3D display data 130). The first view 3D display data 130 may be generated using a stereo pair of images or a stereo pair of point clouds based on left eye view content 112 and right eye view content 114. The first view 3D display data 130 may include stereoscopic images of 3D content item 110 in a 3D environment for a left eye view 132 and a right eye view 134 having a different perspective than left eye view 132. The left eye view 132 may depict a view of 3D content item 110 within a 3D environment from the perspective of a user's left eye and may be generated based on left eye view content 112. For example, a virtual shape or geometry may be positioned within the 3D environment and textured using the left eye view content 112. Left eye view 132 may be generated by rendering a view of the textured shape or geometry from the perspective of the user's left eye. A similar process may be used to generated right eye view 134 using a collocated shape or geometry textured instead with right eye view content 114. In some implementations, the stereo images of first view 3D display data 130 may provide a pixel perfect representation (e.g., based on camera image data) of 3D content item 110 within a 3D environment.
Example environment 200 includes the view selection and generation instruction set 120 that is configured with instructions executable by a processor to acquire 3D content data (e.g., 3D content item 110) and generate another view of 3D display data that is different than the first view 3D display data 130 (e.g., second view 3D display data 230). In some implementations, the second view 3D display data 230 may be generated using a single, shared content item. The second view 3D display data 230 may include a left eye view 232 that is based a shared content item and a right eye view 234 that is based on the same shared content item, where the shared content item is at least one of the left eye view content 112 and the right eye view content 114. The second view 3D display data 230 may be generated using a single dense or sparse point cloud representation of the 3D content item 110 (e.g., point cloud data from left eye view content 112, point cloud data from right eye view content 114, or point cloud data generated from a combination of left eye view content 112 and right eye view content 114). In some implementations, the second view 3D display data 230 may include stereoscopic images of 3D content item 110 in a 3D environment for a left eye view 232 and a right eye view 234 having a different perspective than left eye view 232. The left eye view 232 and right eye view 234 may depict a view of 3D content item 110 within a 3D environment from the perspective of a user's left eye and right eye, respectively, and may both be generated using the same singular point cloud representation of 3D content item 110. For example, the singular point cloud representation of the 3D content item 110 (e.g., point cloud data from left eye view content 112, point cloud data from right eye view content 114, or point cloud data generated from a combination of left eye view content 112 and right eye view content 114) may be positioned within the 3D environment. Left eye view 232 may be generated by rendering a view of the singular point cloud from the perspective of the user's left eye and right eye view 234 may be generated by rendering a view of the same singular point cloud from the perspective of the user's right eye.
FIG. 3 illustrates a system flow diagram of an example environment 300 in which a transition effect instruction set 310 can provide transition effects between different types of views of 3D content. In some implementations, the system flow of the example environment 300 is performed on a device, such as a mobile device, desktop, laptop, or server device. The images of the example environment 300 can be displayed on the device that has a screen for displaying images and/or a screen for viewing stereoscopic images such as an HMD. In some implementations, the system flow of the example environment 300 is performed on processing logic, including hardware, firmware, software, or a combination thereof. In some implementations, the system flow of the example environment 300 is performed on a processor executing code stored in a non-transitory computer-readable medium (e.g., a memory).
An example system flow of the transition effect instruction set 310 of example environment 300 acquires a first set of image data (e.g., first view 3D display data 130), a second set of image data (e.g., the second view 3D display data 230), and/or device data 320 (e.g., image data from image source(s), ambient light data, position information, motion data, etc., from the device executing the system), and based on transition elements criteria from the transition elements database 330, determine whether to transition between views of 3D content item 304. For example, transitioning from a stereo image to a point cloud.
In an exemplary implementation, 3D content item 304, as illustrated in FIG. 3, is a 3D photograph application window that is overlaid on a view of a physical environment (e.g., an XR environment) such that user can view a 3D photo or 3D video within the window of the 3D content item 304. For example, view 302 represents a viewpoint of a device (e.g., a user's viewpoint wearing an HMD) of the first view 3D display data 130 (e.g., left eye view 132 and right eye view 134) that is displayed within 3D content item 304 (e.g., a stereo image). View 306 represents a viewpoint of a device of the second view 3D display data 230 (e.g., left eye view 232 and right eye view 234) that is displayed within 3D content item 304 (e.g., a point cloud). View 314 represents a viewpoint of a device of transitioning between the first and second view 3D display data 130 and 230, respectively, that is displayed within 3D content item 304 with transition effects as transition display data 312. The system flow of environment 300 for the transition effect instruction set 310 of transitioning from a first view to a second view is interchangeable and can include other additional views of display data and can transition back and forth between each different view of the 3D display data.
In some implementations, the transition effects may include a blurring and/or halo effect. For example, as illustrated in the 3D content item 304 of view 314, the transition includes blurring effect 316 as the 3D content item 304 is transitioning between stereoscopic image in view 302 (e.g., the first view 3D display data 130) and view 306 (e.g., the second view 3D display data 130). Alternatively, the transition effects may include a curtain effect, a sweep, a bubble view, animation effects, etc., such that the viewer is provided a noticeable change in the views occurring when transitioning in the type of views (e.g., stereo to mono).
In some implementations, the first view 3D display data 130 may transition to the second view 3D display data 230 based on viewer movement (e.g., a user wearing an HMD). For example, the transition effect may include transitioning to a full abstract representation (e.g., second view 3D display data 230 using a single point cloud) after detecting a user movement above a particular threshold (e.g., moving his or her head quickly) to further de-emphasize depth artifacts. For example, while a user is moving quickly through an XR environment, processing capabilities of the device, such as an HMD, may produce undesirable depth artifacts (e.g., noise). However, with the transitioning elements described herein, a fully abstract representation, such as a sparse point cloud that would require less processing capabilities, would provide a more enjoyable experience. When the same user stops moving his or her head to focus on a particular object within the XR environment, the views can be transitioned back to a higher quality stereo view (e.g., first view 3D display data using stereo images or point clouds) since there is minimal head motion.
In some implementations, the first view 3D display data 130 may transition to the second view 3D display data 230 based on a gaze convergence angle (or a physical distance) between the user or device and the object or set of objects that are within the current view. If the angle (or the physical distance) is too large, such as exceeding a convergence angle (or physical distance) threshold, then the system would implement a transition effect and transition from a stereo image(s) (e.g., first view 3D display data 130) to a mono or sparse point cloud image(s) (e.g., second view 3D display data 230). The angles may be determined based on a three-point triangle of a user's position, a projected 3D point of a pixel on an object for a left eye, and a projected 3D point of a pixel on an object for a right eye. As the two projected 3D points for the left and right eye view moves, the angle may become smaller or larger.
In some implementations, the first view 3D display data 130, as shown in view 302, includes a stereoscopic view of 3D content within a 3D environment based on higher quality light intensity image data and depth data. Alternatively, the first view 3D display data 130 includes a stereoscopic view of 3D content within a 3D environment based on a stereo pair of sparse or dense point clouds of 3D content (e.g., a 3D representation of a physical environment displayed in an XR environment). In some implementations, the second view 3D display data 230, as shown in view 306, includes a view of 3D content within a 3D environment based on lower quality (e.g., more noise) light intensity image data and depth data. Alternatively, the first view 3D display data 230 includes a view of 3D content within a 3D environment based on a sparse point cloud of 3D content (e.g., a 3D representation of a physical object displayed in an XR environment).
In some implementations, the first view 3D display data 130 may transition to the second view 3D display data 230 based on a change in quality of the depth data. For example, depending on the method of acquiring the depth data (e.g., via a depth sensor or estiiTiated from RGB data), the depth quality may waiver based on any number of instances (movement of the device, user, or users head, depth sensor issues, computational issues estimating the depth data, etc.), such that the resulting stereo image (e.g., view 302) is noisy or produces several artifacts that may be undesirable. Thus, a transition to an abstract representation of view 302 (e.g., view 306) via a transition effect may be more desirable.
The implementations of the instruction sets described herein (e.g., view selection and generation instruction set 120 and transition effect instruction set 310) of example environments 100, 200, and 300, may be performed on an electronic device. In some implementations, the device is a handheld electronic device (e.g., a smartphone or a tablet). In some implementations, the device is a near-eye device such as a head worn device (e.g., a head mounted device (HMD)). The device utilizes one or more display elements to present views. For example, the device can display views that include 3D content in the context of an extended reality (XR) environment. In some implementations, the device may enclose the field-of-view of a user. In some implementations, the functionalities of device are provided by more than one device. In some implementations, the device communicates with a separate controller or server to manage and coordinate an experience for the user. Such a controller or server may be located in or may be remote relative to the physical environment.
People may sense or interact with a physical environment or world without using an electronic device. Physical features, such as a physical object or surface, may be included within a physical environment. For instance, a physical environment may correspond to a physical city having physical buildings, roads, and vehicles. People may directly sense or interact with a physical environment through various means, such as smell, sight, taste, hearing, and touch. This can be in contrast to an extended reality (XR) environment that may refer to a partially or wholly simulated environment that people may sense or interact with using an electronic device. The XR environment may include virtual reality (VR) content, mixed reality (MR) content, augmented reality (AR) content, or the like. Using an XR system, a portion of a person's physical motions, or representations thereof, may be tracked and, in response, properties of virtual objects in the XR environment may be changed in a way that complies with at least one law of nature. For example, the XR system may detect a user's head movement and adjust auditory and graphical content presented to the user in a way that simulates how sounds and views would change in a physical environment. In other examples, the XR system may detect movement of an electronic device (e.g., a laptop, tablet, mobile phone, or the like) presenting the XR environment. Accordingly, the XR system may adjust auditory and graphical content presented to the user in a way that simulates how sounds and views would change in a physical environment. In some instances, other inputs, such as a representation of physical motion (e.g., a voice command), may cause the XR system to adjust properties of graphical content.
Numerous types of electronic systems may allow a user to sense or interact with an XR environment. A non-exhaustive list of examples includes lenses having integrated display capability to be placed on a user's eyes (e.g., contact lenses), heads-up displays (HUDs), projection-based systems, head mountable systems, windows or windshields having integrated display technology, headphones/earphones, input systems with or without haptic feedback (e.g., handheld or wearable controllers), smartphones, tablets, desktop/laptop computers, and speaker arrays. Head mountable systems may include an opaque display and one or more speakers. Other head mountable systems may be configured to receive an opaque external display, such as that of a smartphone. Head mountable systems may capture images/video of the physical environment using one or more image sensors or capture audio of the physical environment using one or more microphones. Instead of an opaque display, some head mountable systems may include a transparent or translucent display. Transparent or translucent displays may direct light representative of images to a user's eyes through a medium, such as a hologram medium, optical waveguide, an optical combiner, optical reflector, other similar technologies, or combinations thereof. Various display technologies, such as liquid crystal on silicon, LEDs, uLEDs, OLEDs, laser scanning light source, digital light projection, or combinations thereof, may be used. In some examples, the transparent or translucent display may be selectively controlled to become opaque. Projection-based systems may utilize retinal projection technology that projects images onto a user's retina or may project virtual content into the physical environment, such as onto a physical surface or as a hologram.
FIG. 4 is a flowchart representation of an exemplary method 400 that determines and provides a transition (optionally including a transition effect) between different types of views of 3D content in accordance with some implementations. In some implementations, the method 400 is performed by a device, such as a mobile device, desktop, laptop, or server device. In some implementations, the device has a screen for displaying images and/or a screen for viewing stereoscopic images such as a head-mounted display (HMD). In some implementations, the method 400 is performed by processing logic, including hardware, firmware, software, or a combination thereof. In some implementations, the method 400 is performed by a processor executing code stored in a non-transitory computer-readable medium (e.g., a memory). The content presentation process of method 400 is illustrated with examples with reference to FIG. 3.
At block 402, the method 400 obtains a 3D content item. The 3D content item may include left eye content and right eye content generated based on camera images corresponding to left and right eye viewpoints (e.g., a left eye view and a right eye view). The left eye content and the right eye content may include a stereo camera image or a stereo point cloud generated based on a stereo camera image. The left eye content and right eye content could include previously-captured points clouds and/or 3D images of an object. In some implementations, the at least one of the left eye content and the right eye content includes a left eye point cloud of a stereo point cloud, a right eye point cloud of the stereo point cloud, or a combination of the left eye point cloud and right eye point cloud.
In some implementations, the left eye content and right eye content may be generated based on an RGB camera for each eye and a separate depth camera for each eye. Alternatively, the left eye content and right eye content may be generated based on RGB images for each eye (e.g., two separate RGB sensors), but only a single depth camera such that the depth data is used for both the right eye content and the left eye content. Alternatively, the left eye content and right eye content may be generated based on RGB images for each eye (e.g., two separate RGB sensors), but using just RGB images for the depth data. For example, depth data may be estimated from the light intensity image data by using a convolutional neural network to generate a depth map from the RGB data in the images.
As discussed herein with reference to FIG. 1, the left eye view 132 may depict a view of 3D content item 110 within a 3D environment from the perspective of a user's left eye and may be generated based on left eye view content 112. For example, a virtual shape or geometry may be positioned within the 3D environment and textured using the left eye view content 112. Left eye view 132 may be generated by rendering a view of the textured shape or geometry from the perspective of the user's left eye. A similar process may be used to generated right eye view 134 using a collocated shape or geometry textured instead with right eye view content 114. In some implementations, the stereo images of first view 3D display data 130 may provide a pixel perfect representation (e.g., based on camera image data) of 3D content item 110 within a 3D environment. For example, if re-projecting a point cloud, the further a user moves away, the larger the error gets if the depth is not correct. By keeping the view of the content separate for each eye, each individual reprojection is smaller which provides better perception. However, if the depth is moved beyond a threshold, then the left eye content and the right eye content could collapse into transitions and other rendering methods.
At block 404, the method 400 provides a first view of the 3D content item at a position within a 3D environment in which a left eye view of the 3D content item is based on the left eye content and a right eye view of the 3D content item is based on the right eye content. For example, as illustrated in view 302 of FIG. 3, a stereo image (high quality image) as 3D content itern 304 is displayed for a user that includes light intensity image data and depth data for both the left eye and the right eye (e.g., while a user is wearing a HMD), For example, a user is viewing a 3D photo in an application window overlaid on a view of a physical environment. In some implementations, when providing the first view the left eye view of the 3D content item is not based on the right eye content, and the right eye view of the 3D content item is not based on the left eye content.
At block 406, the method 400 determines to transition from the first view to a second view of the 3D content item based on a criterion. For example, as illustrated in view 314 of FIG. 3, the transition instruction set 310 obtains device data 320 and data from transition elements database 330 to determine to transition from view 302 (e.g., stereo view) to view 306 (e.g., a point cloud) via a transition view 314 (e.g., via transition effects). Transitioning from the first view to a second view may depend upon a user's position or a user's head position and/or location of particular portions of the content relative to the user/head.
At block 408, the method 400 provides the second view of the 3D content item within the 3D environment, where the second view includes the left eye view of the 3D content that is based on a shared content item and the right eye view of the 3D content item that is based on the shared content item, where the shared content item includes at least one of the left eye content and the right eye content. For example, as illustrated in view 306 of FIG. 3, the 3D content item 304 is transitioned to and displayed as a point cloud. In some implementations, the at least one of the left eye content and the right eye content includes only the left eye content, only the right eye content, or content generated based on the left eye content and the right eye content.
In some implementations, the method 400 further includes providing a transitioning effect while transitioning from the first view to the second view. For example, an animation such as a curtain effect or a sweep effect, may be used to transition between the first view and the second view. For example, as illustrated in view 314 of FIG. 3, the transition effect of blurring effect 316 is displayed to the user to signify the change of views from view 302 to view 306 is occurring because of some detected criterion or change in a detected attribute (e.g., distance of the user to the content item, a change in gaze, etc.).
In some implementations, the electronic device includes sensor data that provides a third view of a physical environment and providing the transitioning effect includes providing within a portion of the second view a portion of the third view of the physical environment (e.g., the transition effect could be showing part of the real world so the user doesn't bump into something). For example, while wearing an HMD while viewing an XR environment, a user may be viewing all virtual (e.g., reconstructed) 3D content while moving around a physical environment (e.g., viewing a reconstructed room layout that includes all different furniture while in the same physical room with different locations of different real physical furniture). Thus, while the user is moving around viewing the virtual content in the XR environment, he or she may be about run into a physical object, so based on detecting this, a transition effect instruction set could provide a transition effect showing the user a partial view of the physical world (e.g., the object the user was about to bump into), using one or more of the transition effects described herein (e.g., halo/blur effect, curtain effect, and the like).
In some implementations, the criterion for determining to transition from the first view to the second view is based on a position of the electronic device or a user of the electronic device relative to a portion of the position of the 3D content item within the 3D environment, such as based on a position of a user's head or location of particular portions of the 3D content relative to the user's head. The transition effects may be determined based on viewer movement (e.g., a user wearing an HMD while walking around a physical environment). For example, the transition effect may include transitioning to a full abstract representation (e.g., a sparse point cloud) after detecting a user movement above a particular threshold (e.g., moving his or her head quickly) to further de-emphasize depth artifacts. For example, while a user is moving quickly through an XR environment, processing capabilities of the device, such as an HMD, may produce undesirable depth artifacts (e.g., noise). However, with the transitioning elements described herein, a fully abstract representation, such as a sparse point cloud that would require less processing capabilities, would provide a more enjoyable experience. When the same user stops moving his or her head to focus on a particular object within the XR environment, the views can be transitioned back to a higher quality stereo view since there is minimal head motion.
In some implementations, criterion for determining to transition from the first view to the second view is based on a change in a gaze direction a user of the electronic device relative to the position of the 3D content item within the 3D environment. For example, the transition effects may be determined based on a convergence angle between the user or device and the object or set of objects that are within the current view. If the angle is too large, such as exceeding a convergence angle threshold, then the system would implement a transition effect and transition from a stereo image(s) to a mono or sparse point cloud image(s). Additionally, or alternatively, gaze angle of a user may be used as a trigger similar to the convergence angle. The angles may be determined based on a three-point triangle of a user's position, a projected 3D point of a pixel on an object for a left eye, and a projected 3D point of a pixel on an object for a right eye. As the two projected 3D points for the left and right eye view moves, the angle may become smaller or larger.
In some implementations, the criterion for determining to transition from the first view to the second view is based on a distance between the electronic device and the object (e.g., transition based on scene depth, and a threshold distance). For example, the transition effects may be determined based on a distance between the user or device and the object or set of objects that are within the current view. If the distance is too large, such as exceeding a distance threshold, then the system would implement a transition effect and transition from a stereo image(s) to a mono or sparse point cloud image(s). The distance may be determined based on a user's or device's position and the projected 3D points of a pixel on an object for a left eye and/or right eye.
In some implementations, there may be another transition to a third view of the 3D content item that may be more abstract (e.g., based on a sparser point cloud). For example, the method 400 may further include determining to transition from the second view to a third view of the 3D content based on an additional criterion, and providing the third view of the 3D content in which the left eye view and right eye view are based on content that is sparser than the at least one of the left eye content and the right eye content. For example, as illustrated in view 306 of FIG. 3, a sparse point cloud image is displayed for a user that includes an abstract representation of the objects in view 302. In some implementations, a first transition is from a high-quality stereo image (e.g., view 304) to a lower quality mono image (e.g., the third point cloud), then a second transition could include a transition to a different view of content (e.g., an even more sparse abstract representation). The additional criterion used to determine to transition from the second view to the third view may be based on a same type of criterion used to determine to transition from the first view to the second view (e.g., distance, gaze convergence angle, rate of movement of the device/user, and the like).
In some implementations, the stereo point cloud includes depth data and the criterion for determining to transition from the first view to the second view is based on a change in quality of the depth data. For example, depending on the method of acquiring the depth data (e.g., via a depth sensor or estimated from RGB data), the depth quality may waiver based on any number of instances (movement of the device, user, or user's head, depth sensor issues, computational issues estimating the depth data, etc.), such that the resulting stereo image (e.g., view 302) is noisy or produces several artifacts that may be undesirable. Thus, a transition to an abstract representation of view 302 (e.g., view 304) via a transition effect may be more desirable.
The stereo images (stereo point cloud representing 3D content) may provide a pixel perfect representation of 3D content within a 3D environment, but may limit the viewer's ability to move their head away from the “sweet spot” while wearing an HMD (e.g., the place where a 3D camera captured the image/video and would provide the highest image quality). In some implementations, transitioning to the second view may include indicators (e.g., transition effects), such as animation, that directs the user to view an object at a different viewing angle in order to see the object from the highest image quality (e.g., the “sweet spot”). Thus, the transition effects described herein could be used to direct a user within an XR environment to particular viewpoints in order have a more desirable effect (e.g., better image quality, less artifacts, and the like).
In some implementations, in order to allow for a transition between the first, second, and third views in method 400, a capturing device may concurrently capture different representations of an environment. For example, the capture device may be configured to concurrently capture a stereo pair of images and optionally depth data, a single or stereo pair of point clouds, a lower quality stereo pair of images or single/stereo pair of point cloud(s), or a combination thereof. By having these different representations of the same subject, a viewing device may switch between views of the same subject as described above.
FIG. 5 is a block diagram of an example device 500. Device 500 illustrates an exemplary device configuration for the device used to implement environment 100 of FIG. 1. While certain specific features are illustrated, those skilled in the art will appreciate from the present disclosure that various other features have not been illustrated for the sake of brevity, and so as not to obscure more pertinent aspects of the implementations disclosed herein. To that end, as a non-limiting example, in some implementations the device 500 includes one or more processing units 502 (e.g., microprocessors, ASICs, FPGAs, GPUs, CPUs, processing cores, and/or the like), one or more input/output (I/O) devices and sensors 506, one or more communication interfaces 508 (e.g., USB, FIREVVIRE, THUNDERBOLT, IEEE 802.3x, IEEE 802.11x, IEEE 802.16x, GSM, CDMA, TDMA, GPS, IR, BLUETOOTH, ZIGBEE, SPI, 120, and/or the like type interface), one or more programming (e.g., I/O) interfaces 510, one or more displays 512, one or more interior and/or exterior facing image sensor systems 514, a memory 520, and one or more communication buses 504 for interconnecting these and various other components.
In some implementations, the one or more communication buses 504 include circuitry that interconnects and controls communications between system components. In some implementations, the one or more I/O devices and sensors 506 include at least one of an inertial measurement unit (IMU), an accelerometer, a magnetometer, a gyroscope, a thermometer, one or more physiological sensors (e.g., blood pressure monitor, heart rate monitor, blood oxygen sensor, blood glucose sensor, etc.), one or more microphones, one or more speakers, a haptics engine, one or more depth sensors (e.g., a structured light, a time-of-flight, or the like), and/or the like.
In some implementations, the one or more displays 512 are configured to present a view of a physical environment or a graphical environment to the user. In some implementations, the one or more displays 512 correspond to holographic, digital light processing (DLP), liquid-crystal display (LCD), liquid-crystal on silicon (LCoS), organic light-emitting field-effect transitory (OLET), organic light-emitting diode (OLED), surface-conduction electron-emitter display (SED), field-emission display (FED), quantum-dot light-emitting diode (QD-LED), micro-electro-mechanical system (MEMS), and/or the like display types. In some implementations, the one or more displays 512 correspond to diffractive, reflective, polarized, holographic, etc. waveguide displays. In one example, the device 500 includes a single display. In another example, the device 500 includes a display for each eye of the user.
In some implementations, the one or more image sensor systems 514 are configured to obtain image data that corresponds to at least a portion of the physical environment 105. For example, the one or more image sensor systems 514 include one or more RGB cameras (e.g., with a complimentary metal-oxide-semiconductor (CMOS) image sensor or a charge-coupled device (CCD) image sensor), monochrome cameras, IR cameras, depth cameras, event-based cameras, and/or the like. In various implementations, the one or more image sensor systems 514 further include illumination sources that emit light, such as a flash. In various implementations, the one or more image sensor systems 514 further include an on-camera image signal processor (ISP) configured to execute a plurality of processing operations on the image data.
In some implementations, the device 500 includes an eye tracking system for detecting eye position and eye movements (e.g., eye gaze detection). For example, an eye tracking system may include one or more infrared (IR) light-emitting diodes (LEDs), an eye tracking camera (e.g., near-IR (NIR) camera), and an illumination source (e.g., an NIR light source) that emits light (e.g., NIR light) towards the eyes of the user. Moreover, the illumination source of the device 10 may emit NIR light to illuminate the eyes of the user and the NIR camera may capture images of the eyes of the user. In some implementations, images captured by the eye tracking system may be analyzed to detect position and movements of the eyes of the user, or to detect other information about the eyes such as pupil dilation or pupil diameter. Moreover, the point of gaze estimated from the eye tracking images may enable gaze-based interaction with content shown on the near-eye display of the device 500.
The memory 520 includes high-speed random-access memory, such as DRAM, SRAM, DDR RAM, or other random-access solid-state memory devices. In some implementations, the memory 520 includes non-volatile memory, such as one or more magnetic disk storage devices, optical disk storage devices, flash memory devices, or other non-volatile solid-state storage devices. The memory 520 optionally includes one or more storage devices remotely located from the one or more processing units 502. The memory 520 includes a non-transitory computer readable storage medium.
In some implementations, the memory 520 or the non-transitory computer readable storage medium of the memory 520 stores an optional operating system 330 and one or more instruction set(s) 540. The operating system 530 includes procedures for handling various basic system services and for performing hardware dependent tasks. In some implementations, the instruction set(s) 540 include executable software defined by binary information stored in the form of electrical charge. In some implementations, the instruction set(s) 540 are software that is executable by the one or more processing units 502 to carry out one or more of the techniques described herein.
The instruction set(s) 540 include a view selection and generation instruction set 542 and a transition effect instruction set 544. The instruction set(s) 540 may be embodied as a single software executable or multiple software executables.
The view selection and generation instruction set 542 (e.g., view selection and generation instruction set 120 of FIGS. 1 and 2) is executable by the processing unit(s) 502 to generate one or more different views 3D display data. For example, the view selection and generation instruction set 542 obtains a 3D content item (e.g., 3D photo or video), determines a view selection, and generates 3D display data for the determined view. For example, a 3D content item that is obtained includes different sets of image data (e.g., RGB images, depth data, and point cloud representations) such that the view selection and generation instruction set 542 analyzes the image data to generate a particular view so that a user, during execution of the application, views the 3D photo or video as an overlay on top of the 3D representation, as illustrated herein with reference to FIGS. 1-3.
The transition effect instruction set 544 (e.g., transition effect instruction set 310 of FIG. 3) is configured with instructions executable by a processor to acquires a first set of image data (e.g., first view 3D display data 130), a second set of image data (e.g., the second view 3D display data 230), and/or device data 320 (e.g., image data from image source(s), ambient light data, position information, motion data, etc., from the device executing the system), and based on transition elements criteria from the transition elements database 330, determine whether to transition between views of 3D content item 304. For example, as discussed herein with reference to FIG. 3, the user transition effect instruction set 544 can obtain two or more views of a 3D photo of video (e.g., a stereo image and a point cloud) during the execution of a photo/video viewer application program and based on some criterion (e.g., user/head movement, etc.) from device data 320, transition between the two or more views using transition effects from the transition elements database 330.
Although the instruction set(s) 540 are shown as residing on a single device, it should be understood that in other implementations, any combination of the elements may be located in separate computing devices. Moreover, FIG. 5 is intended more as functional description of the various features which are present in a particular implementation as opposed to a structural schematic of the implementations described herein. As recognized by those of ordinary skill in the art, items shown separately could be combined and some items could be separated. The actual number of instructions sets and how features are allocated among them may vary from one implementation to another and may depend in part on the particular combination of hardware, software, and/or firmware chosen for a particular implementation.
Those of ordinary skill in the art will appreciate that well-known systems, methods, components, devices, and circuits have not been described in exhaustive detail so as not to obscure more pertinent aspects of the example implementations described herein. Moreover, other effective aspects and/or variants do not include all of the specific details described herein. Thus, several details are described in order to provide a thorough understanding of the example aspects as shown in the drawings. Moreover, the drawings merely show some example embodiments of the present disclosure and are therefore not to be considered limiting.
While this specification contains many specific implementation details, these should not be construed as limitations on the scope of any inventions or of what may be claimed, but rather as descriptions of features specific to particular embodiments of particular inventions. Certain features that are described in this specification in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination. Moreover, although features may be described above as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination.
Similarly, while operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. In certain circumstances, multitasking and parallel processing may be advantageous. Moreover, the separation of various system components in the embodiments described above should not be understood as requiring such separation in all embodiments, and it should be understood that the described program components and systems can generally be integrated together in a single software product or packaged into multiple software products.
Thus, particular embodiments of the subject matter have been described. Other embodiments are within the scope of the following claims. In some cases, the actions recited in the claims can be performed in a different order and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In certain implementations, multitasking and parallel processing may be advantageous.
Embodiments of the subject matter and the operations described in this specification can be implemented in digital electronic circuitry, or in computer software, firmware, or hardware, including the structures disclosed in this specification and their structural equivalents, or in combinations of one or more of them. Embodiments of the subject matter described in this specification can be implemented as one or more computer programs, e.g., one or more modules of computer program instructions, encoded on computer storage medium for execution by, or to control the operation of, data processing apparatus. Alternatively, or additionally, the program instructions can be encoded on an artificially generated propagated signal, e.g., a machine-generated electrical, optical, or electromagnetic signal, that is generated to encode information for transmission to suitable receiver apparatus for execution by a data processing apparatus. A computer storage medium can be, or be included in, a computer-readable storage device, a computer-readable storage substrate, a random or serial access memory array or device, or a combination of one or more of them. Moreover, while a computer storage medium is not a propagated signal, a computer storage medium can be a source or destination of computer program instructions encoded in an artificially generated propagated signal. The computer storage medium can also be, or be included in, one or more separate physical components or media (e.g., multiple CDs, disks, or other storage devices).
The term “data processing apparatus” encompasses all kinds of apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, a system on a chip, or multiple ones, or combinations, of the foregoing. The apparatus can include special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application specific integrated circuit). The apparatus can also include, in addition to hardware, code that creates an execution environment for the computer program in question, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, a cross-platform runtime environment, a virtual machine, or a combination of one or more of them. The apparatus and execution environment can realize various different computing model infrastructures, such as web services, distributed computing and grid computing infrastructures. Unless specifically stated otherwise, it is appreciated that throughout this specification discussions utilizing the terms such as “processing,” “computing,” “calculating,” “determining,” and “identifying” or the like refer to actions or processes of a computing device, such as one or more computers or a similar electronic computing device or devices, that manipulate or transform data represented as physical electronic or magnetic quantities within memories, registers, or other information storage devices, transmission devices, or display devices of the computing platform.
The system or systems discussed herein are not limited to any particular hardware architecture or configuration. A computing device can include any suitable arrangement of components that provides a result conditioned on one or more inputs. Suitable computing devices include multipurpose microprocessor-based computer systems accessing stored software that programs or configures the computing system from a general purpose computing apparatus to a specialized computing apparatus implementing one or more implementations of the present subject matter. Any suitable programming, scripting, or other type of language or combinations of languages may be used to implement the teachings contained herein in software to be used in programming or configuring a computing device.
Implementations of the methods disclosed herein may be performed in the operation of such computing devices. The order of the blocks presented in the examples above can be varied for example, blocks can be re-ordered, combined, and/or broken into sub-blocks. Certain blocks or processes can be performed in parallel. The operations described in this specification can be implemented as operations performed by a data processing apparatus on data stored on one or more computer-readable storage devices or received from other sources.
The use of “adapted to” or “configured to” herein is meant as open and inclusive language that does not foreclose devices adapted to or configured to perform additional tasks or steps. Additionally, the use of “based on” is meant to be open and inclusive, in that a process, step, calculation, or other action “based on” one or more recited conditions or values may, in practice, be based on additional conditions or value beyond those recited. Headings, lists, and numbering included herein are for ease of explanation only and are not meant to be limiting.
It will also be understood that, although the terms “first,” “second,” etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another. For example, a first node could be termed a second node, and, similarly, a second node could be termed a first node, which changing the meaning of the description, so long as all occurrences of the “first node” are renamed consistently and all occurrences of the “second node” are renamed consistently. The first node and the second node are both nodes, but they are not the same node.
The terminology used herein is for the purpose of describing particular implementations only and is not intended to be limiting of the claims. As used in the description of the implementations and the appended claims, the singular forms “a,” “an,” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will also be understood that the term “and/or” as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items. It will be further understood that the terms “comprises” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
As used herein, the term “if” may be construed to mean “when” or “upon” or “in response to determining” or “in accordance with a determination” or “in response to detecting,” that a stated condition precedent is true, depending on the context. Similarly, the phrase “if it is determined [that a stated condition precedent is true]” or “if [a stated condition precedent is true]” or “when [a stated condition precedent is true]” may be construed to mean “upon determining” or “in response to determining” or “in accordance with a determination” or “upon detecting” or “in response to detecting” that the stated condition precedent is true, depending on the context.