Magic Leap Patent | Eye tracking based video transmission and compression
Patent: Eye tracking based video transmission and compression
Publication Number: 20250208706
Publication Date: 2025-06-26
Assignee: Magic Leap
Abstract
A computer-implemented method includes receiving gaze information about an observer of a video stream; determining a video compression spatial map for the video stream based on the received gaze information and performance characteristics of a network connection with the observer; compressing the video stream according to the video compression spatial map; and sending the compressed video stream to the observer.
Claims
1.1.-12. (canceled)
13.A computer-implemented method comprising:receiving gaze information about an observer of a video stream; determining a video compression spatial map for the video stream based on the received gaze information and performance characteristics of a network connection with the observer, wherein determining of the video compression spatial map includes:selecting a second shape within a first shape of a region of interest, where the region of interest corresponds to a predicted eye position; sharing a center of the region of interest; scaling a size of the second shape in proportion to network latency times a maximum eye velocity; and selecting a video compression profile that includes (1) lower compression inside the second shape, (2) medium compression outside the second shape but inside the first shape; and (3) higher compression outside the first shape; compressing the video stream according to the video compression spatial map; and sending the compressed video stream to the observer.
14.The method of claim 13, wherein the receiving of the gaze information is a receiving of gaze information from a head-mounted or display-mounted gaze tracker.
15.The method of claim 13, wherein:the gaze information includes information about an instantaneous eye position; determining of the video compression spatial map includes:identifying the center of the region; selecting a first shape for the region of interest; and selecting a video compression profile that includes higher compression outside the first shape.
16.The method of claim 15, wherein:the video compression profile corresponds to a video quality profile, a video resolution profile, or a video color profile.
17.The method of claim 15, wherein the video compression profile increases with distance from the center of the region of interest.
18.The method of claim 15, wherein:the performance characteristics of the network connection include information about an available bandwidth; and determining of the video compression spatial map includes scaling a size of the first shape in proportion to a ratio of the available bandwidth to a bandwidth for the video stream without compression.
19.The method of claim 15, wherein:the gaze information includes information about an instantaneous eye velocity; the performance characteristics of the network connection include information about network latency; and the center of the region of interest corresponds to the instantaneous eye position plus an offset proportional to the instantaneous eye velocity times the network latency.
20.The method of claim 13, wherein the lower compression is zero compression.
21.The method of claim 13, wherein the maximum eye velocity is a human saccadic eye velocity.
22.A computer-implemented system, comprising:one or more processors; and one or more computer memory devices interoperably coupled with the one or more processors and having tangible, non-transitory, machine-readable media storing one or more instructions that, when executed by the one or more processors, perform one or more operations, comprising:receiving gaze information about an observer of a video stream; determining a video compression spatial map for the video stream based on the received gaze information and performance characteristics of a network connection with the observer, wherein determining of the video compression spatial map includes:selecting a second shape within a first shape of a region of interest, where the region of interest corresponds to a predicted eye position; sharing a center of the region of interest; scaling a size of the second shape in proportion to network latency times a maximum eye velocity; and selecting a video compression profile that includes (1) lower compression inside the second shape, (2) medium compression outside the second shape but inside the first shape; and (3) higher compression outside the first shape; compressing the video stream according to the video compression spatial map; and sending the compressed video stream to the observer.
23.The computer-implemented system of claim 22, wherein the receiving of the gaze information is a receiving of gaze information from a head-mounted or display-mounted gaze tracker.
24.The computer-implemented system of claim 22, wherein:the gaze information includes information about instantaneous eye position; determining of the video compression spatial map includes:identifying a center of a region of interest corresponding to a predicted eye position; selecting a first shape for the region of interest; and selecting a video compression profile that includes higher compression outside the first shape.
25.The computer-implemented system of claim 24, wherein:the video compression profile corresponds to a video quality profile, a video resolution profile, or a video color profile.
26.The computer-implemented system of claim 24, wherein the video compression profile increases with distance from the center of the region of interest.
27.A non-transitory, computer-readable medium storing one or more instructions executable by a computer system to perform one or more operations, comprising:receiving gaze information about an observer of a video stream; determining a video compression spatial map for the video stream based on the received gaze information and performance characteristics of a network connection with the observer, wherein determining of the video compression spatial map includes:selecting a second shape within a first shape of a region of interest, where the region of interest corresponds to a predicted eye position; sharing a center of the region of interest; scaling a size of the second shape in proportion to network latency times a maximum eye velocity; and selecting a video compression profile that includes (1) lower compression inside the second shape, (2) medium compression outside the second shape but inside the first shape; and (3) higher compression outside the first shape; compressing the video stream according to the video compression spatial map; and sending the compressed video stream to the observer.
28.The non-transitory, computer-readable medium of claim 27, wherein the receiving of the gaze information is a receiving of gaze information from a head-mounted or display-mounted gaze tracker.
29.The non-transitory, computer-readable medium of claim 27, wherein:the gaze information includes information about instantaneous eye position; determining of the video compression spatial map includes:identifying a center of a region of interest corresponding to a predicted eye position; selecting a first shape for the region of interest; and selecting a video compression profile that includes higher compression outside the first shape.
30.The non-transitory, computer-readable medium of claim 29, wherein:the video compression profile corresponds to a video quality profile, a video resolution profile, or a video color profile.
31.The non-transitory, computer-readable medium of claim 29, wherein the video compression profile increases with distance from the center of the region of interest.
Description
CROSS-REFERENCE TO RELATED APPLICATIONS
This application is a continuation of U.S. patent application Ser. No. 18/037,709, filed on May 18, 2023, which is a National Stage Application of International Application No. PCT/US2021/050665, filed Sep. 16, 2021, which claims priority to U.S. Provisional Patent Application No. 63/115,287 filed on Nov. 18, 2020, of which is incorporated by reference in its entirety.
TECHNICAL FIELD
This specification generally relates to video stream data processing and transmission.
BACKGROUND
Video streaming consumes a large amount of bandwidth, especially in three-dimensional (3D) environments. For example, in virtual reality (VR), augmented reality (AR), and/or mixed reality (MR) systems, a display device (e.g., head-mounted device) will receive video stream data from a server and display the received video stream data to a user in a spatial three-dimensional (3D) environment. Such video stream data usually has a big size, and uses a significant amount of bandwidth.
To reduce the bandwidth consumption, conventional methods may compress the video stream data to reduce the data size. However, conventional video compression algorithms may involve trade-offs between the degree of compression and the amount of distortion introduced. For example, to significantly reduce the size of the video stream data, conventional video compression algorithms may be lossy. As a result, the received video stream data may lose information, and the quality of the received video stream may be compromised, which may result in poor user experience.
Therefore, there is a need for an improved approach to process, compress, and transmit video stream data to reduce the bandwidth demand while still providing a satisfactory user experience.
SUMMARY
The technology described herein provides an eye tracking based video compression transmission method. Specifically, the technology described herein uses gaze information to determine the regions of interest (ROIs). Such ROIs are regions the observer/user is watching. The technology described herein can process the video stream data such that the regions being observed are displayed in higher fidelity, while other regions not being observed are displayed in lower fidelity. The video stream data can be compressed significantly to reduce the size of the video stream data and further reduce the demand for bandwidth. As a result, based on the real time gaze tracking information, the technology described herein can make better utilization of the bandwidth by increasing the visual acuity of the region of the video images being observed or watched.
In general, a first innovative aspect of the subject matter described in this specification can be embodied in a method performed by a data processing apparatus that includes receiving gaze information about an observer of a video stream; determining a video compression spatial map for the video stream based on the received gaze information and performance characteristics of a network connection with the observer; compressing the video stream according to the video compression spatial map; and sending the compressed video stream to the observer.
The subject matter described in this specification can be implemented in particular embodiments so as to realize one or more of the following advantages. The size of the video stream is significantly reduced, because the image content outside the ROI is significantly compressed. As a result, the demand for bandwidth is significantly reduced. Furthermore, because the ROI (e.g., area/location the observer is watching) is displayed in higher fidelity, the user experience is not compromised.
Various features and advantages of the foregoing subject matter are described below with respect to the figures. Additional features and advantages are apparent from the subject matter described herein and the claims.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is an example of tracking an observer's eye gaze and/or movement, according to an example embodiment.
FIG. 2 is a flowchart of the video stream data processing and transmission, according to an example embodiment.
FIG. 3 is a video compression spatial map, according to an example embodiment.
FIG. 4 is an architecture diagram for the video stream data processing and transmission, according to an example embodiment.
Like reference numbers and designations in the various drawings indicate like elements.
Publication Number: 20250208706
Publication Date: 2025-06-26
Assignee: Magic Leap
Abstract
A computer-implemented method includes receiving gaze information about an observer of a video stream; determining a video compression spatial map for the video stream based on the received gaze information and performance characteristics of a network connection with the observer; compressing the video stream according to the video compression spatial map; and sending the compressed video stream to the observer.
Claims
1.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
Description
CROSS-REFERENCE TO RELATED APPLICATIONS
This application is a continuation of U.S. patent application Ser. No. 18/037,709, filed on May 18, 2023, which is a National Stage Application of International Application No. PCT/US2021/050665, filed Sep. 16, 2021, which claims priority to U.S. Provisional Patent Application No. 63/115,287 filed on Nov. 18, 2020, of which is incorporated by reference in its entirety.
TECHNICAL FIELD
This specification generally relates to video stream data processing and transmission.
BACKGROUND
Video streaming consumes a large amount of bandwidth, especially in three-dimensional (3D) environments. For example, in virtual reality (VR), augmented reality (AR), and/or mixed reality (MR) systems, a display device (e.g., head-mounted device) will receive video stream data from a server and display the received video stream data to a user in a spatial three-dimensional (3D) environment. Such video stream data usually has a big size, and uses a significant amount of bandwidth.
To reduce the bandwidth consumption, conventional methods may compress the video stream data to reduce the data size. However, conventional video compression algorithms may involve trade-offs between the degree of compression and the amount of distortion introduced. For example, to significantly reduce the size of the video stream data, conventional video compression algorithms may be lossy. As a result, the received video stream data may lose information, and the quality of the received video stream may be compromised, which may result in poor user experience.
Therefore, there is a need for an improved approach to process, compress, and transmit video stream data to reduce the bandwidth demand while still providing a satisfactory user experience.
SUMMARY
The technology described herein provides an eye tracking based video compression transmission method. Specifically, the technology described herein uses gaze information to determine the regions of interest (ROIs). Such ROIs are regions the observer/user is watching. The technology described herein can process the video stream data such that the regions being observed are displayed in higher fidelity, while other regions not being observed are displayed in lower fidelity. The video stream data can be compressed significantly to reduce the size of the video stream data and further reduce the demand for bandwidth. As a result, based on the real time gaze tracking information, the technology described herein can make better utilization of the bandwidth by increasing the visual acuity of the region of the video images being observed or watched.
In general, a first innovative aspect of the subject matter described in this specification can be embodied in a method performed by a data processing apparatus that includes receiving gaze information about an observer of a video stream; determining a video compression spatial map for the video stream based on the received gaze information and performance characteristics of a network connection with the observer; compressing the video stream according to the video compression spatial map; and sending the compressed video stream to the observer.
The subject matter described in this specification can be implemented in particular embodiments so as to realize one or more of the following advantages. The size of the video stream is significantly reduced, because the image content outside the ROI is significantly compressed. As a result, the demand for bandwidth is significantly reduced. Furthermore, because the ROI (e.g., area/location the observer is watching) is displayed in higher fidelity, the user experience is not compromised.
Various features and advantages of the foregoing subject matter are described below with respect to the figures. Additional features and advantages are apparent from the subject matter described herein and the claims.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is an example of tracking an observer's eye gaze and/or movement, according to an example embodiment.
FIG. 2 is a flowchart of the video stream data processing and transmission, according to an example embodiment.
FIG. 3 is a video compression spatial map, according to an example embodiment.
FIG. 4 is an architecture diagram for the video stream data processing and transmission, according to an example embodiment.
Like reference numbers and designations in the various drawings indicate like elements.