Facebook Patent | Customizing Head-Related Transfer Functions Based On Monitored Responses To Audio Content
Patent: Customizing Head-Related Transfer Functions Based On Monitored Responses To Audio Content
Publication Number: 10638251
Publication Date: 20200428
Applicants: Facebook
Abstract
The present disclosure relates to a method and audio system for customizing a set of head-related transfer functions (HRTFs) for a user of the audio system to account for the user’s bias in hearing. The audio system first presents, via one or more speakers on a headset, audio content to the user wearing the headset, the audio content generated using a set of HRTFs. The audio system monitors responses of the user to the audio content. The audio system customizes the set of HRTFs for the user based on at least one of the monitored responses. The audio system updates audio content using the customized set of HRTFs. The audio system presents the updated audio content to the user with the speakers on the headset.
BACKGROUND
This present disclosure generally relates to audio systems providing audio content to one or more users of an audio system, and more specifically to, audio systems monitoring user responses to audio content and customizing head-related transfer functions (HRTFs) for the user based on the monitored responses.
Headsets in an artificial reality system often include an audio system to provide audio content to users of the headsets. In the artificial reality environment, audio content can significantly improve a user’s immersive experience with the artificial reality. Conventional audio systems implemented in headsets comprise audio devices (e.g., ear buds, headphones) positioned in proximity to both ears of a user and provide audio content to the user. However, conventional audio systems generally do a poor job of providing directional content. This is because the content is presented without regard to head-related transfer functions (HRTFs) of the user, and HRTFs vary from user to user (e.g., due to different shapes of the ear).
SUMMARY
The present disclosure relates to a method and audio system for customizing a set of head-related transfer functions (HRTFs) for a user of the audio system. The audio content is generated using a set of head related transfer functions (HRTFs). The audio system presents, via one or more speakers on a headset, the audio content to the user wearing the headset.
The audio system monitors responses of the user to the audio content. The monitored responses of the user may be associated with a perceived origin direction and/or location of the audio content. In cases where the set of HRTFs for the user used to generate the content are not fully individualized/customized to the user, a delta is present between a perceived origin direction, location, angle, solid angle, or any combination thereof and a target presentation direction and/or location of audio content. The audio system customizes the set of HRTFs for the user based on at least one of the monitored responses to reduce the delta. The audio system generates updated audio content using the customized set of HRTFs, and presents the updated audio content to the user with the speakers on the headset.
Embodiments according to the invention are in particular disclosed in the attached claims directed to an audio system and a method, wherein any feature mentioned in one claim category, e.g. method, can be claimed in another claim category, e.g. audio system, as well. The dependencies or references back in the attached claims are chosen for formal reasons only. However any subject matter resulting from a deliberate reference back to any previous claims (in particular multiple dependencies) can be claimed as well, so that any combination of claims and the features thereof is disclosed and can be claimed regardless of the dependencies chosen in the attached claims. The subject-matter which can be claimed comprises not only the combinations of features as set out in the attached claims but also any other combination of features in the claims, wherein each feature mentioned in the claims can be combined with any other feature or combination of other features in the claims. Furthermore, any of the embodiments and features described or depicted herein can be claimed in a separate claim and/or in any combination with any embodiment or feature described or depicted herein or with any of the features of the attached claims.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is a perspective view of a user’s bias in perceiving audio content, in accordance with one or more embodiments.
FIG. 2 is a perspective view of a headset including an audio system, in accordance with one or more embodiments.
FIG. 3 is a block diagram of an audio system, in accordance with one or more embodiments.
FIG. 4 is a flowchart illustrating a process for customizing a set of HRTFs for a user based on monitored user responses, in accordance with one or more embodiments.
FIG. 5 is a system environment of a headset including the audio system 300 of FIG. 3, in accordance with one or more embodiments.
The figures depict embodiments of the present disclosure for purposes of illustration only. One skilled in the art will readily recognize from the following description that alternative embodiments of the structures and methods illustrated herein may be employed without departing from the principles, or benefits touted, of the disclosure described herein.
DETAILED DESCRIPTION
Embodiments of the invention may include or be implemented in conjunction with an artificial reality system. Artificial reality is a form of reality that has been adjusted in some manner before presentation to a user, which may include, e.g., a virtual reality, an augmented reality, a mixed reality, a hybrid reality, or some combination and/or derivatives thereof. Artificial reality content may include completely generated content or generated content combined with captured (e.g., real-world) content. The artificial reality content may include video, audio, haptic sensation, or some combination thereof, and any of which may be presented in a single channel or in multiple channels (such as stereo video that produces a three-dimensional effect to the viewer). Additionally, in some embodiments, artificial reality may also be associated with applications, products, accessories, services, or some combination thereof, that are used to, e.g., create content in an artificial reality and/or are otherwise used in (e.g., perform activities in) an artificial reality. The artificial reality system that provides the artificial reality content may be implemented on various platforms, including an eyewear device, a head-mounted display (HMD) assembly with the eyewear device as a component, a HMD connected to a host computer system, a standalone HMD, a mobile device or computing system, or any other hardware platform capable of providing artificial reality content to one or more viewers. In addition, the artificial reality system may implement multiple controller devices for receiving user input which may influence the artificial reality content provided to the user.
* Overview*
An audio system generates audio content according to a customized set of HRTFs for a user of the audio system. The audio system generates audio content using a set of HRTFs. The set of HRTFs may include one or more generic HRTFs, one or more customized HRTFs for the user, or some combination thereof. The audio system presents, via one or more speakers on a headset, audio content to the user wearing the headset. The audio system monitors responses of the user to the audio content with one or more monitoring devices. The monitored responses of the user may be associated with a perceived origin direction and/or location of the audio content. In cases where the set of HRTFs for the user used to generate the content are not fully individualized or customized to the user, a delta is present between the perceived origin direction and/or location and a target presentation direction and/or location of the audio content. The audio system customizes the set of HRTFs for the user based on at least one of the monitored responses to reduce a delta between perceived origin direction and/or location and a target presentation direction and/or location of audio content. The audio system generates subsequent audio content using the customized set of HRTFs. Customizing a set of HRTFs for the user is beneficial as it removes potential instances where there is a discrepancy between a user’s perception of some virtual content and the user’s perception of the audio content presented with the virtual content.
FIG. 1 is a perspective view of a user’s 110 hearing perception in perceiving audio content, in accordance with one or more embodiments. An audio system presents audio content to the user 110 of the audio system. In this illustrative example, the user 110 is placed at an origin of a spherical coordinate system, more specifically a midpoint between the user’s 110 ears. The audio system is generating audio content with a target presentation direction 120 with an elevation angle .PHI. and an azimuthal angle .THETA. according to a set of HRTFs. Accordingly, the audio system presents audio content comprising binaural acoustic signals to the ears of the user 110. Due to the user’s 110 hearing perception, the user 110 perceives the audio content is originating from a perceived origin direction 130 that is a vector with an elevation angle .PHI.’ and an azimuthal angle .THETA.’. The elevation angles are angles measured from the horizon plane 140 towards a pole of the spherical coordinate system. The azimuthal angles are measured in the horizon plane 140 from a reference axis. In other embodiments, a perceived origin direction may include one or more vectors, e.g., an angle of vectors describing a width of perceived origin direction or a solid angle of vectors describing an area of perceived origin direction. Due to the HRTFs used to generate the audio content not being customized to the user 110, the user 110 may perceive the source to be more diffuse than the target presentation direction and/or location. Noticeably, there is a delta 125 between the target presentation direction 120 of the audio content and the user’s 110 perceived origin direction 130. When considering the target presentation direction 120 and the perceived origin direction 130, the delta 125 corresponds to an angular difference between the two directions. The delta 125 may be due to a result of the set of HRTFs used to generate the audio content not being customized to the user’s 110 hearing perception. In the case with the target presentation location 150 and the perceived origin location 160, the delta 125 may describe a distance difference between the target presentation location 150 and the perceived origin location 160.
The HRTFs can be tailored (e.g., using an audio system described in later figures) so as to reduce the delta between the target presentation direction 120 of the audio content and the user’s 110 perceived origin direction 130. Likewise, the HRTFS can be tailored to reduce the delta 125 between a target presentation location 150 and a perceived origin location 160. In embodiments of the perceived origin direction including an angle and/or a solid angle, the HRTFs may be tailored so as to decrease the angle and/or the solid angle. The reduction in delta (between the target presentation direction 120 and the perceived origin direction 130 and/or the target presentation location 150 and the perceived origin location 160) can be advantageous in providing audio content in artificial reality systems. For example, customizing a set of HRTFs for the user 110 may avoid situations where the user 110 perceives a discrepancy between visual content of a virtual object and audio content of the virtual content.
* Headset*
FIG. 2 is a perspective view of a headset 200 including an audio system, in accordance with one or more embodiments. The headset 200 presents media to a user. Examples of media presented by the headset 200 include one or more images, video, audio, or some combination thereof. The headset 200 may be an eyewear device or a head-mounted display (HMD). The headset 200 includes, among other components, a frame 205, a lens 210, a sensor device 215, and an audio system.
In embodiments as an eyewear device, the headset 200 may correct or enhance the vision of a user, protect the eye of a user, or provide images to a user. The headset 200 may be eyeglasses which correct for defects in a user’s eyesight. The headset 200 may be sunglasses which protect a user’s eye from the sun. The headset 200 may be safety glasses which protect a user’s eye from impact. The headset 200 may be a night vision device or infrared goggles to enhance a user’s vision at night. In alternative embodiments, the headset 200 may not include a lens 210 and may be a frame 205 with the audio system that provides audio content (e.g., music, radio, podcasts) to a user. In other embodiments of the headset 200 as a HMD, the headset 200 may be a HMD that produces artificial reality content for the user.
The frame 205 includes a front part that holds the lens 210 and end pieces to attach to the user. The front part of the frame 205 bridges the top of a nose of the user. The end pieces (e.g., temples) are portions of the frame 205 to which the temples of a user are attached. The length of the end piece may be adjustable (e.g., adjustable temple length) to fit different users. The end piece may also include a portion that curls behind the ear of the user (e.g., temple tip, ear piece).
The lens 210 provides or transmits light to a user wearing the headset 200. The lens 210 is held by a front part of the frame 205 of the headset 200. The lens 210 may be prescription lens (e.g., single vision, bifocal and trifocal, or progressive) to help correct for defects in a user’s eyesight. The prescription lens transmits ambient light to the user wearing the headset 200. The transmitted ambient light may be altered by the prescription lens to correct for defects in the user’s eyesight. The lens 210 may be a polarized lens or a tinted lens to protect the user’s eyes from the sun. The lens 210 may be one or more waveguides as part of a waveguide display in which image light is coupled through an end or edge of the waveguide to the eye of the user. The lens 210 may include an electronic display for providing image light and may also include an optics block for magnifying image light from the electronic display. Additional detail regarding the lens 210 can be found in the detailed description of FIG. 5.
The sensor device 215 estimates a current position of the headset 200 relative to an initial position of the headset 200. The sensor device 215 may be located on a portion of the frame 205 of the headset 200. The sensor device 215 includes a position sensor and an inertial measurement unit. The sensor device 215 may also include one or more cameras placed on the frame 205 in view or facing the user’s eyes. The one or more cameras of the sensor device 215 are configured to capture image data corresponding to eye positions of the user’s eyes. Additional details about the sensor device 215 can be found in the detailed description of FIG. 5.
The audio system provides audio content to a user of the headset 200. The audio system includes an audio assembly, a monitoring assembly, and a controller. The monitoring assembly contains one or more monitoring devices for monitoring responses of the user to audio content. The monitoring devices may be various sensors or input devices that monitor response of the user. In one embodiment, the sensor device 215 is a monitoring device and tracks movement of the headset 200 as monitoring data. The monitoring assembly is described further in conjunction with FIGS. 3 & 4. The controller is also part of the audio system and manages operation of the audio assembly and the monitoring assembly.
The audio assembly provides audio content to a user of the headset 200. The audio assembly includes a plurality of speakers 220 that provide audio content in accordance with instructions from the controller. In the illustrated embodiment of FIG. 2, the speakers 220 are coupled to the end pieces of the frame 205. The speakers 220 may be placed so as to be in proximity to the user’s ear canals or inside the user’s ear canals when the user is wearing the headset 200, on another portion of the frame 205 and/or in a local area, or some combination thereof. Based on a placement of the speakers relative to a user’s ears, the audio assembly 220 may assign speakers to be for a user’s right ear or for a user’s left ear. When presenting audio content, the audio assembly may receive binaural acoustic signals for specific actuation of speakers assigned to each of the user’s ears. Additional detail regarding the structure and the function of the audio assembly can be found in the detailed description of FIGS. 3 & 4.
The controller provides audio content to the audio assembly 220 for presentation. The controller is embedded into the frame 205 of the headset 200. In other embodiments, the controller may be located in a different location (e.g., different portion of the frame 205 or external to the frame 205. The controller generates audio content according to a set of HRTFs and based on a target presentation direction and/or location for the audio content. The audio content provided to the audio assembly 220 may be binaural acoustic signals that dictate actuation of the speakers to present specific content to each of the user’s ears. The functions and operations of the controller in providing audio content to the audio assembly will be further described in conjunction with FIGS. 3 & 4.
The controller adjusts the set of HRTFs according to monitored responses. The controller obtains monitored data from the monitoring assembly. With the monitored data, the controller determines monitored responses of the user in response to audio content provided by the audio assembly. The controller customizes the set of HRTFs for the user of the headset 200 according to the monitored responses. The controller then generates updated audio content according to the customized set of HRTFs for the user. Additional detail regarding the controller and the controller’s operation with other components of the audio system can be found in the detailed description of FIGS. 3 & 4.