Facebook Patent | Compensating For Effects Of Headset On Head Related Transfer Functions

Patent: Compensating For Effects Of Headset On Head Related Transfer Functions

Publication Number: 20200245091

Publication Date: 20200730

Applicants: Facebook

Abstract

An audio system captures audio data of test sounds through a microphone of a headset worn by a user. The test sounds are played by an external speaker, and the audio data includes audio data captured for different orientations of the headset with respect to the external speaker. A set of head-related transfer function (HRTFs) is calculated based at least in part on the audio data of the test sounds at the different orientations of the headset. A portion of the set of HRTFs is discarded to create an intermediate set of HRTFs. The discarded portion corresponding to one or more distortion regions that are based in part on wearing the headset. One or more HRTFs are generated that correspond to the discarded portion using at least some of the intermediate set of HRTFs to create an individualized set of HRTFs for the user.

CROSS REFERENCE TO RELATED APPLICATION

[0001] This application claims the benefit and priority of U.S. Provisional Application No. 62/798,813 filed Jan. 30, 2019, which is incorporated by reference herein in its entirety.

FIELD OF THE INVENTION

[0002] The present disclosure relates generally to head-related transfer functions (HRTFs) and specifically to compensating for effects of a headset on HRTFs.

BACKGROUND

[0003] Conventionally, head-related transfer functions (HRTF)s are determined in a sound dampening chamber for many different source locations (e.g., typically more than a 100) relative to a person. The determined HRTFs may then be used to provide spatialized audio content to the person. Moreover, to reduce error, it is common to determine multiple HRTFs for each source location (i.e., each speaker is generating a plurality of discrete sounds). Accordingly, for high quality spatialization of audio content it takes a relatively long time (e.g., more than an hour) to determine the HRTFs as there are multiple HRTFs determined for many different speaker locations. Additionally, the infrastructure for measuring HRTFs sufficient for quality surround sound is rather complex (e.g., sound dampening chamber, one or more speaker arrays, etc.). Accordingly, conventional approaches for obtaining HRTFs are inefficient in terms of hardware resources and/or time needed.

SUMMARY

[0004] Embodiments relate to a system and a method for obtaining an individualized set of HRTFs for a user. In one embodiment, a HRTF system determines a set of distortion regions, which are portions HRTFs where the sound is commonly distorted by the presence of a headset. The HRTF system captures audio test data for a population of test users, both with a headset on and with the headset off. The audio test data is used to determine sets of HRTFs. Analyzing and comparing sets of HRTFs of the test users with the headset and sets of HRTFs of the test users without the headset for the population of test users determines frequency-dependent and directionally-dependent regions of distorted HRTFs that are common for the population of test users.

[0005] An audio system of an artificial reality system compensates for the distortion of the set of HRTFs by accounting for the distortion regions. A user wears a headset equipped with means for capturing sounds in the user’s ear canal (i.e., a microphone). The audio system plays test sounds through an external speaker and records audio data of how the test sounds are captured in the user’s ear for different directional orientations with respect to an external speaker. For each measured direction, an initial HRTF is calculated, forming an initial set of HRTFs. The portions of the initial set of HRTFs corresponding to the distortion regions are discarded. The discarded regions are interpolated to calculate an individualized set of HRTFs that compensates for the headset distortion.

BRIEF DESCRIPTION OF THE DRAWINGS

[0006] FIG. 1A is a diagram of a sound measurement system (SMS) for obtaining audio data associated with a test user wearing a headset, in accordance with one or more embodiments.

[0007] FIG. 1B is a diagram of the SMS of FIG. 1A configured to obtain audio data associated with the test user not wearing a headset, in accordance with one or more embodiments.

[0008] FIG. 2 is a block diagram a HRTF system, in accordance with one or more embodiments.

[0009] FIG. 3 is a flowchart illustrating a process for determining a set of distortion regions, in accordance with one or more embodiments.

[0010] FIG. 4A is a diagram of an example artificial reality system for obtaining audio data associated with a user wearing a headset using an external speaker and a generated virtual space, in accordance with one or more embodiments.

[0011] FIG. 4B is a diagram of a display in which an alignment prompt and an indicator are displayed by a headset and a user’s head is not at a correct orientation, in accordance with one or more embodiments.

[0012] FIG. 4C is a diagram of the display of FIG. 4B in which the user’s head is at a correct orientation, in accordance with one or more embodiments.

[0013] FIG. 5 is a block diagram of a system environment of a system for determining individualized HRTFs for a user, in accordance with one or more embodiments.

[0014] FIG. 6 is a flowchart illustrating a process of obtaining a set of individualized HRTFs for a user,* in accordance with one or more embodiments*

[0015] FIG. 7A is a perspective view of a headset implemented as an eyewear device, in accordance with one or more embodiments.

[0016] FIG. 7B is a perspective view of a headset implemented as a HMD, in accordance with one or more embodiments.

[0017] FIG. 8 is a block diagram of a system environment that includes a headset and a console, in accordance with one or more embodiments.

[0018] The figures depict embodiments of the present disclosure for purposes of illustration only. One skilled in the art will readily recognize from the following description that alternative embodiments of the structures and methods illustrated herein may be employed without departing from the principles, or benefits touted, of the disclosure described herein.

DETAILED DESCRIPTION

[0019] Embodiments of the present disclosure may include or be implemented in conjunction with an artificial reality system. Artificial reality is a form of reality that has been adjusted in some manner before presentation to a user, which may include, e.g., a virtual reality (VR), an augmented reality (AR), a mixed reality (MR), a hybrid reality, or some combination and/or derivatives thereof. Artificial reality content may include completely generated content or generated content combined with captured (e.g., real-world) content. The artificial reality content may include video, audio, haptic feedback, or some combination thereof, and any of which may be presented in a single channel or in multiple channels (such as stereo video that produces a three-dimensional effect to the viewer). Additionally, in some embodiments, artificial reality may also be associated with applications, products, accessories, services, or some combination thereof, that are used to, e.g., create content in an artificial reality and/or are otherwise used in (e.g., perform activities in) an artificial reality. The artificial reality system that provides the artificial reality content may be implemented on various platforms, including a headset, a headset connected to a host computer system, a standalone headset, a mobile device or computing system, or any other hardware platform capable of providing artificial reality content to one or more viewers.

Overview

[0020] An HRTF system herein is used to collect audio test data to determine common portions of HRTFs that are distorted by the presence of a headset. The HRTF system captures audio test data at a test user’s ear canal in an acoustic chamber, both with the test user wearing a headset and without the headset. The audio test data is analyzed and compared to determine the effect of the presence of the headset on individualized HRTFs. The audio test data is collected for a population of test users and used to determine a set of distortion regions where the HRTFs are commonly distorted by the presence of the headset.

[0021] An audio system of a headset uses information from the HRTF system to calculate for a user a set of individualized HRTFs that compensate for the effects of the headset on the HRTFs. The user wears the headset and the audio system captures audio data of test sounds emitted from an external speaker. The external speaker may be, e.g., physically separate from the headset and audio system. The audio system calculates a set of initial HRTFs based at least in part on the audio data of the test sound at different orientations of the headset. The audio system discards a portion (based in part on at least some of the distortion regions determined by the HRTF server) of the set of initial HRTFs to create an intermediate set of HRTFs. The intermediate set of HRTFs is formed from the non-discarded HRTFs of the set of HRTFs. The discarded portion of the set of HRTFs corresponds to one or more distortion regions that are caused by the presence of the headset. The audio system generates one or more HRTFs (e.g., via interpolation) that correspond to the discarded portion of the set, which are combined with at least some of the intermediate set of HRTFs to create a set of individualized HRTFs for the user. The set of individualized HRTFs are customized to the user such that errors in the HRTFs caused by wearing the headset are mitigated, and thereby mimic actual HRTFs of the user without a headset. The audio system may use the set of individualized HRTFs to present spatialized audio content to the user. Spatialized audio content is audio that can be presented as if it is positioned at a specific point in three-dimensional space. For example, in a virtual environment, audio associated with a virtual object that is being displayed by the headset can appear to originate from the virtual object.

[0022] Note that in this manner, the audio system is effectively able to generate an individualized set of HRTFs for the user, even though the user is wearing the headset. This is much faster, easier, and cheaper than conventional methods of measuring a user’s actual HRTFs in a customized sound dampening chamber.

Example Distortion Mapping System

[0023] FIG. 1A is a diagram of a sound measurement system (SMS) 100 for obtaining audio test data associated with a test user 110 wearing a headset 120, in accordance with one or more embodiments. The sound measurement system 100 is part of an HRTF system (e.g., as described below with regard to FIG. 2). The SMS 100 includes a speaker array 130, and a binaural microphones 140a, 140b. In the illustrated embodiment, the test user 110 is wearing the headset 120 (e.g., as described in more detail in relation to FIGS. 7A and 7B). The headset 110 may be called a test headset. The SMS 100 is used for measuring audio test data to determine a set of HRTFs for a test user 110. The SMS 100 is housed in an acoustically treated chamber. In one particular embodiment, the SMS 100 is anechoic down to a frequency of approximately 500 hertz (Hz).

[0024] In some embodiments, the test user 110 is a human. In these embodiments, it is useful to collect audio test data for a large number of different people. The people can different ages, different sizes, different genders, have different hair lengths, etc. In this manner audio test data can be collected over a large population. In other embodiments, the test user 110 is a manikin. The manikin may, e.g., have physical features (e.g., ear shape, size, etc.) representative of an average person.

[0025] The speaker array 130 emits test sounds in accordance with instructions from a controller of the SMS 100. A test sound is an audible signal transmitted by a speaker that may be used to determine a HRTF. A test sound may have one or more specified characteristics, such as frequency, volume, and length of the transmission. The test sounds may include, for example, a continuous sinusoidal wave at a constant frequency, a chirp, some other audio content (e.g., music), or some combination thereof. A chirp is a signal whose frequency is swept upward or downward for a period of time. The speaker array 130 comprises a plurality of speakers, including a speaker 150, that are positioned to project sound to a target area. The target area is where the test user 110 is located during operation of the SMS 100. Each speaker of the plurality of speakers is in a different location relative to the test user 110 in the target area. Note that, while the speaker array 130 is depicted in two-dimensions in FIG. 1, it is noted that the speaker array 130 can also include speakers in other locations and/or dimensions (e.g., span three-dimensions). In some embodiments, the speakers in the speaker array 130 are positioned spanning in elevation from -66.degree. to +85.degree. with a spacing of 9.degree. -10.degree. between each speaker 150 and spans every 10.degree. in azimuth around a full sphere. That is, 36 azimuths and 17 elevations, creating a total of 612 different angles of speakers 150 with respect to the test user 110. In some embodiments, one or more speakers of the speaker array 130 may dynamically change their position (e.g., in azimuth and/or elevation) relative to the target area. Note in the above description the test user 110 is stationary (i.e., the position of the ears within the target area stays substantially constant).

[0026] The binaural microphones 140a, 140b (collectively referred to as “140”) capture the test sounds emitted by the speaker array 130. The captured test sounds are referred to as audio test data. The binaural microphones 140 are each placed in an ear canal of the test user. As illustrated, the binaural microphone 140a is placed in the ear canal of the right ear of the user, and the microphone 140b is placed in the ear canal of the left ear of the user. In some embodiments, the microphones 140 are embedded in foam earplugs that are worn by the test user 110. As discussed in detail below with regard to FIG. 2, the audio test data can be used to determine a set of HRTFS. For example, test sounds emitted by a speaker 150 of the speaker array 130 are captured by the binaural microphones 140 as audio test data. The speaker 150 has a specific location relative to the ears of the test user 110, accordingly, there is a specific HRTF for each ear that can be determined using the associated audio test data.

[0027] FIG. 1B is a diagram of the SMS 100 of FIG. 1A configured to obtain audio test data associated with the test user 110 not wearing a headset, in accordance with one or more embodiments. In the illustrated embodiments, the SMS 100 collects audio test data in the same way described above with respect to FIG. 1A, except that the test user 110 in FIG. 1B is not wearing a headset. Accordingly, the audio test data collected can be used to determine actual HRTF’s of the test user 110 that do not include distortion introduced by wearing the headset 140.

[0028] FIG. 2 is a block diagram a HRTF system 200, in accordance with one or more embodiments. The HRTF system 200 captures audio test data and determines portions of HRTFs commonly distorted by a headset. The HRTF system 200 includes a sound measurement system 210, and a system controller 240. In some embodiments some or all of the functions of the system controller 240 may be shared and/or performed by the SMS 210.

[0029] The SMS 210 captures audio test data to be used by the HRTF system 200 to determine a mapping of distortion regions. In particular, the SMS 210 is used to capture audio test data that is used to determine HRTFs of a test user. The SMS 210 includes a speaker array 220 and microphones 230. In some embodiments, the SMS 210 is the SMS 100 described in relation to FIGS. 1A and 1B. The captured audio data is stored in the HRTF data store 245.

[0030] The speaker array 220 emits test sounds in accordance with instructions from the system controller 240. The test sounds transmitted by the speaker array 130 may include, for example, a chirp (a signal whose frequency is swept upward or downward for a period of time), some other audio signal that may be used for HRTF determination, or some combination thereof. The speaker array 220 comprises one or more speakers that are positioned to project sound to a target area (i.e., location where a test user is located). In some embodiments, the speaker array 220 includes a plurality of speakers and each speaker of the plurality of speakers is in a different location relative to the test user in the target area. In some embodiments, one or more speakers of the plurality of speakers may dynamically change their position (e.g., in azimuth and/or elevation) relative to the target area. In some embodiments, one or more speakers of the plurality of speakers may change their position (e.g., in azimuth and/or elevation) relative to the test user by instructing the test user to rotate his/her head. The speaker array 130 is an embodiment of the speaker array 220.

[0031] The microphones 230 capture the test sounds emitted by the speaker array 220. The captured test sounds are referred to as audio test data. The microphones 230 include binaural microphones for each ear canal, and may include additional microphones. The additional microphones may be placed, e.g., in areas around the ears, along different portions of the headset, etc. The binaural microphones 140 are an embodiment of the microphones 230.

[0032] The system controller 240 generates control components of the HRTF system 200. The system controller 240 includes an HRTF data store 245, a HRTF module 250, and a distortion identification module 255. Some embodiments of the system controller 240 may include other components than those described herein. Similarly, the functions of components may be distributed differently than described here. For example, in some embodiments, some or all of the functionality of the HRTF module 250 may be part of the SMS 210.

[0033] The HRTF data store 245 stores data relating to the HRTF system 200. The HRTF data store 245 may store, e.g., audio test data associated with test users, HRTFs for test users wearing a headset, HRTFs for test users that are not wearing the headset, distortion mappings including sets of distortion regions for one or more test users, distortion mappings including sets of distortion regions for one or more populations of test users, parameters associated with physical characteristics of the test users, other data relating to the HRTF system 200, or some combination thereof. The parameters associated with physical characteristics of the test users may include gender, age, height, ear geometry, head geometry, and other physical characteristics that affect how audio is perceived by a user.

[0034] The HRTF module 250 generates instructions for the speaker array 220. The instructions are such that the speaker arrays 220 emits test sounds that can be captured at the microphones 230. In some embodiments, the instructions are such that each speaker of the speaker array 220 plays one or more a respective test sounds. And each test sound may have one or more of a specified length of time, a specified volume, a specified start time, a specified stop time, and a specified waveform (e.g., chirp, frequency tone, etc.). For example, the instructions may be such that one or more speakers of the speaker array 220 play, in sequence, a 1-second logarithmic sine sweep, ranging infrequency from 200 Hz to 20 kHz at a sampling frequency of 48 kHz, with a sounds level of 94 decibel of sounds pressure level (dB SPL). In some embodiments, each speaker of the speaker array 220 is associated with a different position relative to the target area, accordingly, each speaker is associated with a specific azimuth and elevation relative to the target area. In some embodiments, one or more speakers of the speaker array 220 may be associated with multiple positions. For example, the one or more speakers may change position relative to the target area. In these embodiments, the generated instructions may also control motion of some or all of speakers in the speaker array 220. In some embodiments, one or more speakers of the speaker array 220 may be associated with multiple positions. For example, the one or more speakers may change position relative to the test user by instructing the target user to rotate his/her head. In these embodiments, the generated instructions may also be presented to the test user. The HRTF module 250 provides the generated instructions to the speaker array 220 and/or the SMS 210.

[0035] The HRTF module 250 determines HRTFs for the test user using the audio test data captured via the microphones 230. In some embodiments, for each test sound played by a speaker of the speaker array 220 at a known elevation and azimuth, the microphones 230 capture audio test data of the test sound at the right ear and audio test data at the left ear (e.g., using binaural microphones as the microphones 230). The HRTF module 250 uses audio test data for the right ear and the audio test data for the left ear to determine a right-ear HRTF and a left-ear HRTF, respectfully. The right-ear and left-ear HRTFs are determined for a plurality of different directions (elevation and azimuth) that each correspond to a different location of a respective speaker in the speaker array 220.

[0036] Each set of HRTFs is calculated from captured audio test data for a particular test user. In some embodiments, the audio test data is a head-related impulse response (HRIR), where the test sound is the impulse. A HRIR relates the location of the sound source (i.e., a particular speaker in the speaker array 220) to the location of the test user’s ear canal (i.e., the location of the microphones 230). The HRTFs are determined by taking the Fourier transform of each corresponding HRIR. In some embodiments, error in the HRTFs is mitigated using free-field impulse response data. The free-field impulse response data may be deconvolved from the HRIRs to remove the individual frequency response of the speaker array 220 and the microphones 230.

[0037] The HRTFs are determined at each direction both with the test user wearing a headset 120 (e.g., as shown in FIG. 1A) and the test user not wearing a headset (e.g., as shown in FIG. 1B). For example, the HRTFs are determined at each elevation and azimuth with the test user wearing the headset 120 (as shown in FIG. 1A), then the headset 120 is removed, and the HRTFs are measured at each elevation and azimuth with the user not wearing the headset 120 (as shown in FIG. 1B). Audio test data at each speaker direction, both with and without the headset 120, may be captured for a population (e.g., hundreds, thousands, etc.) of test users. The population of test users may include individuals of differing ages, sizes, genders, hair lengths, head geometry, ear geometry, some other factor that can affect an HRTF, or some combination thereof. For each test user, there is a set of individualized HRTFs with the headset 120 and a set of individualized HRTFs without the headset 120.

[0038] The distortion identification module 255 compares one or more of the sets of HRTFs of a test user wearing a headset to one or more of the sets of HRTFs of the test user not wearing a headset. In one embodiment, the comparison involves the evaluation of the two sets of HRTFs using spectral difference error (SDE) analysis and determining discrepancies in the interaural time difference (ITD).

[0039] The SDE between the set of HRTFs without the headset and the set of HRTFs with the headset, for a particular test user, is calculated based on the formula:

SDE WO – Headset ( .OMEGA. , f ) = 20 log 1 0 HRTF WO ( .OMEGA. , f ) HRTF Headset ( .OMEGA. , f ) ( 1 ) ##EQU00001##

[0040] Where .OMEGA. is direction angle (azimuth and elevation), f is the frequency of the test sound, HRTF.sub.WO(.OMEGA., f) is the HRTF without the headset for the direction .OMEGA. and frequency f, and HRTF.sub.Headset(.OMEGA., f) is the HRTF with the headset for the direction .OMEGA. and frequency f. The SDE is calculated for each pair of HRTFs with and without the headset at a particular frequency and direction. The SDE is calculated for both ears at each frequency and direction.

[0041] In one embodiment, ITD error is also estimated by determining the time when the result of the correlation between the right and the left HRIRs reaches a maximum. For each measured test user, the ITD error may be calculated as the absolute value of the difference between the ITD of the HRTF without the headset and with the headset for each direction.

[0042] In some embodiments, a comparison of the set of HRTFs of a test user wearing a headset to the set of HRTFs of the test user not wearing a headset includes an additional subjective analysis. In one embodiment, each test user who had their HRTFs measured with and without the headset participates in a Multiple Stimuli with Hidden Reference and Anchor (MUSHRA) listening test to corroborate the results of the objective analysis. In particular, the MUSHRA test consists of a set of generalized HRTFs without the headset, a set of generalized HRTFs with the headset, the test user’s individualized set of HRTFs without the headset, and the test user’s individualized set of HRTFs with the headset, wherein the set of individualized HRTFs without the headset is the hidden reference and there is no anchor.

[0043] The distortion identification module 255 determines an average comparison across the population of test users. To determine an average comparison the SDE.sub.WO-Headset(.OMEGA., f) for each test user is averaged across the population of test users at each frequency and direction, denoted by SDE.sub.WO-Headset(.OMEGA., f):

SDE _ WO – Headset ( .OMEGA. , f ) = 1 N .PI. i N 20 log 1 0 HRTF WO i ( .OMEGA. , f ) HRTF Headset i ( .OMEGA. , f ) ( 2 ) ##EQU00002##

[0044] Where N is the total number of test users in the population of users. In alternate embodiments, SDE.sub.WO-Headset(.OMEGA., f) may be determined by alternate calculations.

[0045] In one embodiment, the determination further includes averaging across the span of frequencies measured (e.g., 0-16 kHz), denoted by SDE.sub.WO-Headset(.OMEGA.). The SDE is found to generally be higher at higher frequencies. That is, the HRTF with the headset differs more dramatically from the HRTF without the headset at higher frequencies due to the fact that at high frequencies the wavelengths are large relative to the headset’s form factor. Because of the general trend that the SDE is greater at higher frequencies, averaging across all frequencies allows for determination of particular azimuths and elevations at which the distortion due to the headset is more extreme.

[0046] The average ITD error across the population of test users, ITD.sub.WO-Headset(.OMEGA.), is calculated based on the following formula:

ITD _ WO – Headset ( .OMEGA. ) = 1 N i N | ITD WO i ( .OMEGA. ) – ITD Headset i ( .OMEGA. ) | ( 3 ) ##EQU00003##

[0047] Where N is the total number of test users in the population of test users, ITD.sub.WO.sub.i(.OMEGA.) is the maximum ITD of the HRTF without the headset at direction .OMEGA. of user i, and ITD.sub.Headset.sub.i(.OMEGA.) is the maximum ITD of the HRTF with the headset at direction .OMEGA. of user i.

[0048] The distortion identification module 255 determines a distortion mapping that identifies a set of one or more distortion regions based on portions of HRTFs commonly distorted across the population of test users. Using the SDE.sub.WO-Headset(.OMEGA.) and ITD.sub.WO-Headset(.OMEGA.), the directional dependence of the distortion of the HRTFs based on the presence of the headset can be determined. Both SDE.sub.WO-Headset(.OMEGA.) and ITD.sub.WO-Headset(.OMEGA.) can be plotted in two dimensions to determine particular azimuths and elevations where the errors are the greatest in magnitude. In one embodiment, the directions with the greatest error are determined by a particular threshold value of SDE and/or ITD. The determined directions of greatest error are the set of one or more distortion regions.

[0049] In one example, the threshold is high error in the contralateral direction greater than 4 dB of SDE. In this example, based on the SDE.sub.WO-Headset(.OMEGA.) for the left-HRTFs, regions of azimuth [-80.degree., -10.degree.] and elevation [-30.degree., 40.degree.] and regions of azimuth [-120.degree., -100.degree.] and elevation [-30.degree., 0.degree.] are above the SDE threshold. These regions are thereby determined to be the distortion regions.

[0050] In another example, the threshold is ITD.sub.WO-Headset(.OMEGA.)>50 .mu.s. In this example, directions corresponding to the regions of azimuth [-115.degree., -100.degree.] and elevation [-15.degree., 0.degree.], azimuth [-60.degree., -30.degree.] and elevation [0.degree., 30.degree.], azimuth [30.degree., 60.degree.] and elevation [0.degree., 30.degree.], and azimuth [100.degree., -115.degree.] and elevation [-15.degree., 0.degree.] are above the ITD threshold. These regions are thereby determined to be the distortion regions.

[0051] The SDE and ITD analysis and thresholds may determine different distortion regions. In particular, the ITD analysis may result in smaller distortion region than the SDE analysis. In different embodiments, the SDE and ITD analyses may be used independently from one another, or used together.

[0052] Note that the distortion mapping is based on the HRTFs determined for a population of test users. In some embodiments, the population may be a single manakin. But in other embodiments, the population may include a plurality of test users having a large cross section of different physical characteristics. Note that in some embodiments, distortion maps are determined for populations having one or more common physical characteristics (e.g., age, gender, size, etc.). In this manner, the distortion identification module 255 may determine multiple distortion mappings that are each indexed to one or more specific physical characteristics. For example, one distortion mapping could be specific to adults that identifies a first set of distortion regions, and a separate distortion map could be specific to children that may identify a second set of distortion regions that are different than the first set of distortion regions.

[0053] The HRTF system 200 may communicate with one or more headsets and/or consoles. In some embodiments, the HRTF system 200 is configured to receive a query for distortion regions from a headset and/or console. In some embodiments, the query may include parameters about a user of the headset, which is used by the distortion identification module 255 to determine a set of distortion regions. For example, the query may include specific parameters about the user, such as height, weight, age, gender, dimensions of ears, and/or type of headset being worn. The distortion identification module 255 can use one or more of the parameters to determine a set of distortion regions. That is, the distortion identification module 255 uses parameters provided by the headset and/or console to determine a set of distortion regions from audio test data captured from test users with similar characteristics. The HRTF server 200 provides the determined set of distortion regions to the requesting headset and/or console. In some embodiments, the HRTF server 200 receives information (e.g., parameter about a user, sets of individualized HRTFs, HRTFs measured while a user is wearing a headset from a headset and/or console, or some combination thereof) from a headset (e.g., via a network). The HRTF server 200 may use the information to update one or more distortion mappings.

[0054] In some embodiments, the HRTF system 200 may be remote and/or separate from the sound measurement system 210. For example, the sound measurement system 210 may be communicatively coupled with the HRTF system 200 via a network (e.g., local area network, Internet, etc.). Similarly, the HRTF system 200 may connect to other components via a network, as discussed in greater detail below in reference to FIGS. 5 and 8.

[0055] FIG. 3 is a flowchart illustrating a process 300 of obtaining a set of distortion regions, in accordance with one or more embodiments. In one embodiment, the process 300 is performed by the HRTF system 200. Other entities may perform some or all of the steps of the process 300 in other embodiments (e.g., a server, headset, other connected device). Likewise, embodiment may include different and/or additional steps or perform the steps in a different order.

[0056] The HRTF system 200 determines 310 a set of HRTFs for a test user wearing a headset and a set of HRTFs for the test user not wearing the headset. Audio test data is captured by one or more microphones that are at or near the ear canals of a test user. The audio test data is captured for test sounds played from a variety of orientations, both with the test user wearing a headset and the user not wearing the headset. The audio test data is collected at each orientation both with and without the headset such that the audio test data can be compared for the instances with the headset and the instances without the headset. In one embodiment, this is done by the processes discussed above in relation to FIG. 1A and 1B.

[0057] Note that audio test data can be captured over a population of test users that includes one or more test users from which audio test data was measured. In some embodiments, the population of test users can be one or more people. The one or more people can be further divided into subsets of the population based on different physical characteristics, such as gender, age, ear geometry, head dimensions, some other factor that may affect HRTFs for the test user, or some combination thereof. In other embodiments, a test user may be a manikin head. In some embodiments, a first manikin head may have average physical characteristics, whereas other manikins may have different physical characteristics and be similarly subdivided into subsets based on the physical characteristics.

[0058] The HRTF system 200 compares 320 the set of HRTFs for the test user wearing a headset and the set of HRTFs for the test user not wearing a headset. In one embodiment, the comparison 320 is performed using SDE analysis and/or ITD, as previously discussed in relation to the HRTF module 250 of FIG. 2 and equation (1). The comparison 320 may be repeated for a population of test users. The sets of HRTFs and corresponding audio test data can be grouped based on the physical characteristics of the population of test users.

[0059] The HRTF system 200 determines 330 a set of distortion regions based on portions of the HRTFs commonly distorted across a population of test users. In some embodiments, the population of test users is a subset of the previously discussed population of test users. In particular the distortion regions may be determined for a population of test users that is a subset of the total population of test users that meet one or more parameters based on physical characteristics. In one embodiment, the HRTF system 200 determines 330 using an average of the SDE and average of the ITD, as previously discussed in relation to the distortion identification module 255 of FIG. 2 and equations (2) and (3).

Example System for Calculating Individualized Sets of HRTFs

[0060] An audio system uses information from an HRTF system and HRTFs calculated while a user of a headset is wearing the headset to determine a set of individualized HRTFs that compensate for the effects of the headset. The audio system collects audio data for a user wearing a headset. The audio system may determine HRTFs for the user wearing the headset and/or provide the audio data to a separate system (e.g., HRTF system and/or console) for the HRTF determination. In some embodiments, the audio system requests a set of distortion regions based on the audio test data previously captured by the HRTF system, and uses the set of distortion regions to determine he individualized set of HRTFS for the user.

[0061] FIG. 4A is a diagram of an example artificial reality system 400 for obtaining audio data associated with a user 410 wearing a headset 420 using an external speaker 430 and a generated virtual space 440, in accordance with one or more embodiments. The audio data obtained by the artificial reality system 400 is distorted by the presence of the headset 420, which is used by an audio system to calculate an individualized set of HRTFs for the user 410 that compensates for the distortion. The artificial reality system 400 uses artificial reality to enable measurement of individualized HRTFs for the user 410 without the use of anechoic chamber, such as the SMSs 100, 210 previously discussed in FIGS. 1A-3.

[0062] The user 410 is an individual, distinct from the test user 110 of FIGS. 1A and 1B. The user 410 is an end-user of the artificial reality system 400. The user 410 may use the artificial reality system 400 to create a set of individualized HRTFs that compensate for distortion of the HRTFs caused by the headset 420. The user 410 wears a headset 420 and a pair of microphones 450a, 450b (collectively referred to as “450”). The headset 420 can be the same type, model, or shape as the headset 120, as described in more detail in relation to FIGS. 7A and 7B. The microphones 450 can have the same properties as the binaural microphones 140, as discussed in relation to FIG. 1A, or the or microphones 230, as discussed in relation to FIG. 2. In particular, the microphones 450 are located at or near the entrance to the ear canals of the user 410.

[0063] The external speaker 430 is a device configured to transmit sound (e.g., test sounds) to the user 410. For example, the external speaker 430 may be a smartphone, a tablet, a laptop, a speaker of a desktop computer, a smart speaker, or any other electronic device capable of playing sound. In some embodiments, the external speaker 430 is driven by the headset 420 via a wireless connection. In other embodiments, the external speaker 430 is driven by a console. In one aspect, the external speaker 430 is fixed at one position and transmits test sounds that the microphones 450 can receive for calibrating HRTFs. For example, the external speaker 430 may play test sounds that are the same as those played by the speaker array 130, 220 of the SMS 100, 210. In another aspect, the external speaker 430 provides test sounds of frequencies that the user 410 can optimally hear based on audio characterization configuration, in accordance with the image presented on the headset 420.

[0064] The virtual space 440 is generated by the artificial reality system 400 to direct the orientation of the head of the user 410 while measuring the individualized HRTFs. The user 410 views the virtual space 440 through a display of the headset 420. The term “virtual space” 440 is not intended to be limiting. In some various embodiments the virtual reality space 440 may include virtual reality, augmented reality, mixed reality, or some other form of artificial reality.

[0065] In the embodiment illustrated, the virtual reality space 440 includes an indicator 460. The indicator 460 is presented on the display of the headset 420 to direct the orientation of the head of the user 410. The indicator 460 can be light, or a marking presented on the display of the headset 420. The position of the headset 420 can be tracked through an imaging device and/or an IMU (show in in FIGS. 7A and 7B) to confirm whether the indicator 460 is aligned with the desired head orientation.

[0066] In one example, the user 410 is prompted to view the indicator 460. After confirming that the indicator 460 is aligned with the head orientation, for example based on the location of the indicator 460 displayed on the HMD 420 with respect to a crosshair, the external speaker 430 generates a test sound. For each ear a corresponding microphone 450a, 450b captures the received test sound as audio data.

[0067] After the microphones 450 successfully capture the audio data, the user 410 is prompted to direct their orientation towards a new indicator 470 at a different location in the virtual space 440. The process of capturing the audio data at indicator 460 is repeated to capture audio data at indicator 470. Indicators 460, 470 are generated at different locations in the virtual space 440 to capture audio data to be used to determine HRTFs at different head orientations of the user 410. Each indicator 460, 470 at a different location in the virtual space 440 enables the measurement of an HRTF at a different direction (elevation and azimuth). New indicators are generated and the process of capturing audio data is repeated to sufficiently span elevations and azimuths within the virtual space 440. The use of an external speaker 430 and a display of indicators 460, 470 within the virtual space 440 displayed via a headset 420 enables relatively convenient measurement the measurement of individualized HRTFs for a user 410. That is, the user 410 can perform these steps at their convenience in their own home with an artificial reality system 400, without the need for an anechoic chamber.

[0068] FIG. 4B is a diagram of a display 480 in which an alignment prompt 490 and an indicator 460 are displayed by a headset and a user’s head is not at a correct orientation, in accordance with one or more embodiments. As shown in FIG. 4B, a display 480 presents an alignment prompt 490 on a center of the display 480 or at one or more predetermined pixels of the display 480. In this embodiment, the alignment prompt 490 is a crosshair. But more generally, the alignment prompt 490 is any text and/or graphical interface that shows the user whether the user’s head is at the correct orientation relative to a displayed indicator 460. In one aspect, the alignment prompt 490 reflects a current head orientation and the indicator 460 reflects a target head orientation. The correct orientation occurs when the indicator 460 is at the center of the alignment prompt 490. In the example depicted in FIG. 4B, the indicator 460 is positioned on a top left corner of the display 480, rather than on the alignment prompt 490. Accordingly, the head orientation is not at the correct orientation. Moreover, because the indicator 460 and the alignment prompt 490 are not aligned it is apparent to the user that his/her head is not at the proper orientation.

[0069] FIG. 4C is a diagram of the display of FIG. 4B in which the user’s head is at a correct orientation, in accordance with one or more embodiments. The display 480 on FIG. 4C is substantially similar to the display 480 of FIG. 4B, except the indicator 460 is now displayed on the crosshair 490. Hence, it is determined the head orientation is properly aligned with the indicator 460 and the user’s HRTF is measured for the head orientation. That is, a test sound is played by the external speaker 430 and captured as audio data at the microphones 450. Based on the audio data, an HRTF is determined for each ear at the current orientation. The process described in relation to FIGS. 4B and 4C is repeated for a plurality of different orientations of the head of the user 410 with respect to the external speaker 430. A set of HRTFs for the user 410 comprises an HRTF at each measured head orientation.

[0070] FIG. 5 is a block diagram of a system environment 500 of a system for determining individualized HRTFs for a user, in accordance with one or more embodiments. The system environment 500 comprises an external speaker 505, the HRTF system 200, a network 510, and a headset 515. The external speaker 505, the HRTF system 200, and the headset 515 are all connected via the network 510.

[0071] The external speaker 505 is a device configured to transmit sound to the user. In one embodiment, the external speaker 505 is operated according to commands from the headset 515. In other embodiments, the external speaker 505 is operated by an external console. The external speaker 505 is fixed at one position and transmits test sounds. Test sounds transmitted by the external speaker 505 include, for example, a continuous sinusoidal wave at a constant frequency, or a chirp. In some embodiments, the external speaker 505 is the external speaker 430 of FIG. 4A.

……
……
……

更多阅读推荐......