空 挡 广 告 位 | 空 挡 广 告 位

Facebook Patent | Individualization Of Head Related Transfer Function Templates For Presentation Of Audio Content

Patent: Individualization Of Head Related Transfer Function Templates For Presentation Of Audio Content

Publication Number: 20200336858

Publication Date: 20201022

Applicants: Facebook

Abstract

A system for generating individualized HRTFs that are customized to a user of a headset. The system includes a server and an audio system. The server determines the individualized HRTFs based in part on acoustic features data (e.g., image data, anthropometric features, etc.) of the user and a template HRTF. The server provides the individualized HRTFs to the audio system. The audio system presents spatialized audio content to the user using the individualized HRTFs.

BACKGROUND

[0001] The present disclosure generally relates to binaural audio synthesis, and specifically to individualizing head-related transfer functions (HRTFs) for presentation of audio content.

[0002] A sound from a given source received at two ears can be different, depending on a direction and location of the sound source with respect to each ear as well as on the surroundings of the room in which the sound is perceived. A HRTF characterizes sound received at an ear of the person for a particular location (and frequency) of the sound source. A plurality of HRTFs are used to characterize how a user perceives sound. In some instances, the plurality of HRTFs form a high dimensional data set that depends on tens of thousands of parameters to provide a listener with a percept of sound source direction.

SUMMARY

[0003] A system for generating individualized HRTFs that are customized to a user of an audio system (e.g., may be implemented as part of a headset). The system includes a server and an audio system. The server determines the individualized HRTFs based in part on acoustic features data (e.g., image data, anthropometric features, etc.) of the user and a template HRTF. A template HRTF is an HRTF that can be customized (e.g., add one or more notches) such that it can be individualized to different users. The server provides the individualized HRTFs to the audio system. The audio system presents spatialized audio content to the user using the individualized HRTFs. Methods described herein may also be embodied as instructions stored on computer readable mediums.

[0004] In some embodiments, a method is disclosed for execution by a server. The method comprises determining one or more individualized filters (e.g., e.g., via machine learning) based at least in part on acoustic features data of a user. One or more individualized HRTFs for the user are generated based on a template HRTF and the one or more individualized filters. The one or more individualized filters function to individualize (e.g., add one or more notches) the template HRTF such that it is customized to the user, thereby forming an individualized HRTF. The server provides the generated one or more individualized HRTFs to an audio system, wherein an individualized HRTF is used to generate spatialized audio content.

[0005] In some embodiments, a method is disclosed for execution by a headset. The method comprises receiving (e.g., from a server), at a headset, one or more individualized HRTFs for a user of the headset. The headset retrieves audio data associated with a target sound source direction with respect to the headset. The headset applies the one or more individualized HRTFs to the audio data to render the audio data as audio content. The headset presents, by a speaker assembly, the audio content, wherein the presented audio content is spatialized such that it appears to be originating from the target sound source direction.

BRIEF DESCRIPTION OF THE DRAWINGS

[0006] FIG. 1 is a perspective view of sound source elevation from a user’s viewpoint, in accordance with one or more embodiments.

[0007] FIG. 2 illustrates an example depiction of three HRTFs as parameterized by sound source elevation for a user, in accordance with one or more embodiments.

[0008] FIG. 3 is a schematic diagram of a high-level system environment for generating individualized HRTFs, in accordance with one or more embodiments.

[0009] FIG. 4 is a block diagram of a server, in accordance with one or more embodiments.

[0010] FIG. 5 is a flowchart illustrating a process for processing a request for one or more individualized HRTFs for a user, in accordance with one or more embodiments.

[0011] FIG. 6 is a block diagram of an audio system, in accordance with one or more embodiments.

[0012] FIG. 7 is a flowchart illustrating a process for presenting audio content on a headset using one or more individualized HRTFs, in accordance with one or more embodiments.

[0013] FIG. 8 is a system environment for a headset including an audio system, in accordance with one or more embodiments.

[0014] FIG. 9 is a perspective view of a headset including an audio system, in accordance with one or more embodiments.

[0015] The figures depict various embodiments of the present invention for purposes of illustration only. One skilled in the art will readily recognize from the following discussion that alternative embodiments of the structures and methods illustrated herein may be employed without departing from the principles of the invention described herein.

[0016] The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee.

DETAILED DESCRIPTION

Overview

[0017] A system environment configured to generate individualized HRTFs. A HRTF characterizes sound received at an ear of the person for a particular location of the sound source. A plurality of HRTFs are used to characterize how a user perceives sound. The HRTFs for a particular source direction relative to a person may be unique to the person based on the person’s anatomy (e.g., ear shape, shoulders, etc.), as their anatomy affects how sound arrives at the person’s ear canal.

[0018] A typical HRTF that is specific to a user includes features (e.g., notches) that act to customize the HRTF for the user. A template HRTF is an HRTF that was determined using data from some population of people, and that can then be individualized to be specific to a single user. Accordingly, a single template HRTF is customizable to provide different individualized HRTFs for different users. The template HRTF may be considered a smoothly varying continuous energy function with no individual sound source directional frequency characteristics over one or more frequency ranges (e.g., 5 kHz-10 kHz). An individualized HRTF is generated using the template HRTF by applying one or more filters to the template HRTF. For example, the filters may act to introducing one or more notches into the template HRTF. In some embodiments, for a given source direction, a notch is described by the following parameters: a frequency location, a width of a frequency band centered around the frequency location, and a value of attenuation in the frequency band at the frequency location. A notch may be viewed as the result of the resonances in the acoustic energy as it arrives at the head of a listener and bounces around the head and pinna undergoing cancellations before reaching the entrance of the ear canal. As noted above, notches can affect how a person perceives sound (e.g., from what elevation relative to the user a sound appears to originate).

[0019] The system environment includes a server and an audio system (that may be fully or partially implemented as part of a headset, may be separate and external to the headset, etc.). The server may receive acoustic features data describing features of a head of a user and/or the headset. For example, the user may provide images and/or video of their head and/or ears, anthropometric features of the head and/or ears, etc. to the server system. The server determines parameter values for one or more individualized filters (e.g., add notches) based at least in part on the acoustic features data. For example, the server may utilize machine learning to identify parameter values for the one or more notch filters based on the received acoustic features data. The server generates one or more individualized HRTFs for the user based on the template HRTF and the individualized filters (e.g., determined parameter values for the one or more individualized notches). In some embodiments, the server provides the one or more individualized HRTFs to an audio system (e.g., may be part of a headset) associated with the user. The audio system may apply the one or more individualized HRTFs to audio data to render the audio data as audio content. The audio system may then present (e.g., via a speaker assembly of the audio system), the audio content. The presented audio content is spatialized audio content (i.e., appears to be originating from one or more target sound source directions).

[0020] In some embodiments, some or all of the functionality of the server is performed by the audio system. For example, the server may provide the individualized filters (e.g., parameter values for the one or more individualized notches) to the audio system on the headset, and the audio system may generate the one or more individualized HRTFs using the individualized filters and a template HRTF.

[0021] FIG. 1 is a perspective view of a user’s 110 hearing perception in perceiving audio content, in accordance with one or more embodiments. An audio system (not shown) presents audio content to the user 110 of the audio system. In this illustrative example, the user 110 is placed at an origin of a spherical coordinate system, more specifically a midpoint between the user’s 110 ears. When the audio system in a headset provides audio content to the user 110, to facilitate an immersive experience for the user, the audio system can spatially localize audio content such that a user perceives as the audio content as originating from a source direction 120 with respect to the headset. The source direction 120 may be described by an elevation angle .phi. 130 and an azimuthal angle .theta. 140. The elevation angles are angles measured from the horizon plane 150 towards a pole of the spherical coordinate system. The azimuthal angles are measured in the horizon plane 150 from a reference axis. In other embodiments, a perceived sound origination direction may include one or more vectors, e.g., an angle of vectors describing a width of perceived sound origination direction or a solid angle of vectors describing an area of perceived sound origination direction. Audio content may be further spatially localized as originating at a particular distance in the target sound source direction using the physical principle that acoustic pressure decreases with the ratio 1/r with distance r.

[0022] Two of the parameters that affect sound localization are the interaural time differences (ITD) and interaural level differences (ILD) of a user. The ITD describes the difference in arrival time of a sound between the two ears, and this parameter provides a cue to the angle or direction of the sound source from the head. For example, sound from the source located at the right side of the person will reach the right ear before it reaches the left ear of the person. The ILD describes the difference in the level or intensity of the sound between the two ears. For example, sound from the source located at the right side of the person will be louder as heard by the right ear of the person compared to sound as heard by the left ear due to the head occluding part of the sound waves as it travels to the left ear. ITDs and ILDs may affect lateralization of sound.

[0023] In some embodiments the individualized HRTFs for a user are parameterized based on the sound source elevation and azimuthal angles. Thus, for a target user audio perception of a particular source direction 120 with defined values for elevation angle .phi. 130 and an azimuthal angle .theta. 140, the audio content provided to the user may be modified by a set of HRTFs individualized for the user and also for the target source direction 120. Some embodiments may further spatially localize the presented audio content for a target distance in the target sound source direction as a function of distance between the user 110 and a target location that the sound is meant to be perceived as originating from.

Template HRTFs

[0024] A template HRTF is an HRTF that can be customized such that it can be individualized to different users. The template HRTF may be considered a smoothly varying continuous energy function with no individual sound source directional frequency characteristics, but describing the average sound source directional frequency characteristics for a group of listeners (e.g., in some cases all listeners).

[0025] In some embodiments, a template HRTF is generated from a generic HRTF over a population of users. In some embodiments, a generic HRTF corresponds to an average HRTF that is obtained over a population of users. In some embodiments, a generic HRTF corresponds to one of the HRTFs from a database of HRTFs obtained from a population of users. The criteria for selection of this one HRTF from the database of HRTFs, in some embodiments, corresponds to a predefined machine learning or statistical model or a statistical metric. The generic HRTF exhibits average frequency characteristics for varying sound source directions over the population of users.

[0026] In some embodiments, the template HRTF can be considered to retain mean angle-dependent ITDs and ILDs for a general population of users. However, the template HRTF does not exhibit any individualized frequency characteristics (e.g., notches in specific locations). A notch may be viewed as the result of the resonances in the acoustic energy as it arrives at the head of a listener and bounces around the head and pinna undergoing cancellations before reaching the entrance of the ear canal. Notches (e.g., the number of notches, the location of notches, width of notches, etc.) in an HRTF act to customize/individualize that HRTF for a particular user. Thus, the template HRTF is a generic non-individualized parameterized frequency transfer function that has been modified to remove individualized notches in the frequency spectrum, particularly those between 5 kHz and 10 kHz. And in some embodiments, these notches may be located below 5 kHz and above 10 kHz.

[0027] A fully individualized “true” HRTF for a user is a high dimensional data set depending on tens of thousands of parameters to provide a listener with a realistic sound source elevation perception. Features such as the geometry of the user’s head, shape of the pinnae of the ear, geometry of the ear canal, density of the head, environmental characteristics, all transform the audio content as it travels from the source location, and influence how audio is perceived by the individual user (e.g., attenuating or amplifying frequencies of the generated audio content). In short, individualized true HRTFs for a user includes individualized notches in the frequency spectrum.

[0028] FIG. 2 illustrates an example depiction of three HRTFs as parameterized by sound source elevation for a user, in accordance with one or more embodiments. The three HRTFs include a true HRTF 210 for a user, a template HRTF 220, and an individualized HRTF 230. These three HRTFs depict the color-scale coded energy value in decibels, energy (dB) over a range of -20 dB-20 dB, as parameterized over a set of frequency values in kilohertz, frequency (kHz) over a range of 0.0 kHz-16.0 kHz, for elevation angles in degrees, elevation (deg) over a range of -90-90 deg., and are further discussed below. Note while not shown, there would also be plots for each of these HRTFs as a function of azimuth.

[0029] The true HRTF 210 describes the true frequency attenuation characteristics that impact how an ear receives a sound from a point in space, across illustrated elevation range. Note that at a frequency range of approximately 5.0 kHz-16.0 kHz, the true HRTF 330 exhibits frequency attenuation characteristics over the range of elevations. This is depicted visually as notches 240. This means that, for with respect to audio content within a frequency band range of 5.0 kHz-16 kHz, in order for audio content to provide the user with a true immersive experience with respect to sound source elevation, the generated audio content may ideally be convolved with an HRTF that is as close as possible to the true HRTF 210 for the illustrated elevation ranges.

[0030] The template HRTF 220 represents an example of frequency attenuation characteristics displayed by a generic centroid HRTF that retains mean angle-dependent ITDs and ILDs for a general population of users. Note that the template HRTF 220 exhibits similar characteristics to the true HRTF 210 at a frequency range of approximately 0.0 kHz-5.0 kHz. However, at a frequency range of approximately 5.0 kHz-16.0 kHz, unlike the true HRTF 330, the template HRTF 220 exhibits diminished frequency attenuation characteristics across the illustrated range of elevations.

[0031] The individualized HRTF 230 is a version of the template HRTF 220 that has been individualized for the user. As discussed below with regard to FIGS. 3-7, the individualization applies one or more filters to the template HRTF. The one or more filters may act to introduce one or more notches into the template HRTF. In the illustrated example, two notches 350 are added to the HRTF template 230 to form the individualized HRTF 230. Note that the individualized HRTF 230 exhibits similar characteristics to the true HRTF 210 at frequency ranges from 0.0 kHz-16.0 kHz, due in part to the notches 250 approximating the notches 240 in the true HRTF 210.

System Overview

[0032] FIG. 3 is a schematic diagram of a high-level system environment 300 for determining an individualized HRTF for a user 310, in accordance with one or more embodiments. A headset 320 communicates with a server 330 through a network 340. The headset 320 may be worn by the user 310.

[0033] The server 330 receives acoustic feature data. For example, the user 310 may provide the acoustic features data to the server 330 via the network 340. Acoustic features data describes features of a head of the user 310 and/or the headset 320. Acoustic features data may include, for example, one or more images of a head and/or ears of the user 310, one or more videos of the head and/or ears of the user 310, anthropometric features of the head and/or ears of the user 310, one or more images of the head wearing the headset 320, one or more images of the headset 320 in isolation, one or more videos of the head wearing the headset 320, one or more videos of the headset 320 in isolation, or some combination thereof. Anthropometric features of the user 310 are measurements of the head and/or ears of the user 310. In some embodiments, the anthropometric features may be measured using measuring instruments like a measuring tape and/or ruler. In some embodiments, images and/or videos of the head and/or ears of the user 310 are captured using an imaging device (not shown). The imaging device may be a camera on the headset 320, a depth camera assembly (DCA) that is part of the headset 320, an external camera (e.g., part of a mobile device), an external DCA, some other device configured to capture images and/or depth information, or some combination thereof. In some embodiments, the imaging device is also used to capture images of the headset 320. The data may be provided through the network 340 to the server 330.

[0034] To capture the user’s head more accurately, the user 310 (or some other party) positions an imaging device in in different positions relative to their head, such that the captured images cover different portions of the head of the user 310. The user 310 may hold the imaging device at different angles and/or distances relative to the user 310. For example, the user 310 may hold the imaging device at arm’s length directly in front of the user’s 310 face and use the imaging device to capture images of the user’s 310 face. The user 310 may also hold the imaging device at a distance shorter than arm’s length with the imaging device pointed towards the side of the head of the user 310 to capture an image of the ear and/or shoulder of the user 310. In some embodiments, the imaging device may run a feature recognition software and capture an image automatically when features of interest (e.g., ear, shoulder) are recognized or receive an input from the user to capture the image. In some embodiments, the imaging device may have an application that has a graphical user interface (GUI) that guides the user 310 to capture the plurality of images of the head of the user 310 from specific angles and/or distances relative to the user 310. For example, the GUI may request a front-facing image of a face of the user 310, an image of a right ear of the user 310, and an image of left ear of the user 310. In some embodiments, anthropometric features are determined by the imaging device using the images and/or videos captured by the imaging device.

[0035] In the illustrated example, the data is provided from the headset 320 via the network 340 to the server 330. However, in alternate embodiments, some other device (e.g., a mobile device (e.g., smartphone, tablet, etc.), a desktop computer, an external camera, etc.) may be used to upload the data to the server 330. In some embodiments, the data may be directly provided to the server 330.

[0036] The network 340 may be any suitable communications network for data transmission. The network 340 is typically the Internet, but may be any network, including but not limited to a Local Area Network (LAN), a Metropolitan Area Network (MAN), a Wide Area Network (WAN), a mobile wired or wireless network, a private network, or a virtual private network. In some example embodiments, network 340 is the Internet and uses standard communications technologies and/or protocols. Thus, network 340 can include links using technologies such as Ethernet, 802.11, worldwide interoperability for microwave access (WiMAX), 3G, 4G, digital subscriber line (DSL), asynchronous transfer mode (ATM), InfiniBand, PCI express Advanced Switching, etc. In some example embodiments, the entities use custom and/or dedicated data communications technologies instead of, or in addition to, the ones described above.

[0037] The server 330 uses the acoustic features data of the user along with a template HRTF to generate individualized HRTFs for the user 310. In some embodiments, there is a single template HRTF for all users. However, in alternate embodiments, there are a plurality of different template HRTFs, and each template HRTF is directed to different groups that have one or more common characteristics (e.g., head size, ear shape, men, women, etc.). In some embodiments, each template HRTF is associated with specific characteristics. The characteristics may be, e.g., head size, head shape, ear size, gender, age, some other characteristic that affects how a person perceives sound, or some combination thereof. For example, there may be different HRTFs based on variation in head size and/or age (e.g., there may be a template HRTF for children and a different HRTF for adults) as ITD may scale with head diameter. In some embodiments, the server 330 uses the acoustic features data to determine one or more characteristics (e.g., ear size, shape, head size, etc.) that describe the head of the user 310. The server 330 may then select a template HRTF based on the one or more characteristics.

[0038] The server 330 uses a trained machine learning system on the acoustic features data to obtain filters that are customized to the user. The filters can be applied to a template HRTF to create an individualized HRTF. A filter may be, e.g., a band pass (e.g., describes a peak), a band stop (e.g., describes a notch), a high pass (e.g., describes a high frequency shelf), a low pass (e.g., e.g., describes a low frequency shelf), or some combination thereof. A filter may be described by one or more parameter values. Parameter values may include, e.g., a frequency location, a width of a frequency band centered around the frequency location (e.g., determined by a Quality factor and/or Filter Order), and depth at the frequency location (e.g., gain). Depth at the frequency location refers to a value of attenuation in the frequency band at the frequency location. A single filter or combinations of filters may be used to describe one or more notches. In some embodiments, the server 330 uses a trained machine learning (ML) model to determine filter parameter values for one or more individualized filters using the acoustic features data of the user 310. The ML model may determine the filters based in part on ITDs and/or ILDs that are estimated from the acoustic features data. As noted above ITDs may affect, e.g., elevation, and ILDs can have some affect regarding lateralization. The one or more individualized filters are each applied to the template HRTF based on the corresponding filter parameter values to modify the template HRTF (e.g., adding one or more notches), thereby generating individualized HRTFs (e.g., at least one for each ear) for the user 310. The individualized HRTFs may be parameterized by elevation and azimuth angles. In some embodiments, when multiple users may operate the headset 320, the ML model may determine parameter values for individualized notches to be applied to the template HRTF for each particular individual user to generate individualized HRTFs for each of the multiple users.

[0039] In some embodiments, the server 330 provides the individualized HRTFs to the headset 320 via the network 340. The audio system (not shown) in the headset 320 stores the individualized HRTFs. The headset 320 may then use the individualized HRTFs to render audio content to the user 310 such that it would appear to originate from a specific location towards the user (e.g., in front of, behind, from a virtual object in the room, etc.). For example, the headset 320 may convolve audio data with one or more individualized HRTFs to generate spatialized audio content, that when presented, appears to originate from the specific location (i.e., spatialized audio content).

[0040] In some embodiments, the server 330 provides the generated individualized sets of filter parameter values to the headset 310. In this embodiment, the audio system (not shown) in the headset 320 applies the individualized sets of filter parameter values to a template HRTF to generate one or more individualized HRTFs. The template HRTF may be stored locally on the headset 320 and/or retrieved from some other location (e.g., the server 330).

[0041] FIG. 4 is a block diagram of a server 400, in accordance with one or more embodiments. The server 330 is an embodiment of the server 400. The server 400 includes various components, including, e.g., a data store 410, a communication module 420, a template HRTF generating module 430, and an HRTF individualization module 440. Some embodiments of the server 400 have different components than those described here. Similarly, the functions can be distributed among the components in a different manner than is described here. And in some embodiments, one or more functions of the server 400 may be performed by other components (e.g., an audio system of a headset).

[0042] The data store 410 stores data for use by the server 400. Data in the data store 410 may include, e.g., one or more template HRTFs, one or more individualized HRTFs, individualized filters (e.g., individualized sets of filter parameter values), user profiles, acoustic features data, other data relevant for use by the server system 400, audio data, or some combination thereof. In some embodiments, the data store 410 stores one or more template HRTFs from the template HRTF generating module 430, stores individualized HRTFs from the HRTF individualization module 440, stores individualized sets of filter parameter values from the HRTF individualization module 440, or some combination thereof. In some embodiments, the data store 410 may periodically receive and store updated time-stamped template HRTFs from the template HRTF generating module 440. In some embodiments, periodically updated individualized HRTFs for the user may be received from the HRTF individualization module 440, time-stamped, and stored in the data store 410. In some embodiments, the data store 410 may receive and store time-stamped individualized sets of filter parameter values from the HRTF individualization module 440.

[0043] The communication module 420 communicates with one or more headsets (e.g., the headset 320). In some embodiments, the communications module 420 may also communicate with one or more other devices (e.g., an imaging device, a smartphone, etc.). The communication module 420 may communicate via, e.g., the network 340 and/or some direct coupling (e.g., Universal Serial Bus (USB), WIFI, etc.). The communication module 420 may receive a request from a headset for individualized HRTFs for a particular user, acoustic features data (from the headset and/or some other device), or some combination thereof. The communication module 420 may also provide one or more individualized HRTFs, one or more individualized sets of filter parameter values, one or more template HRTFs, or some combination thereof, to a headset.

[0044] The template HRTF generating module 430 generates a template HRTF. The generated template HRTF may be stored in the data store 410, and may also be sent to a headset for storage at the headset. In some embodiments, the HRTF generating module 430 generates a template HRTF from a generic HRTF. The generic HRTF is associated with some population of users and may include one or more notches. A notch in the generic HRTF corresponds to a change in this amplitude over a frequency window or band. A notch is described by the following parameters: a frequency location, a width of a frequency band centered around the frequency location, and a value of attenuation in the frequency band at the frequency location. In some embodiments, a notch in an HRTF is identified as the location of frequency where the change in amplitude of above a predefined threshold. Accordingly, notches in a generic HRTF can be thought to represent average attenuation characteristics as a function of frequency and direction for the population of users.

[0045] The template HRTF generating module 430 removes notches in the generic HRTF over some or all of an entire audible frequency band (range of sounds that humans can perceive) to form a template HRTF. The template HRTF generating module 430 may also smooth the template HRTF such that some or all of it is a smooth and continuous function. In some embodiments, the template HRTF is generated to be a smooth and continuous function lacking notches over some frequency ranges, but not necessarily lacking notches outside of those frequency ranges. In some embodiments, the template HRTF is such that there are no notches that are within a frequency range of 5 kHz-10 kHz. This may be significant because notches in this frequency range tend to vary between different users. This means that, at a frequency range of approximately 5 kHz-10 kHz, notch number, notch size, notch location, may have strong effects regarding how acoustic energy is received at the entry of the ear canal (and thus can affect user perception). Thus, having a template HRTF as smooth and continuous function with no notches at this frequency range of approximately 5 kHz-10 kHz makes it a suitable template that can then be individualized for different users. In some embodiments, the template HRTF generating module 430 generate an HRTF template to be a smooth and continuous function lacking notches at all frequency ranges. In some embodiments, template HRTF generating module 430 generates an HRTF that is smooth and continuous function over one or more bands of frequencies, but may include notches outside of these one or more bands of frequencies. For example, the template HRTF generating module 430 may generate a template HRTF template that lacks notches over a frequency range (e.g., approximately 5 kHz-10 kHz), but may include one or more notches outside of this range.

[0046] Note that the generic HRTF used to generate the template HRTF is based on a population of users. In some embodiments, the population may be selected such that it is representative of most users, and a single template HRTF is generated from the population and is used to generate all some or all individualized HRTFs.

[0047] In other embodiments, multiple populations are used to generate different generic HRTFs, and the populations are such that each are associated with one or more common characteristics. The characteristics may be, e.g., head size, head shape, ear size, ear shape, age, gender, some other feature that affects how a person perceives sound, or some combination thereof. For example, one population may be for adults, one population for children, one population for men, one population for women, etc. The template HRTF generating module 430 may generate a template HRTF for one or more of the plurality of generic HRTFs. Accordingly, there may be a plurality of different template HRTFs, and each template HRTF is directed to different groups that share some common set of characteristics.

[0048] In some embodiments, the template HRTF generating module 430 may periodically generate a new template HRTF and/or modify a previously generated template HRTF as more population HRTF data is obtained. The template HRTF generating module 430 may store each newly generated template HRTF and/or each update to a template HRTF in the data store 410. In some embodiments, the server 400 may send a newly generated template HRTF and/or an update to a template HRTF to the headset.

[0049] The HRTF individualization module 430 determines filters that are individualized to the user based at least in part on acoustic features data associated with a user. The filters may include, e.g., one or more filter parameter values that are individualized to the user. The HRTF individualization module 430 employs a trained machine learning (ML) model on the acoustic features data of a user to determine individualized filter parameter values (e.g., filter parameter values) for one or more individualized filters (e.g., notches) that are customized to the user. In some embodiments, the individualized filter parameter values are parameterized by sound source elevation and azimuth angles. The ML model is first trained using data collected from a population of users. The collected data may include, e.g., image data, anthropometric features, and acoustic data. The training may include supervised or unsupervised learning algorithms including, but not limited to, linear and/or logistic regression models, neural networks, classification and regression trees, k-means clustering, vector quantization, or any other machine learning algorithms. The acoustic data may include HRTFs measured using audio measurement apparatus and/or simulated via numerical analysis from three dimensional scans of a head.

[0050] In some embodiments, the filters and/or filter parameter values are derived via machine learning directly from image data of a user correspond to single or multiple snapshots of left and right ears taken by a camera (in phone or otherwise). In some embodiments, the filters and/or filter parameter values are derived via machine learning from single or multiple videos of left and right ear captured by a camera (in phone or otherwise). In some embodiments, the filters and/or filter parameter values are derived from anthropometric features of a user and correspond to physical characteristics of the left and right ear. These anthropometric features include the height of the left and right ear, the width of left and right ear, left and right ear cavum concha height, left and right ear cavum concha width, left and right ear cymba height, left and right ear fossa height, left and right ear pinna height and width, left and right ear intertragal incisure width and other related physical measurements. In some embodiments the filters and/or filter parameter values are derived from weighted combinations of photos, video, and anthropometric measurements.

[0051] In some embodiments, the ML model uses a convolutional neural network model with layers of nodes, in which values at nodes of a current layer are a transformation of values at nodes of a previous layer. A transformation in the model is determined through a set of weights and parameters connecting the current layer and the previous layer. In some examples, the transformation may also be determined through a set of weights and parameters used to transform between previous layers in the model.

[0052] The input to the neural network model may be some or all of the acoustic features data of a user along with a template HRTF encoded onto the first convolutional layer, and the output of the neural network model is filter parameter values for one or more individualized notches to be applied to the template HRTF as parameterized by elevation and azimuth angles for the user; this is decoded from the output layer of the neural network. The weights and parameters for the transformations across the multiple layers of the neural network model may indicate relationships between information contained in the starting layer and the information obtained from the final output layer. For example, the weights and parameters can be a quantization of user characteristics, etc. included in information in the user image data. The weights and parameters may also be based on historical user data.

[0053] The ML model can include any number of machine learning algorithms. Some other ML models that can be employed are linear and/or logistic regression, classification and regression trees, k-means clustering, vector quantization, etc. In some embodiments, the ML model includes deterministic methods that have been trained with reinforcement learning (thereby creating a reinforcement learning model). The model is trained to increase the quality of the individualized sets of filter parameter values generated using measurements from a monitoring system within the audio system at the headset.

[0054] The HRTF individualization module 430 selects an HRTF template for use in generating one or more individualized HRTFS for the user. In some embodiments, the HRTF individualization module 430 simply retrieves the single HRTF template (e.g., from the data store 410). In other embodiments, the HRTF individualization module 430 determines one or more characteristics associated with the user from the acoustic features data, and uses the determined one or more characteristics to select a template HRTF from a plurality of template HRTFs.

……
……
……

您可能还喜欢...