Facebook Patent | Dynamic Adjustment Of Signal Enhancement Filters For A Microphone Array
Patent: Dynamic Adjustment Of Signal Enhancement Filters For A Microphone Array
Publication Number: 10638252
Publication Date: 20200428
Applicants: Facebook
Abstract
An audio assembly includes a microphone assembly, a controller, and a speaker assembly. The microphone assembly detects audio signals. The detected audio signals originate from audio sources located within a local area. Each audio source is associated with a respective beamforming filter. The controller determines beamformed data using the beamforming filters associated with each audio source and a relative contribution of each of the audio sources using the beamformed data. The controller generates updated beamforming filters for each of the audio sources based in part on the relative acoustic contribution of the audio source, a current location of the audio source, and a transfer function associated with audio signals produced by the audio source. The controller generates updated beamformed data using the updated beamforming filters and performs an action (e.g., via the speaker assembly) based in part on the updated beamformed data.
BACKGROUND
This disclosure relates generally to signal enhancement filters and specifically to adapting updating signal enhancement filters.
Conventional signal enhancement algorithms operate under certain assumptions, for example knowledge of a layout of an environment surrounding an audio assembly and one or more audio sources, the layout of the environment doesn’t change over a period of time, and statistics describing certain acoustic attributes are already available to be determined. However, in most practical applications the layout of the environment is dynamic with regards to the position of audio sources and devices that receive signals from those audio sources. Additionally, given the dynamically changing nature of audio sources in most environments, noisy signals received from the audio sources often need to be enhanced by signal enhancement algorithms.
SUMMARY
An audio assembly dynamically adjusts beam forming filters for a microphone array (e.g., of an artificial reality headset). The audio assembly may include a microphone array, a speaker assembly, and a controller. The microphone array detects audio signals originating from one or more audio sources within a local area. The controller generates updated enhanced signal data for each of the one or more audio sources using signal enhancement filters. The speaker assembly performs an action, for example presenting content to the user operating the audio assembly, based in part on the updated enhanced signal data.
In some embodiments, the audio assembly includes a microphone assembly, a controller, and a speaker assembly. The microphone assembly is configured to detect audio signals with a microphone array. The audio signals originate from one or more audio sources located within a local area, and each audio source is associated with a set of respective signal enhancement filters to enhance audio signals from a set of microphones. In some embodiments, an audio signal is processed using one of a variety of signal enhancement processes, for example a filter and sum process. The controller is configured to determine enhanced signal data using the signal enhancement filters associated with each audio source. The controller is configured to determine a relative acoustic contribution of each of the one or more audio sources using the enhanced signal data. The controller is configured to generate updated signal enhancement filters for each of the one or more audio sources. The generation for each audio source based in part on an estimate of the relative acoustic contribution of the audio source, an estimate of a current location of the audio source, and an estimate of a transfer function associated with audio signals produced by the audio source. In some embodiments, the relative acoustic contribution, the current location, and the transfer function may be characterized by exact values, but they may alternatively be estimated values. The controller is configured to generate updated signal enhancement data using the updated signal enhancement filters. The speaker assembly is configured to perform an action based in part on the updated enhanced signal data. In some embodiments, the audio assembly may be a part of a headset (e.g., an artificial reality headset).
In some embodiments, a method is described. The method comprises detecting audio signals with a microphone array, and the audio signals originate from one or more audio sources located within a local area. Enhanced signal data is determined using the signal enhancement filters associated with each audio source. A relative acoustic contribution of each of the one or more audio sources is determined using the enhanced signal data. An updated signal enhancement filter for each of the one or more audio sources is generated. And the generation for each audio source is based in part on the relative acoustic contribution of the audio source, a current location of the audio source, and a transfer function associated with audio signals produced by the audio source. Updated enhanced signal data is generated using the updated signal enhancement filters. An action is performed based in part on the updated enhanced signal data.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is an example diagram illustrating a headset including an audio assembly, according to one or more embodiments.
FIG. 2 illustrates an example audio assembly within a local area, according to one or more embodiments.
FIG. 3 is a block diagram of an audio assembly, according to one or more embodiments.
FIG. 4 is a flowchart illustrating the process of determining enhanced signal data using an audio assembly, according to one or more embodiments.
FIG. 5 is a system environment including a headset, according to one or more embodiments.
The figures depict various embodiments of the present disclosure for purposes of illustration only. One skilled in the art will readily recognize from the following discussion that alternative embodiments of the structures and methods illustrated herein may be employed without departing from the principles, or benefits touted, of the disclosure described herein.
DETAILED DESCRIPTION
* Configuration Overview*
An audio assembly updates signal enhancement filters in environments in which a microphone array embedded into the audio assembly and at least one audio source which may be moving relative to each other. The audio assembly is configured to include a microphone array, a controller, and a speaker assembly, all of which may be components of headsets (e.g., near-eye displays, head-mounted displays) worn by a user. The audio assembly detects an audio signal using one or more microphone arrays. An audio source, which may be a person in the environment different from the user operating the audio assembly, a speaker, an animal, or a mechanical device emits a sound near the user operating the assembly. In addition to those described above, an acoustic sound source may be any other sound source. The embedded microphone array detects the sound emitted. Additionally, the microphone array may record the detected sound and store the recording for subsequent processing and analysis of the sound.
Depending on its position, an audio assembly may be surrounded by multiple audio sources which collectively produce sounds that may be incoherent when listened to all at once. Among these audio sources, a user of the audio assembly may want to tune into a particular audio source. Typically, the audio source that the user 165 wants to tune into may need to be enhanced to distinguish its audio signal from the signals of other audio sources. Additionally, at a first timestamp, an audio signal emitted from an audio source may travel directly to a user, but at a second timestamp, the same audio source may change position and an audio signal emitted from the source and may travel a longer distance to the user. At the first timestamp, the audio assembly may not need to enhance the signal, but at the second timestamp, the audio assembly may need to enhance the signal. Hence, embodiments described herein adaptively generate signal enhancement filters to reflect the most recent position of each audio source in the surrounding environment. As referenced herein, the environment surrounding a user and a local area surrounding an audio assembly operated by the user are referenced synonymously.
Depending on its position, an audio assembly may receive audio signals from various directions of arrival at various levels of strength, for example audio signals may travel directly from an audio source to the audio assembly or reflect off of surfaces within the environment. Audio signals reflecting off of surfaces may resultantly experience decreases in their signal strengths. Accordingly, the audio assembly may need to perform signal enhancement techniques to improve the strength of such signals. Additionally, given that the position of audio sources may change over time, the strength of signals emitted by the audio sources at each time may also vary. Accordingly, the signal enhancement filter may be updated to accommodate the strength of the emitted signal at each position.
At a first timestamp, an initial use of the audio assembly, or both, the controller determines enhanced signal data using the signal enhancement filters associated with each audio source and determines a relative acoustic contribution of each of the one or more audio sources using the enhanced signal data. At a second timestamp during which each audio source may have adjusted their position, the controller generates updated signal enhancement filters for each audio source based on one or more of the relative acoustic contribution of each audio source, a current location of the audio source, and a transfer function associated with audio signals produced by the audio source. The controller generates updated enhanced signal data using the updated signal enhancement filters. Based in part on the enhanced signal data, a speaker assembly performs an action.
Embodiments of the invention may include or be implemented in conjunction with an artificial reality system. Artificial reality is a form of reality that has been adjusted in some manner before presentation to a user, which may include, e.g., a virtual reality (VR), an augmented reality (AR), a mixed reality (MR), a hybrid reality, or some combination and/or derivatives thereof. Artificial reality content may include completely generated content or generated content combined with captured (e.g., real-world) content. The artificial reality content may include video, audio, haptic feedback, or some combination thereof, and any of which may be presented in a single channel or in multiple channels (such as stereo video that produces a three-dimensional effect to the viewer). Additionally, in some embodiments, artificial reality may also be associated with applications, products, accessories, services, or some combination thereof, that are used to, e.g., create content in an artificial reality and/or are otherwise used in (e.g., perform activities in) an artificial reality. The artificial reality system that provides the artificial reality content may be implemented on various platforms, including a HMD connected to a host computer system, a standalone HMD, a mobile device or computing system, or any other hardware platform capable of providing artificial reality content to one or more viewers.
* Headset Configuration*
FIG. 1 is an example diagram illustrating a headset 100 including an audio assembly, according to one or more embodiments. The headset 100 presents media to a user. In one embodiment, the headset 100 may be a near-eye display (NED). In another embodiment, the headset 100 may be a head-mounted display (HMD). In general, the headset may be worn on the face of a user such that content (e.g., media content) is presented using one or both lens 110 of the headset. However, the headset 100 may also be used such that media content is presented to a user in a different manner. Examples of media content presented by the headset 100 include one or more images, video, audio, or some combination thereof. The headset 100 includes the audio assembly, and may include, among other components, a frame 105, a lens 110, and a sensor device 115. While FIG. 1 illustrates the components of the headset 100 in example locations on the headset 100, the components may be located elsewhere on the headset 100, on a peripheral device paired with the headset 100, or some combination thereof.
The headset 100 may correct or enhance the vision of a user, protect the eye of a user, or provide images to a user. The headset 100 may be eyeglasses which correct for defects in a user’s eyesight. The headset 100 may be sunglasses which protect a user’s eye from the sun. The headset 100 may be safety glasses which protect a user’s eye from impact. The headset 100 may be a night vision device or infrared goggles to enhance a user’s vision at night. The headset 100 may be a near-eye display that produces artificial reality content for the user. Alternatively, the headset 100 may not include a lens 110 and may be a frame 105 with an audio system that provides audio content (e.g., music, radio, podcasts) to a user.
The frame 105 includes a front part that holds the lens 110 and end pieces to attach to the user. The front part of the frame 105 bridges the top of a nose of the user. The end pieces (e.g., temples) are portions of the frame 105 that hold the headset 100 in place on a user (e.g., each end piece extends over a corresponding ear of the user). The length of the end piece may be adjustable to fit different users. The end piece may also include a portion that curls behind the ear of the user (e.g., temple tip, ear piece).
The lens 110 provides or transmits light to a user wearing the headset 100. The lens 110 may be prescription lens (e.g., single vision, bifocal and trifocal, or progressive) to help correct for defects in a user’s eyesight. The prescription lens transmits ambient light to the user wearing the headset 100. The transmitted ambient light may be altered by the prescription lens to correct for defects in the user’s eyesight. The lens 110 may be a polarized lens or a tinted lens to protect the user’s eyes from the sun. The lens 110 may be one or more waveguides as part of a waveguide display in which image light is coupled through an end or edge of the waveguide to the eye of the user. The lens 110 may include an electronic display for providing image light and may also include an optics block for magnifying image light from the electronic display. Additional detail regarding the lens 110 is discussed with regards to FIG. 5. The lens 110 is held by a front part of the frame 105 of the headset 100.