Facebook Patent | Optimization Of Microphone Array Geometry For Direction Of Arrival Estimation

Patent: Optimization Of Microphone Array Geometry For Direction Of Arrival Estimation

Publication Number: 10638222

Publication Date: 20200428

Applicants: Facebook

Abstract

A system performs an optimization algorithm to optimize two or more acoustic sensors of a microphone array. The system obtains an array transfer function (ATF) for a plurality of combinations of the acoustic sensors of the microphone array. In a first embodiment, the algorithm optimizes an active set of acoustic sensors on the eyewear device. The plurality of combinations may be all possible combinations of subsets of the acoustic sensors that may be active. In a second embodiment, the algorithm optimizes a placement of two or more acoustic sensors on an eyewear device during manufacturing of the eyewear device. Each combination of acoustic sensors may represent a different arrangement of the acoustic sensors in the microphone array. In each embodiment, the system evaluates the obtained ATFs and, based on the evaluation, selects a combination of acoustic sensors for the microphone array.

BACKGROUND

The present disclosure generally relates to microphone arrays and specifically to optimization of microphone array geometries for direction of arrival estimation.

A sound perceived at two ears can be different, depending on a direction and a location of a sound source with respect to each ear as well as on the surroundings of a room in which the sound is perceived. Humans can determine a location of the sound source by comparing the sound perceived at each ear. In a “surround sound” system, a plurality of speakers reproduce the directional aspects of sound using acoustic transfer functions. An acoustic transfer function represents the relationship between a sound at its source location and how the sound is detected, for example, by a microphone array or by a person. A single microphone array (or a person wearing a microphone array) may have several associated acoustic transfer functions for several different source locations in a local area surrounding the microphone array (or surrounding the person wearing the microphone array). In addition, acoustic transfer functions for the microphone array may differ based on the position and/or orientation of the microphone array in the local area. Furthermore, the acoustic sensors of a microphone array can be arranged in a large number of possible combinations, and, as such, the associated acoustic transfer functions are unique to the microphone array. Determining an optimal set of acoustic sensors for each microphone array can require direct evaluation, which can be a lengthy and expensive process in terms of time and resources needed.

SUMMARY

Embodiments relate to a method for selecting a combination of acoustic sensors of a microphone array. The method may be performed during and/or prior to manufacturing of the microphone array to determine an optimal set of acoustic sensors in the microphone array. In some embodiments, at least some of the acoustic sensors of the microphone array are coupled to a near-eye display (NED). In one embodiment, a system obtains an array transfer function (ATF) for a plurality of combinations of acoustic sensors of the microphone array. An ATF characterizes how the microphone array receives a sound from a point in space. Each combination of acoustic sensors may be a subset of the acoustic sensors of the microphone array or may represent a different arrangement of the acoustic sensors in the microphone array. The system computes a Euclidean norm of each obtained ATF. The system computes an average of the Euclidean norms over a target source range and a target frequency range and then ranks each computed average. The system selects a combination of acoustic sensors for the microphone array based in part on the ranking. In some embodiments, the system activates the selected combination of acoustic sensors. In some embodiments, a computer-readable medium may be configured to perform the steps of the method.

In some embodiments, an audio system for selecting a combination of acoustic sensors of a microphone array is described. A microphone array monitors sounds in a local area surrounding the microphone array. The microphone array includes a plurality of acoustic sensors. At least some of the plurality of acoustic sensors are coupled to a near-eye display (NED). The audio system also includes a controller that is configured to obtain an array transfer function (ATF) for a plurality of combinations of acoustic sensors of the microphone array. The controller computes a Euclidean norm of each obtained ATF. The controller computes an average of the Euclidean norms over a target source range and a target frequency range and then ranks each computed average. The controller selects a combination of acoustic sensors for the microphone array based in part on the ranking. In some embodiments, the controller activates the selected combination of acoustic sensors.

BRIEF DESCRIPTION OF THE DRAWINGS

Figure (FIG. 1 is an example illustrating an eyewear device including a microphone array, in accordance with one or more embodiments.

FIG. 2 is an example illustrating a portion of the eyewear device including an acoustic sensor that is a microphone on an ear of a user, in accordance with one or more embodiments.

FIG. 3 is an example illustrating an eyewear device including a neckband, in accordance with one or more embodiments.

FIG. 4 is a block diagram of an audio system, in accordance with one or more embodiments.

FIG. 5 is a flowchart illustrating a process of generating and updating a head-related transfer function of an eyewear device including an audio system, in accordance with one or more embodiments.

FIG. 6 is a flowchart illustrating a process of optimizing acoustic sensors on an eyewear device, in accordance with one or more embodiments.

FIG. 7 is a system environment of an eyewear device including an audio system, in accordance with one or more embodiments.

The figures depict embodiments of the present disclosure for purposes of illustration only. One skilled in the art will readily recognize from the following description that alternative embodiments of the structures and methods illustrated herein may be employed without departing from the principles, or benefits touted, of the disclosure described herein.

DETAILED DESCRIPTION

Microphone arrays are sometimes employed in spatial sound applications that require sound source localization and directional filtering. One of the major concerns of using microphone arrays is the choice of array geometry or, more generally, the problem of mutual microphone positioning to optimize certain acoustic characteristics of the array. Some considerations for choosing parameters of a microphone array may include choosing the distance between adjacent sensors, the number of sensors, and the overall aperture of the array. In addition, some methods outline general differences between the spatial abilities of linear, planar, and volumetric array geometries. However, some methods may be limited when it comes to designing microphone arrays for specific applications. Direct evaluation of performance of a large number of different possible microphone array geometries may be performed, but it is extremely expensive in terms of time and resource requirements.

A method for selecting a combination of acoustic sensors of a microphone array may be performed. The method may be performed during and/or prior to manufacturing of the microphone array or during use of the microphone array to determine an optimal set of acoustic sensors in the microphone array. In some embodiments, prior to manufacturing of the microphone array, the optimal set of acoustic sensors may designate a set of parameters for placement of the acoustic sensors configured to be coupled to a near-eye display (NED). The set of parameters may include a number of acoustic sensors, a location of each acoustic sensor on the NED, an arrangement of the acoustic sensors, or some combination thereof. In some embodiments, the NED may be coupled with a neckband, on which some of the acoustic sensors of the microphone array may be located. After the microphone array is manufactured (e.g., coupled to the NED and/or neckband), the optimal set of acoustic sensors may designate a subset of the acoustic sensors that are active or inactive. In one embodiment, a system obtains an array transfer function (ATF) for a plurality of combinations of acoustic sensors of the microphone array. An ATF characterizes how the microphone array receives a sound from a point in space. Each combination of acoustic sensors may be a subset of the acoustic sensors of the microphone array or may represent a different arrangement of the acoustic sensors in the microphone array. The system computes a Euclidean norm of each obtained ATF. The system computes an average of the Euclidean norms over a target source range and a target frequency range and then ranks each computed average. The system selects a combination of acoustic sensors for the microphone array based in part on the ranking. In some embodiments, the system activates the selected combination of acoustic sensors. In some embodiments, a computer-readable medium may be configured to perform the steps of the method.

Embodiments of the present disclosure may include or be implemented in conjunction with an artificial reality system. Artificial reality is a form of reality that has been adjusted in some manner before presentation to a user, which may include, e.g., a virtual reality (VR), an augmented reality (AR), a mixed reality (MR), a hybrid reality, or some combination and/or derivatives thereof. Artificial reality content may include completely generated content or generated content combined with captured (e.g., real-world) content. The artificial reality content may include video, audio, haptic feedback, or some combination thereof, and any of which may be presented in a single channel or in multiple channels (such as stereo video that produces a three-dimensional effect to the viewer). Additionally, in some embodiments, artificial reality may also be associated with applications, products, accessories, services, or some combination thereof, that are used to, e.g., create content in an artificial reality and/or are otherwise used in (e.g., perform activities in) an artificial reality. The artificial reality system that provides the artificial reality content may be implemented on various platforms, including a head-mounted display (HMD) connected to a host computer system, a standalone HMD, a mobile device or computing system, or any other hardware platform capable of providing artificial reality content to one or more viewers.

* Eyewear Device Configuration*

FIG. 1 is an example illustrating an eyewear device 100 including an audio system, in accordance with one or more embodiments. The eyewear device 100 presents media to a user. In one embodiment, the eyewear device 100 may be a near-eye display (NED). Examples of media presented by the eyewear device 100 include one or more images, video, audio, or some combination thereof. The eyewear device 100 may include, among other components, a frame 105, a lens 110, a sensor device 115, and an audio system. The audio system may include, among other components, a microphone array of one or more acoustic sensors 120 and a controller 125. While FIG. 1 illustrates the components of the eyewear device 100 in example locations on the eyewear device 100, the components may be located elsewhere on the eyewear device 100, on a peripheral device paired with the eyewear device 100, or some combination thereof.

The eyewear device 100 may correct or enhance the vision of a user, protect the eye of a user, or provide images to a user. The eyewear device 100 may be eyeglasses which correct for defects in a user’s eyesight. The eyewear device 100 may be sunglasses which protect a user’s eye from the sun. The eyewear device 100 may be safety glasses which protect a user’s eye from impact. The eyewear device 100 may be a night vision device or infrared goggles to enhance a user’s vision at night. The eyewear device 100 may be a near-eye display that produces VR, AR, or MR content for the user. Alternatively, the eyewear device 100 may not include a lens 110 and may be a frame 105 with an audio system that provides audio (e.g., music, radio, podcasts) to a user.

The frame 105 includes a front part that holds the lens 110 and end pieces to attach to the user. The front part of the frame 105 bridges the top of a nose of the user. The end pieces (e.g., temples) are portions of the frame 105 that hold the eyewear device 100 in place on a user (e.g., each end piece extends over a corresponding ear of the user). The length of the end piece may be adjustable to fit different users. The end piece may also include a portion that curls behind the ear of the user (e.g., temple tip, ear piece).

The lens 110 provides or transmits light to a user wearing the eyewear device 100. The lens 110 may be prescription lens (e.g., single vision, bifocal and trifocal, or progressive) to help correct for defects in a user’s eyesight. The prescription lens transmits ambient light to the user wearing the eyewear device 100. The transmitted ambient light may be altered by the prescription lens to correct for defects in the user’s eyesight. The lens 110 may be a polarized lens or a tinted lens to protect the user’s eyes from the sun. The lens 110 may be one or more waveguides as part of a waveguide display in which image light is coupled through an end or edge of the waveguide to the eye of the user. The lens 110 may include an electronic display for providing image light and may also include an optics block for magnifying image light from the electronic display. Additional detail regarding the lens 110 is discussed with regards to FIG. 7. The lens 110 is held by a front part of the frame 105 of the eyewear device 100.

In some embodiments, the eyewear device 100 may include a depth camera assembly (DCA) that captures data describing depth information for a local area surrounding the eyewear device 100. In one embodiment, the DCA may include a structured light projector, an imaging device, and a controller. The captured data may be images captured by the imaging device of structured light projected onto the local area by the structured light projector. In one embodiment, the DCA may include two or more cameras that are oriented to capture portions of the local area in stereo and a controller. The captured data may be images captured by the two or more cameras of the local area in stereo. The controller computes the depth information of the local area using the captured data. Based on the depth information, the controller determines absolute positional information of the eyewear device 100 within the local area. The DCA may be integrated with the eyewear device 100 or may be positioned within the local area external to the eyewear device 100. In the latter embodiment, the controller of the DCA may transmit the depth information to the controller 125 of the eyewear device 100.

The sensor device 115 generates one or more measurement signals in response to motion of the eyewear device 100. The sensor device 115 may be located on a portion of the frame 105 of the eyewear device 100. The sensor device 115 may include a position sensor, an inertial measurement unit (IMU), or both. Some embodiments of the eyewear device 100 may or may not include the sensor device 115 or may include more than one sensor device 115. In embodiments in which the sensor device 115 includes an IMU, the IMU generates fast calibration data based on measurement signals from the sensor device 115. Examples of sensor devices 115 include: one or more accelerometers, one or more gyroscopes, one or more magnetometers, another suitable type of sensor that detects motion, a type of sensor used for error correction of the IMU, or some combination thereof. The sensor device 115 may be located external to the IMU, internal to the IMU, or some combination thereof.

Based on the one or more measurement signals, the sensor device 115 estimates a current position of the eyewear device 100 relative to an initial position of the eyewear device 100. The estimated position may include a location of the eyewear device 100 and/or an orientation of the eyewear device 100 or the user’s head wearing the eyewear device 100, or some combination thereof. The orientation may correspond to a position of each ear relative to the reference point. In some embodiments, the sensor device 115 uses the depth information and/or the absolute positional information from a DCA to estimate the current position of the eyewear device 100. The sensor device 115 may include multiple accelerometers to measure translational motion (forward/back, up/down, left/right) and multiple gyroscopes to measure rotational motion (e.g., pitch, yaw, roll). In some embodiments, an IMU rapidly samples the measurement signals and calculates the estimated position of the eyewear device 100 from the sampled data. For example, the IMU integrates the measurement signals received from the accelerometers over time to estimate a velocity vector and integrates the velocity vector over time to determine an estimated position of a reference point on the eyewear device 100. Alternatively, the IMU provides the sampled measurement signals to the controller 125, which determines the fast calibration data. The reference point is a point that may be used to describe the position of the eyewear device 100. While the reference point may generally be defined as a point in space, however, in practice the reference point is defined as a point within the eyewear device 100.

The audio system detects sound to generate one or more acoustic transfer functions for a user. An acoustic transfer function characterizes how a sound is received from a point in space. The acoustic transfer functions may be array transfer functions (ATFs), head-related transfer functions (HRTFs), other types of acoustic transfer functions, or some combination thereof. The one or more acoustic transfer functions may be associated with the eyewear device 100, the user wearing the eyewear device 100, or both. The audio system may then use the one or more acoustic transfer functions to generate audio content for the user. The audio system of the eyewear device 100 includes a microphone array and the controller 125.

The microphone array detects sounds within a local area surrounding the microphone array. The microphone array includes a plurality of acoustic sensors. The acoustic sensors are sensors that detect air pressure variations induced by a sound wave. Each acoustic sensor is configured to detect sound and convert the detected sound into an electronic format (analog or digital). The acoustic sensors may be acoustic wave sensors, microphones, sound transducers, or similar sensors that are suitable for detecting sounds. For example, in FIG. 1, the microphone array includes eight acoustic sensors: acoustic sensors 120a, 120b, which may be designed to be placed inside a corresponding ear of the user, and acoustic sensors 120c, 120d, 120e, 120f, 120g, 120h, which are positioned at various locations on the frame 105. The acoustic sensors 120a-120h may be collectively referred to herein as “acoustic sensors 120.” Additional detail regarding the audio system is discussed with regards to FIG. 4.

The microphone array detects sounds within the local area surrounding the microphone array. The local area is the environment that surrounds the eyewear device 100. For example, the local area may be a room that a user wearing the eyewear device 100 is inside, or the user wearing the eyewear device 100 may be outside and the local area is an outside area in which the microphone array is able to detect sounds. Detected sounds may be uncontrolled sounds or controlled sounds. Uncontrolled sounds are sounds that are not controlled by the audio system and happen in the local area. Examples of uncontrolled sounds may be naturally occurring ambient noise. In this configuration, the audio system may be able to calibrate the eyewear device 100 using the uncontrolled sounds that are detected by the audio system. Controlled sounds are sounds that are controlled by the audio system. Examples of controlled sounds may be one or more signals output by an external system, such as a speaker, a speaker assembly, a calibration system, or some combination thereof. While the eyewear device 100 may be calibrated using uncontrolled sounds, in some embodiments, the external system may be used to calibrate the eyewear device 100 during a calibration process. Each detected sound (uncontrolled and controlled) may be associated with a frequency, an amplitude, a duration, or some combination thereof.

You may also like...