Facebook Patent | Determination Of Acoustic Parameters For A Headset Using A Mapping Server

Patent: Determination Of Acoustic Parameters For A Headset Using A Mapping Server

Publication Number: 20200314583

Publication Date: 20201001

Applicants: Facebook

Abstract

Determination of a set of acoustic parameters for a headset is presented herein. The set of acoustic parameters can be determined based on a virtual model of physical locations stored at a mapping server. The virtual model describes a plurality of spaces and acoustic properties of those spaces, wherein the location in the virtual model corresponds to a physical location of the headset. A location in the virtual model for the headset is determined based on information describing at least a portion of the local area received from the headset. The set of acoustic parameters associated with the physical location of the headset is determined based in part on the determined location in the virtual model and any acoustic parameters associated with the determined location. The headset presents audio content using the set of acoustic parameters received from the mapping server.

CROSS REFERENCE TO RELATED APPLICATIONS

[0001] This application is a continuation of co-pending U.S. application Ser. No. 16/366,484, filed Mar. 27, 2019, which is incorporated by reference in its entirety.

BACKGROUND

[0002] The present disclosure relates generally to presentation of audio at a headset, and specifically relates to determination of acoustic parameters for a headset using a mapping server.

[0003] A sound perceived at the ears of two users can be different, depending on a direction and a location of a sound source with respect to each user as well as on the surroundings of a room in which the sound is perceived. Humans can determine a location of the sound source by comparing the sound perceived at each set of ears. In an artificial reality environment, simulating sound propagation from an object to a listener may use knowledge about the acoustic parameters of the room, for example a reverberation time or the direction of incidence of the strongest early reflections. One technique for determining the acoustic parameters of a room includes placing a loudspeaker in a desired source location, playing a controlled test signal, and de-convolving the test signal from what is recorded at a listener location. However, such a technique generally requires a measurement laboratory or dedicated equipment in-situ.

[0004] To seamlessly place a virtual sound source in an environment, sound signals to each ear are determined based on sound propagation paths from the source, through an environment, to a listener (receiver). Various sound propagation paths can be represented based on a set of frequency dependent acoustic parameters used at a headset for presenting audio content to the receiver (user of the headset). A set of frequency dependent acoustic parameters is typically unique for a specific acoustic configuration of a local environment (room) that has a unique acoustic property. However, storing and updating various sets of acoustic parameters at the headset for all possible acoustic configurations of the local environment is impractical. Various sound propagation paths within a room between a source and a receiver represent a room impulse response, which depends on specific locations of the source and receiver. It is however memory intensive to store measured or simulated room impulse responses for a dense network of all possible source and receiver locations in a space, or even a relatively small subset of the most common arrangements. Therefore, determination of a room impulse response in real-time is computationally intensive as the required accuracy increases.

SUMMARY

[0005] Embodiments of the present disclosure support a method, computer readable medium, and apparatus for determining a set of acoustic parameters for presenting audio content at a headset. In some embodiments, the set of acoustic parameters are determined based on a virtual model of physical locations stored at a mapping server connected with the headset via a network. The virtual model describes a plurality of spaces and acoustic properties of those spaces, wherein the location in the virtual model corresponds to a physical location of the headset. The mapping server determines a location in the virtual model for the headset, based on information describing at least a portion of the local area received from the headset. The mapping server determines a set of acoustic parameters associated with the physical location of the headset, based in part on the determined location in the virtual model and any acoustic parameters associated with the determined location. The headset presents audio content to a listener using the set of acoustic parameters received from the mapping server.

BRIEF DESCRIPTION OF THE DRAWINGS

[0006] FIG. 1 is a block diagram of a system environment for a headset, in accordance with one or more embodiments.

[0007] FIG. 2 illustrates effects of surfaces in a room on the propagation of sound between a sound source and a user of a headset, in accordance with one or more embodiments.

[0008] FIG. 3A is a block diagram of a mapping server, in accordance with one or more embodiments.

[0009] FIG. 3B is a block diagram of an audio system of a headset, in accordance with one or more embodiments.

[0010] FIG. 3C is an example of a virtual model describing physical spaces and acoustic properties of the physical spaces, in accordance with one or more embodiments.

[0011] FIG. 4 is a perspective view of a headset including an audio system, in accordance with one or more embodiments.

[0012] FIG. 5A is a flowchart illustrating a process for determining acoustic parameters for a physical location of a headset, in accordance with one or more embodiments.

[0013] FIG. 5B is a flowchart illustrating a process for obtaining acoustic parameters from a mapping server, in accordance with one or more embodiments.

[0014] FIG. 5C is a flowchart illustrating a process for reconstructing a room impulse response at a headset, in accordance with one or more embodiments.

[0015] FIG. 6 is a block diagram of a system environment that includes a headset and a mapping server, in accordance with one or more embodiments.

[0016] The figures depict embodiments of the present disclosure for purposes of illustration only. One skilled in the art will readily recognize from the following description that alternative embodiments of the structures and methods illustrated herein may be employed without departing from the principles, or benefits touted, of the disclosure described herein.

DETAILED DESCRIPTION

[0017] Embodiments of the present disclosure may include or be implemented in conjunction with an artificial reality system. Artificial reality is a form of reality that has been adjusted in some manner before presentation to a user, which may include, e.g., a virtual reality (VR), an augmented reality (AR), a mixed reality (MR), a hybrid reality, or some combination and/or derivatives thereof. Artificial reality content may include completely generated content or generated content combined with captured (e.g., real-world) content. The artificial reality content may include video, audio, haptic feedback, or some combination thereof, and any of which may be presented in a single channel or in multiple channels (such as stereo video that produces a three-dimensional effect to the viewer). Additionally, in some embodiments, artificial reality may also be associated with applications, products, accessories, services, or some combination thereof, that are used to, e.g., create content in an artificial reality and/or are otherwise used in (e.g., perform activities in) an artificial reality. The artificial reality system that provides the artificial reality content may be implemented on various platforms, including a headset, a head-mounted display (HMD) connected to a host computer system, a standalone HMD, a near-eye display (NED), a mobile device or computing system, or any other hardware platform capable of providing artificial reality content to one or more viewers.

[0018] A communication system for room acoustic matching is presented herein. The communication system includes a headset with an audio system communicatively coupled to a mapping server. The audio system is implemented on a headset, which may include, speakers, an array of acoustic sensors, a plurality of imaging sensors (cameras), and an audio controller. The imaging sensors determine visual information in relation to at least a portion of the local area (e.g., depth information, color information, etc.). The headset communicates (e.g., via a network) the visual information to a mapping server. The mapping server maintains a virtual model of the world that includes acoustic properties for spaces within the real world. The mapping server determines a location in the virtual model that corresponds to the physical location of the headset using the visual information from the headset, e.g., images of at least the portion of the local area. The mapping server determines a set of acoustic parameters (e.g., a reverberation time, a reverberation level, etc.) associated with the determined location and provides the acoustic parameters to the headset. The headset uses (e.g., via the audio controller) the set of acoustic parameters to present audio content to a user of the headset. The array of acoustic sensors mounted on the headset monitors sound in the local area. The headset may selectively provide some or all of the monitored sound as an audio stream to the mapping server, responsive to determining that a change in room configuration has occurred (e.g., a change of human occupancy level, windows are open after being closed, curtains are open after being closed, etc.). The mapping server may update the virtual model by re-computing acoustic parameters based on the audio stream received from the headset.

[0019] In some embodiments, the headset obtains information about a set of acoustic parameters that parametrize an impulse response for a local area where the headset is located. The headset may obtain the set of acoustic parameters from the mapping server. Alternatively, the set of acoustic parameters are stored at the headset. The headset may reconstruct an impulse response for a specific spatial arrangement of the headset and a sound source (e.g., a virtual object) by extrapolating the set of acoustic parameters. The reconstructed impulse response may be represented by an adjusted set of acoustic parameters, wherein one or more acoustic parameters from the adjusted set are obtained by dynamically adjusting one or more corresponding acoustic parameters from the original set. The headset presents (e.g., via the audio controller) audio content using the reconstructed impulse response, i.e., the adjusted set of acoustic parameters.

[0020] The headset may be, e.g., a NED, HMD, or some other type of headset. The headset may be part of an artificial reality system. The headset further includes a display and an optical assembly. The display of the headset is configured to emit image light. The optical assembly of the headset is configured to direct the image light to an eye box of the headset corresponding to a location of a wearer’s eye. In some embodiments, the image light may include depth information for a local area surrounding the headset.

[0021] FIG. 1 is a block diagram of a system 100 for a headset 110, in accordance with one or more embodiments. The system 100 includes the headset 110 that can be worn by a user 106 in a room 102. The headset 110 is connected to a mapping server 130 via a network 120.

[0022] The network 120 connects the headset 110 to the mapping server 130. The network 120 may include any combination of local area and/or wide area networks using both wireless and/or wired communication systems. For example, the network 120 may include the Internet, as well as mobile telephone networks. In one embodiment, the network 120 uses standard communications technologies and/or protocols. Hence, the network 120 may include links using technologies such as Ethernet, 802.11, worldwide interoperability for microwave access (WiMAX), 2G/3G/4G mobile communications protocols, digital subscriber line (DSL), asynchronous transfer mode (ATM), InfiniBand, PCI Express Advanced Switching, etc. Similarly, the networking protocols used on the network 120 can include multiprotocol label switching (MPLS), the transmission control protocol/Internet protocol (TCP/IP), the User Datagram Protocol (UDP), the hypertext transport protocol (HTTP), the simple mail transfer protocol (SMTP), the file transfer protocol (FTP), etc. The data exchanged over the network 120 can be represented using technologies and/or formats including image data in binary form (e.g. Portable Network Graphics (PNG)), hypertext markup language (HTML), extensible markup language (XML), etc. In addition, all or some of links can be encrypted using conventional encryption technologies such as secure sockets layer (SSL), transport layer security (TLS), virtual private networks (VPNs), Internet Protocol security (IPsec), etc. The network 120 may also connect multiple headsets located in the same or different rooms to the same mapping server 130.

[0023] The headset 110 presents media to a user. In one embodiment, the headset 110 may be a NED. In another embodiment, the headset 110 may be a HMD. In general, the headset 110 may be worn on the face of a user such that content (e.g., media content) is presented using one or both lens of the headset. However, the headset 110 may also be used such that media content is presented to a user in a different manner. Examples of media content presented by the headset 110 include one or more images, video, audio, or some combination thereof.

[0024] The headset 110 may determine visual information describing at least a portion of the room 102, and provide the visual information to the mapping server 130. For example, the headset 110 may include at least one depth camera assembly (DCA) that generates depth image data for at least the portion of the room 102. The headset 110 may further include at least one passive camera assembly (PCA) that generates color image data for at least the portion of the room 102. In some embodiments, the DCA and the PCA of the headset 110 are part of simultaneous localization and mapping (SLAM) sensors mounted on the headset 110 for determining visual information of the room 102. Thus, the depth image data captured by the at least one DCA and/or the color image data captured by the at least one PCA can be referred to as visual information determined by the SLAM sensors of the headset 110.

[0025] The headset 110 may communicate the visual information via the network 120 to the mapping server 130 for determining a set of acoustic parameters for the room 102. In another embodiment, the headset 110 provides its location information (e.g., Global Positioning System (GPS) location of the room 102) to the mapping server 130 in addition to the visual information for determining the set of acoustic parameters. Alternatively, the headset 110 provides only the location information to the mapping server 130 for determining the set of acoustic parameters. A set of acoustic parameters can be used to represent various acoustic properties of a particular configuration in the room 102 that together define an acoustic condition in the room 102. The configuration in the room 102 is thus associated with a unique acoustic condition in the room 102. A configuration in the room 102 and an associated acoustic condition may change based on at least one of e.g., a change in location of the headset 110 in the room 102, a change in location of a sound source in the room 102, a change of human occupancy level in the room 102, a change of one or more acoustic materials of surfaces in the room 102, by opening/closing windows in the room 102, by opening/closing curtains, by opening/closing a door in the room 102, etc.

[0026] The set of acoustic parameters may include some or all of: a reverberation time from the sound source to the headset 110 for each of a plurality of frequency bands, a reverberant level for each frequency band, a direct to reverberant ratio for each frequency band, a direction of a direct sound from the sound source to the headset 110 for each frequency band, an amplitude of the direct sound for each frequency band, a time of early reflection of a sound from the sound source to the headset, an amplitude of early reflection for each frequency band, a direction of early reflection, room mode frequencies, room mode locations, etc. In some embodiments, the frequency dependence of some of the aforementioned acoustic parameters can be clustered into four frequency bands. In some other embodiments, some of the acoustic parameters can be clustered in more or less than four frequency bands. The headset 110 presents audio content to the user 106 using the set of acoustic parameters obtained from the mapping server 130. The audio content is presented to appear originating from an object (i.e., a real object or a virtual object) within the room 102.

[0027] The headset 110 may further include an array of acoustic sensors for monitoring sound in the room 102. The headset 110 may generate an audio stream based on the monitored sound. The headset 110 may selectively provide the audio stream to the mapping server 130 (e.g., via the network 120) for updating one or more acoustic parameters for the room 102 at the mapping server 130, responsive to determination that a change in a configuration in the room 102 has occurred causing that an acoustic condition in the room 102 has been changed. The headset 110 presents audio content to the user 106 using an updated set of acoustic parameters obtained from the mapping server 130.

[0028] In some embodiments, the headset 110 obtains a set of acoustic parameters parametrizing an impulse response for the room 102, either from the mapping server 130 or from a non-transitory computer readable storage device (i.e., a memory) at the headset 110. The headset 110 may selectively extrapolate the set of acoustic parameters into an adjusted set of acoustic parameters representing a reconstructed room impulse response for a specific configuration of the room 102 that differs from a configuration associated with the obtained set of acoustic parameters. The headset 110 presents audio content to the user of the headset 110 using the reconstructed room impulse response. Furthermore, the headset 110 may include position sensors or an inertial measurement unit (IMU) that tracks the position (e.g., location and pose) of the headset 110 within the room. Additional details regarding operations and components of the headset 110 are discussed below in connection with FIG. 3B, FIG. 4, FIGS. 5B-5C and FIG. 6.

[0029] The mapping server 130 facilitates the creation of audio content for the headset 110. The mapping server 130 includes a database that stores a virtual model describing a plurality of spaces and acoustic properties of those spaces, wherein one location in the virtual model corresponds to a current configuration of the room 102. The mapping server 130 receives, from the headset 110 via the network 120, visual information describing at least the portion of the room 102 and/or location information for the room 102. The mapping server 130 determines, based on the received visual information and/or location information, a location in the virtual model that is associated with the current configuration of the room 102. The mapping server 130 determines (e.g., retrieves) a set of acoustic parameters associated with the current configuration of the room 102, based in part on the determined location in the virtual model and any acoustic parameters associated with the determined location. The mapping server 130 may provide information about the set of acoustic parameters to the headset 110 (e.g., via the network 120) for generating audio content at the headset 110. Alternatively, the mapping server 130 may generate an audio signal using the set of acoustic parameters and provide the audio signal to the headset 110 for rendering. In some embodiments, some of the components of the mapping server 130 may be integrated with another device (e.g., a console) connected to the headset 110 via a wired connection (not shown in FIG. 1). Additional details regarding operations and components of the mapping server 130 are discussed below in connection with FIG. 3A, FIG. 3C, FIG. 5A.

[0030] FIG. 2 illustrates effects of surfaces in a room 200 on the propagation of sound between a sound source and a user of a headset, in accordance with one or more embodiments. A set of acoustic parameters (e.g., parametrizing a room impulse response) represent how a sound is transformed when traveling in the room 200 from a sound source to a user (receiver), and may include effects of a direct sound path and reflection sound paths traversed by the sound. For example, the user 106 wearing the headset 110 is located in the room 200. The room 200 includes walls, such as walls 202 and 204, which provide surfaces for reflecting sound 208 from an object 206 (e.g., virtual sound source). When the object 206 emits the sound 208, the sound 208 travels to the headset 110 through multiple paths. Some of the sound 208 travels along a direct sound path 210 to the (e.g., right) ear of the user 106 without reflection. The direct sound path 210 may result in an attenuation, filtering, and time delay of the sound caused by the propagation medium (e.g., air) for the distance between the object 206 and the user 106.

[0031] Other portions of the sound 208 are reflected before reaching the user 106 and represent reflection sounds. For example, another portion of the sound 208 travels along a reflection sound path 212, where the sound is reflected by the wall 202 to the user 106. The reflection sound path 212 may result in an attenuation, filtering, and time delay of the sound 208 caused by the propagation medium for the distance between the object 206 and the wall 202, another attenuation or filtering caused by a reflection off the wall 202, and another attenuation, filtering, and time delay caused by the propagation medium for the distance between the wall 202 and the user 106. The amount of the attenuation at the wall 202 depends on the acoustic absorption of the wall 202, which can vary based on the material of the wall 202. In another example, another portion of the sound 208 travels along a reflection sound path 214, where the sound 208 is reflected by an object 216 (e.g., a table) and toward the user 106.

[0032] Various sound propagation paths 210, 212, 214 within the room 200 represent a room impulse response, which depends on specific locations of a sound source (i.e., the object 206) and a receiver (e.g., the headset 106). The room impulse response contains a wide variety of information about the room, including low frequency modes, diffraction paths, transmission through walls, acoustic material properties of surfaces. The room impulse response can be parametrized using the set of acoustic parameters. Although the reflection sound paths 212 and 214 are examples of first order reflections caused by reflection at a single surface, the set of acoustic parameters (e.g., room impulse response) may incorporate effects from higher order reflections at multiple surfaces or objects. By transforming an audio signal of the object 206 using the set of acoustic parameters, the headset 110 generates audio content for the user 106 that simulates propagation of the audio signal as sound through the room 200 along the direct sound path 210 and reflection sound paths 212, 214.

[0033] Note that a propagation path from the object 206 (sound source) to the user 106 (receiver) within the room 200 can be generally divided into three parts: the direct sound path 210, early reflections (e.g., carried by the reflection sound path 214) that correspond to the first order acoustic reflections from nearby surfaces, and late reverberation (e.g., carried by the reflection sound path 212) that corresponds to the first order acoustic reflections from farther surfaces or higher order acoustic reflections. Each sound path has different perceptual requirements affecting rates of updating corresponding acoustic parameters. For example, the user 106 may have very little tolerance for latency in the direct sound path 210, and thus one or more acoustic parameters associated with the direct sound path 210 may be updated at a highest rate. The user 106 may have however more tolerance for latency in early reflections. The late reverberation is the least sensitive to changes in head rotation, because in many cases the late reverberation is isotropic and uniform within a room, hence the late reverberation does not change at the ears with rotational or translational movements. It is also very computationally expensive to compute all perceptually important acoustic parameters related to the late reverberation. For this reason, acoustic parameters associated with early reflections and late reverberation may be efficiently computed off-time, e.g., at the mapping server 130, which does not have as stringent energy and computation limitations as the headset 110, but does have a substantial latency. Details regarding operations of the mapping server 130 for determining acoustic parameters are discussed below in connection with FIG. 3A and FIG. 5A.

[0034] FIG. 3A is a block diagram of the mapping server 130, in accordance with one or more embodiments. The mapping server 130 determines a set of acoustic parameters for physical space (room) where the headset 110 is located. The determined set of acoustic parameters may be used at the headset 110 to transform an audio signal associated with an object (e.g., virtual or real object) in the room. To add a convincing sound source to the object, the audio signal output from the headset 110 should sound like it has propagated from the object’s location to the listener in the same way that a natural source in the same position would. The set of acoustic parameters defines a transformation caused by the propagation of sound from the object within the room to the listener (i.e., to position of the headset within the room), including propagation along a direct path and various reflection paths off surfaces of the room. The mapping server 130 includes a virtual model database 305, a communication module 310, a mapping module 315, and an acoustic analysis module 320. In other embodiments, the mapping server 130 can have any combination of the modules listed with any additional modules. In some other embodiments, the mapping server 130 includes one or more modules that combine functions of the modules illustrated in FIG. 3A. A processor of the mapping server 130 (not shown in FIG. 3A) may run some or all of the virtual model database 305, the communication module 310, the mapping module 315, the acoustic analysis module 320, one or more other modules or modules combining functions of the modules shown in FIG. 3A.

[0035] The virtual model database 305 stores a virtual model describing a plurality of physical spaces and acoustic properties of those physical spaces. Each location in the virtual model corresponds to a physical location of the headset 110 within a local area having a specific configuration associated with a unique acoustic condition. The unique acoustic condition represents a condition of the local area having a unique set of acoustic properties represented with a unique set of acoustic parameters. A particular location in the virtual model may correspond to a current physical location of the headset 110 within the room 102. Each location in the virtual model is associated with a set of acoustic parameters for a corresponding physical space that represents one configuration of the local area. The set of acoustic parameters describes various acoustic properties of that one particular configuration of the local area. The physical spaces whose acoustic properties are described in the virtual model include, but are not limited to, a conference room, a bathroom, a hallway, an office, a bedroom, a dining room, and a living room. Hence, the room 102 of FIG. 1 may be a conference room, a bathroom, a hallway, an office, a bedroom, a dining room, or a living room. In some embodiments, the physical spaces can be certain outside spaces (e.g., patio, garden, etc.) or combination of various inside and outside spaces. More details about a structure of the virtual model are discussed below in connection with FIG. 3C.

[0036] The communication module 310 is a module that communicates with the headset 130 via the network 120. The communication module 310 receives, from the headset 130, visual information describing at least the portion of the room 102. In one or more embodiments, the visual information includes image data for at least the portion of the room 102. For example, the communication module 310 receives depth image data captured by the DCA of the headset 110 with information about a shape of the room 102 defined by surfaces of the room 102, such as surfaces of the walls, floor and ceiling of the room 102. The communication module 310 may also receive color image data captured by the PCA of the headset 110. The mapping server 130 may use the color image data to associate different acoustic materials with the surfaces of the room 102. The communication module 310 may provide the visual information received from the headset 130 (e.g., the depth image data and the color image data) to the mapping module 315.

[0037] The mapping module 315 maps the visual information received from the headset 110 to a location of the virtual model. The mapping module 315 determines the location of the virtual model corresponding to a current physical space where the headset 110 is located, i.e., a current configuration of the room 102. The mapping module 315 searches through the virtual model to find mapping between (i) the visual information that include at least e.g., information about geometry of surfaces of the physical space and information about acoustic materials of the surfaces and (ii) a corresponding configuration of the physical space within the virtual model. The mapping is performed by matching the geometry and/or acoustic materials information of the received visual information with geometry and/or acoustic materials information that is stored as part of the configuration of the physical space within the virtual model. The corresponding configuration of the physical space within the virtual model corresponds to a model of the physical space where the headset 110 is currently located. If no matching is found, this is an indication that a current configuration of the physical space is not yet modeled within the virtual model. In such case, the mapping module 315 may inform the acoustic analysis module 320 that no matching is found, and the acoustic analysis module 320 determines a set of acoustic parameters based at least in part on the received visual information.

[0038] The acoustic analysis module 320 determines the set of acoustic parameters associated with the physical location of the headset 110, based in part on the determined location in the virtual model obtained from the mapping module 315 and any acoustic parameters in the virtual model associated with the determined location. In some embodiments, the acoustic analysis module 320 retrieves the set of acoustic parameters from the virtual model, as the set of acoustic parameters are stored at the determined location in the virtual model that is associated with a specific space configuration. In some other embodiments, the acoustic analysis module 320 determines the set of acoustic parameters by adjusting a previously determined set of acoustic parameters for a specific space configuration in the virtual model, based at least in part on the visual information received from the headset 110. For example, the acoustic analysis module 320 may run off-line acoustic simulation using the received visual information to determine the set of acoustic parameters.

[0039] In some embodiments, the acoustic analysis module 320 determines that previously generated acoustic parameters are not consistent with an acoustic condition of the current physical location of the headset 110, e.g., by analyzing an ambient sound that is captured and obtained from the headset 110. The detected miss-match may trigger regeneration of a new set of acoustic parameters at the mapping server 130. Once re-computed, this new set of acoustic parameters may be entered into the virtual model of the mapping server 130 as a replacement for the previous set of acoustic parameters, or as an additional state for the same physical space. In some embodiments, the acoustic analysis module 320 estimates a set of acoustic parameters by analyzing the ambient sound (e.g., speech) received from the headset 110. In some other embodiments, the acoustic analysis module 320 derives a set of acoustic parameters by running an acoustic simulation (e.g., a wave-based acoustic simulation or ray tracing acoustic simulation) using the visual information received from the headset 110 that may include the room geometry and estimates of the acoustic material properties. The acoustic analysis module 320 provides the derived set of acoustic parameters to the communication module 310 that communicates the set of acoustic parameters from the mapping server 130 to the headset 110, e.g., via the network 120.

[0040] In some embodiments, as discussed, the communication module 310 receives an audio stream from the headset 110, which may be generated at the headset 110 using sound in the room 102. The acoustic analysis module 320 may determine (e.g., by applying a server-based computational algorithm) one or more acoustic parameters for a specific configuration of the room 102, based on the received audio stream. In some embodiments, the acoustic analysis module 320 estimates the one or more acoustic parameters (e.g., a reverberation time) from the audio stream, based on e.g., a statistical model for a sound decay in the audio stream that employs a maximum-likelihood estimator. In some other embodiments, the acoustic analysis module 320 estimates the one or more acoustic parameters based on e.g., time domain information and/or frequency domain information extracted from the received audio stream.

[0041] In some embodiments, the one or more acoustic parameters determined by the acoustic analysis module 320 represent a new set of acoustic parameters that was not part of the virtual model as a current configuration of the room 102 and a corresponding acoustic condition of the room 102 were not modeled by the virtual model. In such case, the virtual model database 305 stores the new set of acoustic parameters at a location within the virtual model that is associated with a current configuration of the room 102 modelling a current acoustic condition of the room 102. Some or all of the one or more acoustic parameters (e.g., a frequency dependent reverberation time, a frequency dependent direct to reverberant ratio, etc.) may be stored in the virtual model along with a confidence (weight) and an absolute time stamp associated with that acoustic parameter, which can be used for re-computing some of the acoustic parameters.

[0042] In some embodiments, a current configuration of the room 102 has been already modeled by the virtual model, and the acoustic analysis module 320 re-computes the set of acoustic parameters based on the received audio stream. Alternatively, one or more acoustic parameters in the re-computed set may be determined at the headset 110 based on, e.g., at least sound in the local area monitored at the headset 110, and communicated to the mapping server 130. The virtual model database 305 may update the virtual model by replacing the set of acoustic parameters with the re-computed set of acoustic parameters. In one or more embodiments, the acoustic analysis module 320 compares the re-computed set of acoustic parameters with the previously determined set of acoustic parameters. Based on the comparison, when a difference between any of the re-computed acoustic parameters and any of the previously determined acoustic parameter is above a threshold difference, the virtual model is updated using the re-computed set of acoustic parameters.

[0043] In some embodiments, the acoustic analysis module 320 combines any of the re-computed acoustic parameters with past estimates of a corresponding acoustic parameter for the same configuration of a local area, if the past estimates are within a threshold value from a re-computed acoustic parameter. The past estimates may be stored in the virtual model database 305 at a location of the virtual model associated with the corresponding configuration of the local area. In one or more embodiments, the acoustic analysis module 320 applies weights on the past estimates (e.g., weights based on time stamps associated with the past estimates or stored weights), if the past estimates are not within the threshold value from the re-computed acoustic parameter. In some embodiments, the acoustic analysis module 320 applies a material optimization algorithm on estimates for at least one acoustic parameter (e.g., a reverberation time) and geometry information for a physical space where the headset 110 is located to determine different acoustic materials that would produce the estimates for the at least one acoustic parameter. Information about the acoustic materials along with the geometry information may be stored in different locations of the virtual model that model different configurations and acoustic conditions of the same physical space.

[0044] In some embodiments, the acoustic analysis module 320 may perform acoustic simulations to generate spatially dependent pre-computed acoustic parameters (e.g., a spatially dependent reverberation time, a spatially dependent direct to reverberant ratio, etc.). The spatially dependent pre-computed acoustic parameters may be stored in appropriate locations of the virtual model at the virtual model database 305. The acoustic analysis module 320 may re-compute spatially dependent acoustic parameters using the pre-computed acoustic parameters whenever geometry and/or acoustic materials of a physical space change. The acoustic analysis module 320 may use various inputs for the acoustic simulations, such as but not limited to: information about a room geometry, acoustic material property estimates, and/or information about a human occupancy level (e.g., empty, partially full, full). The acoustic parameters may be simulated for various occupancy levels, and various states of a room (e.g. open windows, closed windows, curtains open, curtains closed, etc.). If a state of the room changes, the mapping server 130 may determine and communicate to the headset 110 an appropriate set of acoustic parameters for presenting audio content to user. Otherwise, if the appropriate set of acoustic parameters is not available, the mapping server 130 (e.g., via the acoustic analysis module 320) would calculate a new set of acoustic parameters (e.g., via the acoustic simulations) and communicate the new set of acoustic parameters to the headset 110.

[0045] In some embodiments, the mapping server 130 stores a full (measured or simulated) room impulse response for a given configuration of the local area. For example, the configuration of the local area may be based on a specific spatial arrangement of the headset 110 and a sound source. The mapping server 130 may reduce the room impulse response into a set of acoustic parameters suitable for a defined bandwidth of network transmission (e.g., a bandwidth of the network 120). The set of acoustic parameters representing a parametrized version of a full impulse response may be stored, e.g., in the virtual model database 305 as part of the virtual mode, or in a separate non-transitory computer readable storage medium of the mapping server 130 (not shown in FIG. 3A).

[0046] FIG. 3B is a block diagram of an audio system 330 of the headset 110, in accordance with one or more embodiments. The audio system 330 includes a transducer assembly 335, an acoustic assembly 340, an audio controller 350, and a communication module 355. In one embodiment, the audio system 330 further comprises an input interface (not shown in FIG. 3B) for, e.g., controlling operations of different components of the audio system 330. In other embodiments, the audio system 330 can have any combination of the components listed with any additional components.

[0047] The transducer assembly 335 produces sound for user’s ears, e.g., based on audio instructions from the audio controller 350. In some embodiments, the transducer assembly 335 is implemented as pair of air conduction transducers (e.g., one for each ear) that produce sound by generating an airborne acoustic pressure wave in the user’s ears, e.g., in accordance with the audio instructions from the audio controller 350. Each air conduction transducer of the transducer assembly 335 may include one or more transducers to cover different parts of a frequency range. For example, a piezoelectric transducer may be used to cover a first part of a frequency range and a moving coil transducer may be used to cover a second part of a frequency range. In some other embodiments, each transducer of the transducer assembly 335 is implemented as a bone conduction transducer that produces sound by vibrating a corresponding bone in the user’s head. Each transducer implemented as a bone conduction transducer may be placed behind an auricle coupled to a portion of the user’s bone to vibrate the portion of the user’s bone that generates a tissue-borne acoustic pressure wave propagating toward the user’s cochlea, thereby bypassing the eardrum.

[0048] The acoustic assembly 340 may include a plurality of acoustic sensors, e.g., one acoustic sensor for each ear. Alternatively, the acoustic assembly 340 includes an array of acoustic sensors (e.g., microphones) mounted on various locations of the headset 110. An acoustic sensor of the acoustic assembly 340 detects acoustic pressure waves at the entrance of the ear. One or more acoustic sensors of the acoustic assembly 340 may be positioned at an entrance of each ear. The one or more acoustic sensors are configured to detect the airborne acoustic pressure waves formed at an entrance of the ear. In one embodiment, the acoustic assembly 340 provides information regarding the produced sound to the audio controller 350. In another embodiment, the acoustic assembly 340 transmits feedback information of the detected acoustic pressure waves to the audio controller 350, and the feedback information may be used by the audio controller 350 for calibration of the transducer assembly 335.

[0049] In one embodiment, the acoustic assembly 340 includes a microphone positioned at an entrance of each ear of a wearer. A microphone is a transducer that converts pressure into an electrical signal. The frequency response of the microphone may be relatively flat in some portions of a frequency range and may be linear in other portions of a frequency range. The microphone may be configured to receive a signal from the audio controller 350 to scale a detected signal from the microphone based on the audio instructions provided to the transducer assembly 335. For example, the signal may be adjusted based on the audio instructions to avoid clipping of the detected signal or for improving a signal to noise ratio in the detected signal.

[0050] In another embodiment, the acoustic assembly 340 includes a vibration sensor. The vibration sensor is coupled to a portion of the ear. In some embodiments, the vibration sensor and the transducer assembly 335 couple to different portions of the ear. The vibration sensor is similar to an air transducer used in the transducer assembly 335 except the signal is flowing in reverse. Instead of an electrical signal producing a mechanical vibration in a transducer, a mechanical vibration is generating an electrical signal in the vibration sensor. A vibration sensor may be made of piezoelectric material that can generate an electrical signal when the piezoelectric material is deformed. The piezoelectric material may be a polymer (e.g., PVC, PVDF), a polymer-based composite, ceramic, or crystal (e.g., SiO.sub.2, PZT). By applying a pressure on the piezoelectric material, the piezoelectric material changes in polarization and produces an electrical signal. The piezoelectric sensor may be coupled to a material (e.g., silicone) that attaches well to the back of ear. A vibration sensor can also be an accelerometer. The accelerometer may be piezoelectric or capacitive. In one embodiment, the vibration sensor maintains good surface contact with the back of the wearer’s ear and maintains a steady amount of application force (e.g., 1 Newton) to the ear. The vibration sensor may be integrated in an IMU integrated circuit. The IMU is further described with relation to FIG. 6.

[0051] The audio controller 350 provides audio instructions to the transducer assembly 335 for generating sound by generating audio content using a set of acoustic parameters (e.g., a room impulse response). The audio controller 350 presents the audio content to appear originating from an object (e.g., virtual object or real object) within a local area of the headset 110. In an embodiment, the audio controller 350 presents the audio content to appear originating from a virtual sound source by transforming a source audio signal using the set of acoustic parameters for a current configuration of the local area, which may parametrize the room impulse response for the current configuration of the local area.

[0052] The audio controller 350 may obtain information describing at least a portion of the local area, e.g., from one or more cameras of the headset 110. The information may include depth image data, color image data, location information of the local area, or combination thereof. The depth image data may include geometry information about a shape of the local area defined by surfaces of the local area, such as surfaces of the walls, floor and ceiling of the local area. The color image data may include information about acoustic materials associated with surfaces of the local area. The location information may include GPS coordinates or some other positional information of the local area.

[0053] In some embodiments, the audio controller 350 generates an audio stream based on sound in the local area monitored by the acoustic assembly 340 and provides the audio stream to the communication module 355 to be selectively communicated to the mapping server 130. In some embodiments, the audio controller 350 runs a real-time acoustic ray tracing simulation to determine one or more acoustic parameters (e.g., early reflections, a direct sound occlusion, etc.). To be able to run the real-time acoustic ray tracing simulation, the audio controller 350 requests and obtains, e.g., from the virtual model stored at the mapping server 130, information about geometry and/or acoustic parameters for a configuration of the local area where the headset 110 is currently located. In some embodiments, the audio controller 350 determines one or more acoustic parameters for a current configuration of the local area using sound in the local area monitored by the acoustic assembly 340 and/or vision information determined at the headset 110, e.g., by one or more of the SLAM sensors mounted on the headset 110.

[0054] The communication module 355 (e.g., a transceiver) is coupled to the audio controller 350 and may be integrated as a part of the audio controller 350. The communication module 355 may communicate the information describing at least the portion of the local area to the mapping server 130 for determination of a set of acoustic parameters at the mapping server 130. The communication module 355 may selectively communicate the audio stream obtained from the audio controller 350 to the mapping server 130 for updating the visual model of physical spaces at the mapping server 130. For example, the communication module 355 communicates the audio stream to the mapping server 130 responsive to determination (e.g., by the audio controller 350 based on the monitored sound) that a change of an acoustic condition of the local area over time is above a threshold change due to a change of a configuration of the local area, which requires a new or updated set of acoustic parameters. In some embodiments, the audio controller 350 determines that the change of the acoustic condition of the local area is above the threshold change by periodically analyzing the ambient audio stream and e.g., by periodically estimating a reverberation time from the audio stream that is changing over time. For example, the change of acoustic condition can be caused by changing human occupancy level (e.g., empty, partially full, full) in the room 102, by opening or closing windows in the room 102, opening or closing door of the room 102, opening or closing curtains on the windows, changing a location of the headset 110 in the room 102, changing a location of a sound source in the room 102, changing some other feature in the room 102, or combination thereof. In some embodiments, the communication module 355 communicates the one or more acoustic parameters determined by the audio controller 350 to the mapping server 130 for comparing with a previously determined set of acoustic parameters associated with the current configuration of the local area to possibly update the virtual model at the mapping server 130.

[0055] In one embodiment, the communication module 355 receives a set of acoustic parameters for a current configuration of the local area from the mapping server 130. In another embodiment, the audio controller 350 determines the set of acoustic parameters for the current configuration of the local area based on, e.g., visual information of the local area determined by one or more of the SLAM sensors mounted on the headset 110, sound in the local area monitored by the acoustic assembly 340, information about a position of the headset 110 in the local area determined by the position sensor 440, information about position of a sound source in the local area, etc. In yet another embodiment, the audio controller 350 obtains the set of acoustic parameters from a computer-readable data storage (i.e., memory) coupled to the audio controller 350 (not shown in FIG. 3B). The memory may store different sets of acoustic parameters (room impulse responses) for a limited number of configurations of physical spaces. The set of acoustic parameters may represent a parametrized form of a room impulse response for the current configuration of the local area.

[0056] The audio controller 350 may selectively extrapolate the set of acoustic parameters into an adjusted set of acoustic parameters (i.e., a reconstructed room impulse response), responsive to a change over time in a configuration of the local area that causes a change in an acoustic condition of the local area. The change of acoustic condition of the local area over time can be determined by the audio controller 350 based on, e.g., visual information of the local area, monitored sound in the local area, information about a change in position of the headset 110 in the local area, information about a change in position of the sound source in the local area, etc. As some acoustic parameters in the set are changing in a systematic manner as a configuration of the local area changes (e.g., due to moving of the headset 110 and/or the sound source in the local area), the audio controller 350 may apply an extrapolation scheme to dynamically adjust some of the acoustic parameters.

[0057] In one embodiment, the audio controller 350 dynamically adjusts, using an extrapolation scheme, e.g., an amplitude and direction of a direct sound, a delay between a direct sound and early reflections, and/or a direction and amplitude of early reflections, based on information about room geometry and pre-calculated image sources (e.g., in one iteration). In another embodiment, the audio controller 350 dynamically adjusts some of the acoustic parameters based on e.g., a data driven approach. In such case, the audio controller 350 may train a model with measurements of a defined number of rooms and source/receiver locations, and the audio controller 350 may predict an impulse response for a specific novel room and source/receiver arrangement based on the a priori knowledge. In yet another embodiment, the audio controller 350 dynamically adjusts some of the acoustic parameters by interpolating acoustic parameters associated with two rooms as a listener nears the connection between the rooms. A parametrized representation of a room impulse response represented with a set of acoustic parameters can be therefore adapted dynamically. The audio controller 350 may generate audio instructions for the transducer assembly 335 based at least in part on the dynamically adapted room impulse response.

[0058] The audio controller 350 may reconstruct a room impulse response for a specific configuration of the local area by applying an extrapolation scheme on the set of acoustic parameters received from the mapping server 130. Acoustic parameters that represent a parametrized form of a room impulse response and are related to perceptually relevant room impulse response features may include some or all of: a reverberation time from the sound source to the headset 110 for each of a plurality of frequency bands, a reverberant level for each frequency band, a direct to reverberant ratio for each frequency band, a direction of a direct sound from the sound source to the headset 110 for each frequency band, an amplitude of the direct sound for each frequency band, a time of early reflection of a sound from the sound source to the headset, an amplitude of early reflection for each frequency band, a direction of early reflection, room mode frequencies, room mode locations, one or more other acoustic parameters, or combination thereof.

[0059] The audio controller 350 may perform a spatial extrapolation on the received set of acoustic parameters to obtain an adjusted set of acoustic parameters that represents a reconstructed room impulse response for a current configuration of the local area. When performing the spatial extrapolation, the audio controller 350 may adjust multiple acoustic parameters, such as: a direction of direct sound, an amplitude of direct sound relative to reverberation, a direct sound equalization according to source directivity, a timing of early reflection, an amplitude of early reflection, a direction of early reflection, etc. Note that the reverberation time may remain constant within a room, and may need to be adjusted at intersection of rooms.

[0060] In one embodiment, to adjust early reflection timing/amplitude/direction, the audio controller 350 performs extrapolation based on a direction of arrival (DOA) per sample or reflection. In such case, the audio controller 350 may apply an offset to the entire DOA vector. Note that the DOA of early reflections may be determined by processing audio data obtained by the array of microphones mounted on the headset 110. The DOA of early reflections may be then adjusted based on, e.g., a user’s position in the room 102 and information about the room geometry.

[0061] In another embodiment, when room geometry and source/listener position are known, the audio controller 350 may identify low order reflections based on an image source model (ISM). As the listener moves, the timing and direction of the identified reflections are modified by running the ISM. In such case, an amplitude can be adjusted, whereas a coloration may not be manipulated. Note that an ISM represents a simulation model that determines a source position of early reflections, independent of a listener’s position. The early reflection directions can then be calculated by tracing from an image source to the listener. Storing and utilizing image sources for a given source yields early reflection directions for any listener position in the room 102.

……
……
……

更多阅读推荐......