雨果巴拉:行业北极星Vision Pro过度设计不适合市场

Facebook Patent | Determination Of Acoustic Parameters For A Headset Using A Mapping Server

Patent: Determination Of Acoustic Parameters For A Headset Using A Mapping Server

Publication Number: 10674307

Publication Date: 20200602

Applicants: Facebook

Abstract

Determination of a set of acoustic parameters for a headset is presented herein. The set of acoustic parameters can be determined based on a virtual model of physical locations stored at a mapping server. The virtual model describes a plurality of spaces and acoustic properties of those spaces, wherein the location in the virtual model corresponds to a physical location of the headset. A location in the virtual model for the headset is determined based on information describing at least a portion of the local area received from the headset. The set of acoustic parameters associated with the physical location of the headset is determined based in part on the determined location in the virtual model and any acoustic parameters associated with the determined location. The headset presents audio content using the set of acoustic parameters received from the mapping server.

BACKGROUND

The present disclosure relates generally to presentation of audio at a headset, and specifically relates to determination of acoustic parameters for a headset using a mapping server.

A sound perceived at the ears of two users can be different, depending on a direction and a location of a sound source with respect to each user as well as on the surroundings of a room in which the sound is perceived. Humans can determine a location of the sound source by comparing the sound perceived at each set of ears. In an artificial reality environment, simulating sound propagation from an object to a listener may use knowledge about the acoustic parameters of the room, for example a reverberation time or the direction of incidence of the strongest early reflections. One technique for determining the acoustic parameters of a room includes placing a loudspeaker in a desired source location, playing a controlled test signal, and de-convolving the test signal from what is recorded at a listener location. However, such a technique generally requires a measurement laboratory or dedicated equipment in-situ.

To seamlessly place a virtual sound source in an environment, sound signals to each ear are determined based on sound propagation paths from the source, through an environment, to a listener (receiver). Various sound propagation paths can be represented based on a set of frequency dependent acoustic parameters used at a headset for presenting audio content to the receiver (user of the headset). A set of frequency dependent acoustic parameters is typically unique for a specific acoustic configuration of a local environment (room) that has a unique acoustic property. However, storing and updating various sets of acoustic parameters at the headset for all possible acoustic configurations of the local environment is impractical. Various sound propagation paths within a room between a source and a receiver represent a room impulse response, which depends on specific locations of the source and receiver. It is however memory intensive to store measured or simulated room impulse responses for a dense network of all possible source and receiver locations in a space, or even a relatively small subset of the most common arrangements. Therefore, determination of a room impulse response in real-time is computationally intensive as the required accuracy increases.

SUMMARY

Embodiments of the present disclosure support a method, computer readable medium, and apparatus for determining a set of acoustic parameters for presenting audio content at a headset. In some embodiments, the set of acoustic parameters are determined based on a virtual model of physical locations stored at a mapping server connected with the headset via a network. The virtual model describes a plurality of spaces and acoustic properties of those spaces, wherein the location in the virtual model corresponds to a physical location of the headset. The mapping server determines a location in the virtual model for the headset, based on information describing at least a portion of the local area received from the headset. The mapping server determines a set of acoustic parameters associated with the physical location of the headset, based in part on the determined location in the virtual model and any acoustic parameters associated with the determined location. The headset presents audio content to a listener using the set of acoustic parameters received from the mapping server.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram of a system environment for a headset, in accordance with one or more embodiments.

FIG. 2 illustrates effects of surfaces in a room on the propagation of sound between a sound source and a user of a headset, in accordance with one or more embodiments.

FIG. 3A is a block diagram of a mapping server, in accordance with one or more embodiments.

FIG. 3B is a block diagram of an audio system of a headset, in accordance with one or more embodiments.

FIG. 3C is an example of a virtual model describing physical spaces and acoustic properties of the physical spaces, in accordance with one or more embodiments.

FIG. 4 is a perspective view of a headset including an audio system, in accordance with one or more embodiments.

FIG. 5A is a flowchart illustrating a process for determining acoustic parameters for a physical location of a headset, in accordance with one or more embodiments.

FIG. 5B is a flowchart illustrating a process for obtaining acoustic parameters from a mapping server, in accordance with one or more embodiments.

FIG. 5C is a flowchart illustrating a process for reconstructing a room impulse response at a headset, in accordance with one or more embodiments.

FIG. 6 is a block diagram of a system environment that includes a headset and a mapping server, in accordance with one or more embodiments.

The figures depict embodiments of the present disclosure for purposes of illustration only. One skilled in the art will readily recognize from the following description that alternative embodiments of the structures and methods illustrated herein may be employed without departing from the principles, or benefits touted, of the disclosure described herein.

DETAILED DESCRIPTION

Embodiments of the present disclosure may include or be implemented in conjunction with an artificial reality system. Artificial reality is a form of reality that has been adjusted in some manner before presentation to a user, which may include, e.g., a virtual reality (VR), an augmented reality (AR), a mixed reality (MR), a hybrid reality, or some combination and/or derivatives thereof. Artificial reality content may include completely generated content or generated content combined with captured (e.g., real-world) content. The artificial reality content may include video, audio, haptic feedback, or some combination thereof, and any of which may be presented in a single channel or in multiple channels (such as stereo video that produces a three-dimensional effect to the viewer). Additionally, in some embodiments, artificial reality may also be associated with applications, products, accessories, services, or some combination thereof, that are used to, e.g., create content in an artificial reality and/or are otherwise used in (e.g., perform activities in) an artificial reality. The artificial reality system that provides the artificial reality content may be implemented on various platforms, including a headset, a head-mounted display (HMD) connected to a host computer system, a standalone HMD, a near-eye display (NED), a mobile device or computing system, or any other hardware platform capable of providing artificial reality content to one or more viewers.

A communication system for room acoustic matching is presented herein. The communication system includes a headset with an audio system communicatively coupled to a mapping server. The audio system is implemented on a headset, which may include, speakers, an array of acoustic sensors, a plurality of imaging sensors (cameras), and an audio controller. The imaging sensors determine visual information in relation to at least a portion of the local area (e.g., depth information, color information, etc.). The headset communicates (e.g., via a network) the visual information to a mapping server. The mapping server maintains a virtual model of the world that includes acoustic properties for spaces within the real world. The mapping server determines a location in the virtual model that corresponds to the physical location of the headset using the visual information from the headset, e.g., images of at least the portion of the local area. The mapping server determines a set of acoustic parameters (e.g., a reverberation time, a reverberation level, etc.) associated with the determined location and provides the acoustic parameters to the headset. The headset uses (e.g., via the audio controller) the set of acoustic parameters to present audio content to a user of the headset. The array of acoustic sensors mounted on the headset monitors sound in the local area. The headset may selectively provide some or all of the monitored sound as an audio stream to the mapping server, responsive to determining that a change in room configuration has occurred (e.g., a change of human occupancy level, windows are open after being closed, curtains are open after being closed, etc.). The mapping server may update the virtual model by re-computing acoustic parameters based on the audio stream received from the headset.

In some embodiments, the headset obtains information about a set of acoustic parameters that paramterize an impulse response for a local area where the headset is located. The headset may obtain the set of acoustic parameters from the mapping server. Alternatively, the set of acoustic parameters are stored at the headset. The headset may reconstruct an impulse response for a specific spatial arrangement of the headset and a sound source (e.g., a virtual object) by extrapolating the set of acoustic parameters. The reconstructed impulse response may be represented by an adjusted set of acoustic parameters, wherein one or more acoustic parameters from the adjusted set are obtained by dynamically adjusting one or more corresponding acoustic parameters from the original set. The headset presents (e.g., via the audio controller) audio content using the reconstructed impulse response, i.e., the adjusted set of acoustic parameters.

The headset may be, e.g., a NED, HMD, or some other type of headset. The headset may be part of an artificial reality system. The headset further includes a display and an optical assembly. The display of the headset is configured to emit image light. The optical assembly of the headset is configured to direct the image light to an eye box of the headset corresponding to a location of a wearer’s eye. In some embodiments, the image light may include depth information for a local area surrounding the headset.

FIG. 1 is a block diagram of a system 100 for a headset 110, in accordance with one or more embodiments. The system 100 includes the headset 110 that can be worn by a user 106 in a room 102. The headset 110 is connected to a mapping server 130 via a network 120.

The network 120 connects the headset 110 to the mapping server 130. The network 120 may include any combination of local area and/or wide area networks using both wireless and/or wired communication systems. For example, the network 120 may include the Internet, as well as mobile telephone networks. In one embodiment, the network 120 uses standard communications technologies and/or protocols. Hence, the network 120 may include links using technologies such as Ethernet, 802.11, worldwide interoperability for microwave access (WiMAX), 2G/3G/4G mobile communications protocols, digital subscriber line (DSL), asynchronous transfer mode (ATM), InfiniBand, PCI Express Advanced Switching, etc. Similarly, the networking protocols used on the network 120 can include multiprotocol label switching (MPLS), the transmission control protocol/Internet protocol (TCP/IP), the User Datagram Protocol (UDP), the hypertext transport protocol (HTTP), the simple mail transfer protocol (SMTP), the file transfer protocol (FTP), etc. The data exchanged over the network 120 can be represented using technologies and/or formats including image data in binary form (e.g. Portable Network Graphics (PNG)), hypertext markup language (HTML), extensible markup language (XML), etc. In addition, all or some of links can be encrypted using conventional encryption technologies such as secure sockets layer (SSL), transport layer security (TLS), virtual private networks (VPNs), Internet Protocol security (IPsec), etc. The network 120 may also connect multiple headsets located in the same or different rooms to the same mapping server 130.

The headset 110 presents media to a user. In one embodiment, the headset 110 may be a NED. In another embodiment, the headset 110 may be a HMD. In general, the headset 110 may be worn on the face of a user such that content (e.g., media content) is presented using one or both lens of the headset. However, the headset 110 may also be used such that media content is presented to a user in a different manner. Examples of media content presented by the headset 110 include one or more images, video, audio, or some combination thereof.

The headset 110 may determine visual information describing at least a portion of the room 102, and provide the visual information to the mapping server 130. For example, the headset 110 may include at least one depth camera assembly (DCA) that generates depth image data for at least the portion of the room 102. The headset 110 may further include at least one passive camera assembly (PCA) that generates color image data for at least the portion of the room 102. In some embodiments, the DCA and the PCA of the headset 110 are part of simultaneous localization and mapping (SLAM) sensors mounted on the headset 110 for determining visual information of the room 102. Thus, the depth image data captured by the at least one DCA and/or the color image data captured by the at least one PCA can be referred to as visual information determined by the SLAM sensors of the headset 110.

The headset 110 may communicate the visual information via the network 120 to the mapping server 130 for determining a set of acoustic parameters for the room 102. In another embodiment, the headset 110 provides its location information (e.g., Global Positioning System (GPS) location of the room 102) to the mapping server 130 in addition to the visual information for determining the set of acoustic parameters. Alternatively, the headset 110 provides only the location information to the mapping server 130 for determining the set of acoustic parameters. A set of acoustic parameters can be used to represent various acoustic properties of a particular configuration in the room 102 that together define an acoustic condition in the room 102. The configuration in the room 102 is thus associated with a unique acoustic condition in the room 102. A configuration in the room 102 and an associated acoustic condition may change based on at least one of e.g., a change in location of the headset 110 in the room 102, a change in location of a sound source in the room 102, a change of human occupancy level in the room 102, a change of one or more acoustic materials of surfaces in the room 102, by opening/closing windows in the room 102, by opening/closing curtains, by opening/closing a door in the room 102, etc.

The set of acoustic parameters may include some or all of: a reverberation time from the sound source to the headset 110 for each of a plurality of frequency bands, a reverberant level for each frequency band, a direct to reverberant ratio for each frequency band, a direction of a direct sound from the sound source to the headset 110 for each frequency band, an amplitude of the direct sound for each frequency band, a time of early reflection of a sound from the sound source to the headset, an amplitude of early reflection for each frequency band, a direction of early reflection, room mode frequencies, room mode locations, etc. In some embodiments, the frequency dependence of some of the aforementioned acoustic parameters can be clustered into four frequency bands. In some other embodiments, some of the acoustic parameters can be clustered in more or less than four frequency bands. The headset 110 presents audio content to the user 106 using the set of acoustic parameters obtained from the mapping server 130. The audio content is presented to appear originating from an object (i.e., a real object or a virtual object) within the room 102.

The headset 110 may further include an array of acoustic sensors for monitoring sound in the room 102. The headset 110 may generate an audio stream based on the monitored sound. The headset 110 may selectively provide the audio stream to the mapping server 130 (e.g., via the network 120) for updating one or more acoustic parameters for the room 102 at the mapping server 130, responsive to determination that a change in a configuration in the room 102 has occurred causing that an acoustic condition in the room 102 has been changed. The headset 110 presents audio content to the user 106 using an updated set of acoustic parameters obtained from the mapping server 130.

In some embodiments, the headset 110 obtains a set of acoustic parameters parametrizing an impulse response for the room 102, either from the mapping server 130 or from a non-transitory computer readable storage device (i.e., a memory) at the headset 110. The headset 110 may selectively extrapolate the set of acoustic parameters into an adjusted set of acoustic parameters representing a reconstructed room impulse response for a specific configuration of the room 102 that differs from a configuration associated with the obtained set of acoustic parameters. The headset 110 presents audio content to the user of the headset 110 using the reconstructed room impulse response. Furthermore, the headset 110 may include position sensors or an inertial measurement unit (IMU) that tracks the position (e.g., location and pose) of the headset 110 within the room. Additional details regarding operations and components of the headset 110 are discussed below in connection with FIG. 3B, FIG. 4, FIGS. 5B-5C and FIG. 6.

您可能还喜欢...