Facebook Patent | Determination of an acoustic filter for incorporating local effects of room modes
Patent: Determination of an acoustic filter for incorporating local effects of room modes
Drawings: Click to check drawins
Publication Number: 20210044916
Publication Date: 20210211
Applicant: Facebook
Abstract
Determination of an acoustic filter for incorporating local effects of room modes within a target area is presented herein. A model of the target area is determined based in part on a three-dimensional virtual representation of the target area. In some embodiments, the model is selected from a group of candidate models. Room modes of the target area are determined based on a shape and/or dimensions of the model. The room mode parameters are determined based on at least one of the room modes and the position of a user within the target area. The room mode parameters describe an acoustic filter that as applied to audio content, simulates acoustic distortion at the position of the user and at frequencies associated with the at least one room mode. The acoustic filter is generated at a headset based on the room mode parameter and is used to present audio content.
Claims
-
A method comprising: determining one or more room mode parameters associated with a target area based on a position of a user within the target area and a model of the target area, wherein the one or more room mode parameters describe an acoustic filter that is used to present audio content to the user and the acoustic filter, as applied to audio content, simulates acoustic distortion at the position of the user.
-
The method of claim 1, wherein the model of the target area is determined based on a three-dimensional virtual representation of the target area.
-
The method of claim 2, wherein the three-dimensional virtual representation of the target area is generated by using depth information of at least a portion of the target area.
-
The method of claim 3, further comprising: receiving, from a headset, the depth information, the headset configured to present the audio content to the user.
-
The method of claim 1, further comprising: determining the model of the target area based in part on a three-dimensional virtual representation of the target area by: comparing the three-dimensional virtual representation with a plurality of candidate models; and identifying one of the plurality of candidate models that matches the three-dimensional virtual representation as the model of the target area.
-
The method of claim 1, further comprising: receiving image data of at least a portion of the target area; determining material composition of surfaces in the portion of the target area using the image data; determining an attenuation parameter for each surface based on the material composition of the surface; and updating the model with the attenuation parameter of each surface.
-
The method of claim 1, wherein determining the one or more room mode parameters associated with the target area comprises: determining the room modes based on a shape of the model of the target area.
-
The method of claim 1, wherein the acoustic distortion describes amplification as a function of frequency.
-
The method of claim 1, further comprising: transmitting parameters describing the acoustic filter to the headset for rendering the audio content at the headset.
-
The method of claim 8, wherein the target area is different from a physical environment of the user.
-
A system, comprising: a computer processor; and a non-transitory computer-readable storage medium storing executable computer program instructions, the computer program instructions comprising instructions that when executed cause the computer processor to perform steps, comprising: determining one or more room mode parameters associated with a target area based on a position of a user within the target area and a model of the target area, wherein the one or more room mode parameters describe an acoustic filter that is used to present audio content to the user and the acoustic filter, as applied to audio content, simulates acoustic distortion at the position of the user.
-
The system of claim 11, wherein the model of the target area is determined based on a three-dimensional virtual representation of the target area.
-
The system of claim 12, wherein the three-dimensional virtual representation of the target area is generated by using depth information of at least a portion of the target area.
-
The system of claim 13, wherein the steps further comprise: receiving, from a headset, the depth information, the headset configured to present the audio content to the user.
-
The system of claim 11, wherein the steps further comprise: determining the model of the target area based in part on a three-dimensional virtual representation of the target area by: comparing the three-dimensional virtual representation with a plurality of candidate models; and identifying one of the plurality of candidate models that matches the three-dimensional virtual representation as the model of the target area.
-
The system of claim 11, wherein the steps further comprise: receiving image data of at least a portion of the target area; determining material composition of surfaces in the portion of the target area using the image data; determining an attenuation parameter for each surface based on the material composition of the surface; and updating the model with the attenuation parameter of each surface.
-
The system of claim 11, wherein determining the one or more room mode parameters associated with the target area comprises: determining the room modes based on a shape of the model of the target area.
-
The system of claim 11, wherein the acoustic distortion describes amplification as a function of frequency.
-
The system of claim 11, wherein the steps further comprise: transmitting parameters describing the acoustic filter to the headset for rendering the audio content at the headset.
-
The system of claim 11, wherein the target area is different from a physical environment of the user.
Description
CROSS REFERENCE TO RELATED APPLICATIONS
[0001] This application is a continuation of U.S. application Ser. No. 16/418,426, filed May 21, 2019, which is incorporated by reference in its entirety.
BACKGROUND
[0002] The present disclosure relates generally to presentation of audio, and specifically relates to determination of an acoustic filter for incorporating local effects of room modes.
[0003] A physical area (e.g., a room) may have one or more room modes. Room modes are caused by sound reflecting off of various room surfaces. A room mode can cause both anti-nodes (peaks) and nodes (dips) in a frequency response of the room. The nodes and antinodes of these standing waves result in the loudness of the resonant frequency being different at different locations of the room. Moreover, effects of room modes can be especially prominent in small rooms, such as bathrooms, offices, and small conference rooms. Conventional virtual reality systems fail to account for room modes that would be associated with a particular virtual reality environment. They generally rely on geometrical acoustics simulations that are unreliable at low frequencies or artistic renders unrelated to physical modelling of environment. Accordingly, audio presented by conventional virtual reality systems can lack a sense of realism associated with virtual reality environments (e.g., small rooms).
SUMMARY
[0004] Embodiments of the present disclosure support a method, computer readable medium, and apparatus for determining an acoustic filter for incorporating local effects of room modes. In some embodiments, a model of a target area (e.g., a virtual area, a physical environment of the user, etc.) is determined based in part on a three-dimensional (3D) virtual representation of the target area. Room modes of the target area are determined using the model. One or more room mode parameters are determined based on at least one of the room modes and a position of a user within the target area. The one or more room mode parameters describe an acoustic filter. The acoustic filter can be generated based on the one or more room mode parameters. The acoustic filter simulates acoustic distortion at frequencies associated with the at least one room mode. Audio content is presented based in part on the acoustic filter. The audio content is presented such that it appears to originate from an object (e.g., a virtual object) in the target area.
BRIEF DESCRIPTION OF THE DRAWINGS
[0005] FIG. 1 illustrates local effects of room modes in a room, in accordance with one or more embodiments.
[0006] FIG. 2 illustrates axial modes, tangential modes, and oblique modes of a cube room, in accordance with one or more embodiments.
[0007] FIG. 3 is a block diagram of an audio system, in accordance with one or more embodiments.
[0008] FIG. 4 is a block diagram of an audio server, in accordance with one or more embodiments.
[0009] FIG. 5 is a flowchart illustrating a process for determining room mode parameters that describe an acoustic filter, in accordance with one or more embodiments.
[0010] FIG. 6 is a block diagram of an audio assembly, in accordance with one or more embodiments.
[0011] FIG. 7 is a flowchart illustrating a process of presenting audio content based in part on an acoustic filter, in accordance with one or more embodiments.
[0012] FIG. 8 is a block diagram of a system environment that includes a headset and an audio server, in accordance with one or more embodiments.
[0013] FIG. 9 is a perspective view of a headset including an audio assembly, in accordance with one or more embodiments.
[0014] The figures depict embodiments of the present disclosure for purposes of illustration only. One skilled in the art will readily recognize from the following description that alternative embodiments of the structures and methods illustrated herein may be employed without departing from the principles, or benefits touted, of the disclosure described herein.
DETAILED DESCRIPTION
[0015] Embodiments of the present disclosure may include or be implemented in conjunction with an artificial reality system. Artificial reality is a form of reality that has been adjusted in some manner before presentation to a user, which may include, e.g., a virtual reality (VR), an augmented reality (AR), a mixed reality (MR), a hybrid reality, or some combination and/or derivatives thereof. Artificial reality content may include completely generated content or generated content combined with captured (e.g., real-world) content. The artificial reality content may include video, audio, haptic feedback, or some combination thereof, and any of which may be presented in a single channel or in multiple channels (such as stereo video that produces a three-dimensional effect to the viewer). Additionally, in some embodiments, artificial reality may also be associated with applications, products, accessories, services, or some combination thereof, that are used to, e.g., create content in an artificial reality and/or are otherwise used in (e.g., perform activities in) an artificial reality. The artificial reality system that provides the artificial reality content may be implemented on various platforms, including a headset, a head-mounted display (HMD) connected to a host computer system, a standalone HMD, a near-eye display (NED), a mobile device or computing system, or any other hardware platform capable of providing artificial reality content to one or more viewers.
[0016] An audio system for determination of an acoustic filter to incorporate local effects of room modes is presented herein. Audio content presented by the audio assembly is filtered using the acoustic filter such that acoustic distortion (e.g., amplification as a function of frequency and position) that would be caused by room modes associated with a target area of the user may be part of the presented audio content. Note that amplification as used herein may be used to describe an increase or a decrease in signal strength. The target area can be a local area occupied by the user or a virtual area. A virtual area may be based on the local area, some other virtual area, or some combination thereof. For example, the local area may be a living room that is occupied by the user of the audio system, and a virtual area may be a virtual concert stadium or a virtual conference room.
[0017] The audio system includes an audio assembly communicatively coupled to an audio server. The audio assembly may be implemented on a headset worn by the user. The audio assembly may request (e.g., over a network) one or more room mode parameters from the audio server. The request may include, e.g., visual information (depth information, color information, etc.) of at least a part of the target area, location information of the user, location information of a virtual sound source, visual information of a local area occupied by the user, or some combination thereof.
[0018] The audio server determines one or more room mode parameters. The audio server identifies and/or generates a model of the target area using the information in the request. In some embodiments, the audio server develops a 3D virtual representation of at least a portion of the target area based on the visual information of the target area in the request. The audio server uses the 3D virtual representation to select the model from a plurality of candidate models. The audio server determines room modes of the target area by using the model. For example, the audio server determines the room modes based on a shape or dimensions of the model. The room modes may include one or more types of room modes. Types of room modes may include, e.g., axial modes, tangential modes, and oblique modes. For each type, the room modes may include a first order mode, higher order modes, or some combination thereof. The audio server determines the one or more room mode parameters (e.g., Q factor, gain, amplitude, modal frequencies, etc.) based on at least one of the room modes and the position of the user. The audio server may also use the location information of the virtual sound source to determine the room mode parameters. For example, the audio server uses the location information of the virtual sound source to determine whether a room mode is excited or not. The audio server may determine that the room mode is not excited based on that the virtual sound source is located at an antinode position.
[0019] The room mode parameters describe an acoustic filter that as applied to the audio content, simulates acoustic distortion at a position of the user within the target area. The acoustic distortion may represent amplification at frequencies associated with the at least one room mode. The audio server transmits one or more of the room mode parameters to the headset.
[0020] The audio assembly generates an acoustic filter using the one or more room mode parameters from the audio server. The audio assembly presents audio content using the generated acoustic filter. In some embodiments, the audio assembly dynamically detects changes in the position of the user and/or changes of relative position between the user and virtual objects, and updates the acoustic filter based on the changes.
[0021] In some embodiments, the audio content is spatialized audio content. Spatialized audio content is audio content that is presented in a manner such that it appears to originate from one or more points in an environment surrounding the user (e.g., from a virtual object in the target area).
[0022] In some embodiments, the target area can be a local area of the user. For example, the target area is an office room where the user sits. As the target area is the actual office, the audio assembly generates an acoustic filter that causes the presented audio content to be spatialized in a manner consistent with how a real sound source would sound from a particular location in the office room.
[0023] In some other embodiments, the target area is a virtual area that is being presented to the user (e.g., via a headset). For instance, the target area may be a virtual conference room. As the target area is the virtual conference room, the audio assembly generates an acoustic filter that causes the presented audio content to be spatialized in a manner consistent with how a real sound source would sound from a particular location in the virtual conference room. For example, the user may be presented virtual content that makes it appear as if he/she is seated with a virtual audience watching a virtual speaker give a speech. And the presented audio content as modified by the acoustic filter would make it sound to the user as if the speaker was talking in A conference room–and this is despite the user actually being in the office room (which would have significantly different acoustic properties than a large conference room).
[0024] FIG. 1 illustrates local effects of room modes in a room 100, in accordance with one or more embodiments. A sound source 105 is located in the room 100 and emits sound wave into the room 100. The sound wave causes fundamental resonances of the room 100 and room modes occur in the room 100. FIG. 1 shows a first order mode 110 at a first modal frequency of the room and a second order mode 120 at a second modal frequency that is twice of the first modal frequency. Even though not shown in FIG. 1, room modes of higher orders can exist in the room 100. The first order mode 110 and second order mode 120 can both be axial modes.
[0025] The room modes depend on the shape, dimensions, and/or acoustic properties of the room 100. Room modes cause different amounts of acoustic distortion at different positions within the room 100. The acoustic distortion can be positive amplification (i.e., increase in amplitude) or negative amplification (i.e., attenuation) of the audio signal at the modal frequencies (and multiples of the modal frequencies).
[0026] The first order mode 110 and second order mode 120 have peaks and dips at different positions of the room 100, which cause different levels of amplification of the sound wave as a function of frequency and position within the room 100. FIG. 1 shows three different positions 130, 140, and 150 within the room 100. At the position 130, the first order mode 110 and the second order mode 120 each have a peak. Moving to the position 140, both the first order mode 110 and the second order mode 120 decrease and the second order mode 120 has a dip. Moving further to the position 150, there is a null at the first order mode 110 and a peak at the second order mode 120. Combining the effects of the first order mode 110 and second order mode 120, the amplification of the audio signal is the highest at the position 130 and lowest at the position 150. Accordingly, sound perceived by a user can vary dramatically based on what room they are in and where they are in the room. As described below, a system is described which simulates room modes for a target area occupied by a user, presents audio content to the user taking into account the room modes to provide an added level of realism to the user.
[0027] FIG. 2 illustrates axial modes 210, tangential modes 220, and oblique modes 230 of a cube room, in accordance with one or more embodiments. Room modes are caused by sound reflecting off of various room surfaces. The room in FIG. 2 has a shape of a cube and includes six surfaces: four walls, a ceiling, and a floor. There are three types of modes in the room: the axial modes 210, tangential modes 220, and oblique modes 230, which are represented by dash lines in FIG. 2. An axial mode 210 involves resonance between two parallel surfaces of the room. Three axial modes 210 occur in the room: one involves the ceiling and the floor, and the other two each involve a pair of parallel walls. For rooms of other shapes, different numbers of axial modes 210 may occur. A tangential mode 220 involves two sets of parallel surfaces, all four walls or two walls with the ceiling and the floor. An oblique room mode 230 involves all the six surfaces of the room.
[0028] The axial room modes 210 are the strongest out of the three types of modes. The tangential room modes 220 can be half as strong as the axial room modes 210, and the oblique room modes 230 can be one quarter as strong as the axial room modes 210. In some embodiments, an acoustic filter that as applied to audio content, simulates acoustic distortion in the room is determined based on the axial room modes 210. In some other embodiments, the tangential room modes 220 and/or oblique room modes 230 are also used to determine the acoustic filter. Each of the axial room modes 210, tangential room modes 220, and oblique room modes 230 can occur at a series of modal frequencies. The modal frequencies of the three types of room modes can be different.
[0029] FIG. 3 is a block diagram of an audio system 300, in accordance with one or more embodiments. The audio system 300 includes a headset 310 is connected to an audio server 320 via a network 330. The headset 310 can be worn by a user 340 in a room 350.
[0030] The network 330 connects the headset 310 to the audio server 320. The network 330 may include any combination of target area and/or wide area networks using both wireless and/or wired communication systems. For example, the network 330 may include the Internet, as well as mobile telephone networks. In one embodiment, the network 330 uses standard communications technologies and/or protocols. Hence, the network 330 may include links using technologies such as Ethernet, 802.11, worldwide interoperability for microwave access (WiMAX), 2G/3G/4G mobile communications protocols, digital subscriber line (DSL), asynchronous transfer mode (ATM), InfiniBand, PCI Express Advanced Switching, etc. Similarly, the networking protocols used on the network 330 can include multiprotocol label switching (MPLS), the transmission control protocol/Internet protocol (TCP/IP), the User Datagram Protocol (UDP), the hypertext transport protocol (HTTP), the simple mail transfer protocol (SMTP), the file transfer protocol (FTP), etc. The data exchanged over the network 330 can be represented using technologies and/or formats including image data in binary form (e.g. Portable Network Graphics (PNG)), hypertext markup language (HTML), extensible markup language (XML), etc. In addition, all or some of links can be encrypted using conventional encryption technologies such as secure sockets layer (SSL), transport layer security (TLS), virtual private networks (VPNs), Internet Protocol security (IPsec), etc. The network 330 may also connect multiple headsets located in the same or different rooms to the same audio server 320.
[0031] The headset 310 presents media content to a user. In one embodiment, the headset 310 may be, e.g., a NED or a HMD. In general, the headset 310 may be worn on the face of a user such that media content is presented using one or both lenses of the headset 310. However, the headset 310 may also be used such that media content is presented to a user in a different manner. Examples of media content presented by the headset 310 include one or more images, video content, audio content, or some combination thereof. The headset 310 includes an audio assembly, and may also include at least one depth camera assembly (DCA) and/or at least one passive camera assembly (PCA). As described in detail below with regard to FIG. 8, a DCA generates depth image data that describes the 3D geometry for some or all of the target area (e.g., the room 350), and a PCA generates color image data for some or all of the target area. In some embodiments, the DCA and the PCA of the headset 310 are part of simultaneous localization and mapping (SLAM) sensors mounted on the headset 310 for determining visual information of the room 350. Thus, the depth image data captured by the at least one DCA and/or the color image data captured by the at least one PCA can be referred to as visual information determined by the SLAM sensors of the headset 310. Furthermore, the headset 310 may include position sensors or an inertial measurement unit (IMU) that tracks the position (e.g., location and pose) of the headset 310 within the target area. The headset 310 may also include a Global Positioning System (GPS) receiver to further track location of the headset 310 within the target area. The position (includes orientation) of the of the headset 310 within the target area is referred to as location information of the headset 310. The location information of the headset may indicate a position of the user 340 of the headset 310.
[0032] The audio assembly presents audio content to the user 340. The audio content can be presented in a manner such that it appears to originate from an object (real or object) in the target area, also known as spatialized audio content. The target area can be a physical environment of the user, such as the room 350, or a virtual area. For example, the audio content presented by the audio assembly may appear to originate from a virtual speaker in a virtual conference room (which are being presented to the user 340 via the headset 310). In some embodiments, local effects of room modes associated with a position of the user 340 within a target area are incorporated into the audio content. The local effects of the room modes are represented by acoustic distortion (of specific frequencies) that occurs at a position of the user 340 within the target area. The acoustic distortion may change as the position of the users in the target area changes. In some embodiments, the target area is the room 350. In some other embodiments, the target area is a virtual area. The virtual area may be based on a real room that is different from the room 350. For instance, the room 350 is an office. The target area is a virtual area based on a conference room. The audio content presented by the audio assembly can be a speech from a speaker located in the conference room. A position within the conference room corresponds to the user’s position within the target area. The audio content is rendered so that it appears originating from the speaker of the conference room and being received at the position within the conference room.
[0033] The audio assembly uses acoustic filters to incorporate the local effects of room modes. The audio assembly requests an acoustic filter by sending a room mode query to the audio server 320. A room mode query is a request for one or more room mode parameters, based on which the audio assembly can generate an acoustic filter that as applied to the audio content simulates acoustic distortion (e.g., amplification as a function of frequency and position) that would be caused by the room modes. The room mode query may include visual information describing some or all of the target area (e.g., the room 350 or a virtual area), location information of the user, information of the audio content, or some combination thereof. Visual information describes a 3D geometry of some or all of the target area and may also include color image data of some or all of the target area. In some embodiments, the visual information of the target area can be captured by the headset 310 (e.g., in embodiments where the target area is the room 350) and/or a different device. Location information of the user indicates a position of the user 340 within the target area and may include location information of the headset 310 or information describing a position of the user 340. Information of the audio content includes, e.g., information describing a location of a virtual sound source of the audio content. The virtual sound source of the audio content can be a real object in the target area and/or a virtual object. The headset 310 may communicate the room mode query via the network 330 to the audio server 320.
[0034] In some embodiments, the headset 310 obtains one or more room mode parameters describing an acoustic filter from the audio server 320. Room mode parameters are parameters that describe an acoustic filter that as applied to audio content simulates acoustic distortion caused by one or more room modes in a target area. The room mode parameters include Q factor, gain, amplitude, modal frequencies of the room modes, some other feature that describes an acoustic filter, or some combination thereof. The headset 310 uses the room modes parameters to generate filters to render the audio content. For example, the headset 310 generates infinite impulse response filters and/or all-pass filters. The infinite impulse response filters and/or all-pass filters include a Q value and gain corresponding to each modal frequency. Additional details regarding operations and components of the headset 310 are discussed below in connection with FIG. 4, FIG. 8, and FIG. 9.
[0035] The audio server 320 determines one or more room mode parameters based on the room mode query received from the headset 310. The audio server 320 determines a model of the target area. In some embodiments, the audio server 320 determines the model based on the visual information of the target area. For example, the audio server 320 obtains a 3D virtual representation of at least a portion of the target area based on the visual information. The audio server 320 compares the 3D virtual representation with a group of candidate models and identifies a candidate model that matches the 3D virtual representation as the model. In some embodiments, a candidate model is a model of a room that includes a shape of the room, one or more dimensions of the room, or material acoustic parameters (e.g., attenuation parameter) of surfaces within the room. The group of candidate models can include models of rooms having different shapes, different dimensions, and different surfaces. The 3D virtual representation of the target area includes a 3D mesh of the target area that defines a shape and/or dimensions of the target area. The 3D virtual representation may use one or more material acoustic parameters (e.g., attenuation parameter) to describe acoustic properties of surfaces within the target area. The audio server 320 determines that a candidate model matches the 3D virtual representation based on a determination that a difference between the candidate model and the 3D virtual representation is below a threshold. The difference may include difference in shapes, dimensions, acoustic properties of surfaces, etc. In some embodiments, the audio server 320 uses a fit metric to determine the difference between the candidate model and the 3D virtual representation. The fit metric can be based on one or more geometric features, such as square errors in Hausdorff distance, openness (e.g. indoors vs outdoors), volume, etc. The threshold may be based on perceptual just noticeable differences (JNDs) in room mode changes. For example, if the user can detect a 10% change in modal frequency, geometric deviations that would result in a modal frequency change of up to 10% would be tolerated. The threshold can be the geometric deviations that would result in a modal frequency change of 10%.
[0036] The audio server 320 determines room modes of the target area using the model. For example, the audio server 320 uses conventional techniques, such as numerical simulation techniques (e.g., finite element method, boundary element method, finite difference time domain method, etc.), to determine the room modes. In some embodiments, the audio server 300 determines the room modes based on the shape, dimensions, and/or material acoustic parameters of the model to determine the room modes. The room modes may include one or more of axial modes, tangential modes, and oblique modes. In some embodiments, the audio server 320 determines the room modes based on the position of the user. For example, the audio server 320 identifies the target area based on the position of the user and retrieves the room modes of the target area based on the identification.
[0037] The audio server 330 determines the one or more room mode parameters based on at least on one of the room modes and the position of a user within the target area. The room mode parameters describe an acoustic filter that as applied to the audio content, simulates acoustic distortion that occurs at the position of the user within the target area for frequencies associated with the at least one room mode. The audio server 320 transmits the room mode parameters to the headset 310 for rendering audio content. In some embodiments, the audio server 330 may generate the acoustic filter based on the room mode parameters and transmits the acoustic filter to the headset 310.
[0038] FIG. 4 is a block diagram of an audio server 400, in accordance with one or more embodiments. An embodiment of the audio server 400 is the audio server 300. The audio server 400 determines one or more room mode parameters of a target area in response to a room mode query from an audio assembly. The audio server 400 includes a database 410, a mapping module 420, a matching module 430, a room mode module 440, and an acoustic filter module 450. In other embodiments, the audio server 400 can have any combination of the modules listed with any additional modules. One or more processors of the audio server 400 (not shown) may run some or all of the modules within the audio server 400.
[0039] The database 410 stores data for the audio server 400. The stored data may include a virtual model, candidate models, room modes, room mode parameters, acoustic filters, audio data, visual information (depth information, color information, etc.), room mode queries, other information that may be used by the audio server 400, or some combination thereof.
[0040] The virtual model describes one or more areas and acoustic properties (e.g., room modes) of those areas. Each location in the virtual model is associated with acoustic properties (e.g., room modes) for a corresponding area. The areas whose acoustic properties are described in the virtual model include virtual areas, physical areas, or some combination thereof. A physical area is a real area (e.g., an actual physical room), as opposed to a virtual area. Examples of the physical areas include a conference room, a bathroom, a hallway, an office, a bedroom, a dining room, an outdoor space (e.g., patio, garden, park, etc.), a living room, an auditorium, some other real area, or some combination thereof. A virtual area describes a space that may be entirely fictional and/or based on a real physical area (e.g., rendering a physical room as a virtual area). For example, a virtual area could be a fictionalized dungeon, a rendering of a virtual conference room, etc. Note that the virtual area can be based on real places. For example, the virtual conference room could be based on a real conference center. A particular location in the virtual model may correspond to a current physical location of the headset 310 within the room 350. Acoustic properties of the room 350 can be retrieved from the virtual model based on a location within the virtual model obtained from the mapping module 420.
[0041] A room mode query is a request for room mode parameters that describes an acoustic filter used for incorporating effects of room modes of a target area for a position of a user within the target area. The room mode query includes target area information, user information, audio content information, some other information that the audio server 320 can use to determine the acoustic filter, or some combination thereof. Target area information is information that describes the target area (e.g., its geometry, objects within it, materials, colors, etc.). It may include depth image data of the target area, color image data of the target area, or some combination thereof. User information is information that describes the user. It may include information describing a position of the user within the target area, information of a physical area where the user is physically located, or some combination thereof. Audio content information is information that describes the audio content. It may include location information of a virtual sound source of the audio content, location information of a physical sound source of the audio content, or some combination thereof.
[0042] The candidate models can be models of rooms having different shapes and/or dimensions. The audio server 400 uses the candidate models to determine a model of the target area.
[0043] The mapping module 420 maps information in the room mode query to a location within the virtual model. The mapping module 420 determines the location within the virtual model corresponding to the target area. In some embodiments, the mapping module 420 searches the virtual model to identify a mapping between (i) the information of the target area and/or information of the position of the user and (ii) a corresponding configuration of an area within the virtual model. The area within the virtual model may describe a physical area and/or virtual area. In one embodiment, the mapping is performed by matching a geometry of visual information of the target area with a geometry associated with a location within the virtual model. In another embodiment, the mapping is performed by matching information of the position of the user with a location within the virtual model. For example, in embodiments where the target area is a virtual area, the mapping module 420 identifies a location associated with the virtual area in the virtual model based on information indicating the position of the user. A match suggests that the location within the virtual model is a representation of the target area.
[0044] If a match is found, the mapping module 420 retrieves the room modes that are associated with the location within the virtual model and sends the room modes to the acoustic filter module 450 for determining room mode parameters. In some embodiments, the virtual model does not include room modes associated with the location within the virtual model that matches the target area but includes a candidate model associated with the location. The mapping module 420 may retrieve the candidate model and sends it to the room mode module 440 to determine room modes of the target area. In some embodiments, the virtual model does not include room modes or candidate models associated with the location within the virtual model that matches the target area. The mapping module 420 may retrieve a 3D representation of the location and sends it to the matching module 440 to determine a model of the target area.
[0045] If no match is found, this is an indication that a configuration of the target area is not yet described by the virtual model. In such case, the mapping module 420 may develop a 3D virtual representation of the target area based on the visual information in the room mode query and update the virtual model with the 3D virtual representation. The 3D virtual representation of the target area may include a 3D mesh of the target area. The 3D mesh includes points and/or lines that represent boundaries of the target area. The 3D virtual representation may also include virtual representation of surfaces within the target area, such as walls, ceiling, floor, surfaces of furniture, surfaces of appliances, surfaces of other types of objects, and so on. In some embodiments, the virtual model uses one or more material acoustic parameters (e.g., attenuation parameter) to describe acoustic properties of the surfaces within the virtual area. In some embodiments, the mapping module 420 may develop a new model that includes the 3D virtual representation and uses one or more material acoustic parameters to describe acoustic properties of the surfaces within the virtual area. The new model can be saved in the database 410.
[0046] The mapping module 420 may also inform at least one of the matching module 430 and the room mode module 440 that no match is found, so that the matching module 430 can determine a model of the target area and the room mode module 440 can determine room modes of the target area by using the model.
[0047] In some embodiments, the mapping module 420 may also determine a location within the virtual model corresponding to a local area where the user is physically located (e.g., the room 350).
[0048] The target area may be different from the local area. For example, the local area is an office room where the user sits, but the target area is a virtual area (e.g., a virtual conference room).
[0049] If a match is found, the mapping module 420 retrieves the room modes that are associated with the location within the virtual model corresponding to the target area and sends the room modes to the acoustic filter module 450 for determining room mode parameters. If no match is found, the mapping module 420 may develop a 3D virtual representation of the target area based on the visual information in the room mode query and update the virtual model with the 3D virtual representation of the target area. The mapping module 420 may also inform at least one of the matching module 430 and the room mode module 440 that no match is found, so that the matching module 430 can determine a model of the target area and the room mode module 440 can determine room modes of the target area by using the model.
[0050] The matching module 430 determines a model of the target area based on the 3D virtual representation of the target area. Taking the target area as an example, in some embodiments, the matching module 430 selects the model from a plurality of candidate models. A candidate model can be a model of a room that includes information about shape, dimensions, or surfaces within the room. The group of candidate models can include models of rooms having different shapes (e.g., square, round, triangle, etc.), different dimensions (e.g., shoebox, big conference room, etc.), and different surfaces. The matching module 430 compares the 3D virtual representation of the target area with each candidate model and determines whether the candidate model matches the 3D virtual representation. The matching module 430 determines that a candidate model matches the 3D virtual representation based on a determination that a difference between the candidate model and the 3D virtual representation is below a threshold. The difference may include difference in shapes, dimensions, acoustic properties of surfaces, etc. In some embodiments, the matching module 430 may determine that the 3D virtual representation matches multiple candidate models. The matching module 430 selects the candidate model with the best match, i.e., the candidate model having the least difference from the 3D virtual representation.
[0051] In some embodiments, the matching module 430 compares the shape of a candidate model and the shape of the 3D mesh included in the 3D virtual representation. For example, the matching module 430 traces rays in a number of directions from a center of the 3D mesh target area and determines points where the rays intersect with the 3D mesh computes. The matching module 430 identifies a candidate model that matches these points. The matching module 430 may shrink or expand the candidate model to exclude any differences in sizes of the candidate model and the target area from the comparison.
[0052] The room mode module 440 determines room modes of the target area using the model of the target area. The room modes may include at least one of three types of room mode: axial modes, tangential modes, and oblique modes. In some embodiments, for each type of room mode, the room mode module 440 determines a first order mode and may also determine modes of higher orders. The room mode module 440 determines the room modes based on the shape and/or dimensions of the model. For example, in embodiments where the model has a rectangular homogeneous shape, the room mode module 440 determines axial, tangential, and oblique modes of the model. In some embodiments, the room mode module 440 uses the dimensions of the model to calculate room modes that fall within a range from a lower frequency in an audile or reproducible frequency range (e.g., 63 Hz) to a Schroeder frequency of the target area. The Schroeder frequency of the target area can be a frequency at which room modes are too densely overlapped in frequency to be individually distinguishable. The room mode module 440 may determine the Schroeder frequency based on a volume of the target area and a reverberation time (e.g., RT60) of the target area. The room mode module 440 may use e.g., numerical simulation techniques (such as finite element method, boundary element method, finite difference time domain method, etc.), to determine the room modes.
[0053] In some embodiments, the room mode module 440 uses material acoustic parameters (such as attenuation parameter) of surfaces within the 3D virtual representation of the target area to determine the room modes. For example, the room mode module 440 determines material composition of the surfaces using the color image data the target area. The room mode module 440 determines an attenuation parameter for each surface based on the material composition of the surface and updates the model with the material compositions and attenuation parameters.
[0054] In one embodiment, the room mode module 440 uses machine learning techniques to determine the material composition of the surfaces. The initialization module 230 can input image data of the target area (or a part of the image data that is related to the surface) and/or audio data into a machine learning model, the machine learning model outputs the material composition of each surface. The machine learning model can be trained with different machine learning techniques, such as linear support vector machine (linear SVM), boosting for other algorithms (e.g., AdaBoost), neural networks, logistic regression, naive Bayes, memory-based learning, random forests, bagged trees, decision trees, boosted trees, or boosted stumps. As part of the training of the machine learning model, a training set is formed. The training set includes image data and/or audio data of a group of surfaces and material composition of the surfaces in the group.
……
……
……