雨果巴拉:行业北极星Vision Pro过度设计不适合市场

Facebook Patent | Determination Of An Acoustic Filter For Incorporating Local Effects Of Room Modes

Patent: Determination Of An Acoustic Filter For Incorporating Local Effects Of Room Modes

Publication Number: 20200374648

Publication Date: 20201126

Applicants: Facebook

Abstract

Determination of an acoustic filter for incorporating local effects of room modes within a target area is presented herein. A model of the target area is determined based in part on a three-dimensional virtual representation of the target area. In some embodiments, the model is selected from a group of candidate models. Room modes of the target area are determined based on a shape and/or dimensions of the model. The room mode parameters are determined based on at least one of the room modes and the position of a user within the target area. The room mode parameters describe an acoustic filter that as applied to audio content, simulates acoustic distortion at the position of the user and at frequencies associated with the at least one room mode. The acoustic filter is generated at a headset based on the room mode parameter and is used to present audio content.

BACKGROUND

[0001] The present disclosure relates generally to presentation of audio, and specifically relates to determination of an acoustic filter for incorporating local effects of room modes.

[0002] A physical area (e.g., a room) may have one or more room modes. Room modes are caused by sound reflecting off of various room surfaces. A room mode can cause both anti-nodes (peaks) and nodes (dips) in a frequency response of the room. The nodes and antinodes of these standing waves result in the loudness of the resonant frequency being different at different locations of the room. Moreover, effects of room modes can be especially prominent in small rooms, such as bathrooms, offices, and small conference rooms. Conventional virtual reality systems fail to account for room modes that would be associated with a particular virtual reality environment. They generally rely on geometrical acoustics simulations that are unreliable at low frequencies or artistic renders unrelated to physical modelling of environment. Accordingly, audio presented by conventional virtual reality systems can lack a sense of realism associated with virtual reality environments (e.g., small rooms).

SUMMARY

[0003] Embodiments of the present disclosure support a method, computer readable medium, and apparatus for determining an acoustic filter for incorporating local effects of room modes. In some embodiments, a model of a target area (e.g., a virtual area, a physical environment of the user, etc.) is determined based in part on a three-dimensional (3D) virtual representation of the target area. Room modes of the target area are determined using the model. One or more room mode parameters are determined based on at least one of the room modes and a position of a user within the target area. The one or more room mode parameters describe an acoustic filter. The acoustic filter can be generated based on the one or more room mode parameters. The acoustic filter simulates acoustic distortion at frequencies associated with the at least one room mode. Audio content is presented based in part on the acoustic filter. The audio content is presented such that it appears to originate from an object (e.g., a virtual object) in the target area.

BRIEF DESCRIPTION OF THE DRAWINGS

[0004] FIG. 1 illustrates local effects of room modes in a room, in accordance with one or more embodiments.

[0005] FIG. 2 illustrates axial modes, tangential modes, and oblique modes of a cube room, in accordance with one or more embodiments.

[0006] FIG. 3 is a block diagram of an audio system, in accordance with one or more embodiments.

[0007] FIG. 4 is a block diagram of an audio server, in accordance with one or more embodiments.

[0008] FIG. 5 is a flowchart illustrating a process for determining room mode parameters that describe an acoustic filter, in accordance with one or more embodiments.

[0009] FIG. 6 is a block diagram of an audio assembly, in accordance with one or more embodiments.

[0010] FIG. 7 is a flowchart illustrating a process of presenting audio content based in part on an acoustic filter, in accordance with one or more embodiments.

[0011] FIG. 8 is a block diagram of a system environment that includes a headset and an audio server, in accordance with one or more embodiments.

[0012] FIG. 9 is a perspective view of a headset including an audio assembly, in accordance with one or more embodiments.

[0013] The figures depict embodiments of the present disclosure for purposes of illustration only. One skilled in the art will readily recognize from the following description that alternative embodiments of the structures and methods illustrated herein may be employed without departing from the principles, or benefits touted, of the disclosure described herein.

DETAILED DESCRIPTION

[0014] Embodiments of the present disclosure may include or be implemented in conjunction with an artificial reality system. Artificial reality is a form of reality that has been adjusted in some manner before presentation to a user, which may include, e.g., a virtual reality (VR), an augmented reality (AR), a mixed reality (MR), a hybrid reality, or some combination and/or derivatives thereof. Artificial reality content may include completely generated content or generated content combined with captured (e.g., real-world) content. The artificial reality content may include video, audio, haptic feedback, or some combination thereof, and any of which may be presented in a single channel or in multiple channels (such as stereo video that produces a three-dimensional effect to the viewer). Additionally, in some embodiments, artificial reality may also be associated with applications, products, accessories, services, or some combination thereof, that are used to, e.g., create content in an artificial reality and/or are otherwise used in (e.g., perform activities in) an artificial reality. The artificial reality system that provides the artificial reality content may be implemented on various platforms, including a headset, a head-mounted display (HMD) connected to a host computer system, a standalone HMD, a near-eye display (NED), a mobile device or computing system, or any other hardware platform capable of providing artificial reality content to one or more viewers.

[0015] An audio system for determination of an acoustic filter to incorporate local effects of room modes is presented herein. Audio content presented by the audio assembly is filtered using the acoustic filter such that acoustic distortion (e.g., amplification as a function of frequency and position) that would be caused by room modes associated with a target area of the user may be part of the presented audio content. Note that amplification as used herein may be used to describe an increase or a decrease in signal strength. The target area can be a local area occupied by the user or a virtual area. A virtual area may be based on the local area, some other virtual area, or some combination thereof. For example, the local area may be a living room that is occupied by the user of the audio system, and a virtual area may be a virtual concert stadium or a virtual conference room.

[0016] The audio system includes an audio assembly communicatively coupled to an audio server. The audio assembly may be implemented on a headset worn by the user. The audio assembly may request (e.g., over a network) one or more room mode parameters from the audio server. The request may include, e.g., visual information (depth information, color information, etc.) of at least a part of the target area, location information of the user, location information of a virtual sound source, visual information of a local area occupied by the user, or some combination thereof.

[0017] The audio server determines one or more room mode parameters. The audio server identifies and/or generates a model of the target area using the information in the request. In some embodiments, the audio server develops a 3D virtual representation of at least a portion of the target area based on the visual information of the target area in the request. The audio server uses the 3D virtual representation to select the model from a plurality of candidate models. The audio server determines room modes of the target area by using the model. For example, the audio server determines the room modes based on a shape or dimensions of the model. The room modes may include one or more types of room modes. Types of room modes may include, e.g., axial modes, tangential modes, and oblique modes. For each type, the room modes may include a first order mode, higher order modes, or some combination thereof. The audio server determines the one or more room mode parameters (e.g., Q factor, gain, amplitude, modal frequencies, etc.) based on at least one of the room modes and the position of the user. The audio server may also use the location information of the virtual sound source to determine the room mode parameters. For example, the audio server uses the location information of the virtual sound source to determine whether a room mode is excited or not. The audio server may determine that the room mode is not excited based on that the virtual sound source is located at an antinode position.

[0018] The room mode parameters describe an acoustic filter that as applied to the audio content, simulates acoustic distortion at a position of the user within the target area. The acoustic distortion may represent amplification at frequencies associated with the at least one room mode. The audio server transmits one or more of the room mode parameters to the headset.

[0019] The audio assembly generates an acoustic filter using the one or more room mode parameters from the audio server. The audio assembly presents audio content using the generated acoustic filter. In some embodiments, the audio assembly dynamically detects changes in the position of the user and/or changes of relative position between the user and virtual objects, and updates the acoustic filter based on the changes.

[0020] In some embodiments, the audio content is spatialized audio content. Spatialized audio content is audio content that is presented in a manner such that it appears to originate from one or more points in an environment surrounding the user (e.g., from a virtual object in the target area).

[0021] In some embodiments, the target area can be a local area of the user. For example, the target area is an office room where the user sits. As the target area is the actual office, the audio assembly generates an acoustic filter that causes the presented audio content to be spatialized in a manner consistent with how a real sound source would sound from a particular location in the office room.

[0022] In some other embodiments, the target area is a virtual area that is being presented to the user (e.g., via a headset). For instance, the target area may be a virtual conference room. As the target area is the virtual conference room, the audio assembly generates an acoustic filter that causes the presented audio content to be spatialized in a manner consistent with how a real sound source would sound from a particular location in the virtual conference room. For example, the user may be presented virtual content that makes it appear as if he/she is seated with a virtual audience watching a virtual speaker give a speech. And the presented audio content as modified by the acoustic filter would make it sound to the user as if the speaker was talking in A conference room–and this is despite the user actually being in the office room (which would have significantly different acoustic properties than a large conference room).

[0023] FIG. 1 illustrates local effects of room modes in a room 100, in accordance with one or more embodiments. A sound source 105 is located in the room 100 and emits sound wave into the room 100. The sound wave causes fundamental resonances of the room 100 and room modes occur in the room 100. FIG. 1 shows a first order mode 110 at a first modal frequency of the room and a second order mode 120 at a second modal frequency that is twice of the first modal frequency. Even though not shown in FIG. 1, room modes of higher orders can exist in the room 100. The first order mode 110 and second order mode 120 can both be axial modes.

[0024] The room modes depend on the shape, dimensions, and/or acoustic properties of the room 100. Room modes cause different amounts of acoustic distortion at different positions within the room 100. The acoustic distortion can be positive amplification (i.e., increase in amplitude) or negative amplification (i.e., attenuation) of the audio signal at the modal frequencies (and multiples of the modal frequencies).

[0025] The first order mode 110 and second order mode 120 have peaks and dips at different positions of the room 100, which cause different levels of amplification of the sound wave as a function of frequency and position within the room 100. FIG. 1 shows three different positions 130, 140, and 150 within the room 100. At the position 130, the first order mode 110 and the second order mode 120 each have a peak. Moving to the position 140, both the first order mode 110 and the second order mode 120 decrease and the second order mode 120 has a dip. Moving further to the position 150, there is a null at the first order mode 110 and a peak at the second order mode 120. Combining the effects of the first order mode 110 and second order mode 120, the amplification of the audio signal is the highest at the position 130 and lowest at the position 150. Accordingly, sound perceived by a user can vary dramatically based on what room they are in and where they are in the room. As described below, a system is described which simulates room modes for a target area occupied by a user, presents audio content to the user taking into account the room modes to provide an added level of realism to the user.

[0026] FIG. 2 illustrates axial modes 210, tangential modes 220, and oblique modes 230 of a cube room, in accordance with one or more embodiments. Room modes are caused by sound reflecting off of various room surfaces. The room in FIG. 2 has a shape of a cube and includes six surfaces: four walls, a ceiling, and a floor. There are three types of modes in the room: the axial modes 210, tangential modes 220, and oblique modes 230, which are represented by dash lines in FIG. 2. An axial mode 210 involves resonance between two parallel surfaces of the room. Three axial modes 210 occur in the room: one involves the ceiling and the floor, and the other two each involve a pair of parallel walls. For rooms of other shapes, different numbers of axial modes 210 may occur. A tangential mode 220 involves two sets of parallel surfaces, all four walls or two walls with the ceiling and the floor. An oblique room mode 230 involves all the six surfaces of the room.

[0027] The axial room modes 210 are the strongest out of the three types of modes. The tangential room modes 220 can be half as strong as the axial room modes 210, and the oblique room modes 230 can be one quarter as strong as the axial room modes 210. In some embodiments, an acoustic filter that as applied to audio content, simulates acoustic distortion in the room is determined based on the axial room modes 210. In some other embodiments, the tangential room modes 220 and/or oblique room modes 230 are also used to determine the acoustic filter. Each of the axial room modes 210, tangential room modes 220, and oblique room modes 230 can occur at a series of modal frequencies. The modal frequencies of the three types of room modes can be different.

[0028] FIG. 3 is a block diagram of an audio system 300, in accordance with one or more embodiments. The audio system 300 includes a headset 310 is connected to an audio server 320 via a network 330. The headset 310 can be worn by a user 340 in a room 350.

[0029] The network 330 connects the headset 310 to the audio server 320. The network 330 may include any combination of target area and/or wide area networks using both wireless and/or wired communication systems. For example, the network 330 may include the Internet, as well as mobile telephone networks. In one embodiment, the network 330 uses standard communications technologies and/or protocols. Hence, the network 330 may include links using technologies such as Ethernet, 802.11, worldwide interoperability for microwave access (WiMAX), 2G/3G/4G mobile communications protocols, digital subscriber line (DSL), asynchronous transfer mode (ATM), InfiniBand, PCI Express Advanced Switching, etc. Similarly, the networking protocols used on the network 330 can include multiprotocol label switching (MPLS), the transmission control protocol/Internet protocol (TCP/IP), the User Datagram Protocol (UDP), the hypertext transport protocol (HTTP), the simple mail transfer protocol (SMTP), the file transfer protocol (FTP), etc. The data exchanged over the network 330 can be represented using technologies and/or formats including image data in binary form (e.g. Portable Network Graphics (PNG)), hypertext markup language (HTML), extensible markup language (XML), etc. In addition, all or some of links can be encrypted using conventional encryption technologies such as secure sockets layer (SSL), transport layer security (TLS), virtual private networks (VPNs), Internet Protocol security (IPsec), etc. The network 330 may also connect multiple headsets located in the same or different rooms to the same audio server 320.

[0030] The headset 310 presents media content to a user. In one embodiment, the headset 310 may be, e.g., a NED or a HMD. In general, the headset 310 may be worn on the face of a user such that media content is presented using one or both lenses of the headset 310. However, the headset 310 may also be used such that media content is presented to a user in a different manner. Examples of media content presented by the headset 310 include one or more images, video content, audio content, or some combination thereof. The headset 310 includes an audio assembly, and may also include at least one depth camera assembly (DCA) and/or at least one passive camera assembly (PCA). As described in detail below with regard to FIG. 8, a DCA generates depth image data that describes the 3D geometry for some or all of the target area (e.g., the room 350), and a PCA generates color image data for some or all of the target area. In some embodiments, the DCA and the PCA of the headset 310 are part of simultaneous localization and mapping (SLAM) sensors mounted on the headset 310 for determining visual information of the room 350. Thus, the depth image data captured by the at least one DCA and/or the color image data captured by the at least one PCA can be referred to as visual information determined by the SLAM sensors of the headset 310. Furthermore, the headset 310 may include position sensors or an inertial measurement unit (IMU) that tracks the position (e.g., location and pose) of the headset 310 within the target area. The headset 310 may also include a Global Positioning System (GPS) receiver to further track location of the headset 310 within the target area. The position (includes orientation) of the of the headset 310 within the target area is referred to as location information of the headset 310. The location information of the headset may indicate a position of the user 340 of the headset 310.

[0031] The audio assembly presents audio content to the user 340. The audio content can be presented in a manner such that it appears to originate from an object (real or object) in the target area, also known as spatialized audio content. The target area can be a physical environment of the user, such as the room 350, or a virtual area. For example, the audio content presented by the audio assembly may appear to originate from a virtual speaker in a virtual conference room (which are being presented to the user 340 via the headset 310). In some embodiments, local effects of room modes associated with a position of the user 340 within a target area are incorporated into the audio content. The local effects of the room modes are represented by acoustic distortion (of specific frequencies) that occurs at a position of the user 340 within the target area. The acoustic distortion may change as the position of the users in the target area changes. In some embodiments, the target area is the room 350. In some other embodiments, the target area is a virtual area. The virtual area may be based on a real room that is different from the room 350. For instance, the room 350 is an office. The target area is a virtual area based on a conference room. The audio content presented by the audio assembly can be a speech from a speaker located in the conference room. A position within the conference room corresponds to the user’s position within the target area. The audio content is rendered so that it appears originating from the speaker of the conference room and being received at the position within the conference room.

[0032] The audio assembly uses acoustic filters to incorporate the local effects of room modes. The audio assembly requests an acoustic filter by sending a room mode query to the audio server 320. A room mode query is a request for one or more room mode parameters, based on which the audio assembly can generate an acoustic filter that as applied to the audio content simulates acoustic distortion (e.g., amplification as a function of frequency and position) that would be caused by the room modes. The room mode query may include visual information describing some or all of the target area (e.g., the room 350 or a virtual area), location information of the user, information of the audio content, or some combination thereof. Visual information describes a 3D geometry of some or all of the target area and may also include color image data of some or all of the target area. In some embodiments, the visual information of the target area can be captured by the headset 310 (e.g., in embodiments where the target area is the room 350) and/or a different device. Location information of the user indicates a position of the user 340 within the target area and may include location information of the headset 310 or information describing a position of the user 340. Information of the audio content includes, e.g., information describing a location of a virtual sound source of the audio content. The virtual sound source of the audio content can be a real object in the target area and/or a virtual object. The headset 310 may communicate the room mode query via the network 330 to the audio server 320.

[0033] In some embodiments, the headset 310 obtains one or more room mode parameters describing an acoustic filter from the audio server 320. Room mode parameters are parameters that describe an acoustic filter that as applied to audio content simulates acoustic distortion caused by one or more room modes in a target area. The room mode parameters include Q factor, gain, amplitude, modal frequencies of the room modes, some other feature that describes an acoustic filter, or some combination thereof. The headset 310 uses the room modes parameters to generate filters to render the audio content. For example, the headset 310 generates infinite impulse response filters and/or all-pass filters. The infinite impulse response filters and/or all-pass filters include a Q value and gain corresponding to each modal frequency. Additional details regarding operations and components of the headset 310 are discussed below in connection with FIG. 4, FIG. 8, and FIG. 9.

[0034] The audio server 320 determines one or more room mode parameters based on the room mode query received from the headset 310. The audio server 320 determines a model of the target area. In some embodiments, the audio server 320 determines the model based on the visual information of the target area. For example, the audio server 320 obtains a 3D virtual representation of at least a portion of the target area based on the visual information. The audio server 320 compares the 3D virtual representation with a group of candidate models and identifies a candidate model that matches the 3D virtual representation as the model. In some embodiments, a candidate model is a model of a room that includes a shape of the room, one or more dimensions of the room, or material acoustic parameters (e.g., attenuation parameter) of surfaces within the room. The group of candidate models can include models of rooms having different shapes, different dimensions, and different surfaces. The 3D virtual representation of the target area includes a 3D mesh of the target area that defines a shape and/or dimensions of the target area. The 3D virtual representation may use one or more material acoustic parameters (e.g., attenuation parameter) to describe acoustic properties of surfaces within the target area. The audio server 320 determines that a candidate model matches the 3D virtual representation based on a determination that a difference between the candidate model and the 3D virtual representation is below a threshold. The difference may include difference in shapes, dimensions, acoustic properties of surfaces, etc. In some embodiments, the audio server 320 uses a fit metric to determine the difference between the candidate model and the 3D virtual representation. The fit metric can be based on one or more geometric features, such as square errors in Hausdorff distance, openness (e.g. indoors vs outdoors), volume, etc. The threshold may be based on perceptual just noticeable differences (JNDs) in room mode changes. For example, if the user can detect a 10% change in modal frequency, geometric deviations that would result in a modal frequency change of up to 10% would be tolerated. The threshold can be the geometric deviations that would result in a modal frequency change of 10%.

[0035] The audio server 320 determines room modes of the target area using the model. For example, the audio server 320 uses conventional techniques, such as numerical simulation techniques (e.g., finite element method, boundary element method, finite difference time domain method, etc.), to determine the room modes. In some embodiments, the audio server 300 determines the room modes based on the shape, dimensions, and/or material acoustic parameters of the model to determine the room modes. The room modes may include one or more of axial modes, tangential modes, and oblique modes. In some embodiments, the audio server 320 determines the room modes based on the position of the user. For example, the audio server 320 identifies the target area based on the position of the user and retrieves the room modes of the target area based on the identification.

[0036] The audio server 330 determines the one or more room mode parameters based on at least on one of the room modes and the position of a user within the target area. The room mode parameters describe an acoustic filter that as applied to the audio content, simulates acoustic distortion that occurs at the position of the user within the target area for frequencies associated with the at least one room mode. The audio server 320 transmits the room mode parameters to the headset 310 for rendering audio content. In some embodiments, the audio server 330 may generate the acoustic filter based on the room mode parameters and transmits the acoustic filter to the headset 310.

[0037] FIG. 4 is a block diagram of an audio server 400, in accordance with one or more embodiments. An embodiment of the audio server 400 is the audio server 300. The audio server 400 determines one or more room mode parameters of a target area in response to a room mode query from an audio assembly. The audio server 400 includes a database 410, a mapping module 420, a matching module 430, a room mode module 440, and an acoustic filter module 450. In other embodiments, the audio server 400 can have any combination of the modules listed with any additional modules. One or more processors of the audio server 400 (not shown) may run some or all of the modules within the audio server 400.

[0038] The database 410 stores data for the audio server 400. The stored data may include a virtual model, candidate models, room modes, room mode parameters, acoustic filters, audio data, visual information (depth information, color information, etc.), room mode queries, other information that may be used by the audio server 400, or some combination thereof.

[0039] The virtual model describes one or more areas and acoustic properties (e.g., room modes) of those areas. Each location in the virtual model is associated with acoustic properties (e.g., room modes) for a corresponding area. The areas whose acoustic properties are described in the virtual model include virtual areas, physical areas, or some combination thereof. A physical area is a real area (e.g., an actual physical room), as opposed to a virtual area. Examples of the physical areas include a conference room, a bathroom, a hallway, an office, a bedroom, a dining room, an outdoor space (e.g., patio, garden, park, etc.), a living room, an auditorium, some other real area, or some combination thereof. A virtual area describes a space that may be entirely fictional and/or based on a real physical area (e.g., rendering a physical room as a virtual area). For example, a virtual area could be a fictionalized dungeon, a rendering of a virtual conference room, etc. Note that the virtual area can be based on real places. For example, the virtual conference room could be based on a real conference center. A particular location in the virtual model may correspond to a current physical location of the headset 310 within the room 350. Acoustic properties of the room 350 can be retrieved from the virtual model based on a location within the virtual model obtained from the mapping module 420.

[0040] A room mode query is a request for room mode parameters that describes an acoustic filter used for incorporating effects of room modes of a target area for a position of a user within the target area. The room mode query includes target area information, user information, audio content information, some other information that the audio server 320 can use to determine the acoustic filter, or some combination thereof. Target area information is information that describes the target area (e.g., its geometry, objects within it, materials, colors, etc.). It may include depth image data of the target area, color image data of the target area, or some combination thereof. User information is information that describes the user. It may include information describing a position of the user within the target area, information of a physical area where the user is physically located, or some combination thereof. Audio content information is information that describes the audio content. It may include location information of a virtual sound source of the audio content, location information of a physical sound source of the audio content, or some combination thereof.

[0041] The candidate models can be models of rooms having different shapes and/or dimensions. The audio server 400 uses the candidate models to determine a model of the target area.

[0042] The mapping module 420 maps information in the room mode query to a location within the virtual model. The mapping module 420 determines the location within the virtual model corresponding to the target area. In some embodiments, the mapping module 420 searches the virtual model to identify a mapping between (i) the information of the target area and/or information of the position of the user and (ii) a corresponding configuration of an area within the virtual model. The area within the virtual model may describe a physical area and/or virtual area. In one embodiment, the mapping is performed by matching a geometry of visual information of the target area with a geometry associated with a location within the virtual model. In another embodiment, the mapping is performed by matching information of the position of the user with a location within the virtual model. For example, in embodiments where the target area is a virtual area, the mapping module 420 identifies a location associated with the virtual area in the virtual model based on information indicating the position of the user. A match suggests that the location within the virtual model is a representation of the target area.

[0043] If a match is found, the mapping module 420 retrieves the room modes that are associated with the location within the virtual model and sends the room modes to the acoustic filter module 450 for determining room mode parameters. In some embodiments, the virtual model does not include room modes associated with the location within the virtual model that matches the target area but includes a candidate model associated with the location. The mapping module 420 may retrieve the candidate model and sends it to the room mode module 440 to determine room modes of the target area. In some embodiments, the virtual model does not include room modes or candidate models associated with the location within the virtual model that matches the target area. The mapping module 420 may retrieve a 3D representation of the location and sends it to the matching module 440 to determine a model of the target area.

[0044] If no match is found, this is an indication that a configuration of the target area is not yet described by the virtual model. In such case, the mapping module 420 may develop a 3D virtual representation of the target area based on the visual information in the room mode query and update the virtual model with the 3D virtual representation. The 3D virtual representation of the target area may include a 3D mesh of the target area. The 3D mesh includes points and/or lines that represent boundaries of the target area. The 3D virtual representation may also include virtual representation of surfaces within the target area, such as walls, ceiling, floor, surfaces of furniture, surfaces of appliances, surfaces of other types of objects, and so on. In some embodiments, the virtual model uses one or more material acoustic parameters (e.g., attenuation parameter) to describe acoustic properties of the surfaces within the virtual area. In some embodiments, the mapping module 420 may develop a new model that includes the 3D virtual representation and uses one or more material acoustic parameters to describe acoustic properties of the surfaces within the virtual area. The new model can be saved in the database 410.

[0045] The mapping module 420 may also inform at least one of the matching module 430 and the room mode module 440 that no match is found, so that the matching module 430 can determine a model of the target area and the room mode module 440 can determine room modes of the target area by using the model.

[0046] In some embodiments, the mapping module 420 may also determine a location within the virtual model corresponding to a local area where the user is physically located (e.g., the room 350).

[0047] The target area may be different from the local area. For example, the local area is an office room where the user sits, but the target area is a virtual area (e.g., a virtual conference room).

[0048] If a match is found, the mapping module 420 retrieves the room modes that are associated with the location within the virtual model corresponding to the target area and sends the room modes to the acoustic filter module 450 for determining room mode parameters. If no match is found, the mapping module 420 may develop a 3D virtual representation of the target area based on the visual information in the room mode query and update the virtual model with the 3D virtual representation of the target area. The mapping module 420 may also inform at least one of the matching module 430 and the room mode module 440 that no match is found, so that the matching module 430 can determine a model of the target area and the room mode module 440 can determine room modes of the target area by using the model.

[0049] The matching module 430 determines a model of the target area based on the 3D virtual representation of the target area. Taking the target area as an example, in some embodiments, the matching module 430 selects the model from a plurality of candidate models. A candidate model can be a model of a room that includes information about shape, dimensions, or surfaces within the room. The group of candidate models can include models of rooms having different shapes (e.g., square, round, triangle, etc.), different dimensions (e.g., shoebox, big conference room, etc.), and different surfaces. The matching module 430 compares the 3D virtual representation of the target area with each candidate model and determines whether the candidate model matches the 3D virtual representation. The matching module 430 determines that a candidate model matches the 3D virtual representation based on a determination that a difference between the candidate model and the 3D virtual representation is below a threshold. The difference may include difference in shapes, dimensions, acoustic properties of surfaces, etc. In some embodiments, the matching module 430 may determine that the 3D virtual representation matches multiple candidate models. The matching module 430 selects the candidate model with the best match, i.e., the candidate model having the least difference from the 3D virtual representation.

[0050] In some embodiments, the matching module 430 compares the shape of a candidate model and the shape of the 3D mesh included in the 3D virtual representation. For example, the matching module 430 traces rays in a number of directions from a center of the 3D mesh target area and determines points where the rays intersect with the 3D mesh computes. The matching module 430 identifies a candidate model that matches these points. The matching module 430 may shrink or expand the candidate model to exclude any differences in sizes of the candidate model and the target area from the comparison.

[0051] The room mode module 440 determines room modes of the target area using the model of the target area. The room modes may include at least one of three types of room mode: axial modes, tangential modes, and oblique modes. In some embodiments, for each type of room mode, the room mode module 440 determines a first order mode and may also determine modes of higher orders. The room mode module 440 determines the room modes based on the shape and/or dimensions of the model. For example, in embodiments where the model has a rectangular homogeneous shape, the room mode module 440 determines axial, tangential, and oblique modes of the model. In some embodiments, the room mode module 440 uses the dimensions of the model to calculate room modes that fall within a range from a lower frequency in an audile or reproducible frequency range (e.g., 63 Hz) to a Schroeder frequency of the target area. The Schroeder frequency of the target area can be a frequency at which room modes are too densely overlapped in frequency to be individually distinguishable. The room mode module 440 may determine the Schroeder frequency based on a volume of the target area and a reverberation time (e.g., RT60) of the target area. The room mode module 440 may use e.g., numerical simulation techniques (such as finite element method, boundary element method, finite difference time domain method, etc.), to determine the room modes.

[0052] In some embodiments, the room mode module 440 uses material acoustic parameters (such as attenuation parameter) of surfaces within the 3D virtual representation of the target area to determine the room modes. For example, the room mode module 440 determines material composition of the surfaces using the color image data the target area. The room mode module 440 determines an attenuation parameter for each surface based on the material composition of the surface and updates the model with the material compositions and attenuation parameters.

[0053] In one embodiment, the room mode module 440 uses machine learning techniques to determine the material composition of the surfaces. The initialization module 230 can input image data of the target area (or a part of the image data that is related to the surface) and/or audio data into a machine learning model, the machine learning model outputs the material composition of each surface. The machine learning model can be trained with different machine learning techniques, such as linear support vector machine (linear SVM), boosting for other algorithms (e.g., AdaBoost), neural networks, logistic regression, naive Bayes, memory-based learning, random forests, bagged trees, decision trees, boosted trees, or boosted stumps. As part of the training of the machine learning model, a training set is formed. The training set includes image data and/or audio data of a group of surfaces and material composition of the surfaces in the group.

[0054] For each room mode or a combination of multiple room modes, the room mode module 440 determines amplification as a function of frequency and position. The amplification includes increase or decrease in signal strength caused by the corresponding room mode(s).

[0055] The acoustic filter module 450 determines one or more room mode parameters of the target area based on at least one of the room modes and the position of the user within the target area. In some embodiments, the acoustic filter module 450 determines the room mode parameters based on amplification as a function of frequency and position (e.g., position of the user) within the target area. The room mode parameters describes acoustic distortion caused by the at least one of room modes at the position of the user. In some embodiments, the acoustic filter module 450 also uses the position of a sound source of the audio content to determine the acoustic distortion.

[0056] In some embodiments, the audio content is rendered by one or more speakers that are external to the headset. The acoustic filter module 450 determines one or more room mode parameters of a local area of the user. In some embodiments, the target area is different from the local area. For instance, the local area of the user is an office room where the user sits, and the target area is a virtual conference room including a virtual sound source (e.g., a speaker). The room mode parameters of the local area describe an acoustic filter of the local area that can be used to render audio content from a speaker external to the headset (e.g., on or coupled to a console). The acoustic filter of the local area mitigates room modes of the local area at the position of the user in the local area. In some embodiments, the acoustic filter module 450 determines the room modes parameters of the local area based on one or more room modes of the local area determined by the room mode module 440. The room modes of the local area can be determined based on a model of the local area determined by either the mapping module 420 or the matching module 430.

[0057] FIG. 5 is a flowchart illustrating a process 500 for determining room mode parameters that describe an acoustic filter, in accordance with one or more embodiments. The process 500 of FIG. 5 may be performed by the components of an apparatus, e.g., the audio server 400 of FIG. 4. Other entities (e.g., portions of a headset and/or console) may perform some or all of the steps of the process in other embodiments. Likewise, embodiments may include different and/or additional steps, or perform the steps in different orders.

[0058] The audio server 400 determines 510 a model of a target area based in part on a 3D virtual representation of the target area. The target area can be a local area or a virtual area. The virtual area may be based on a real room. In some embodiments, the audio server 510 determines the model by retrieving the model from a database based on a position of a user within the target area. For example, the database stores a virtual model that describes one or more areas and includes models of those areas. Each area corresponds to a location within the virtual model. The areas include virtual areas, physical areas, or some combination thereof. The audio server 400 can identify a location associated with the target area in the virtual model, e.g., based on the position of the user within the target area. The audio server 400 retrieves the model associated with the identified location. In other some embodiments, the audio server 400 receives, e.g., from a headset, depth information describing at least a portion of the target area. In some embodiments, the audio server 400 generates at least a part of the 3D virtual representation using the depth information. The audio server 400 compares the 3D virtual representation with a plurality of candidate models. The audio server 400 identifies one of the plurality of candidate models that match the three-dimensional virtual representation as the model of the target area. In some embodiments, the audio server 400 determines that a candidate model matches the three-dimensional virtual representation based on a determination that a difference between the shape of the candidate model and the 3D virtual representation is below a threshold. The audio server 400 may shrink or expand the candidate model during comparison to eliminate any differences in dimensions of the candidate model and the 3D virtual representation. In some embodiments, the audio server 400 determines an attenuation parameter for each surface in the 3D virtual representation and updates the model with the attenuation parameter.

[0059] The audio server 400 determines 520 room modes of the target area using the model. In some embodiments, the audio server 320 determines the room modes based on a shape of the model. Room modes may be calculated using conventional techniques. The audio server 400 can also use dimensions of the model and/or attenuation parameters of the surfaces in the 3D virtual representation to determine the room modes. The room modes may include axial modes, tangential modes, or oblique modes. In some embodiments, the room modes fall within a range from a lower frequency of the audible frequency range (e.g., 63 Hz) to a Schroeder frequency of the target area. The room modes describe amplification of sounds at specific frequencies as a function of position within the target area. The audio server 400 may determine amplification corresponding to a combination of multiple room modes.

[0060] The audio server 400 determines 530 one or more room mode parameters (e.g., Q factor, etc.) based on at least one of the room modes and a position of a user within the target area. A room mode is represented by amplification of signal strength as a function of frequency and position. In some embodiments, the audio server 400 combines the amplification associated with more than one room modes to more fully describe amplification as a function of frequency and position. The audio server 400 determines amplification as a function of frequency at the position of the user. Based on the function of the amplification and frequency at the position of the user, the audio server 400 determines the room mode parameters. The room mode parameters describe an acoustic filter that as applied to audio content, simulates acoustic distortion at the position of the user at frequencies associated with the at least one room mode. In some embodiments, the at least one room mode is a first order axial mode. In some embodiments, the audio server 320 determines the one or more room mode parameters based on amplification corresponding to the at least one room mode at the position of the user within the target area. The acoustic filter can be used by a headset to present audio content to the user.

[0061] FIG. 6 is a block diagram of an audio assembly 600, in accordance with one or more embodiments. Some or all of the audio assembly 600 may be part of a headset (e.g., the headset 310). The audio assembly 600 includes a speaker assembly 610, a microphone assembly 620, and an audio controller 630. In one embodiment, the audio assembly 600 further comprises an input interface (not shown in FIG. 6) for, e.g., controlling operations of different components of the audio assembly 600. In other embodiments, the audio assembly 600 can have any combination of the components listed with any additional components. In some embodiments, one or more of the functions of the audio server 400 may be performed by the audio assembly 600.

……
……
……

您可能还喜欢...