Microsoft Patent | Bidirectional propagation of sound

编辑：映维 | 分类：Microsoft | 2021年2月26日

Patent: Bidirectional propagation of sound

Drawings: Click to check drawins

Publication Number: 20210058730

Publication Date: 20210225

Applicant: Microsoft

Assignee: Microsoft Technology Licensing

Microsoft Patent | Bidirectional propagation of sound

Abstract

The description relates to rendering directional sound. One implementation includes receiving directional impulse responses corresponding to a scene. The directional impulse responses can correspond to multiple sound source locations and a listener location in the scene. The implementation can also include encoding the directional impulse responses to obtain encoded departure direction parameters for individual sound source locations. The implementation can also include outputting the encoded departure direction parameters, the encoded departure direction parameters providing sound departure directions from the individual sound source locations for rendering of sound.

Claims

A system, comprising: a processor; and storage storing computer-readable instructions which, when executed by the processor, cause the system to: receive an input sound signal for a directional sound source having a source location and a source orientation in a scene; identify an encoded departure direction parameter corresponding to the source location of the directional sound source in the scene, the encoded departure direction parameter specifying a departure direction of initial sound on a sound path in which sound travels from the source location to a listener location around an occlusion in the scene; and based at least on the encoded departure direction parameter and the input sound signal, render a directional sound at the listener location in a manner that accounts for the source location and the source orientation of the directional sound source.
The system of claim 1, wherein the computer-readable instructions, when executed by the processor, cause the system to: identify the encoded departure direction parameter from a precomputed departure direction field based at least on the source location and the listener location.
The system of claim 2, wherein the computer-readable instructions, when executed by the processor, cause the system to: compute the departure direction field from a representation of the scene.
The system of claim 2, wherein the computer-readable instructions, when executed by the processor, cause the system to: obtain directivity characteristics of the directional sound source; and render the initial sound accounting for the directivity characteristics and the source orientation of the directional sound source.
The system of claim 4, wherein the computer-readable instructions, when executed by the processor, cause the system to: obtain directional hearing characteristics of a listener at the listener location and a listener orientation of the listener; and render the initial sound as binaural output that accounts for the directional hearing characteristics of the listener and the listener orientation.
The system of claim 5, wherein the directivity characteristics of the directional sound source comprise a source directivity function, and the directional hearing characteristics of the listener comprise a head-related transfer function.
A system, comprising: a processor; and storage storing computer-readable instructions which, when executed by the processor, cause the system to: receive an input sound signal for a directional sound source having a source location and a source orientation in a scene; identify encoded directional reflection parameters that are associated with the source location of the directional sound source and a listener location, wherein the encoded directional reflection parameters comprise aggregate directional loudness components of reflection energy from corresponding combinations of departure and arrival directions, and the aggregate directional loudness components are aggregated from decomposed directional loudness components of reflections emitted from the source location and arriving at the listener location; and based at least on the input sound signal and the encoded directional reflection parameters, render directional sound reflections at the listener location that account for the source location and the source orientation of the directional sound source.
The system of claim 7, wherein the computer-readable instructions, when executed by the processor, cause the system to: encode the directional reflection parameters for the source location and the listener location prior to receiving the input sound signal.
The system of claim 8, wherein the computer-readable instructions, when executed by the processor, cause the system to: perform reflection simulations in the scene and decompose reflection loudness values obtained during the reflection simulations to obtain the aggregate directional loudness components.
The system of claim 7, wherein the computer-readable instructions, when executed by the processor, cause the system processor to: obtain directivity characteristics of the directional sound source; obtain directional hearing characteristics of a listener at the listener location; and render the directional sound reflections accounting for the directivity characteristics of the directional sound source, the source orientation of the directional sound source, the directional hearing characteristics of the listener, and a listener orientation of the listener.
The system of claim 10, wherein the encoded directional reflection parameters comprise a reflections transfer matrix associated with the source location and the listener location.
The system of claim 7, provided in a gaming console configured to execute video games or a virtual reality device configured to execute virtual reality applications.
A method comprising: receiving impulse responses corresponding to a scene, the impulse responses corresponding to multiple sound source locations and a listener location in the scene; encoding the impulse responses to obtain encoded departure direction parameters for individual sound source locations and the listener location, the encoded departure direction parameters providing sound departure directions from the individual sound source locations to the listener location; encoding the impulse responses to obtain encoded aggregate representations of reflection enemy for corresponding combinations of departure and arrival directions of reflections traveling from the individual sound source locations to the listener location, the encoded aggregate representations of reflection energy being obtained by decomposing reflections in the impulse responses into directional loudness components and aggregating the directional loudness components; and outputting the encoded departure direction parameters and the encoded aggregate representations of reflection energy.
The method of claim 13, wherein the encoded departure direction parameters convey respective directions of initial sound emitted from the individual sound source locations to the listener location.
The method of claim 13, further comprising: encoding initial loudness parameters for the individual sound source locations; and outputting the encoded initial loudness parameters with the encoded departure direction parameters.
The method of claim 15, further comprising: determining the encoded departure direction parameters for initial sound during a first time period; and determining the initial loudness parameters during a second time period that encompasses the first time period.

17-18. (canceled)

The method of claim 13, wherein a particular encoded aggregate representation for a particular source location includes at least: aggregate loudness of reflections arriving at the listener location from a first direction and departing from the particular source location in the first direction, a second direction, a third direction, and a fourth direction; aggregate loudness of reflections arriving at the listener location from the second direction and departing from the particular source location in the first direction, the second direction, the third direction, and the fourth direction; aggregate loudness of reflections arriving at the listener location from the third direction and departing from the particular source location in the first direction, the second direction, the third direction, and the fourth direction; and aggregate loudness of reflections arriving at the listener location from the fourth direction and departing from the particular source location in the first direction, the second direction, the third direction, and the fourth direction.
The method of claim 19, wherein the particular encoded aggregate representation comprises a reflections transfer matrix.
The method of claim 20, further comprising: generating and outputting multiple reflections transfer matrices for multiple source/listener location pairs in the scene.
The method of claim 13, further comprising: rendering sound emitted from a particular directional sound source at a particular source location to a listener at a particular listener location based at least on a particular encoded departure direction parameter, a particular encoded arrival direction parameter, and a particular encoded aggregate representation of reflection energy for the particular source location and the particular listener location.

Description

BACKGROUND

[0001] Practical modeling and rendering of real-time directional acoustic effects (e.g., sound, audio) for video games and/or virtual reality applications can be prohibitively complex. Conventional methods constrained by reasonable computational budgets have been unable to render authentic, convincing sound with true-to-life directionality of initial sounds and/or multiply-scattered sound reflections, particularly in cases with occluders (e.g., sound obstructions). Room acoustic modeling (e.g., concert hall acoustics) does not account for free movement of either sound sources or listeners. Further, sound-to-listener line of sight is usually unobstructed in such applications. Conventional real-time path tracing methods demand enormous sampling to produce smooth results, greatly exceeding reasonable computational budgets. Other methods are limited to oversimplified scenes with few occlusions, such as an outdoor space that contains only 10-20 explicitly separated objects (e.g., building facades, boulders).

BRIEF DESCRIPTION OF THE DRAWINGS

[0002] The accompanying drawings illustrate implementations of the concepts conveyed in the present document. Features of the illustrated implementations can be more readily understood by reference to the following description taken in conjunction with the accompanying drawings. Like reference numbers in the various drawings are used wherever feasible to indicate like elements. In some cases, parentheticals are utilized after a reference number to distinguish like elements. Use of the reference number without the associated parenthetical is generic to the element. Further, the left-most numeral of each reference number conveys the FIG. and associated discussion where the reference number is first introduced.

[0003] FIGS. 1A and 1B illustrate scenarios related to propagation of initial sound, consistent with some implementations of the present concepts.

[0004] FIG. 2 illustrates an example of a field of departure direction indicators, consistent with some implementations of the present concepts.

[0005] FIG. 3 illustrates an example of a field of arrival direction indicators, consistent with some implementations of the present concepts.

[0006] FIG. 4 illustrates a scenario related to propagation of sound reflections, consistent with some implementations of the present concepts.

[0007] FIG. 5 illustrates an example of an aggregate representation of directional reflection energy, consistent with some implementations of the present concepts.

[0008] FIG. 6A illustrates a scenario related to propagation of initial sound and sound reflections, consistent with some implementations of the present concepts.

[0009] FIG. 6B illustrates an example time domain representation of initial sound and sound reflections, consistent with some implementations of the present concepts.

[0010] FIGS. 7A, 7B, and 7C illustrate scenarios related to rendering initial sound and reflections by adjusting power balance based on source directivity, consistent with some implementations of the present concepts.

[0011] FIGS. 8 and 13 illustrate example systems that are consistent with some implementations of the present concepts.

[0012] FIG. 9 illustrates a specific implementation of rendering circuitry that can be employed consistent with some implementations of the present concepts.

[0013] FIGS. 10A, 10B, and 10C show examples of equalized pulses, consistent with some implementations of the present concepts.

[0014] FIGS. 11A and 11B show examples of initial delay processing, consistent with some implementations of the present concepts.

[0015] FIGS. 12A-12F show examples of reflection magnitude fields for a scene, consistent with some implementations of the present concepts.

[0016] FIGS. 14-17 are flowcharts of example methods in accordance with some implementations of the present concepts.

DETAILED DESCRIPTION

Overview

[0017] As noted above, modeling and rendering of real-time directional acoustic effects can be very computationally intensive. As a consequence, it can be difficult to render realistic directional acoustic effects without sophisticated and expensive hardware. Some methods have attempted to account for moving sound sources and/or listeners but are unable to also account for scene acoustics while working within a reasonable computational budget. Still other methods neglect sound directionality entirely.

[0018] The disclosed implementations can generate convincing sound for video games, animations, and/or virtual reality scenarios even in constrained resource scenarios. For instance, the disclosed implementations can model source directivity by rendering sound that accounts for the orientation of a directional source. In addition, the disclosed implementations can model listener directivity by rendering sound that accounts for the orientation of a listener. Taken together, these techniques allow for rendering of sound that accounts for the relationship between source and listener orientation for both initial sounds and sound reflections, as described more below.

[0019] Source and listener directivity can provide important sound cues for a listener. With respect to source directivity, speech, audio speakers, and many musical instruments are directional sources, e.g., these sound sources can emit directional sound that tends to be concentrated in a particular direction. As a consequence, the way that a directional sound source is perceived depends on the orientation of the sound source. For instance, a listener can detect when a speaker turns toward the listener and this tends to draw the listener’s attention. As another example, human beings naturally face toward an open door when communicating with a listener in another room, which causes the listener to perceive a louder sound than were the speaker to face in another direction.

[0020] Listener directivity also conveys important information to listeners. Listeners can perceive the direction at which incoming sound arrives, and this is also an important audio cue that varies with the orientation of the listener. For example, standing outside a meeting hall, a listener is able to locate an open door by listening for the chatter of a crowd in the meeting hall streaming through the door. This is because the listener can perceive the arrival direction of the sound as arriving from the door, allowing the listener to locate the crowd even when sight of the crowd is obscured to the listener. If the listener’s orientation changes, the user perceives that the arrival direction of the sound changes accordingly.

[0021] In addition to source and listener directivity, the time at which sound waves are received at the listener conveys important information. For instance, for a given wave pulse introduced by a sound source into a scene, the pressure response or “impulse response” at the listener arrives as a series of peaks, each of which represents a different path that the sound takes from the source to the listener. Listeners tend to perceive the direction of the first-arriving peak in the impulse response as the arrival direction of the sound, even when nearly-simultaneous peaks arrive shortly thereafter from different directions. This is known as the “precedence effect.” This initial sound takes the shortest path through the air from a sound source to a listener in a given scene. After the initial sound, subsequent reflections are received that generally take longer paths through the scene and become attenuated over time.

[0022] Thus, humans tend to perceive sound as an initial sound followed by reflections and then subsequent reverberations. As a result of the precedence effect, initial sounds tend to enable listeners to perceive where the sound is coming from, whereas reflections and/or reverberations tend to provide listeners with information about the scene because they convey how the impulse response travels along many different paths within the scene.

[0023] Considering reflections specifically, they can be perceived differently by the user depending on properties of the scene. For instance, when a sound source and listener are close (e.g., within footsteps), a delay between arrival of the initial sound and corresponding first reflections can become audible. The delay between the initial sound and the reflections can strengthen the perception of distance to walls.

[0024] Moreover, reflections can be perceived differently based on the orientation of both of the source and listener. For instance, the orientation of a directional sound source can affect how reflections are perceived via a listener. When a directional sound source is oriented directly toward a listener, the initial sound tends to be relatively loud and the reflections and/or reverberations tend to be somewhat quiet. Conversely, if the directional sound source is oriented away from the listener, the power balance between the initial sound and the reflections and/or reverberations can change, so that the initial sound is somewhat quieter relative to the reflections.

[0025] The disclosed implementations offer computationally efficient mechanisms for modeling and rendering of directional acoustic effects. Generally, the disclosed implementations can model a given scene using perceptual parameters that represent how sound is perceived at different source and listener locations within the scene. Once perceptual parameters have been obtained for a given scene as described herein, the perceptual parameters can be used for rendering of arbitrary source and listener positions as well as arbitrary source and listener orientations in the scene.

Initial Sound Propagation

[0026] FIGS. 1A and 1B are provided to introduce the reader to concepts relating to the directionality of initial sound using a relatively simple scene 100. FIG. 1A illustrates a scenario 102A and FIG. 1B illustrates a scenario 102B, each of which conveys certain concepts relating to how initial sound emitted by a sound source 104 is perceived by a listener 106 based on acoustic properties of scene 100.

[0027] For instance, scene 100 can have acoustic properties based on geometry 108, which can include structures such as walls 110 that form a room 112 with a portal 114 (e.g., doorway), an outside area 116, and at least one exterior corner 118. As used herein, the term “geometry” can refer to an arrangement of structures (e.g., physical objects) and/or open spaces in a scene. Generally, the term “scene” is used herein to refer to any environment in which real or virtual sound can travel. In some implementations, structures such as walls can cause occlusion, reflection, diffraction, and/or scattering of sound, etc. Some additional examples of structures that can affect sound are furniture, floors, ceilings, vegetation, rocks, hills, ground, tunnels, fences, crowds, buildings, animals, stairs, etc. Additionally, shapes (e.g., edges, uneven surfaces), materials, and/or textures of structures can affect sound. Note that structures do not have to be solid objects. For instance, structures can include water, other liquids, and/or types of air quality that might affect sound and/or sound travel.

[0028] Generally, the sound source 104 can generate a sound pulses that create corresponding impulse responses. The impulse responses depend on properties of the scene 100 as well as the locations of the sound source and listener. As discussed more below, the first-arriving peak in the impulse response is typically perceived by the listener 106 as an initial sound, and subsequent peaks in the impulse response tend to be perceived as reflections. FIGS. 1A and 1B convey how this initial peak tends to be perceived by the listener, and subsequent examples describe how the reflections are perceived by the listener. Note that this document adopts the convention that the top of the page faces north for the purposes of discussing directions.

[0029] A given sound pulse can result in many different sound wavefronts that propagate in all directions from the source. For simplicity, FIG. 1A shows a single such wavefront, initial sound wavefront 120A, that is perceived by listener 106 as the first-arriving sound. Because of the acoustic properties of scene 100 and the respective positions of the sound source and the listener, the listener perceives initial sound wavefront 120A as arriving from the northeast. For instance, in a virtual reality world based on scenario 102A, a person (e.g., listener) looking at a wall with a doorway to their right would likely expect to hear a sound coming from their right side, as walls 110 attenuate the sound energy that travels directly along the line of sight between the sound source 104 and the listener 106. In general, the concepts disclosed herein can be used for rendering initial sound with realistic directionality, such as coming from the doorway in this instance.

[0030] In some cases, the sound source 104 can be mobile. For example, scenario 102B depicts the sound source 104 in a different location than scenario 102A. In scenario 102B, both the sound source 104 and listener are in outside area 116, but the sound source is around the exterior corner 118 from the listener 106. Once again, the walls 110 obstruct a line of sight between the listener and the sound source. Thus, in this example, the listener perceives initial sound wavefront 120B as the first-arriving sound coming from the northeast.

[0031] The directionality of sound wavefronts can be represented using departure direction indicators that convey the direction from which sound energy departs the source 104, and arrival direction indicators that indicate the direction from which sound energy arrives at the listener 106. For instance, referring back to FIG. 1A, note that initial sound wavefront 120A leaves the sound source 104 in a generally southeast direction as conveyed by departure direction indicator 122(1), and arrives at the listener 106 from a generally northeast direction as conveyed by arrival direction indicator 124(1). Likewise, considering FIG. 1B, initial sound wavefront 120B leaves the sound source in a south-southwest direction as conveyed by departure direction indicator 122(2) and arrives at the listener from an east-northeast direction as conveyed by arrival direction indicator 124(2). By convention, this document uses departure direction indicators that point in the direction of travel of sound from the source toward the listener, and arrival direction indicators that point in the direction that sound is received from the listener toward the source.

Initial Sound Encoding

[0032] Consider a pair of source and listener locations in a given scene, with a sound source located at the source location and a listener located at the listener location. The direction of initial sound perceived by the listener is generally a function of acoustic properties of the scene as well as the location of the source and listener. Thus, the first sound wavefront perceived by the listener will generally leave the source in a particular direction and arrive at the listener in a particular direction. This is the case even for directional sound sources, irrespective of the orientation of the source and the listener. As a consequence, it is possible to encode departure and arrival directions parameters for initial sounds in a scene using an isotropic sound pulse without sampling different source and listener orientations, as discussed more below.

[0033] One way to represent the departure direction of initial sound in a given scene is to fix a listener location and encode departure directions from different potential source locations for sounds that travel from the potential source locations to the fixed listener location. FIG. 2 depicts an example scene 200 and a corresponding departure direction field 202 with respect to a listener location 204. The encoded departure direction field includes many departure direction indicators, each of which is located at a potential source location from which a source can emit sounds. Each departure direction indicator conveys that initial sound travels from that source location to the listener location 204 in the direction indicated by that departure direction parameter. In other words, for any source placed at a given departure direction indicator, initial sounds perceived at listener location 204 will leave that source location in the direction indicated by that departure direction indicator.

[0034] One way to represent the arrival directions of initial sound in a given scene is to use a similar approach as discussed above with respect to departure directions. FIG. 3 depicts example scene 200 with an arrival direction field 302 with respect to listener location 204. Similar to the departure direction field discussed above, the arrival direction field includes many arrival direction indicators, each of which is located at a source location from which a source can emit sounds. Each individual arrival direction indicator conveys that initial sound emitted from the corresponding source location is received at the listener location 204 in the direction indicated by that arrival direction indicator. As noted previously with respect to FIGS. 1A and 1B, the arrival direction indicators point away from the listener in the direction of incoming sound by convention.

[0035] Taken together, departure direction field 202 and arrival direction field 302 provide a bidirectional representation of initial sound travel in scene 200 for a specific listener location. Note that each of these fields can represent a horizontal “slice” within scene 200. Thus, different arrival and departure direction fields can be generated for different vertical heights within scene 200 to create a volumetric representation of initial sound directionality for the scene with respect to the listener location.

[0036] As discussed more below, different departure and arrival direction fields and/or volumetric representations can be generated for different potential listener locations in scene 200 to provide a relatively compact bidirectional representation of initial sound directionality in scene 200. In particular, as discussed more below, departure direction fields and arrival direction fields allow for rendering of initial sound with arbitrary source and listener location and orientation. For instance, each departure direction indicator can represent an encoded departure direction parameter for a specific source/location pair, and each arrival direction indicator can represent an encoded arrival direction parameter for that specific source/location pair. Generally, the relative density of each encoded field can be a configurable parameter that varies based on various criteria, where denser fields can be used to obtain more accurate directionality and sparser fields can be employed to obtain computational efficiency and/or more compact representations.

Reflection Encoding

[0037] As noted previously, reflections tend to convey information about a scene to a listener. Like initial sound, the paths taken by reflections from a given source location to a given listener location within a scene generally do not vary based on the orientation of the source or listener. As a consequence, it is possible to encode source and listener directionality for reflections for source/location pairs in a given scene without sampling different source and listener locations. However, in practice, there are often many, many reflections and it may be impractical to encode source and listener directionality for each reflection path. Thus, the disclosed implementations offer mechanisms for compactly representing directional reflection characteristics in an aggregate manner, as discussed more below.

[0038] FIG. 4 will now be used to introduce concepts relating to reflections of sound. FIG. 4 shows another scene 400 and introduces a scenario 402. Scene 400 is similar to scene 100 with the addition of walls 404, 406, and 408. In this case, FIG. 4 includes reflection wavefronts 410 and omits a representation of any initial sound wavefront for clarity. Only a few reflection wavefronts 410 are designated to avoid clutter on the drawing page. In practice, many more reflection wavefronts may be present in the impulse response for a given sound.

[0039] Note that the reflection wavefronts are emitted from sound source 104 in many different directions and arrive at the listener 106 in many different directions. Each reflection wavefront carries a particular amount of sound energy (e.g., loudness) when leaving the source 104 and arriving at the listener 106. Consider reflection wavefront 410(1), designated by a dashed line in FIG. 4. Sound energy carried by reflection wavefront 410(1) leaves sound source 104 to the southeast of the sound source and arrives at listener 106 from the southeast. One way to represent the sound energy leaving source 104 for reflection wavefront 410(1) is to decompose the sound energy into a first directional loudness component for sound energy emitted to the south, and a second directional loudness component for sound energy emitted to the east. Likewise, the sound energy arriving at listener 106 for reflection wavefront 410(1) can be composed into a first directional loudness component for sound energy received from the south, and a second directional loudness component for sound energy received from the east.

[0040] Now, consider reflection wavefront 410(2), designated by a dotted line in FIG. 4. Sound energy carried by reflection wavefront 410(2) leaves sound source 104 to the northwest of the sound source and arrives at listener 106 from the southwest. One way to represent the sound energy leaving source 104 for reflection wavefront 410(2) is to decompose the sound energy into a first directional loudness component for sound energy emitted to the north, and a second directional loudness component for sound energy emitted to the west. Likewise, the sound energy arriving at listener 106 for reflection wavefront 410(2) can be decomposed into a first directional loudness component for sound energy arriving from the south, and a second directional loudness component for sound energy arriving from the west.

[0041] The disclosed implementations can decompose reflection wavefronts into directional loudness components as discussed above for different potential source and listener locations. Subsequently, the directional loudness components can be used to encode directional reflection characteristics associated with pairs of source and listener locations. In some cases, the directional reflection characteristics can be encoded by aggregating the directional loudness components into an aggregate representation of bidirectional reflection loudness, as discussed more below.

[0042] FIG. 5 illustrates one mechanism for compact encoding of reflection directionality. FIG. 5 shows reflection loudness parameters in four sets–a first reflection parameter set 452 representing loudness of reflections arriving at a listener from the north, a second reflection parameter set 454 representing loudness of reflections arriving at a listener from the east, a third reflection parameter set 456 representing loudness of reflections arriving at a listener from the south, and a fourth reflection parameter set 458 representing loudness of reflections arriving at a listener from the west. Each reflection parameter set includes four reflection loudness parameters, each of which can be a corresponding weight that represents relative loudness of reflections arriving at the listener for sounds emitted by the source in one of these four canonical directions. For instance, each reflection loudness parameter in first reflection parameter set 452 represents an aggregate reflection energy arriving at by the listener from the north for a corresponding departure direction at the source. Thus, reflection loudness parameter w(N, N) represents the aggregate reflection energy arriving the listener from the north for sounds departing north from the source, reflection loudness parameter w(N, E) represents the aggregate reflection energy received by the listener from the north for sounds departing east from the source, and so on.

[0043] Likewise, each reflection loudness parameter in second reflection parameter set 454 represents an aggregate reflection energy arriving at the listener from the east and departing from the source in one of the four directions. Weight w(E, S) represents the aggregate reflection energy arriving at the listener from the east for sounds departing south from the source, weight W(E, W) represents the aggregate reflection energy arriving at the listener from the east for sounds departing west of the source, and so on. Reflection parameter sets 456 and 458 represent aggregate reflection energy arriving at the listener from the south and west, respectively, with similar individual parameters in each set for each departure direction from the source.

[0044] Generally, reflection parameter sets 452, 454, 456, and 458 can be obtained by decomposing each individual reflection wavefront into constituent directional loudness components as discussed above and aggregating those values for each reflection wavefront. For instance, as previously noted, reflection wavefront 410(1) arrives at the listener 106 from the south and the east, and thus can be decomposed into a directional loudness component for energy received from the south and a directional loudness component for energy received to the east. Furthermore, reflection wavefront 410(1) includes energy departing the source from the south and from the east. Thus, the directional loudness component for energy arriving at the listener from the south can be further decomposed into a directional loudness component for sound departing south from the south, shown in FIG. 5 as w(S, S) in reflection parameter set 456, and another directional loudness component for sound departing east from the source, shown in FIG. 5 as w(S, E) in reflection parameter set 456. Similarly, the directional loudness component for energy arriving at the listener from the east can be further decomposed into a directional loudness component for sound departing south of the source, shown in FIG. 5 as w(E, S) in reflection parameter set 454, and another directional loudness component for sound departing east of the source, shown in FIG. 5 as w(E, E) in reflection parameter set 454.

[0045] Likewise, considering reflection wavefront 410(2), this reflection wavefront arrives at the listener 106 from the south and the west and includes departs the source to the north and the west. Energy from reflection wavefront 410(2) be decomposed into directional loudness components for both the source and listener and aggregated as discussed above for reflection wavefront 410(2). Specifically, four directional loudness components can be obtained and aggregated into w(S, N) for energy arriving the listener from the south and departing north from the source, weight w(S, W) for energy arriving the listener from the south and departing west from the source, w(W, N) for energy arriving at the listener from the west and departing north from the source, and w(W, W) for energy arriving at the listener from the west and departing west from the source.

[0046] The above process can be repeated for each reflection wavefront to obtain a corresponding aggregate directional reflection loudness for each combination of canonical directions with respect to both the source and the listener. As discussed more below, such an aggregate representation of directional reflection energy can be used at runtime to effectively render reflections for directional sources that accounts for both source and listener location and orientation, including scenarios with directional sound sources. Taken together, realistic directionality of both initial sound arrivals and sound reflections can improve sensory immersion in virtual environments.

[0047] Note that FIG. 5 illustrates four compass directions and a thus a total of 16 weights, for each possible combination of departure and arrival directions. Examples introduced below can also account for up and down directions as well, in addition to the four compass directions previously discussed, yielding 6 canonical directions and potentially 36 reflection loudness parameters, one for each possible combination of departure and arrival directions.

[0048] In addition, note that aggregate reflection energy representations can be generated as fields for a given scene, as described above for arrival and departure direction. Likewise, a volumetric representation of a scene can be generated by “stacking” fields of reflection energy representations vertically above one another, to account for how reflection energy may vary depending on the vertical height of a source and/or listener.

Time Representation

[0049] As discussed above, FIGS. 2 and 3 illustrate mechanisms for encoding departure and arrival direction parameters for a specific source/location pair in scene. Likewise, FIG. 5 illustrates a mechanism for representing aggregate reflection energy parameters for various combinations of arrival and departure directions for a specific source/location pair in a scene. The following provides some additional discussion of these parameters as well as some additional parameters that can be used to encode bidirectional propogation characteristics of a scene.

[0050] FIG. 6A shows scene 100 with two initial sound wavefronts 602(1) and 602(2) and two reflection wavefronts 604(1) and 604(2). Initial sound wavefronts 602(1) and 602(2) are shown in relatively heavy lines to convey that these sound wavefronts typically carry more sound energy to the listener 106 than reflection wavefronts 604(1) and 604(2). Initial sound wavefront 602(1) is shown as a solid heavy line and initial sound wavefront 602(2) is shown as a dotted heavy line. Reflection wavefront 604(1) is shown as a solid lightweight line and reflection wavefront 604(2) is shown as a dotted lightweight line.

[0051] FIG. 6B shows a time-domain representation 650 of the sound wavefronts shown in FIG. 6A, as well as how individual encoded parameters can be represented in the time domain. Note that time-domain representation 650 is somewhat simplified for clarity, and actual time-domain representations of sound are typically more complex than illustrated in FIG. 6B.

[0052] Time-domain representation 650 includes time-domain representations of initial sound wavefronts 602(1) and 602(2), as well as time-domain representations of reflection wavefronts 604(1) and 604(2). In the time domain, each wavefront appears as a “spike” in impulse response area 652. Thus, in physical space, each spike corresponds to a particular path through the scene from the source to the listener. The corresponding departure direction of each wavefront is shown in area 654, and the corresponding arrival direction of each wavefront is shown in area 656.

[0053] Time-domain representation 650 also includes an initial or onset delay period 658 which represents the time period after sound is emitted from sound source 104 before the first-arriving wavefront to listener 106, which in this example is initial sound wavefront 602(1). The initial delay period parameter can be determined for each source/location pair in the scene, and encodes the amount of time before a listener at a specific listener location hears initial sound from a specific source location.

[0054] Time domain representation 650 also includes an initial loudness period 660 and an initial directionality period 662. The initial loudness period 660 can correspond to a period of time starting at the arrival of the first wavefront to the listener and continuing for a predetermined period during which an initial loudness parameter is determined. The initial directionality period 662 can correspond to a period of time starting at the arrival of the first wavefront to the listener and continuing for a predetermined period during which initial source and listener directions are determined.

[0055] Note that the initial directionality period 662 is illustrated as being somewhat shorter than the initial loudness period 660, for the following reasons. Generally, the first-arriving wavefront to a listener has a strong effect on the listener’s sense of direction. Subsequent wavefronts arriving shortly thereafter tend to contribute to the listener’s perception of initial loudness, but generally contribute less to the listener’s perception of initial direction. Thus, in some implementations, the initial loudness period is longer than the initial directionality period.

[0056] Referring back to FIG. 6A, initial sound wavefront 602(1) has the shortest path to the listener 106 and thus arrives at the listener first, after the onset delay period 658. The corresponding impulse response for initial wavefront occurs within the initial directionality period 662. Consider next initial sound wavefront 602(2). This wavefront has a somewhat longer path to the listener and arrives within the initial loudness period 660, but outside of the initial directionality period 662. Thus, in this example, initial sound wavefront 602(2) contributes to an initial loudness parameter but does not contribute to the initial departure and arrival direction parameters, whereas initial sound wavefront 602(2) contributes to the initial loudness parameter, the initial departure direction parameter, and the initial arrival direction parameter. Each of these parameters can be determined for each source/location pair in the scene. The initial loudness parameter encodes the relative loudness of initial sound that a listener at a specific listener location hears from a given source location. As discussed above, the initial departure and arrival direction parameters encode the directions in which initial sound leaves the source location and arrives at the listener location, respectively.

[0057] Time-domain representation 650 also includes a reflection aggregation period 664, which represents a period of time during which reflection loudness is aggregated. Referring back to FIG. 6A, reflection wavefronts 604(1) and 604(2) arrive some time after initial sound wavefronts 602(1) and 602(2) arrive at the listener. These reflection wavefronts can contribute to an aggregate reflection energy representation such as described above with respect to FIG. 5. One such aggregate reflection energy representation can be determined for each source/location pair in the scene (e.g., a 4.times.4 or 6.times.6 matrix), and each entry (e.g., weight) in the aggregate reflection energy representation can constitute a different loudness parameter. Thus, each parameter in the aggregate reflection energy representation encodes reflection loudness for a specific combination of the following: source location, departure direction, listener location, and arrival direction. Reflection delay period 666 represents the amount of time after the first sound wavefront arrives until the listener hears the first reflection. Reflection delay period is another parameter can be determined for each source/location pair in the scene.

[0058] Time-domain representation 650 also includes a reverberation decay period 668, which represents an amount of time during which sound wavefronts continue to reverberate and decay in scene 100. In some implementations, additional wavefronts that arrive after the reflection loudness period 664 are used to determine a reverberation decay time. Reveberation decay period is another parameter that can be determined for each source/location pair in the scene.

[0059] Generally, the durations of the initial loudness period 660, the initial directionality period 662, and/or reflection aggregation period 664 can be configurable. For instance, the initial directionality period can last for 1 millisecond after the onset delay period 658. The initial loudness period can last for 10 milliseconds after the onset delay period. The reflection loudness period can last for 80 milliseconds after the first-detected reflection wavefront.

Rendering Examples

[0060] The aforementioned parameters can be employed for realistic rendering of directional sound. FIGS. 7A, 7B, and 7C illustrate how source directionality can affect how individual sound wavefronts are perceived. In particular, FIGS. 7A-7C illustrate how the power balance between initial wavefronts and reflection wavefronts can change as a function of the orientation of a directional source. In FIG. 7A, initial sound wavefront 700 is shown as well as reflection wavefronts 702 and 704. In FIG. 7A-7C, weighted lines are used, where the relative weight of each line is roughly proportional to the energy carried by the corresponding sound wavefront.

[0061] FIG. 7A illustrates a directional sound source 706 in a scenario 708A, where the directional sound source is facing toward portal 114. In this case, initial sound wavefront 700 is relatively loud and reflection wavefronts 702 and 704 are relatively quiet, due to the directivity of directional sound source 706.

[0062] FIG. 7B illustrates a scenario 708B, where directional sound source 706 is facing to the northeast. In this case, reflection wavefront 702 is somewhat louder than in scenario 708A, and initial sound wavefront 700 is somewhat quieter. Note that the initial sound wavefront still likely carries the most energy to the user and is still shown with the heaviest line weight, but the line weight is somewhat lighter than in scenario 708A to reflect the relative decrease in sound energy of the initial sound wavefront as compared to the previous scenario. Likewise, reflection wavefront 702 is illustrated as being somewhat heavier than in scenario 708A but still not as heavy as the initial sound wavefront, to show that this reflection wavefront has increased in sound energy relative to the previous scenario.

[0063] FIG. 7C illustrates a scenario 708C, where directional sound source 706 is facing to the northwest. In this case, reflection wavefront 704 is somewhat louder than was the case in scenarios 708A and 708B, and initial sound wavefront 700 is somewhat quieter than in scenario 708A. In a similar manner as discussed above with respect to scenario 708B, the initial sound wavefront still likely carries the most energy to the user but now reflection wavefront 704 carries somewhat more energy than was shown previously.

[0064] In general, the disclosed implementations allow for efficient rendering of initial sound and sound reflections to account for the orientation of a directional source. For instance, the disclosed implementations can render sounds that account for the change in power balance between initial sounds and reflections that occurs when a directional sound source changes orientation. In addition, the disclosed implementations can also account for how listener orientation can affect how the sounds are perceived, as described more below.

First Example System

[0065] In general, note that FIGS. 1-5, 6A, and 6B illustrate examples of acoustic parameters that can be encoded for various scenes. Further, note that these parameters can be generated using isotropic sound sources. At rendering time, directional sound sources can be accounted for when rendering sound as shown in FIGS. 7A-7C. Thus, as discussed more below, the disclosed implementations offer the ability to encode perceptual parameters using isotropic sources that nevertheless allow for runtime rendering of directional sound sources.

[0066] A first example system 800 is illustrated in FIG. 8. In this example, system 800 can include a parameterized acoustic component 802. The parameterized acoustic component 802 can operate on a scene such as a virtual reality (VR) space 804. In system 800, the parameterized acoustic component 802 can be used to produce realistic rendered sound 806 for the virtual reality space 804. In the example shown in FIG. 8, functions of the parameterized acoustic component 802 can be organized into three Stages. For instance, Stage One can relate to simulation 808, Stage Two can relate to perceptual encoding 810, and Stage Three can relate to rendering 812. Also shown in FIG. 8, the virtual reality space 804 can have associated virtual reality space data 814. The parameterized acoustic component 802 can also operate on and/or produce impulse responses 816, perceptual acoustic parameters 818, and sound event input 820, which can include sound source data 822 and/or listener data 824 associated with a sound event in the virtual reality space 804. In this example, the rendered sound 806 can include rendered initial sound(s) 826 and/or rendered sound reflections 828.

[0067] As illustrated in the example in FIG. 8, at simulation 808 (Stage One), parameterized acoustic component 802 can receive virtual reality space data 814. The virtual reality space data 814 can include geometry (e.g., structures, materials of objects, etc.) in the virtual reality space 804, such as geometry 108 indicated in FIG. 1A. For instance, the virtual reality space data 814 can include a voxel map for the virtual reality space 804 that maps the geometry, including structures and/or other aspects of the virtual reality space 804. In some cases, simulation 808 can include directional acoustic simulations of the virtual reality space 804 to precompute sound wave propagation fields. More specifically, in this example simulation 808 can include generation of impulse responses 816 using the virtual reality space data 814. The impulse responses 816 can be generated for initial sounds and/or sound reflections. Stated another way, simulation 808 can include using a precomputed wave-based approach (e.g., pre-computed wave technique) to capture the complexity of the directionality of sound in a complex scene.

[0068] In some cases, the simulation 808 of Stage One can include producing relatively large volumes of data. For instance, the impulse responses 816 can be represented as 11-dimensional (11D) function associated with the virtual reality space 804. For instance, the 11 dimensions can include 3 dimensions relating to the position of a sound source, 3 dimensions relating to the position of a listener, a time dimension, 2 dimensions relating to the arrival direction of incoming sound from the perspective of the listener, and 2 dimensions relating to departure direction of outgoing sound from the perspective of the source. Thus, the simulation can be used to obtain an impulse response at each potential source and listener location in the scene. As discussed more below, perceptual acoustic parameters can be encoded from these impulse responses for subsequent rendering of sound in the scene.

[0069] One approach to encoding perceptual acoustic parameters 818 for virtual reality space 804 would be to generate impulse responses 816 for every combination of possible source and listener locations, e.g., every pair of voxels. While ensuring completeness, capturing the complexity of a virtual reality space in this manner can lead to generation of petabyte-scale wave fields. This can create a technical problem related to data processing and/or data storage. The techniques disclosed herein provide solutions for computationally efficient encoding and rendering using relatively compact representations.

[0070] For example, impulse responses 816 can be generated based on potential listener locations or “probes” scattered at particular locations within virtual reality space 804, rather than at every potential listener location (e.g., every voxel). The probes can be automatically laid out within the virtual reality space 804 and/or can be adaptively sampled. For instance, probes can be located more densely in spaces where scene geometry is locally complex (e.g., inside a narrow corridor with multiple portals), and located more sparsely in a wide-open space (e.g., outdoor field or meadow). In addition, vertical dimensions of the probes can be constrained to account for the height of human listeners, e.g., the probes may be instantiated with vertical dimensions that roughly account for the average height of a human being. Similarly, potential sound source locations for which impulse responses 816 are generated can be located more densely or sparsely as scene geometry permits. Reducing the number of locations within the virtual reality space 804 for which the impulse responses 816 are generated can significantly reduce data processing and/or data storage expenses in Stage One.

[0071] In some cases, virtual reality space 804 can have dynamic geometry. For example, a door in virtual reality space 804 might be opened or closed, or a wall might be blown up, changing the geometry of virtual reality space 804. In such examples, simulation 808 can receive virtual reality space data 814 that provides different geometries for the virtual reality space under different conditions, and impulse responses 816 can be computed for each of these geometries. For instance, opening and/or closing a door could be a regular occurrence in virtual reality space 804, and therefore representative of a situation that warrants modeling of both the opened and closed cases.

[0072] As shown in FIG. 8, at Stage Two, perceptual encoding 810 can be performed on the impulse responses 816 from Stage One. In some implementations, perceptual encoding 810 can work cooperatively with simulation 808 to perform streaming encoding. In this example, the perceptual encoding process can receive and compress individual impulse responses as they are being produced by simulation 808. For instance, values can be quantized (e.g., 3 dB for loudness) and techniques such as delta encoding can be applied to the quantized values. Unlike impulse responses, perceptual parameters tend to be relatively smooth, which enables more compact compression using such techniques. Taken together, encoding parameters in this manner can significantly reduce storage expense.

[0073] Generally, perceptual encoding 810 can involve extracting perceptual acoustic parameters 818 from the impulse responses 816. These parameters generally represent how sound from different source locations is perceived at different listener locations. Example parameters are discussed above with respect to FIGS. 2, 3, 5, and 6B. For example, the perceptual acoustic parameters for a given source/listener location pair can include initial sound parameters such as an initial delay period, initial departure direction from the source location, initial arrival direction at the listener location, and/or initial loudness. The perceptual acoustic parameters for a given source/listener location pair can also include reflection parameters such as a reflection delay period and an aggregate representation of bidirectional reflection loudness, as well as reverberation parameters such as a decay time. Encoding perceptual acoustic parameters in this manner can yield a manageable data volume for the perceptual acoustic parameters, e.g., in a relatively compact data file that can later be used for computationally efficient rendering.

[0074] With respect specifically to the aggregate representation of bidirectional reflection loudness, one approach is to define several coarse directions such as north, east, west, and south as shown in FIG. 5, as well as potentially up and down, as discussed more below. Generally, such a representation can convey, for each pair of source departure and listener arrival directions, the aggregate loudness of reflections for that direction pair. In the example of FIG. 5, each such representation has 16 total fields, e.g., a north-north field for reflection energy arriving at the north of the listener and emitted north of the source, a north-south field for reflection energy arriving at the north of the listener and emitted south of the source, and so on. In a case where the directions also include up and down, the representation can have 36 fields. Thus, for any pair of source and listener locations in a given scene, there can be 36 corresponding reflection loudness parameters, each of which accounts for a different combination of source departure direction and listener arrival direction.

[0075] The parameters for encoding reflections can also include a decay time of the reflections. For instance, the decay time can be a 60 dB decay time of sound response energy after an onset of sound reflections. In some cases, a single decay time is used for each source/location pair. In other words, the reflection parameters for a given location pair can include a single decay time together with a 36-field representation of reflection loudness.

[0076] Additional examples of parameters that could be considered with perceptual encoding 810 are contemplated. For example, frequency dependence, density of echoes (e.g., reflections) over time, directional detail in early reflections, independently directional late reverberations, and/or other parameters could be considered. An example of frequency dependence can include a material of a surface affecting the sound response when a sound hits the surface (e.g., changing properties of the resultant reflections).

[0077] As shown in FIG. 8, at Stage Three, rendering 812 can utilize the perceptual acoustic parameters 818 to render sound from a sound event. As mentioned above, the perceptual acoustic parameters 818 can be obtained in advance and stored, such as in the form of a data file. Rendering 812 can include decoding the data file. When a sound event in the virtual reality space 804 is received, it can be rendered using the decoded perceptual acoustic parameters 818 to produce rendered sound 806. The rendered sound 806 can include an initial sound(s) 826 and/or sound reflections 828, for example.

[0078] In general, the sound event input 820 shown in FIG. 8 can be related to any event in the virtual reality space 804 that creates a response in sound. For example, some sounds may be more or less isotropic, e.g., a detonating grenade or firehouse siren may tend to radiate more or less equally in all directions. Other sounds, such as the human voice, an audio speaker, or a brass or woodwind instrument tend to have directional sound.

[0079] The sound source data 822 for a given sound event can include an input sound signal for a runtime sound source, a location of the runtime sound source, and an orientation of the runtime sound source. For clarity, the term “runtime sound source” is used to refer to the sound source being rendered, to distinguish the runtime sound source from sound sources discussed above with respect to simulation and encoding of parameters. The sound source data can also convey directional characteristics of the runtime sound source, e.g., via a source directivity function (SDF).

[0080] Similarly, the listener data 824 can convey a location of a runtime listener and an orientation of the runtime listener. The term “runtime listener” is used to refer to the listener of the rendered sound at runtime, to distinguish the runtime listener from listeners discussed above with respect to simulation and encoding of parameters. The listener data can also convey directional hearing characteristics of the listener, e.g., in the form of a head-related transfer function (HRTF).

[0081] In some implementations, rendering 812 can include use of a lightweight signal processing algorithm. The lightweight signal processing algorithm can render sound in a manner that can be largely computationally cost-insensitive to a number of the sound sources and/or sound events. For example, the parameters used in Stage Two can be selected such that the number of sound sources processed in Stage Three does not linearly increase processing expense.

[0082] With respect to rendering initial loudness, the rendering can render an initial sound from the input sound signal that accounts for both runtime source and runtime listener location and orientation. For instance, given the runtime source and listener locations, the rendering can involve identifying the following encoded parameters that were precomputed in stage 2 for that location pair–initial delay time, initial loudness, departure direction, and arrival direction. The directivity characteristics of the sound source (e.g., the SDF) can encode frequency-dependent, directionally-varying characteristics of sound radiation patterns from the source. Similarly, the directional hearing characteristics of the listener (e.g., HRTF) encode frequency-dependent, directionally-varying sound characteristics of sound reception patterns at the listener.

[0083] The sound source data for the input event can include an input signal, e.g., a time-domain representation of a sound such as series of samples of signal amplitude (e.g., 44100 samples per second). The input signal can have multiple frequency components and corresponding magnitudes and phases. In some implementations, the input time-domain signal is processed using an equalizer filter bank into different octave bands (e.g., nine bands) to obtain an equalized input signal.

[0084] Next, a lookup into the SDF can be performed by taking the encoded departure direction and rotating it into the local coordinate frame of the input source. This yields a runtime-adjusted sound departure direction that can be used to look up a corresponding set of octave-band loudness values (e.g., nine loudness values) in the SDF. Those loudness values can be applied to the corresponding octave bands in the equalized input signal, yielding nine separate distinct signals that can then be recombined into a single SDF-adjusted time-domain signal representing the initial sound emitted from the runtime source. Then, the encoded initial loudness value can be added to the SDF-adjusted time-domain signal.

[0085] The resulting loudness-adjusted time-domain signal can be input to a spatialization process to generate a binaural output signal that represents what the listener will hear in each ear. For instance, the spatialization process can utilize the HRTF to account for the relative difference between the encoded arrival direction and the runtime listener orientation. This can be accomplished by rotating the encoded arrival direction into the coordinate frame of the runtime listener’s orientation and using the resulting angle to do an HRTF lookup. The loudness-adjusted time-domain signal can be convolved with the result of the HRTF lookup to obtain the binaural output signal. For instance, the HRTF lookup can include two different time-domain signals, one for each ear, each of which can be convolved with the loudness-adjusted time-domain signal to obtain an output for each ear. The encoded delay time can be used to determine the time when the listener receives the individual signals of the binaural output.

[0086] Using the approach discussed above, the SDF and source orientation can be used to determine the amount of energy emitted by the runtime source for the initial path. For instance, for a source with an SDF that emits relatively concentrated sound energy, the initial path might be louder relative to the reflections than for a source with a more diffuse SDF. The HRTF and listener orientation can be used to determine how the listener perceives the arriving sound energy, e.g., the balance of the initial sound perceived for each ear.

[0087] The rendering can also render reflections from the input sound signal that account for both runtime source and runtime listener location and orientation. For instance, given the runtime source and listener locations, the rendering can involve identifying the reflection delay period, the reverberation decay period, and the encoded directional reflection parameters (e.g., a matrix or other aggregate representation) for that specific source/listener location pair. These can be used to render reflections as follows.

[0088] The directivity characteristics of the source provided by the SDF convey loudness characteristics radiating in each axial direction, e.g., north, south, east, west, up, and down, and these can be adjusted to account for runtime source orientation. For instance, the SDF can include octave-band gains that vary as a function of direction relative to the runtime sound source. Each axial direction can be rotated into the local frame of the runtime sound source, and a lookup can be done into the smoothed SDF to obtain, for each octave, one gain per axial direction. These gains can be used to modify the input sound signal, yielding six time-domain signals, one per axial direction.

……
……
……

本文链接：https://patent.nweon.com/17905

Microsoft Patent | Bidirectional propagation of sound

您可能还喜欢...

分类

最新AR/VR行业分享

最新AR/VR论文

最新AR/VR行业招聘

Microsoft Patent | Bidirectional propagation of sound

您可能还喜欢...

Microsoft Patent | Real-time preview for panoramic images

Microsoft Patent | Hologram Anchor Prioritization

Microsoft Patent | Waveguide combiner with separate in-coupling and out-coupling plates

分类

最新AR/VR行业分享

最新AR/VR论文

最新AR/VR行业招聘