Magic Leap Patent | Interaural Time Difference Crossfader For Binaural Audio Rendering
Patent: Interaural Time Difference Crossfader For Binaural Audio Rendering
Publication Number: 20200112817
Publication Date: 20200409
Applicants: Magic Leap
Abstract
Examples of the disclosure describe systems and methods for presenting an audio signal to a user of a wearable head device. According to an example method, a first input audio signal is received, the first input audio signal corresponding to a source location in a virtual environment presented to the user via the wearable head device. The first input audio signal is processed to generate a left output audio signal and a right output audio signal. The left output audio signal is presented to the left ear of the user via a left speaker associated with the wearable head device. The right output audio signal is presented to the right ear of the user via a right speaker associated with the wearable head device. Processing the first input audio signal comprises applying a delay process to the first input audio signal to generate a left audio signal and a right audio signal; adjusting a gain of the left audio signal; adjusting a gain of the right audio signal; applying a first head-related transfer function (HRTF) to the left audio signal to generate the left output audio signal; and applying a second HRTF to the right audio signal to generate the right output audio signal. Applying the delay process to the first input audio signal comprises applying an interaural time delay (ITD) to the first input audio signal, the ITD determined based on the source location.
REFERENCE TO RELATED APPLICATIONS
[0001] This application claims priority to U.S. Provisional Application No. 62/742,254, filed on Oct. 5, 2018, to U.S. Provisional Application No. 62/812,546, filed on Mar. 1, 2019, and to U.S. Provisional Application No. 62/742,191, filed on Oct. 5, 2018, the contents of which are incorporated by reference herein in their entirety.
FIELD
[0002] This disclosure relates generally to systems and methods for audio signal processing, and in particular to systems and methods for presenting audio signals in a mixed reality environment.
BACKGROUND
[0003] Immersive and believable virtual environments require the presentation of audio signals in a manner that is consistent with a user’s expectations–for example, expectations that an audio signal corresponding to an object in a virtual environment will be consistent with that object’s location in the virtual environment, and with a visual presentation of that object. Creating rich and complex soundscapes (sound environments) in virtual reality, augmented reality, and mixed-reality environments requires efficient presentation of a large number of digital audio signals, each appearing to come from a different location/proximity and/or direction in a user’s environment. Listeners’ brains are adapted to recognize differences in the time of arrival of a sound between the user’s two ears (e.g., by detecting a phase shift between the two ears); and to infer the spatial origin of the sound from the time difference. Accordingly, for a virtual environment, accurately presenting an interaural time difference (ITD) between the user’s left ear and right ear can be critical to a user’s ability to identify an audio source in the virtual environment. However, adjusting a soundscape to believably reflect the positions and orientations of the objects and of the user can require rapid changes to audio signals that can result in undesirable sonic artifacts, such as “clicking” sounds, that compromise the immersiveness of a virtual environment. It is desirable for systems and methods of presenting soundscapes to a user of a virtual environment to accurately present interaural time differences to the user’s ears, while minimizing sonic artifacts and remaining computationally efficient.
BRIEF SUMMARY
[0004] Examples of the disclosure describe systems and methods for presenting an audio signal to a user of a wearable head device. According to an example method, a first input audio signal is received, the first input audio signal corresponding to a source location in a virtual environment presented to the user via the wearable head device. The first input audio signal is processed to generate a left output audio signal and a right output audio signal. The left output audio signal is presented to the left ear of the user via a left speaker associated with the wearable head device. The right output audio signal is presented to the right ear of the user via a right speaker associated with the wearable head device. Processing the first input audio signal comprises applying a delay process to the first input audio signal to generate a left audio signal and a right audio signal; adjusting a gain of the left audio signal; adjusting a gain of the right audio signal; applying a first head-related transfer function (HRTF) to the left audio signal to generate the left output audio signal; and applying a second HRTF to the right audio signal to generate the right output audio signal. Applying the delay process to the first input audio signal comprises applying an interaural time delay (ITD) to the first input audio signal, the ITD determined based on the source location.
BRIEF DESCRIPTION OF THE DRAWINGS
[0005] FIG. 1 illustrates an example audio spatialization system, according to some embodiments of the disclosure.
[0006] FIGS. 2A-2C illustrate example delay modules, according to some embodiments of the disclosure.
[0007] FIGS. 3A-3B illustrate an example virtual sound source with respect to a listener, and an example corresponding delay module, respectively, according to some embodiments of the disclosure.
[0008] FIGS. 4A-4B illustrate an example virtual sound source with respect to a listener, and an example corresponding delay module, respectively, according to some embodiments of the disclosure.
[0009] FIGS. 5A-5B illustrate an example virtual sound source with respect to a listener, and an example corresponding delay module, respectively, according to some embodiments of the disclosure.
[0010] FIG. 6A illustrates an example cross-fader, according to some embodiments of the disclosure.
[0011] FIGS. 6B-6C illustrate example control signals for a cross-fader, according to some embodiments of the disclosure.
[0012] FIGS. 7A-7B illustrate an example virtual sound source with respect to a listener, and an example corresponding delay module including cross-faders, respectively, according to some embodiments of the disclosure.
[0013] FIGS. 8A-8B illustrate an example virtual sound source with respect to a listener, and an example corresponding delay module including cross-faders, respectively, according to some embodiments of the disclosure.
[0014] FIGS. 9A-9B illustrate an example virtual sound source with respect to a listener, and an example corresponding delay module including a cross-fader, respectively, according to some embodiments of the disclosure.
[0015] FIGS. 10A-10B illustrate an example virtual sound source with respect to a listener, and an example corresponding delay module including a cross-fader, respectively, according to some embodiments of the disclosure.
[0016] FIGS. 11A-11B illustrate an example virtual sound source with respect to a listener, and an example corresponding delay module including a cross-fader, respectively, according to some embodiments of the disclosure.
[0017] FIGS. 12A-12B illustrate an example virtual sound source with respect to a listener, and an example corresponding delay module including a cross-fader, respectively, according to some embodiments of the disclosure.
[0018] FIGS. 13A-13B illustrate an example virtual sound source with respect to a listener, and an example corresponding delay module including a cross-fader, respectively, according to some embodiments of the disclosure.
[0019] FIGS. 14A-14B illustrate an example virtual sound source with respect to a listener, and an example corresponding delay module including a cross-fader, respectively, according to some embodiments of the disclosure.
[0020] FIGS. 15A-15B illustrate an example virtual sound source with respect to a listener, and an example corresponding delay module including a cross-fader, respectively, according to some embodiments of the disclosure.
[0021] FIGS. 16A-16B illustrate an example virtual sound source with respect to a listener, and an example corresponding delay module including a cross-fader, respectively, according to some embodiments of the disclosure.
[0022] FIG. 17 illustrates an example delay module, according to some embodiments of the disclosure.
[0023] FIGS. 18A-18E illustrate example delay modules, according to some embodiments of the disclosure.
[0024] FIGS. 19-22 illustrate example processes for transitioning between delay modules, according to some embodiments of the disclosure.
[0025] FIG. 23 illustrates an example wearable system, according to some embodiments of the disclosure.
[0026] FIG. 24 illustrates an example handheld controller that can be used in conjunction with an example wearable system, according to some embodiments of the disclosure.
[0027] FIG. 25 illustrates an example auxiliary unit that can be used in conjunction with an example wearable system, according to some embodiments of the disclosure.
[0028] FIG. 26 illustrates an example functional block diagram for an example wearable system, according to some embodiments of the disclosure.
DETAILED DESCRIPTION
[0029] In the following description of examples, reference is made to the accompanying drawings which form a part hereof, and in which it is shown by way of illustration specific examples that can be practiced. It is to be understood that other examples can be used and structural changes can be made without departing from the scope of the disclosed examples.
[0030]* Example Wearable System*
[0031] FIG. 23 illustrates an example wearable head device 2300 configured to be worn on the head of a user. Wearable head device 2300 may be part of a broader wearable system that includes one or more components, such as a head device (e.g., wearable head device 2300), a handheld controller (e.g., handheld controller 2400 described below), and/or an auxiliary unit (e.g., auxiliary unit 2500 described below). In some examples, wearable head device 2300 can be used for virtual reality, augmented reality, or mixed reality systems or applications. Wearable head device 2300 can include one or more displays, such as displays 2310A and 2310B (which may include left and right transmissive displays, and associated components for coupling light from the displays to the user’s eyes, such as orthogonal pupil expansion (OPE) grating sets 2312A/2312B and exit pupil expansion (EPE) grating sets 2314A/2314B); left and right acoustic structures, such as speakers 2320A and 2320B (which may be mounted on temple arms 2322A and 2322B, and positioned adjacent to the user’s left and right ears, respectively); one or more sensors such as infrared sensors, accelerometers, GPS units, inertial measurement units (IMUs, e.g. IMU 2326), acoustic sensors (e.g., microphones 2350); orthogonal coil electromagnetic receivers (e.g., receiver 2327 shown mounted to the left temple arm 2322A); left and right cameras (e.g., depth (time-of-flight) cameras 2330A and 2330B) oriented away from the user; and left and right eye cameras oriented toward the user (e.g., for detecting the user’s eye movements)(e.g., eye cameras 2328A and 2328B). However, wearable head device 2300 can incorporate any suitable display technology, and any suitable number, type, or combination of sensors or other components without departing from the scope of the invention. In some examples, wearable head device 2300 may incorporate one or more microphones 150 configured to detect audio signals generated by the user’s voice; such microphones may be positioned adjacent to the user’s mouth. In some examples, wearable head device 2300 may incorporate networking features (e.g., Wi-Fi capability) to communicate with other devices and systems, including other wearable systems. Wearable head device 2300 may further include components such as a battery, a processor, a memory, a storage unit, or various input devices (e.g., buttons, touchpads); or may be coupled to a handheld controller (e.g., handheld controller 2400) or an auxiliary unit (e.g., auxiliary unit 2500) that includes one or more such components. In some examples, sensors may be configured to output a set of coordinates of the head-mounted unit relative to the user’s environment, and may provide input to a processor performing a Simultaneous Localization and Mapping (SLAM) procedure and/or a visual odometry algorithm. In some examples, wearable head device 2300 may be coupled to a handheld controller 2400, and/or an auxiliary unit 2500, as described further below.
[0032] FIG. 24 illustrates an example mobile handheld controller component 2400 of an example wearable system. In some examples, handheld controller 2400 may be in wired or wireless communication with wearable head device 2300 and/or auxiliary unit 2500 described below. In some examples, handheld controller 2400 includes a handle portion 2420 to be held by a user, and one or more buttons 2440 disposed along a top surface 2410. In some examples, handheld controller 2400 may be configured for use as an optical tracking target; for example, a sensor (e.g., a camera or other optical sensor) of wearable head device 2300 can be configured to detect a position and/or orientation of handheld controller 2400–which may, by extension, indicate a position and/or orientation of the hand of a user holding handheld controller 2400. In some examples, handheld controller 2400 may include a processor, a memory, a storage unit, a display, or one or more input devices, such as described above. In some examples, handheld controller 2400 includes one or more sensors (e.g., any of the sensors or tracking components described above with respect to wearable head device 2300). In some examples, sensors can detect a position or orientation of handheld controller 2400 relative to wearable head device 2300 or to another component of a wearable system. In some examples, sensors may be positioned in handle portion 2420 of handheld controller 2400, and/or may be mechanically coupled to the handheld controller. Handheld controller 2400 can be configured to provide one or more output signals, corresponding, for example, to a pressed state of the buttons 2440; or a position, orientation, and/or motion of the handheld controller 2400 (e.g., via an IMU). Such output signals may be used as input to a processor of wearable head device 2300, to auxiliary unit 2500, or to another component of a wearable system. In some examples, handheld controller 2400 can include one or more microphones to detect sounds (e.g., a user’s speech, environmental sounds), and in some cases provide a signal corresponding to the detected sound to a processor (e.g., a processor of wearable head device 2300).
[0033] FIG. 25 illustrates an example auxiliary unit 2500 of an example wearable system. In some examples, auxiliary unit 2500 may be in wired or wireless communication with wearable head device 2300 and/or handheld controller 2400. The auxiliary unit 2500 can include a battery to provide energy to operate one or more components of a wearable system, such as wearable head device 2300 and/or handheld controller 2400 (including displays, sensors, acoustic structures, processors, microphones, and/or other components of wearable head device 2300 or handheld controller 2400). In some examples, auxiliary unit 2500 may include a processor, a memory, a storage unit, a display, one or more input devices, and/or one or more sensors, such as described above. In some examples, auxiliary unit 2500 includes a clip 2510 for attaching the auxiliary unit to a user (e.g., a belt worn by the user). An advantage of using auxiliary unit 2500 to house one or more components of a wearable system is that doing so may allow large or heavy components to be carried on a user’s waist, chest, or back–which are relatively well suited to support large and heavy objects–rather than mounted to the user’s head (e.g., if housed in wearable head device 2300) or carried by the user’s hand (e.g., if housed in handheld controller 2400). This may be particularly advantageous for relatively heavy or bulky components, such as batteries.
[0034] FIG. 26 shows an example functional block diagram that may correspond to an example wearable system 2600, such as may include example wearable head device 2300, handheld controller 2400, and auxiliary unit 2500 described above. In some examples, the wearable system 2600 could be used for virtual reality, augmented reality, or mixed reality applications. As shown in FIG. 26, wearable system 2600 can include example handheld controller 2600B, referred to here as a “totem” (and which may correspond to handheld controller 2400 described above); the handheld controller 2600B can include a totem-to-headgear six degree of freedom (6DOF) totem subsystem 2604A. Wearable system 2600 can also include example headgear device 2600A (which may correspond to wearable head device 2300 described above); the headgear device 2600A includes a totem-to-headgear 6DOF headgear subsystem 2604B. In the example, the 6DOF totem subsystem 2604A and the 6DOF headgear subsystem 2604B cooperate to determine six coordinates (e.g., offsets in three translation directions and rotation along three axes) of the handheld controller 2600B relative to the headgear device 2600A. The six degrees of freedom may be expressed relative to a coordinate system of the headgear device 2600A. The three translation offsets may be expressed as X, Y, and Z offsets in such a coordinate system, as a translation matrix, or as some other representation. The rotation degrees of freedom may be expressed as sequence of yaw, pitch and roll rotations; as vectors; as a rotation matrix; as a quaternion; or as some other representation. In some examples, one or more depth cameras 2644 (and/or one or more non-depth cameras) included in the headgear device 2600A; and/or one or more optical targets (e.g., buttons 2440 of handheld controller 2400 as described above, or dedicated optical targets included in the handheld controller) can be used for 6DOF tracking. In some examples, the handheld controller 2600B can include a camera, as described above; and the headgear device 2600A can include an optical target for optical tracking in conjunction with the camera. In some examples, the headgear device 2600A and the handheld controller 2600B each include a set of three orthogonally oriented solenoids which are used to wirelessly send and receive three distinguishable signals. By measuring the relative magnitude of the three distinguishable signals received in each of the coils used for receiving, the 6DOF of the handheld controller 2600B relative to the headgear device 2600A may be determined. In some examples, 6DOF totem subsystem 2604A can include an Inertial Measurement Unit (IMU) that is useful to provide improved accuracy and/or more timely information on rapid movements of the handheld controller 2600B.
[0035] In some examples involving augmented reality or mixed reality applications, it may be desirable to transform coordinates from a local coordinate space (e.g., a coordinate space fixed relative to headgear device 2600A) to an inertial coordinate space, or to an environmental coordinate space. For instance, such transformations may be necessary for a display of headgear device 2600A to present a virtual object at an expected position and orientation relative to the real environment (e.g., a virtual person sitting in a real chair, facing forward, regardless of the position and orientation of headgear device 2600A), rather than at a fixed position and orientation on the display (e.g., at the same position in the display of headgear device 2600A). This can maintain an illusion that the virtual object exists in the real environment (and does not, for example, appear positioned unnaturally in the real environment as the headgear device 2600A shifts and rotates). In some examples, a compensatory transformation between coordinate spaces can be determined by processing imagery from the depth cameras 2644 (e.g., using a Simultaneous Localization and Mapping (SLAM) and/or visual odometry procedure) in order to determine the transformation of the headgear device 2600A relative to an inertial or environmental coordinate system. In the example shown in FIG. 26, the depth cameras 2644 can be coupled to a SLAM/visual odometry block 2606 and can provide imagery to block 2606. The SLAM/visual odometry block 2606 implementation can include a processor configured to process this imagery and determine a position and orientation of the user’s head, which can then be used to identify a transformation between a head coordinate space and a real coordinate space. Similarly, in some examples, an additional source of information on the user’s head pose and location is obtained from an IMU 2609 of headgear device 2600A. Information from the IMU 2609 can be integrated with information from the SLAM/visual odometry block 2606 to provide improved accuracy and/or more timely information on rapid adjustments of the user’s head pose and position.
[0036] In some examples, the depth cameras 2644 can supply 3D imagery to a hand gesture tracker 2611, which may be implemented in a processor of headgear device 2600A. The hand gesture tracker 2611 can identify a user’s hand gestures, for example by matching 3D imagery received from the depth cameras 2644 to stored patterns representing hand gestures. Other suitable techniques of identifying a user’s hand gestures will be apparent.
[0037] In some examples, one or more processors 2616 may be configured to receive data from headgear subsystem 2604B, the IMU 2609, the SLAM/visual odometry block 2606, depth cameras 2644, microphones 2650; and/or the hand gesture tracker 2611. The processor 2616 can also send and receive control signals from the 6DOF totem system 2604A. The processor 2616 may be coupled to the 6DOF totem system 2604A wirelessly, such as in examples where the handheld controller 2600B is untethered. Processor 2616 may further communicate with additional components, such as an audio-visual content memory 2618, a Graphical Processing Unit (GPU) 2620, and/or a Digital Signal Processor (DSP) audio spatializer 2622. The DSP audio spatializer 2622 may be coupled to a Head Related Transfer Function (HRTF) memory 2625. The GPU 2620 can include a left channel output coupled to the left source of imagewise modulated light 2624 and a right channel output coupled to the right source of imagewise modulated light 2626. GPU 2620 can output stereoscopic image data to the sources of imagewise modulated light 2624, 2626. The DSP audio spatializer 2622 can output audio to a left speaker 2612 and/or a right speaker 2614. The DSP audio spatializer 2622 can receive input from processor 2616 indicating a direction vector from a user to a virtual sound source (which may be moved by the user, e.g., via the handheld controller 2600B). Based on the direction vector, the DSP audio spatializer 2622 can determine a corresponding HRTF (e.g., by accessing a HRTF, or by interpolating multiple HRTFs). The DSP audio spatializer 2622 can then apply the determined HRTF to an audio signal, such as an audio signal corresponding to a virtual sound generated by a virtual object. This can enhance the believability and realism of the virtual sound, by incorporating the relative position and orientation of the user relative to the virtual sound in the mixed reality environment–that is, by presenting a virtual sound that matches a user’s expectations of what that virtual sound would sound like if it were a real sound in a real environment.
[0038] In some examples, such as shown in FIG. 26, one or more of processor 2616, GPU 2620, DSP audio spatializer 2622, HRTF memory 2625, and audio/visual content memory 2618 may be included in an auxiliary unit 2600C (which may correspond to auxiliary unit 2500 described above). The auxiliary unit 2600C may include a battery 2627 to power its components and/or to supply power to headgear device 2600A and/or handheld controller 2600B. Including such components in an auxiliary unit, which can be mounted to a user’s waist, can limit the size and weight of headgear device 2600A, which can in turn reduce fatigue of a user’s head and neck.
[0039] While FIG. 26 presents elements corresponding to various components of an example wearable system 2600, various other suitable arrangements of these components will become apparent to those skilled in the art. For example, elements presented in FIG. 26 as being associated with auxiliary unit 2600C could instead be associated with headgear device 2600A or handheld controller 2600B. Furthermore, some wearable systems may forgo entirely a handheld controller 2600B or auxiliary unit 2600C. Such changes and modifications are to be understood as being included within the scope of the disclosed examples.
[0040]* Audio Rendering*
[0041] The systems and methods described below can be implemented in an augmented reality or mixed reality system, such as described above. For example, one or more processors (e.g., CPUs, DSPs) of an augmented reality system can be used to process audio signals or to implement steps of computer-implemented methods described below; sensors of the augmented reality system (e.g., cameras, acoustic sensors, IMUs, LIDAR, GPS) can be used to determine a position and/or orientation of a user of the system, or of elements in the user’s environment; and speakers of the augmented reality system can be used to present audio signals to the user.
[0042] In augmented reality or mixed reality systems such as described above, one or more processors (e.g., DSP audio spatializer 2622) can process one or more audio signals for presentation to a user of a wearable head device via one or more speakers (e.g., left and right speakers 2612/2614 described above). In some embodiments, the one or more speakers may belong to a unit separate from the wearable head device (e.g., headphones). Processing of audio signals requires tradeoffs between the authenticity of a perceived audio signal–for example, the degree to which an audio signal presented to a user in a mixed reality environment matches the user’s expectations of how an audio signal would sound in a real environment–and the computational overhead involved in processing the audio signal. Realistically spatializing an audio signal in a virtual environment can be critical to creating immersive and believable user experiences.
[0043] FIG. 1 illustrates an example spatialization system 100, according to some embodiments. The system 100 creates a soundscape (sound environment) by spatializing input sounds/signals. The system 100 includes an encoder 104, a mixer 106, and a decoder 110.