Sony Patent | Audio control based on room correction and head related transfer function

编辑：映维 | 分类：Sony | 2021年5月14日

Patent: Audio control based on room correction and head related transfer function

Publication Number: 20210144475

Publication Date: 20210513

Applicant: Sony

Abstract

An audio reproduction device and method for audio control based on room-correction (RC) and head related transfer function (HRTF). The audio reproduction device includes a speaker configured to reproduce a first audio signal. The audio reproduction device receives a plurality of second audio signals indicative of frequency responses captured based on the first audio signal and captured by a plurality of audio capturing devices positioned on a head wearable device of a user present within an enclosed physical space. The audio reproduction device determines RC preset for one or more RC filters associated with the speaker, based on the captured frequency responses. The audio reproduction device further determines HRTF associated with the user based on the captured frequency responses, and user-specific information of the user. The audio reproduction device further controls audio reproduction of the speaker based on the determined RC preset and the determined HRTF.

Claims

An audio reproduction device, comprising: a speaker configured to reproduce a first audio signal; a circuitry coupled with the speaker, wherein the circuitry is configured to: receive a plurality of second audio signals captured by a plurality of audio capturing devices which are positioned on a head wearable device of a first user present within an enclosed physical space, wherein each of the plurality of second audio signals indicates a frequency response captured based on the first audio signal reproduced by the speaker; determine a room-correction (RC) preset for one or more RC filters associated with the speaker, based on the frequency responses captured in the received plurality of second audio signals, wherein the determined RC preset corresponds to a location of the first user within the enclosed physical space; determine a first head related transfer function (HRTF) associated with the first user based on the frequency responses captured in the received plurality of second audio signals and user-specific information corresponding to the first user, wherein the first HRTF is determined for one or more HRTF filters associated with the speaker; and control the audio reproduction of the speaker based on the determined RC preset corresponding to the location within the enclosed physical space, and further based on the determined first HRTF corresponding to the first user present within the enclosed physical space.
The audio reproduction device according to claim 1, wherein the circuitry is further configured to: determine an average value of the frequency responses captured in the received plurality of second audio signals; and determine the RC preset for the one or more RC filters associated with the speaker, based on the determined average value of the frequency responses captured in the received plurality of second audio signals.
The audio reproduction device according to claim 1, wherein the RC preset comprises one or more filter coefficients associated with the one or more RC filters of the speaker.
The audio reproduction device according to claim 1, further comprising: a memory configured to store the determined RC preset corresponding to the location within the enclosed physical space, and store the determined first HRTF corresponding to the first user present within the enclosed physical space.
The audio reproduction device according to claim 1, wherein the user-specific information comprises at least one of dimensions of a head of the first user, dimensions of ears of the first user, dimensions of ear canals of the first user, dimensions of a shoulder of the first user, dimensions of a torso of the first user, a density of the head of the first user, or an orientation of the head of the first user.
The audio reproduction device according to claim 1, wherein the circuitry is further configured to: determine an interaural time difference (ITD) and an interaural level difference (ILD) for the first user based on the frequency responses captured in the received plurality of second audio signals; and determine the first HRTF associated with the first user based on the determined ITD and the determined ILD.
The audio reproduction device according to claim 1, wherein the circuitry is further configured to: determine a first value of at least one coefficient of the one or more HRTF filters of the speaker; determine a second value of at least one coefficient of the one or more RC filters of the speaker; and control the audio reproduction of the speaker based on the determined first value of the at least one coefficient of the one or more HRTF filters and the determined second value of the at least one coefficient of the one or more RC filters.
The audio reproduction device according to claim 7, further comprising an Input-Output (I/O) interface, wherein the circuitry is further configured to: receive a user input, via the I/O interface; and determine the first value of the at least one coefficient of the one or more HRTF filters and the second value of the at least one coefficient of the one or more RC filters based on the received user input.
The audio reproduction device according to claim 1, wherein the circuitry is further configured to receive occupancy information from at least one sensor communicably coupled to the audio reproduction device, wherein the occupancy information indicates a number of users of a set of users present within the enclosed physical space and the set of users includes the first user.
The audio reproduction device according to claim 9, wherein the circuitry is further configured to: determine whether a second HRTF is calibrated for a second user of the set of users, wherein the second user is different from the first user; and determine a first value of at least one coefficient of the one or more HRTF filters and a second value of at least one coefficient of the one or more RC filters based on the determination.
The audio reproduction device according to claim 10, wherein the circuitry is further configured to: set the second value of the at least one coefficient of the one or more RC filters being higher than the first value of the at least one coefficient of the one or more HRTF filters based on the received occupancy information which indicates the number of users as more than one; and control the audio reproduction of the speaker based on the determined first value of the at least one coefficient of the one or more HRTF filters and the determined second value of the at least one coefficient of the one or more RC filters.
A method, comprising: in an audio reproduction device, which includes a speaker configured to reproduce a first audio signal; receiving a plurality of second audio signals captured by a plurality of audio capturing devices which are positioned on a head wearable device of a first user present within an enclosed physical space, wherein each of the plurality of second audio signals indicates a frequency response captured based the first audio signal reproduced by the speaker; determining a room-correction (RC) preset for one or more RC filters associated with the speaker, based on the frequency responses captured in the received plurality of second audio signals, wherein the determined RC preset corresponds to a location of the first user within the enclosed physical space; determining a first head related transfer function (HRTF) associated with the first user based on the frequency responses captured in the received plurality of second audio signals and user-specific information corresponding to the first user, wherein the first HRTF is determined for one or more HRTF filters associated with the speaker; and controlling the audio reproduction of the speaker based on the determined RC preset corresponding to the location within the enclosed physical space, and further based on the determined first HRTF corresponding to the first user present within the enclosed physical space.
The method according to claim 12, further comprising: determining an average value of the frequency responses captured in the received plurality of second audio signals; and determining the RC preset for the one or more RC filters associated with the speaker, based on the determined average value of the frequency responses captured in the received plurality of second audio signals.
The method according to claim 12, further comprising: determining an interaural time difference (ITD) and an interaural level difference (ILD) for the first user based on the frequency responses captured in the received plurality of second audio signals; and determining the first HRTF associated with the first user based on the determined ITD and the determined ILD.
The method according to claim 12, further comprising: determining a first value of at least one coefficient of the one or more HRTF filters of the speaker; determining a second value of at least one coefficient of the one or more RC filters of the speaker; and controlling the audio reproduction of the speaker based on the determined first value of the at least one coefficient of the one or more HRTF filters and the determined second value of the at least one coefficient of the one or more RC filters.
The method according to claim 15, further comprising: receiving a user input, via an Input-Output (I/O) interface of the audio reproduction device; and determining the first value of the at least one coefficient of the one or more HRTF filters and determining the second value of the at least one coefficient of the one or more RC filters based on the received user input.
The method according to claim 12, further comprising: receiving occupancy information from at least one sensor communicably coupled to the audio reproduction device, wherein the occupancy information indicates a number of users of a set of users present within the enclosed physical space and the set of users includes the first user.
The method according to claim 17, further comprising: determining whether a second HRTF is calibrated for a second user of the set of users, wherein the second user is different from the first user; and determining a first value of at least one coefficient of the one or more HRTF filters and a second value of at least one coefficient of the one or more RC filters based on the determination.
The method according to claim 18, further comprising: setting the second value of the at least one coefficient of the one or more RC filters being higher than the first value of the at least one coefficient of the one or more HRTF filters based on the received occupancy information which indicates the number of users as more than one; and controlling the audio reproduction of the speaker based on the determined first value of the at least one coefficient of the one or more HRTF filters and the determined second value of the at least one coefficient of the one or more RC filters.
A non-transitory computer-readable medium having stored thereon, computer-executable instructions that when executed by an audio reproduction device, causes the audio reproduction device to execute operations, the operations comprising: receiving a plurality of second audio signals captured by a plurality of audio capturing devices which are positioned on a head wearable device of a first user present within an enclosed physical space, wherein each of the plurality of second audio signals indicates a frequency response captured based a first audio signal reproduced by a speaker in the audio reproduction device; determining a room-correction (RC) preset for one or more RC filters associated with the speaker based on the frequency responses captured in the received plurality of second audio signals, wherein the determined RC preset corresponds to a location of the first user within the enclosed physical space; determining a first head related transfer function (HRTF) associated with the first user based on the frequency responses captured in the received plurality of second audio signals and user-specific information corresponding to the first user, wherein the first HRTF is determined for one or more HRTF filters associated with the speaker; and controlling the audio reproduction of the speaker based on the determined RC preset corresponding to the location within the enclosed physical space, and further based on the determined first HRTF corresponding to the first user present within the enclosed physical space.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS/INCORPORATION BY REFERENCE

[0001] This application claims priority to U.S. Provisional Patent Application Ser. No. 62/931,946 filed on Nov. 7, 2019, the entire content of which is hereby incorporated herein by reference.

FIELD

[0002] Various embodiments of the disclosure relate to audio reproduction. More specifically, various embodiments of the disclosure relate to an audio reproduction device and method for audio control based on room correction and head related transfer function.

BACKGROUND

[0003] Recent advancements in the field of audio reproduction devices (such as, televisions, and speakers) have led to development of various technologies and systems to enhance reproduction of audio content. Typically, when an audio reproduction device located within an enclosed physical space (such as, a room or a cinema hall) reproduces audio content, a user (such as, one or more persons present within the enclosed physical space) may hear sound associated with different audio frequency responses for the same reproduced audio content. Such a variation of the sound may be due to various factors, such as, a distance of the user from the audio reproduction device, an absorption and/or a reflection of the audio due to the surrounding environment (such as, furniture, curtain, and walls) of the enclosed physical space or user-specific parameters (such as, a head size or an ear size). However, in certain situations, the audio reproduction device may employ conventional techniques in order to match audio frequency responses heard by the user, with the audio frequency response of the reproduced audio content to provide an optimum sound experience in the enclosed physical space. However, in certain situations, with the variation with number of users present in the enclosed physical space or variation in the user-specific parameters of the user, the optimal sound experience may be affected.

[0004] Further limitations and disadvantages of conventional and traditional approaches will become apparent to one of skill in the art, through comparison of described systems with some aspects of the present disclosure, as set forth in the remainder of the present application and with reference to the drawings.

SUMMARY

[0005] An audio reproduction device and method for audio control based on room correction and head related transfer function is provided substantially as shown in, and/or described in connection with, at least one of the figures, as set forth more completely in the claims.

[0006] These and other features and advantages of the present disclosure may be appreciated from a review of the following detailed description of the present disclosure, along with the accompanying figures in which like reference numerals refer to like parts throughout.

BRIEF DESCRIPTION OF THE DRAWINGS

[0007] FIG. 1 is a block diagram that illustrates an exemplary network environment for an audio reproduction device for audio control based on room correction and head related transfer function, in accordance with an embodiment of the disclosure.

[0008] FIG. 2 is a block diagram that illustrates an exemplary audio reproduction device in FIG. 1, in accordance with an embodiment of the disclosure.

[0009] FIGS. 3A and 3B collectively depict exemplary operations of audio reproduction device for audio control based on room correction and head related transfer function, in accordance with an embodiment of the disclosure.

[0010] FIG. 4 is a diagram that illustrates an exemplary interface associated with the audio reproduction device to provide room correction and head related transfer function related values, in accordance with an embodiment of the disclosure.

[0011] FIGS. 5A and 5B are diagrams that depict exemplary scenarios for audio control based on room correction and head related transfer function, in accordance with an embodiment of the disclosure.

[0012] FIG. 6 depicts a flowchart that illustrates an exemplary method for audio control based on room correction and head related transfer function, in accordance with an embodiment of the disclosure.

DETAILED DESCRIPTION

[0013] The following described implementations may be found in the disclosed audio reproduction device and method for audio control based on room correction and head related transfer function. Exemplary aspects of the disclosure provide an audio reproduction device (for example, a television (TV), an audio video receivers (AVRs), and other audio reproduction devices) that may control audio reproduction of a speaker (such as, a loudspeaker, a soundbar, a woofer, and the like) in an enclosed physical space (such as, a room, a hall, and the like). The audio reproduction device may be configured to receive a plurality of audio signals captured by a plurality of audio capturing devices (such as, recorders, dynamic microphones, or other microphones). The reception of the plurality of audio signals may allow the audio reproduction device to dynamically determine frequency responses captured for audio signals reproduced by the speaker of the audio reproduction device. The plurality of audio capturing devices may be positioned on a head wearable device (such as, a headphone) of a user present within the enclosed physical space. Based on the determined frequency responses, the audio reproduction device may be further configured to determine a room-correction (RC) preset for one or more RC filters associated with the speaker. Such RC preset may be employed by the speaker in order to provide dynamic control of room correction of sound present in the enclosed physical space for a particular location of the user (i.e. listener). The RC preset may be pre-calibrated for the particular location at which the frequency response may be captured, to provide the room correction in the enclosed physical space. The audio reproduction device may be further configured to determine a head related transfer function (HRTF) associated with the user for one or more HRTF filters associated with the speaker, based on the frequency responses determined for the received plurality of audio signals and user-specific information corresponding to the user. In an embodiment, the user-specific information may include, but is not limited to, dimensions of a head of the user, dimensions of ears of the user, dimensions of ear canals of the user, dimensions of a shoulder of the user, dimensions of a torso of the user, a density of the head of the user, or an orientation of the head of the user. The audio reproduction device may be further configured to control the audio reproduction of the speaker based on the determined RC preset corresponding to the location within the enclosed physical space, and further based on the determined HRTF corresponding to the user present within the enclosed physical space. Therefore, the disclosed audio reproduction device achieves audio reproduction control based on combination of the room correction (RC) preset and the HRTF, thereby enhance sound experience for the user present in the enclosed physical space in a real time. In an embodiment, the disclosed audio reproduction device may determine a contribution of the room correction (RC) and head related transfer function (HRTF) to control the audio reproduction, based on different factors such as, but not limited to, user inputs, number of users (i.e. listeners) present in the enclosed physical space, and pre-stored RC presets or HRTF related values/parameters.

[0014] FIG. 1 is a block diagram that illustrates an exemplary network environment for an audio reproduction device for audio control based on room correction and head related transfer function, in accordance with an embodiment of the disclosure. With reference to FIG. 1, there is a diagram of a network environment 100. The network environment 100 may include an audio reproduction device 102, a head wearable device 104, a plurality of audio capturing devices 106A-106B (such as, a first audio capturing device 106A and a second audio capturing device 1068) present within an enclosed physical space 108, and a server 110. The audio reproduction device 102, the head wearable device 104, and the plurality of audio capturing devices 106A-106B may be communicatively coupled to the server 110, via a communication network 112. Further, the head wearable device 104 and the plurality of audio capturing devices 106A-106B may be communicatively coupled with the audio reproduction device 102, via the communication network 112. The audio reproduction device 102 may further include a speaker 114 configured to reproduce a first audio signal (i.e. sound). In the network environment 100, there is further shown a first user 116 associated with the audio reproduction device 102 present within the enclosed physical space 108. It may be noted that the speaker 114 included in the audio reproduction device 102 in FIG. 1 is presented merely as an example. In some embodiments, the speaker 114 may be external to the audio reproduction device 102 and may be communicably coupled to the audio reproduction device 102, without deviating from the scope of disclosure. In such case, the speaker 114 may be present in the enclosed physical space 108 and the audio reproduction device 102 may be present outside the enclosed physical space 108.

[0015] Further, as shown in FIG. 1, the plurality of audio capturing devices 106A-106B may be positioned on the head wearable device 104 of the first user 116. It may be noted that the head wearable device 104, the first audio capturing device 106A, and the second audio capturing device 1068, shown in FIG. 1 are presented merely as an example. The network environment 100 may include other forms of the head wearable device 104, the first audio capturing device 106A and the second audio capturing device 1068 without deviating from the scope of the disclosure. In some embodiments, the plurality of audio capturing devices 106A-106B may include only one audio capturing device or more than one audio capturing device, without deviating from the scope of the disclosure.

[0016] The audio reproduction device 102 may include suitable logic, circuitry, code and/or interfaces that may be configured to control audio reproduction of the speaker 114 in the enclosed physical space 108 based on room correction (RC) and HRTF applied on the speaker 114. The audio reproduction device 102 may be configured to receive a plurality of audio signals captured by a plurality of audio capturing devices (such as, the first audio capturing device 106A and the second audio capturing device 106B) which may be positioned on the head wearable device 104 of the first user 116 present within the enclosed physical space 108. Each of the plurality of audio signals may indicate a frequency response captured based on the first audio signal reproduced by the speaker 114. Based on the frequency responses captured in the received plurality of second audio signals, the audio reproduction device 102 may be configured to determine a room-correction (RC) preset for one or more RC filters associated with the speaker 114 for a particular location of the first user 116 within the enclosed physical space 108. The audio reproduction device 102 may be further configured to determine a head related transfer function (HRTF) associated with the first user 116 based on the frequency responses captured in the received plurality of second audio signals and user-specific information corresponding to the first user 116. Based on the determined RC preset corresponding to the location within the enclosed physical space 108, and further based on the determined HRTF corresponding to the first user 116 present within the enclosed physical space 108, the audio reproduction device 102 may be configured to control the audio reproduction of the speaker 114. Examples of the audio reproduction device 102 may include, but are not limited to, a television (TV), an audio video receivers (AVRs), a soundbar, a sound system, a home theater system, radio receivers, tape recorders with audio reproduction capability, an audio amplifier, audio mixing console, loudspeakers, speakers, or other audio reproduction devices.

[0017] The head wearable device 104 may include suitable logic, circuitry, and/or interfaces that may be worn by the first user 116 to capture a plurality of second audio signals, via the plurality of audio capturing devices 106A-106B, for the first audio signal reproduced by the speaker 114. In some embodiments, the head wearable device 104 may be configured to control playback of multimedia content and other control functions based on different user inputs received from the first user 116. The user inputs may be received from the first user 116 via the plurality of audio capturing devices 106A-106B. In such case, the user inputs may correspond to audio inputs (or voice inputs) from the first user 116. In certain embodiments, the user input may correspond to an input other than a voice input (or an audio input) received from the first user 116. Examples of such user input may include, but are not limited to, a button press input, a touch input, a gesture input, a physical tap, or a haptic input. In certain embodiments, the user input may be represented as an instruction, such as an audio input) for the head wearable device 104.

[0018] Examples of the head wearable device 104 may include, but are not limited to, a head mounted device, a head worn device, headphone, an audio-video (AV) entertainment device, an earphone, a smart glass, a virtual-reality (VR) device, a display device worn on the head of the first user 116, a video-conferencing device worn on the head of the first user 116, a gaming device worn on the head of the first user 116, and/or a consumer electronic (CE) device worn on the head of the first user 116. In accordance with an embodiment, a media player device (not shown) may be integrated with the head wearable device 104. The media player device may be configured to store, decode, and output the multimedia content to different components, for example, a display, a set of speakers, or in-ear speakers, of the head wearable device 104. Examples of the media player device may include, but are not limited to, an audio player, a VR player, and an audio/video (A/V) player.

[0019] The plurality of audio capturing devices 106A-106B may include suitable logic, circuitry, code and/or interfaces that may be configured to capture the plurality of second audio signals for the first audio signal (i.e. sound) reproduced from the speaker 114. The plurality of audio capturing devices 106A-106B may further generate a frequency response of the captured plurality of second audio signals. In an embodiment, the frequency responses of the plurality of second audio signals captured by the plurality of audio capturing devices 106A-106B may be different from a frequency response of the first audio signal reproduced by the speaker 114 due to certain factors (such as sound reflections or absorption done by objects or walls of the enclosed physical space 108). In some embodiments, the plurality of audio capturing devices 106A-106B may be communicatively coupled with the head wearable device 104 and may be positioned on the head wearable device 104. In some embodiments, the plurality of audio capturing devices 106A-106B may be integrated within the head wearable device 104 and may be a component of the head wearable device 104 and the entire functionality of the plurality of audio capturing devices 106A-106B may be included in the head wearable device 104. Examples of the plurality of audio capturing devices 106A-106B may include, but are not limited to, a recorder, an electret microphone, a dynamic microphone, a stereo microphone, a carbon microphone, a piezoelectric microphone, a fiber microphone, a micro-electro-mechanical-systems (MEMS) microphone, or other microphones.

[0020] In the network environment 100, the audio reproduction device 102, the head wearable device 104, the plurality of audio capturing devices 106A-106B, and the first user 116 may be located within the enclosed physical space 108. The enclosed physical space 108 may include a three-dimensional physical area that may be surrounded by walls and have a defined physical dimension in a physical environment. Examples of the enclosed physical space 108 may include, but are not limited to, a room, a hall or other enclosed areas.

[0021] The speaker 114 may include suitable logic, circuitry, code and/or interfaces that may be configured to reproduce the first audio signal (for example a song, a test tone, or a musical tone) associated with the audio reproduction device 102. The speaker 114 may be configured to receive electrical signals or instructions (i.e. related to the first audio signal) from the audio reproduction device 102, and convert the received electrical signals or instructions into an audio output. In some embodiments, the speaker 114 may be integrated with the audio reproduction device 102. The speaker 114 may be an internal component of the audio reproduction device 102 and the entire functionality of the speaker 114 may be included in the audio reproduction device 102. In some embodiments, the speaker 114 may be communicatively coupled with the audio reproduction device 102 and may be positioned within the enclosed physical space 108. Examples of the speaker 114 may include, but are not limited to, an external wireless speaker, a set of internal speakers, an external wired speaker, a woofer, a sub-woofer, a tweeter, a soundbar, a loudspeaker, a monitor speaker, an optical audio device, or other speakers or sound output device that may be communicatively coupled to the audio reproduction device 102 through the communication network 112 or integrated in the audio reproduction device 102.

[0022] The server 110 may include suitable logic, circuitry, and interfaces, and/or code that may be configured to transmit audio content or the multimedia content to the audio reproduction device 102. The server 110 may be further configured to store the audio content. In some embodiments, the server 110 may be configured to store the determined RC preset corresponding to the location within the enclosed physical space 108, and store the determined first HRTF corresponding to the first user 116 present within the enclosed physical space 108. The server 110 may further provide a first value of at least one coefficient of the one or more HRTF filters and a second value of at least one coefficient of the one or more RC filters to the audio reproduction device 102 for audio control based on the room correction (RC) and the head related transfer function (HRTF). The server 110 may be implemented as a cloud server and may execute operations through web applications, cloud applications, HTTP requests, repository operations, file transfer, and the like. Other example implementations of the server 110 may include, but are not limited to, a database server, a file server, a web server, a media server, an application server, a mainframe server, or a cloud computing server.

[0023] The communication network 112 may include a communication medium through which the audio reproduction device 102, the head wearable device 104, the plurality of audio capturing devices 106A-106B, the server 110, and the speaker 114 may communicate with each other. The communication network 112 may be one of a wired connection or a wireless connection Examples of the communication network 112 may include, but are not limited to, the Internet, a cloud network, a Wireless Fidelity (Wi-Fi) network, a Personal Area Network (PAN), a Local Area Network (LAN), or a Metropolitan Area Network (MAN). Various devices in the network environment 100 may be configured to connect to the communication network 112 in accordance with various wired and wireless communication protocols. Examples of such wired and wireless communication protocols may include, but are not limited to, at least one of a Transmission Control Protocol and Internet Protocol (TCP/IP), User Datagram Protocol (UDP), Hypertext Transfer Protocol (HTTP), File Transfer Protocol (FTP), Zig Bee, EDGE, IEEE 802.11, light fidelity (Li-Fi), 802.16, IEEE 802.11s, IEEE 802.11g, multi-hop communication, wireless access point (AP), device to device communication, cellular communication protocols, and Bluetooth (BT) communication protocols.

……
……
……

本文链接：https://patent.nweon.com/18875

Sony Patent | Audio control based on room correction and head related transfer function

您可能还喜欢...

分类

最新AR/VR行业分享

最新AR/VR论文

最新AR/VR行业招聘

Sony Patent | Audio control based on room correction and head related transfer function

您可能还喜欢...

Sony Patent | Display terminal device

Sony Patent | Varying Effective Resolution By Screen Location In Graphics Processing By Approximating Projection Of Vertices Onto Curved Viewport

Sony Patent | Virtual Character Inter-Reality Crossover

分类

最新AR/VR行业分享

最新AR/VR论文

最新AR/VR行业招聘