# Sony Patent | Audio Processing Device And Method

**Patent: **Audio Processing Device And Method

**Publication Number: **10582329

**Publication Date: **20200303

**Applicants: **Sony

**Abstract**

Provided is an audio processing device and method, in which sound can be more efficiently reproduced. An audio processing device includes a matrix generation unit which generates a vector for each time-frequency with a head-related transfer function obtained by spherical harmonic transform by spherical harmonics as an element by using only the element corresponding to a degree of the spherical harmonics determined for the time-frequency or on the basis of the element common to all users and the element dependent on an individual user, and a head-related transfer function synthesis unit which generates a headphone drive signal of a time-frequency domain by synthesizing an input signal of a spherical harmonic domain and the generated vector.

**CROSS REFERENCE TO RELATED APPLICATIONS**

This application is a U.S. National Phase of International Patent Application No. PCT/JP2016/088381 filed on Dec. 22, 2016, which claims priority benefit of Japanese Patent Application No. JP 2016-002168 filed in the Japan Patent Office on Jan. 8, 2016. Each of the above-referenced applications is hereby incorporated herein by reference in its entirety.

**TECHNICAL FIELD**

The present technology relates to an audio processing device and method and a program, and, in particular, relates to an audio processing device and method and a program, in which sound can be more efficiently reproduced.

**BACKGROUND ART**

In recent years, the development and dissemination of systems, which record, transmit, and reproduce spatial information from the entire environment, have been progressing in the field of sound. For example, in Super Hi-Vision, broadcasting is being planned with three-dimensional multi-channel acoustics of 22.2 ch.

Also in the field of virtual reality, ones which reproduce signals surrounding the entire environment for sound in addition to pictures surrounding the entire environment have started to be spread.

Among them, there is a technique called Ambisonics, which expresses three-dimensional audio information flexibly adaptable to an arbitrary recording/reproducing system and is attracting attention. In particular, Ambisonics which has degrees equal to or higher than the second-order is called higher order Ambisonics (HOA) (e.g., see Non-Patent Document 1).

In the three-dimensional multi-channel acoustics, sound information spreads along the spatial axis in addition to the time axis. And in Ambisonics, information is kept by performing frequency transform, that is, spherical harmonic transform on the angular direction of three-dimensional polar coordinates. The spherical harmonic transform can be considered to be equivalent to time-frequency transform on the audio signal about the time axis.

An advantage of this method is that information can be encoded and decoded from an arbitrary microphone array to an arbitrary speaker array without limiting the number of microphones or the number of speakers.

On the other hand, the factors that impede the spread of Ambisonics include the need for a speaker array including a large number of speakers in the reproduction environment, and the narrow range of reproducing the sound space (sweet spot).

For example, to try to increase the spatial resolution of sound, a speaker array including more speakers is necessary, but it is unrealistic to create such a system at home or the like. In addition, in a space like a movie theater, the area where the sound space can be reproduced is narrow, and it is difficult to give desired effects to all the audience.

**CITATION LIST**

**Non-Patent Document**

Non-Patent Document 1: Jerome Daniel, Rozenn Nicol, Sebastien Moreau, “Further Investigations of High Order Ambisonics and Wavefield Synthesis for Holophonic Sound Imaging,” AES 114th Convention, Amsterdam, Netherlands, 2003

**SUMMARY OF THE INVENTION**

**Problems to be Solved by the Invention**

Therefore, it is conceivable to combine Ambisonics and binaural reproduction technology. The binaural reproduction technology is generally called a virtual auditory display (VAD) and is realized by using head-related transfer functions (HRTF).

Herein, the head-related transfer functions express information regarding how sounds are transmitted from every direction surrounding the human head to the binaural eardrums as functions of frequencies and directions of arrival.

In a case of presenting one obtained by synthesizing a target sound and a head-related transfer function from a certain direction with headphones, a listener senses the sound as if the sound comes from the direction of the head-related transfer function used, rather than from the headphones. VAD is a system that utilizes such a principle.

If a plurality of virtual loudspeakers are reproduced by using VAD, it is possible to realize, by headphone presentation, the same effects as Ambisonics in the speaker array system including a large number of speakers, which are difficult in reality.

However, with such a system, the sound cannot be reproduced sufficiently efficiently. For example, in a case where Ambisonics and the binaural reproduction technology are combined, not only the operation amount, such as the convolution operation of the head-related transfer functions, increases, but also the usage amount of the memory used for the operation and the like increases.

The present technology has been made in light of such a situation and can reproduce sound more efficiently.

**Solutions to Problems**

An audio processing device according to one aspect of the present technology includes: a matrix generation unit which generates a vector for each time-frequency with a head-related transfer function obtained by spherical harmonic transform by spherical harmonics as an element by using only the element corresponding to a degree of the spherical harmonics determined for the time-frequency or on the basis of the element common to all users and the element dependent on an individual user; and a head-related transfer function synthesis unit which generates a headphone drive signal of a time-frequency domain by synthesizing an input signal of a spherical harmonic domain and the generated vector.

The matrix generation unit can be caused to generate the vector on the basis of the element common to all the users and the element dependent on the individual user, which are determined for each time-frequency.

The matrix generation unit can be caused to generate the vector including only the element corresponding to the degree determined for the time-frequency on the basis of the element common to all the users and the element dependent on the individual user.

The audio processing device can be further provided with a head direction acquisition unit which acquires a head direction of a user who listens to sound, and the matrix generation unit can be caused to generate, as the vector, a row corresponding to the head direction in a head-related transfer function matrix including the head-related transfer function for each of a plurality of directions.

The audio processing device can be further provided with a head direction acquisition unit which acquires a head direction of a user who listens to sound, and the head-related transfer function synthesis unit can be caused to generate the headphone drive signal by synthesizing a rotation matrix determined by the head direction, the input signal, and the vector.

The head-related transfer function synthesis unit can be caused to generate the headphone drive signal by obtaining a product of the rotation matrix and the input signal and then obtaining a product of the product and the vector.

The head-related transfer function synthesis unit can be caused to generate the headphone drive signal by obtaining a product of the rotation matrix and the vector and then obtaining a product of the product and the input signal.

The audio processing device can be further provided with a rotation matrix generation unit which generates the rotation matrix on the basis of the head direction.

The audio processing device can be further provided with a head direction sensor unit which detects rotation of a head of the user, and the head direction acquisition unit can be caused to acquire the head direction of the user by acquiring a detection result by the head direction sensor unit.

The audio processing device can be further provided with a time-frequency inverse transform unit which performs time-frequency inverse transform on the headphone drive signal.

An audio processing method or a program according to one aspect of the present technology includes steps of: generating a vector for each time-frequency with a head-related transfer function obtained by spherical harmonic transform by spherical harmonics as an element by using only the element corresponding to a degree of the spherical harmonics determined for the time-frequency or on the basis of the element common to all users and the element dependent on an individual user; and generating a headphone drive signal of a time-frequency domain by synthesizing an input signal of a spherical harmonic domain and the generated vector.

According to one aspect of the present technology, a vector for each time-frequency with a head-related transfer function obtained by spherical harmonic transform by spherical harmonics as an element is generated by using only the element corresponding to a degree of the spherical harmonics determined for the time-frequency or on the basis of the element common to all users and the element dependent on an individual user, and a headphone drive signal of a time-frequency domain is generated by synthesizing an input signal of a spherical harmonic domain and the generated vector.

**Effects of the Invention**

According to one aspect of the present technology, it is possible to reproduce sound more efficiently.

Note that the effects described herein are not necessarily limited, and any of the effects described in the present disclosure may be applied.

**BRIEF DESCRIPTION OF THE DRAWINGS**

FIG. 1 is a diagram for explaining simulation of stereophony using head-related transfer functions.

FIG. 2 is a diagram showing the configuration of a general audio processing device.

FIG. 3 is a diagram for explaining the computation of a drive signal by a general technique.

FIG. 4 is a diagram showing the configuration of an audio processing device to which a head tracking function is added.

FIG. 5 is a diagram for explaining the computation of a drive signal in a case where the head tracking function is added.

FIG. 6 is a diagram for explaining the computation of a drive signal by a first proposed technique.

FIG. 7 is a diagram for explaining the operations at the time of computing the drive signals by the first proposed technique and the general technique.

FIG. 8 is a diagram showing a configuration example of an audio processing device to which the present technology is applied.

FIG. 9 is a flowchart for explaining the drive signal generation processing.

FIG. 10** is a diagram for explaining the computation of a drive signal by a second proposed technique**

FIG. 11 is a diagram for explaining the operation amount and necessary memory amount of the second proposed technique.

FIG. 12 is a diagram showing a configuration example of an audio processing device to which the present technology is applied.

FIG. 13 is a flowchart for explaining the drive signal generation processing.

FIG. 14 is a diagram showing a configuration example of an audio processing device to which the present technology is applied.

FIG. 15 is a flowchart for explaining the drive signal generation processing.

FIG. 16 is a diagram for explaining the computation of a drive signal by a third proposed method.

FIG. 17 is a diagram showing a configuration example of an audio processing device to which the present technology is applied.

FIG. 18 is a flowchart for explaining the drive signal generation processing.

FIG. 19 is a diagram showing a configuration example of an audio processing device to which the present technology is applied.

FIG. 20 is a flowchart for explaining the drive signal generation processing.

FIG. 21 is a diagram for explaining reduction in operation amount by degree-truncation.

FIG. 22 is a diagram for explaining reduction in operation amount by degree-truncation.

FIG. 23 is a diagram for explaining the operation amounts and necessary memory amounts of each proposed technique and the general technique.

FIG. 24 is a diagram for explaining the operation amounts and necessary memory amounts of each proposed technique and the general technique.

FIG. 25 is a diagram for explaining the operation amounts and necessary memory amounts of each proposed technique and the general technique.

FIG. 26 is a diagram showing the configuration of a general audio processing device with the MPEG 3D standard.

FIG. 27 is a diagram for explaining the computation of a drive signal by the general audio processing device.

FIG. 28 is a diagram showing a configuration example of an audio processing device to which the present technology is applied.

FIG. 29 is a diagram for explaining the computation of a drive signal by the audio processing device to which the present technology is applied.

FIG. 30 is a diagram for explaining the generation of a matrix of head-related transfer functions.

FIG. 31 is a diagram showing a configuration example of an audio processing device to which the present technology is applied.

FIG. 32 is a flowchart for explaining the drive signal generation processing.

FIG. 33 is a diagram showing a configuration example of an audio processing device to which the present technology is applied.

FIG. 34 is a flowchart for explaining the drive signal generation processing.

FIG. 35 is a diagram showing a configuration example of a computer.

**MODE FOR CARRYING OUT THE INVENTION**

Hereinafter, embodiments, to which the present technology is applied, will be described with reference to the drawings.

**First Embodiment**

According to the present technology, a head-related transfer function itself is taken as a function of the spherical coordinates, similarly spherical harmonic transform is performed to synthesize an input signal, which is the audio signal, and the head-related transfer function in a spherical harmonic domain without decoding the input signal into a speaker array signal, thereby realizing a reproduction system more efficient in the operation amount and memory usage amount.

For example, the spherical harmonic transform on the function f(.theta., .phi.) on the spherical coordinates is expressed by the following Expression (1). [Expression 1] F.sub.n.sup.m=.intg..sub.0.sup..pi..intg..sup.2.pi.f(.theta.,.PHI.)Y.sub.- n.sup.m(.theta.,.PHI.)d.theta.d.PHI. (1)

In Expression (1), .theta. and .phi. are the elevation angle and the horizontal angle in the spherical coordinates, respectively, and Y.sub.n.sup.m(.theta., .phi.) is the spherical harmonics. In addition, one marked with “-” at the top of the spherical harmonics Y.sub.n.sup.m(.theta., .phi.) is the complex conjugate of the spherical harmonics Y.sub.n.sup.m(.theta., .phi.).

Herein, the spherical harmonics Y.sub.n.sup.m(.theta., .phi.) is expressed by the following Expression (2).

.times..times..function..theta..PHI..times..times..times..times..pi..func- tion..times..function..times..times..theta..times..times..times..PHI. ##EQU00001##

In Expression (2), n and m are the degrees of the spherical harmonics Y.sub.n.sup.m(.theta., .phi.), and -n.ltoreq.m.ltoreq.n. In addition, j is a pure imaginary number, and P.sub.n.sup.m(x) is an associated Legendre function.

This associated Legendre function P.sub.n.sup.m(x) is expressed by the following Expression (3) or (4) when n.gtoreq.0 and 0.ltoreq.m.ltoreq.n. Note that Expression (3) is for a case where m=0.

.times..times. ##EQU00002## .function..times..times..times..times..times..times..function..times..tim- es..times..times..function. ##EQU00002.2##

Moreover, in a case where -n.ltoreq.m.ltoreq.0, the associated Legendre function P.sub.n.sup.m(x) is expressed by the following Expression (5).

.times..times. ##EQU00003## .function..times..times..function. ##EQU00003.2##

Furthermore, the inverse transform from the function F.sub.n.sup.m obtained by the spherical harmonic transform into the function f(.theta., .phi.) on the spherical coordinates is as shown in the following Expression (6).

.times..times. ##EQU00004## .function..theta..PHI..infin..times..times..times..times..times..function- ..theta..PHI. ##EQU00004.2##

From the above, the transform from the input signal D’n.sup.m(.omega.) of the sound after the correction in the radial direction, which is kept in the spherical harmonic domain, into a speaker drive signal S(x.sub.i, .omega.) of each of L number of speakers arranged on the spherical surface of the radius R is as shown in the following Expression (7).

.times..times. ##EQU00005## .function..omega..times..times..times..times. ‘.times..function..omega..times..function..beta..alpha. ##EQU00005.2##

Note that, in Expression (7), x.sub.i is the position of the speaker, and .omega. is the time-frequency of the sound signal. The input signal D’.sub.n.sup.m(.omega.) is an audio signal corresponding to each degree n and degree m of the spherical harmonics for the predetermined time-frequency .omega..

In addition, x.sub.i=(R sin .beta..sub.i cos .alpha..sub.i, R sin .beta..sub.i sin .alpha..sub.i, R cos .beta..sub.i), and i is the speaker index for specifying the speaker. Herein, i=1, 2, … , L, and .beta..sub.i and .alpha..sub.i are the elevation angle and the horizontal angle indicating the position of the i-th speaker, respectively.

Such transform shown by Expression (7) is the spherical harmonic inverse transform for Expression (6). In addition, in a case of obtaining the speaker drive signal S(x.sub.i, .omega.) according to Expression (7), the L number of speakers, which is the number of regenerating speakers, and the degree N of the spherical harmonics, that is, the maximum value N of the degree n must meet the relationship shown by the following Expression (8). [Expression 8] L>(N+1).sup.2 (8)

Incidentally, a general technique for simulating stereophony at the ears by headphone presentation is, for example, a method using head-related transfer functions as shown in FIG. 1.

In the example shown in FIG. 1, an inputted Ambisonic signal is decoded, and a speaker drive signal of each of virtual speakers SP11-1 to SP11-8, which are a plurality of virtual speakers, is generated. The signal decoded at this time corresponds to, for example, the aforementioned input signal D’.sub.n.sup.m(.omega.).

Herein, each of the virtual speakers SP11-1 to virtual speakers SP11-8 is annularly disposed and virtually arranged, and the speaker drive signal of each of the virtual speakers is obtained by the calculation of the aforementioned Expression (7). Note that the virtual speakers are simply referred to as the virtual speakers SP11 hereinafter in a case where it is unnecessary to particularly distinguish the virtual speakers SP11-1 to SP11-8.

When the speaker drive signals of the respective virtual speakers SP11 are thus obtained, for each of the virtual speakers SP11, the left and right drive signals (binaural signals) of headphones HD11 which actually reproduce the sound are generated by the convolution operation using the head-related transfer functions. Then, the sum of each of the drive signals of the headphones HD 11 obtained for each of the virtual speakers SP11 is the final drive signal.

Note that such a technique is described in detail in, for example, “ADVANCED SYSTEM OPTIONS FOR BINAURAL RENDERING OF AMBISONIC FORMAT (Gerald Enzner et. al. ICASSP 2013)” and the like.

The head-related transfer function H(x, .omega.) used to generate the left and right drive signals of the headphones HD11 is obtained by normalizing the transfer characteristic H.sub.1(X, .omega.) from the sound source position x in the state in which the head of the user, who is a listener, exists in the free space to the positions of the eardrums of the user by the transfer characteristic H.sub.0(x, .omega.) from the sound source position x in the state in which the head does not exit to the head center O. That is, the head-related transfer function H(x, .omega.) for the sound source position x is obtained by the following Expression (9).

.times..times. ##EQU00006## .function..omega..function..omega..function..omega. ##EQU00006.2##

Herein, by convolving the head-related transfer function H(x, .omega.) with an arbitrary audio signal and presenting the result by headphones or the like, an illusion as if the sound is heard from the direction of the head-related transfer function H(x, .omega.) convolved, that is, the direction of the sound source position x can be given to the listener.

In the example shown in FIG. 1, such a principle is used to generate the left and right drive signals of the headphones HD11.

Specifically, the position of each of the virtual speakers SP11 is set as a position x.sub.i, and the speaker drive signals of these virtual speakers SP11 are set as S(x.sub.i, .omega.).

In addition, the number of virtual speakers SP11 is set as L (herein, L=8), and the final left and right drive signals of the headphones HD11 are set as P.sub.l and P.sub.r, respectively.

In this case, when the speaker drive signals S(x.sub.i, .omega.) are simulated by the presentation of the headphones HD11, the left and right drive signals P.sub.l and P.sub.r of the headphones HD11 can be obtained by calculating the following Expression (10).

.times..times. ##EQU00007## .times..times..function..omega..times..function..omega..times..times..tim- es..times..function..omega..times..function..omega. ##EQU00007.2##

Note that, in Expression (10), H.sub.l(x.sub.i, .omega.) and H.sub.r(x.sub.i, .omega.) are the normalized head-related transfer functions from the position x.sub.i of the virtual speakers SP11 to the left and right eardrum positions of the listener, respectively.

By such operation, it is possible to reproduce the input signal D’.sub.n.sup.m(.omega.) of the spherical harmonic domain finally by the headphone presentation. That is, it is possible to realize, by the headphone presentation, the same effect as Ambisonics.

An audio processing device, which generates the left and right drive signals of the headphones from the input signal by a general technique combining Ambisonics and a binaural reproduction technology as described above (hereinafter also referred to as the general technique), has the configuration as shown in FIG. 2.

That is, an audio processing device 11 shown in FIG. 2 includes a spherical harmonic inverse transform unit 21, a head-related transfer function synthesis unit 22, and a time-frequency inverse transform unit 23.

The spherical harmonic inverse transform unit 21 performs the spherical harmonic inverse transform on the inputted input signal D’.sub.n.sup.m(.omega.) by calculating Expression (7) and supplies the speaker drive signals S(x.sub.i, .omega.) of the virtual speakers SP11 obtained as a result to the head-related transfer function synthesis unit 22.

The head-related transfer function synthesis unit 22 generates the left drive signal P.sub.l and the right drive signal P.sub.r of the headphones HD11 by Expression (10) from the speaker drive signals S(x.sub.i, .omega.) from the spherical harmonic inverse transform unit 21 and the head-related transfer function H.sub.l(x.sub.i, .omega.) and the head-related transfer function H.sub.r(x.sub.i, .omega.), which are prepared in advance, and outputs the drive signals P.sub.l and P.sub.r.

Moreover, the time-frequency inverse transform unit 23 performs time-frequency inverse transform on the drive signal P.sub.l and the drive signal P.sub.r, which are signals in the time-frequency domain outputted from the head-related transfer function synthesis unit 22 and supplies the drive signal p.sub.l(t) and the drive signal p.sub.r(t), which are signals in the time domain and obtained as a result, to the headphones HD11 to reproduce the sound.

Note that, hereinafter, in a case where it is unnecessary to particularly distinguish the drive signal P.sub.l and the drive signal P.sub.r for the time-frequency .omega., they are also simply referred to as drive signals P(.omega.), and in a case where it is unnecessary to particularly distinguish the drive signal p.sub.l(t) and the drive signal p.sub.r(t), they are also simply referred to as drive signals p(t). In addition, in a case where it is unnecessary to particularly distinguish the head-related transfer function H.sub.l(x.sub.i, .omega.) and the head related-transfer function H.sub.r(x.sub.i, .omega.), they are also simply referred to as head-related transfer functions H(x.sub.i, .omega.).

In the audio processing device 11, for example, the operation shown in FIG. 3 is performed in order to obtain the drive signals P(.omega.) of 1.times.1, that is, one row and one column.

In FIG. 3, H(.omega.) is a vector (matrix) of 1.times.L including the L number of head-related transfer functions H(x.sub.i, .omega.). In addition, D’(.omega.) is a vector including the input signals D’.sub.n.sup.m(.omega.), and suppose that the number of input signals D’.sub.n.sup.m(.omega.) of bins of the same time-frequency .omega. is K, then the vector D’(.omega.) becomes K.times.1. Moreover, Y(x) is a matrix including spherical harmonics Y.sub.n.sup.m(.beta..sub.i, .alpha..sub.i) of each degree, and the matrix Y(x) becomes a matrix of L.times.K.

Therefore, in the audio processing device 11, a matrix (vector) S obtained from the matrix operation of the matrix Y(x) of L.times.K and the vector D’(.omega.) of K.times.1 is obtained, and further, the matrix operation of the matrix S and a vector (matrix) H(.omega.) of 1.times.L is performed to obtain one drive signal P(.omega.).

In addition, in a case where the head of the listener wearing the headphones HD11 rotates in a predetermined direction expressed by a rotation matrix g.sub.j (hereinafter also referred to as a direction g.sub.j), for example, the drive signal P.sub.l(g.sub.j, .omega.) of the left headphone of the headphones HD11 is as shown in the following Expression (11).

.times..times. ##EQU00008## .function..omega..times..times..function..omega..times..function..times..- omega. ##EQU00008.2##

Note that the rotation matrix g.sub.j is a three-dimensional rotation matrix expressed by .phi., .theta., and .psi., which are rotation angles of the Euler angle, that is, a rotation matrix of 3.times.3. In addition, in Expression (11), the drive signal P.sub.l(g.sub.j, .omega.) is the aforementioned drive signal P.sub.l and written as the drive signal P.sub.l(g.sub.j, .omega.) herein to clarify the position, that is, the direction g.sub.j and the time-frequency .omega..

By further adding, for example, the configuration for specifying the rotation direction of the head of the listener as shown in FIG. 4, that is, the configuration of the head tracking function to the general audio processing device 11, the sound image position viewed from the listener can be fixed in the space. Note that parts in FIG. 4 corresponding to those in FIG. 2 are denoted by the same reference signs, and the descriptions thereof will be omitted as appropriate.

In an audio processing device 11 shown in FIG. 4, the configuration shown in FIG. 2 is further provided with a head direction sensor unit 51 and a head direction selection unit 52.

The head direction sensor unit 51 detects the rotation of the head of the user, who is a listener, and supplies the detection result to the head direction selection unit 52. On the basis of the detection result from the head direction sensor unit 51, the head direction selection unit 52 obtains the rotation direction of the head of the listener, that is, the direction of the head of the listener after the rotation as the direction g.sub.j and supplies the direction g.sub.j to the head-related transfer function synthesis unit 22.

In this case, on the basis of the direction g.sub.j supplied from the head direction selection unit 52, the head-related transfer function synthesis unit 22 computes the left and right drive signals of the headphones HD11 by using the head-related transfer function of the relative direction g.sub.j.sup.-1x.sub.i of each of the virtual speakers SP11 viewed from the head of the listener from among a plurality of head-related transfer functions prepared in advance. Thus, similarly to the case of using the real speakers, even in the case of reproducing the sound by the headphones HD11, it is possible to fix the sound image position viewed from the listener in the space.

By generating the drive signals of the headphones by the general technique or the technique adding the head tracking function to the general technique described above, the same effects as Ambisonics can be obtained without using the speaker array and without limiting the range of reproducing the sound space. However, with these techniques, not only the operation amount, such as the convolution operation of the head-related transfer function, increases, but also the usage amount of the memory used for the operation and the like increases.

Thereupon, in the present technology, the convolution of the head-related transfer functions performed in the time-frequency domain by the general technique is performed in the spherical harmonic domain. As a result, it is possible to reduce the operation amount of the convolution and the necessary memory amount and to reproduce the sound more efficiently.

Hereinafter, the techniques according to the present technology will be described.

For example, paying attention to the left headphone, the vector P.sub.l(.omega.) including each drive signal P.sub.l(g.sub.j, .omega.) of the left headphone for the full rotation direction of the head of the user (listener), who is a listener, is expressed as shown in the following Expression (12).

.times..times. ##EQU00009## .function..omega..function..omega..times..function..omega..function..omeg- a..times..function..times.’.function..omega. ##EQU00009.2##

Note that, in Expression (12), S(.omega.) is the vector including the speaker drive signal S(x.sub.i, .omega.), and S(.omega.)=Y(x)D’(.omega.). In addition, in Expression (12), Y(x) is a matrix including each degree and the spherical harmonics Y.sub.n.sup.m(x.sub.i) of the position x.sub.i of each virtual speaker as shown in the following Expression (13). Herein, i=1, 2, … , L, and the maximum value (maximum degree) of the degree n is N.

D’(.omega.) is a vector (matrix) including the input signal D’.sub.n.sup.m(.omega.) of the sound corresponding to each degree as shown in the following Expression (14). Each input signal D’.sub.n.sup.m(.omega.) is a signal of a spherical harmonic domain.

Moreover, in Expression (12), H(.omega.) is a matrix including the head-related transfer function H(g.sub.j.sup.-1x.sub.i, .omega.) of the relative direction g.sub.j.sup.-1x.sub.i of each of the virtual speakers viewed from the head of the listener as shown in the following Expression (15) in a case where the direction of the head of the listener is the direction g.sub.j. In this example, the head-related transfer function H(g.sub.j.sup.-1x.sub.i, .omega.) of each of the virtual speakers is prepared for each direction of the total M number of directions g.sub.1 to g.sub.M.

.times..times. ##EQU00010## .function..function..function. .function..function..times..times..times.’.function..omega. ‘.times..function..omega. ‘.times..function..omega..times..times..times..function..omega..function.- .times..omega..function..times..omega. .function..times..omega..function..times..omega. ##EQU00010.2##

To compute the drive signal P.sub.l(g.sub.j, .omega.) of the left headphone when the head of the listener is directed in the direction g.sub.j, the row corresponding to the direction g.sub.j, which is the direction of the head of the listener, that is, the row including the head-related transfer function H(g.sub.j.sup.-1x.sub.i, .omega.) for that direction g.sub.j should be selected from the matrix H(.omega.) of the head-related transfer functions to perform the calculation of Expression (12).

In this case, for example, only necessary rows are calculated as shown in FIG. 5.

In this example, since the head-related transfer function is prepared for each of the M number of directions, the matrix calculation shown in Expression (12) is as shown by the arrow A11.

That is, suppose that the number of input signals D’.sub.n.sup.m(.omega.) of the time-frequency .omega. is K, the vector D’(.omega.) is a matrix of K.times.1, that is, K rows and one column. In addition, the matrix Y(x) of the spherical harmonics is L.times.K, and the matrix H(.omega.) is M.times.L. Therefore, in the calculation of Expression (12), the vector P.sub.l(.omega.) is M.times.1.

Herein, by performing matrix operation (product-sum operation) of the matrix Y(x) and the vector D’(.omega.) in online operation first to obtain the vector S(.omega.), at the time of the computing the drive signal P.sub.l(g.sub.j, .omega.), it is possible to select the row corresponding to the direction g.sub.j of the head of the listener in the matrix H(.omega.) as shown by the arrow A12 and reduce the operation amount. In FIG. 5, the hatched portion in the matrix H(.omega.) is the row corresponding to the direction g.sub.j, the operation of this row and the vector S(.omega.) is performed, and the desired drive signal P.sub.l(g.sub.j, .omega.) of the left headphone is computed.

Herein, when the matrix H’(.omega.) is defined as shown in the following Expression (16), the vector PIN) shown in Expression (12) can be expressed by the following Expression (17). [Expression 16] H’(.omega.)=H(.omega.)Y(x) (16) [Expression 17] P.sub.l(.omega.)=H’(.omega.)D’(.omega.) (17)

In Expression (16), the head-related transfer function, more specifically, the matrix H(.omega.) including the head-related transfer function in the time-frequency domain, is transformed by the spherical harmonic transform using the spherical harmonics into the matrix H’(.omega.) including the head-related transfer function in the spherical harmonic domain.

Therefore, in the calculation of Expression (17), convolution of the speaker drive signal and the head-related transfer function is performed in the spherical harmonic domain. In other words, in the spherical harmonic domain, the product-sum operation of the head-related transfer function and the input signal is performed. Note that the matrix H’(.omega.) can be calculated and kept in advance.

In this case, to compute the drive signal P.sub.l(g.sub.j, .omega.) of the left headphone when the head of the listener is directed in the direction g.sub.j, only the row corresponding to the direction g.sub.j of the head of the listener is selected from the matrix H’(.omega.) kept in advance to calculate Expression (17).

In such a case, the calculation of Expression (17) is calculation shown in the following Expression (18). Thus, it is possible to greatly reduce the operation amount and the necessary memory amount.

.times..times. ##EQU00011## .function..omega..times..times..times..times. ‘.times..function..omega..times. ‘.times..function..omega. ##EQU00011.2##

In Expression (18), H’.sub.n.sup.m(g.sub.j, .omega.) is one element of the matrix H’(.omega.), that is, a head-related transfer function in the spherical harmonic domain, which is a component (element) corresponding to the direction g.sub.j of the head in the matrix H’(.omega.). n and m in the head-related transfer function H’.sub.n.sup.m(g.sub.j, .omega.) are the degree n and the degree m of the spherical harmonics.

In such operation shown in Expression (18), the operation amount is reduced as shown in FIG. 6. That is, the calculation shown in Expression (12) is calculation to obtain a product of the matrix H(.omega.) of M.times.L, the matrix Y(x) of L.times.K, and the vector D’(.omega.) of K.times.1 as indicated by the arrow A21 in FIG. 6.

Herein, since H(.omega.)Y(x) is the matrix H’(.omega.) as defined in Expression (16), the calculation indicated by the arrow A21 eventually becomes as indicated by the arrow A22. In particular, since the calculation for obtaining the matrix H’(.omega.) can be performed offline, that is, in advance, if the matrix H’(.omega.) is obtained and kept in advance, it is possible to reduce the operation amount for obtaining the drive signals of the headphones online by that amount.

When the matrix H’(.omega.) is thus obtained in advance, the calculation indicated by the arrow A22, that is, the calculation of the aforementioned Expression (18) is performed to actually obtain the drive signals of the headphones.

That is, as indicated by the arrow A22, the row corresponding to the direction g.sub.j of the head of the listener in the matrix H’(.omega.) is selected, and the drive signal P.sub.l(g.sub.j, .omega.) of the left headphone is computed by the matrix operation of that selected row and the vector D’(.omega.) including the inputted input signal D’.sub.n.sup.m(.omega.). In FIG. 6, the hatched portion in the matrix H’(.omega.) is the row corresponding to the direction g.sub.j, and the elements constituting this row are the head-related transfer functions H’.sub.n.sup.m(g.sub.j, .omega.) shown in Expression (18).

Herein, referring to FIG. 7, the product-sum amounts and the necessary memory amounts are compared between the technique according to the present technology described above (hereinafter also referred to as a first proposed technique) and the general technique.

For example, suppose that the length of the vector D’(.omega.) is K and the matrix H(.omega.) of the head-related transfer function is M.times.L, then the matrix Y(x) of the spherical harmonics is L.times.K and the matrix H’(.omega.) is M.times.K. In addition, the number of time-frequency bins .omega. is W.

Herein, in the general technique, as indicated by the arrow A31 in FIG. 7, in the process of transforming the vector D’(.omega.) into the time-frequency domain for a bin of each time-frequency .omega. (hereinafter also referred to as time-frequency bin .omega.), the product-sum operation of L.times.K occurs, and the product-sum operation by 2 L occurs by the convolution with the left and right head-related transfer functions.

Therefore, the total calc/W of the number of product-sum operations per time-frequency bin .omega. in the general technique is calc/W=(L.times.K+2 L).

Moreover, suppose that each coefficient of the product-sum operation is one byte, then the memory amount necessary for the operation by the general technique is (the number of directions of the head-related transfer functions to be kept).times.two bytes for each time-frequency bin .omega., and the number of directions of the head-related transfer functions to be kept is M.times.L as indicated by the arrow A31 in FIG. 7. Furthermore, a memory is necessary by L.times.K bytes for the matrix Y(x) of the spherical harmonics common to all the time-frequency bins co.

Therefore, suppose that the number of time-frequency bins .omega. is W, then the necessary memory amount memory in the general technique is memory=(2.times.M.times.L.times.W+L.times.K) bytes in total.

On the other hand, in the first proposed technique, the operation indicated by the arrow A32 in FIG. 7 is performed for each time-frequency bin .omega..

That is, in the first proposed technique, for each time-frequency bin .omega., the product-sum operation by K occurs by the product-sum of the vector D’(.omega.) in the spherical harmonic domain and the matrix H’(.omega.) of the head-related transfer function per one ear.

Therefore, the total calc/W of the number of product-sum operations in the first proposed technique is calc/W=2K.

In addition, since the memory amount necessary for the operation according to the first proposed technique is necessary by the amount to keep the matrix H’(.omega.) of the head-related transfer function for each time-frequency bin .omega., the memory is necessary by M.times.K bytes for the matrix H’(.omega.).

Therefore, suppose that the number of time-frequency bins .omega. is W, then the necessary memory amount memory in the first proposed technique is memory=(2 MKW) bytes in total.

Suppose that the maximum degree of the spherical harmonics is four, K=(4+1).sup.2=25. In addition, since the L number of virtual speakers must to be greater than K, suppose that L=32.

In such a case, the product-sum operation amount of the general technique is calc/W=(32.times.25+2.times.32)=864, whereas the product-sum operation amount of the first proposed technique is only calc/W=2.times.25=50. Thus, it can be seen that the operation amount is greatly reduced.

Moreover, suppose that, for example, W=100 and M=1000, then the memory amount necessary for the operation in the general technique is memory=(2.times.1000.times.32.times.100+32.times.25)=6400800. On the other hand, the memory amount necessary for the operation of the first proposed technique is memory=(2 MKW)=2.times.1000.times.25.times.100=5000000. Thus, it can be seen that the necessary memory amount is greatly reduced.

Next, an audio processing device to which the present technology described above is applied will be described. FIG. 8 is a diagram showing a configuration example of the audio processing device according to one embodiment to which the present technology is applied.

An audio processing device 81 shown in FIG. 8 has a head direction sensor unit 91, a head direction selection unit 92, a head-related transfer function synthesis unit 93, and a time-frequency inverse transform unit 94. Note that the audio processing device 81 may be incorporated in the headphones or may be a device different from the headphones.

The head direction sensor unit 91 includes, for example, an acceleration sensor, an image sensor, and the like attached to the head of the user as necessary, detects the rotation (motion) of the head of the user who is a listener, and supplies the detection result to the head direction selection unit 92. Note that the user herein is a user wearing the headphones, that is, a user who listens to the sound reproduced by the headphones on the basis of the drive signals of the left and right headphones obtained by the time-frequency inverse transform unit 94.

On the basis of the detection result from the head direction sensor unit 91, the head direction selection unit 92 obtains the rotation direction of the head of the listener, that is, the direction g.sub.j of the head of the listener after the rotation and supplies the direction g.sub.j to the head-related transfer function synthesis unit 93. In other words, the head direction selection unit 92 acquires the direction g.sub.j of the head of the user by acquiring the detection result from the head direction sensor unit 91.

An input signal D’.sub.n.sup.m(.omega.) of each degree of spherical harmonics for each time-frequency bin .omega., which is an audio signal in the spherical harmonic domain, is supplied to the head-related transfer function synthesis unit 93 from the outside. Moreover, the head-related transfer function synthesis unit 93 keeps the matrix H’(.omega.) including the head-related transfer function obtained in advance by calculation.

The head-related transfer function synthesis unit 93 performs the convolution operation of the supplied input signal D’.sub.n.sup.m(.omega.) and the kept matrix H’(.omega.) for each of the left and right headphones to synthesize the input signal D’.sub.n.sup.m(.omega.) and the head-related transfer function in the spherical harmonic domain and compute the drive signal P.sub.l(g.sub.j, .omega.) and the drive signal P.sub.r(g.sub.j, .omega.) of the left and right headphones. At this time, the head-related transfer function synthesis unit 93 selects the row corresponding to the direction g.sub.j in the matrix H’(.omega.) supplied from the head direction selection unit 92, that is, for example, the row including the head-related transfer function H’.sub.n.sup.m(g.sub.j, .omega.) of the aforementioned Expression (18) and performs the convolution operation with the input signal D’.sub.n.sup.m(.omega.).

By such operation, in the head-related transfer function synthesis unit 93, the drive signal P.sub.l(g.sub.j, .omega.) of the left headphone in the time-frequency domain and the drive signal P.sub.r(g.sub.j, .omega.) of the right headphone in the time-frequency domain are obtained for each time-frequency bin .omega..

The head-related transfer function synthesis unit 93 supplies the drive signal P.sub.l(g.sub.j, .omega.) and the drive signal P.sub.r(g.sub.j, .omega.) of the left and right headphones obtained to the time-frequency inverse transform unit 94.

The time-frequency inverse transform unit 94 performs the time-frequency inverse transform on the drive signal in the time-frequency domain supplied from the head-related transfer function synthesis unit 93 for each of the left and right headphones to obtain the drive signal p.sub.l(g.sub.j, t) of the left headphone in the time domain and the drive signal p.sub.r(g.sub.j, t) of the right headphone in the time domain and outputs these drive signals to the subsequent part. In the subsequent reproduction device which reproduces the sound by 2 ch, such as headphones, more specifically, headphones including earphones, the sound is reproduced on the basis of the drive signals outputted from the time-frequency inverse transform unit 94.

Next, with reference to the flowchart in FIG. 9, the drive signal generation processing performed by the audio processing device 81 will be described. This drive signal generation processing is started when the input signal D’.sub.n.sup.m(.omega.) is supplied from the outside.

In step S11, the head direction sensor unit 91 detects the rotation of the head of the user, who is a listener, and supplies the detection result to the head direction selection unit 92.

In step S12, on the basis of the detection result from the head direction sensor unit 91, the head direction selection unit 92 obtains the direction g.sub.j of the head of the listener and supplies the direction g.sub.j to the head-related transfer function synthesis unit 93.

In step S13, on the basis of the direction g.sub.j supplied from the head direction selection unit 92, the head-related transfer function synthesis unit 93 convolves the head-related transfer function H’.sub.n.sup.m(g.sub.j, .omega.) constituting the matrix H’(.omega.) kept in advance with the supplied input signal D’.sub.n.sup.m(.omega.).

That is, the head-related transfer function synthesis unit 93 selects the row corresponding to the direction g.sub.j in the matrix H’(.omega.) kept in advance and calculates Expression (18) with the head-related transfer function H’.sub.n.sup.m(g.sub.j, .omega.) constituting the selected row and the input signal D’.sub.n.sup.m(.omega.), thereby computing the drive signal P.sub.l(g.sub.j, .omega.) of the left headphone. In addition, the head-related transfer function synthesis unit 93 performs the operation for the right headphone similarly to the case of the left headphone and computes the drive signal P.sub.r(g.sub.j, .omega.) of the right headphone.

The head-related transfer function synthesis unit 93 supplies the drive signal P.sub.l(g.sub.j, .omega.) and the drive signal P.sub.r(g.sub.j, .omega.) of the left and right headphones thus obtained to the time-frequency inverse transform unit 94.

In step S14, the time-frequency inverse transform unit 94 performs the time-frequency inverse transform on the drive signal in the time-frequency domain supplied from the head-related transfer function synthesis unit 93 for each of the left and right headphones and computes the drive signal p.sub.l(g.sub.j, t) of the left headphone and the drive signal p.sub.r(g.sub.j, t) of the right headphone. For example, inverse discrete Fourier transform is performed as the time-frequency inverse transform.

The time-frequency inverse transform unit 94 outputs the drive signal p.sub.l(g.sub.j, t) and the drive signal p.sub.r(g.sub.j, t) in the time domain thus obtained to the left and right headphones, and the drive signal generation processing ends.

As described above, the audio processing device 81 convolves the head-related transfer functions with the input signals in the spherical harmonic domain and computes the drive signals of the left and right headphones.

By thus convolving the head-related transfer functions in the spherical harmonic domain, it is possible to greatly reduce the operation amount when the drive signals of the headphones are generated as well as to greatly reduce the memory amount necessary for the operation. In other words, it is possible to reproduce sound more efficiently.

**Second Embodiment**

Incidentally, in the first proposed technique described above, although it is possible to greatly reduce the operation amount and the necessary memory amount, it is necessary to keep in the memory all the rotation directions of the head of the listener, that is, the row corresponding to each direction g.sub.j as the matrix H’(.omega.) of the head-related transfer function.

Thereupon, a matrix (vector) including the head-related transfer function of the spherical harmonic domain for one direction g.sub.j may be set as H.sub.s(.omega.)=H’(g.sub.j), and only the matrix H.sub.s(.omega.) including the row corresponding to the one direction g.sub.j of the matrix H’(.omega.) may be kept, and a rotation matrix R’(g.sub.j) for performing rotation corresponding to the head rotation of the listener in the spherical harmonic domain may be kept by the number of the plurality of directions g.sub.j. Hereinafter, such a technique will be referred to as a second proposed technique of the present technology.

The rotation matrix R’(g.sub.j) of each direction g.sub.j is different from the matrix H’(.omega.) and has no time-frequency dependence. Therefore, it is possible to greatly reduce the memory amount as compared with making the matrix H’(.omega.) hold the component of the direction g.sub.j of the rotation of the head.

First, as shown in the following Expression (19), consider a product H’(g.sub.j.sup.-1, .omega.) of a row H(g.sub.j.sup.-1x, .omega.) corresponding to the predetermined direction g.sub.j of the matrix H(.omega.) and the matrix Y(x) of the spherical harmonics. [Expression 19] H’(g.sub.j.sup.-1,.omega.)=H(g.sup.-1x,.omega.)Y(x) (19)

In the aforementioned first proposed technique, the coordinates of the head-related transfer function used are rotated from x to g.sub.j.sup.-1x for the direction g.sub.j of the rotation of the head of the listener. However, the same result can be obtained without changing the coordinates of the position x of the head-related transfer function and by rotating the coordinates of the spherical harmonics from x to g.sub.jx. That is, the following Expression (20) is established. [Expression 20] H’(g.sub.j.sup.-1,.omega.)=H(g.sub.j.sup.-1x,.omega.)Y(x)=H(x,.omega.)Y(g- .sub.jx) (20)

Moreover, the matrix Y(g.sub.jx) of the spherical harmonics is the product of the matrix Y(x) and the rotation matrix R’(g.sub.j.sup.-1) and is as shown by the following Expression (21). Note that the rotation matrix R’(g.sub.j.sup.-1) is a matrix which rotates the coordinates by g.sub.j in the spherical harmonic domain. [Expression 21] Y(g.sub.jx)=Y(x)R’(g.sub.j.sup.-1) (21)

Herein, as for k and m belonging to the set Q shown in the following Expression (22), elements other than the elements in the k rows and m columns of the rotation matrix R’(g.sub.j) are zero. [Expression 22] Q={q|n.sup.2+1.ltoreq.q.ltoreq.(n+1).sup.2,q,n.di-elect cons.{0,1,2 … }} (22)

Therefore, the spherical harmonics Y.sub.n.sup.m(g.sub.jx), which is an element of the matrix Y(g.sub.jx)), can be expressed by the following Expression (23) using an element R’.sup.(n).sub.k, m(g.sub.j) of the k rows and m columns of the rotation matrix R’(g.sub.j).

.times..times. ##EQU00012## .function..times..times..times..function..times. ‘.times..function. ##EQU00012.2##

Herein, the element R’.sup.(n).sub.k, m(g.sub.j) is expressed by the following Expression (24). [Expression 24] R’.sub.k,m.sup.(n)(g.sub.j)=e.sup.-jm.PHI.r.sub.k,m.sup.(n)(.theta.)e.sup- .-jk.psi. (24)

Note that, in Expression (24), .theta., .phi., and .psi. are the rotation angles of the Euler angle of the rotation matrix, and r.sup.(n).sub.k,m(.theta.) is as shown in the following Expression (25).

.times..times. ##EQU00013## .function..theta..times..times..times..sigma..times..sigma..times..sigma.- .times..sigma..times..times..theta..times..sigma..times..times..theta..tim- es..times..sigma. ##EQU00013.2##

From the above, the binaural reproducing signal reflecting the rotation of the head of the listener by using the rotation matrix R’(g.sub.j.sup.-1), for example, the drive signal P.sub.l(g.sub.j, .omega.) of the left headphone can be obtained by calculating the following Expression (26). In addition, in a case where the left and right head-related transfer functions may be considered to be symmetric, by performing inversion using a matrix R.sub.ref making either the input signal D’(.omega.) or the matrix Hs(.omega.) of the left head-related transfer function flip horizontal as the pre-processing of Expression (26), it is possible to obtain the right headphone drive signal by only keeping the matrix Hs(.omega.) of the left head-related transfer function. However, a case where different left and right head-related transfer functions are necessary will be basically described hereinafter.

.times..times. ##EQU00014## .function..omega..function..times..omega..times..function..times.’.functi- on..omega..function..omega..times..function..times.’.function..times.’.fun- ction..omega..function..omega..times.’.function..times.’.function..omega. ##EQU00014.2##

In Expression (26), the drive signal P.sub.l(g.sub.j, .omega.) is obtained by synthesizing the matrix H.sub.s(.omega.), which is the vector, the rotation matrix R’(g.sub.j.sup.-1), and the vector D’(.omega.).

The calculation as described above is, for example, the calculation shown in FIG. 10. That is, the vector P.sub.l(.omega.) including the drive signal P.sub.l(g.sub.j, .omega.) of the left headphone is obtained by the product of the matrix H(.omega.) of M.times.L, the matrix Y(x) of L.times.K, and the vector D’(.omega.) of K.times.1 as indicated by the arrow A41 in FIG. 10. This matrix operation is as shown in the aforementioned Expression (12).

This operation is expressed by using the matrix Y(g.sub.jx) of the spherical harmonics prepared for each of M number of directions g.sub.j as indicated by the arrow A42. That is, the vector P.sub.l(.omega.) including the drive signal P.sub.l(g.sub.j, .omega.) corresponding to each of M number of directions g.sub.j is obtained by the product of the predetermined row H(x, .omega.) of the matrix H(.omega.), the matrix Y(g.sub.jx), and the vector D’(.omega.) from the relationship shown in Expression (20).

Herein, the row H(x, .omega.), which is the vector, is 1.times.L, the matrix Y(g.sub.jx) is L.times.K, and the vector D’(.omega.) is K.times.1. This is further transformed by using the relationships shown in Expressions (17) and (21) and is as indicated by the arrow A43. That is, as shown in Expression (26), the vector P.sub.l(.omega.) is obtained by the product of the matrix H.sub.s(.omega.) of 1.times.K, the rotation matrix R’(g.sub.j.sup.-1) of K.times.K of each of M number of directions g.sub.j, and the vector D’(.omega.) of K.times.1.

Note that, in FIG. 10, the hatched portions of the rotation matrix R’(g.sub.j.sup.-1) are nonzero elements of the rotation matrix R’(g.sub.j.sup.-1).

In addition, the operation amount and the required memory amount in such a second proposed technique are as shown in FIG. 11.

That is, suppose that, as shown in FIG. 11, the matrix H.sub.s(.omega.) of 1.times.K is prepared for each time-frequency bin .omega., the rotation matrix R’(g.sub.j.sup.-1) of K.times.K is prepared for M number of directions g.sub.j, and the vector D’(.omega.) is K.times.1. In addition, suppose that the number of time-frequency bins .omega. is W, and the maximum value of the degree n of the spherical harmonics, that is, the maximum degree is J.

At this time, since the number of nonzero elements of the rotation matrix R’(g.sub.j.sup.-1) is (J+1) (2J+1) (2J+3)/3, the total calc/W of the number of product-sum operations per time-frequency bin .omega. in the second proposed technique is as shown in the following Expression (27).

.times..times. ##EQU00015## .times..times..times..times..times..times..times. ##EQU00015.2##

In addition, for the operation by the second proposed technique, it is necessary to keep the matrix H.sub.s(.omega.) of 1.times.K for each time-frequency bin .omega. for the left and right ears, and further, it is necessary to keep nonzero elements of the rotation matrix R’(g.sub.j.sup.-1) for each of M number of directions. Therefore, the memory amount memory necessary for the operation by the second proposed technique is as shown in the following Expression (28).

.times..times. ##EQU00016## .times..times..times..times..times..times..times. ##EQU00016.2##

Herein, for example, suppose that the maximum degree of the spherical harmonics is J=4, then K=(J+1).sup.2=25. In addition, suppose that W=100 and M=1000.

In this case, the product-sum operation amount in the second proposed technique is calc/W=(4+1) (8+1) (8+3)/3+2.times.25=215. In addition, the memory amount memory necessary for the operation is 1000.times.(4+1) (8+1) (8+3)/3+2.times.25.times.100=170000.

On the other hand, in the aforementioned first proposed technique, the product-sum operation amount under the same condition is calc/W=50, and the memory amount is memory=5000000.

Therefore, according to the second proposed technique, it can be seen that it is possible to greatly reduce the necessary memory amount although the operation amount slightly increases as compared with the aforementioned first proposed technique.

Next, a configuration example of an audio processing device, which computes the drive signals of the headphones by the second proposed technique, will be described. In such a case, the audio processing device is configured, for example, as shown in FIG. 12. Note that parts in FIG. 12 corresponding to those in FIG. 8 are denoted by the same reference signs, and the descriptions thereof will be omitted as appropriate.