Sony Patent | Signal processing apparatus and method, and program

编辑：映维 | 分类：Sony | 2021年1月29日

Patent: Signal processing apparatus and method, and program

Publication Number: 20210029485

Publication Date: 20210128

Applicant: Sony

Abstract

The present technology relates to a signal processing apparatus and method, and a program that are capable of reproducing sound at an optional listening position with a high sense of reality. The signal processing apparatus includes a rendering unit that generates reproduction data of sound at an optional listening position in a target space on the basis of recording signals of microphones attached to a plurality of moving bodies in the target space. The present technology can be applied to a reproduction apparatus.

Claims

A signal processing apparatus, comprising a rendering unit that generates reproduction data of sound at an optional listening position in a target space on a basis of recording signals of microphones attached to a plurality of moving bodies in the target space.
The signal processing apparatus according to claim 1, wherein the rendering unit selects one or a plurality of the recording signals among the recording signals obtained for the respective moving bodies, and generates the reproduction data on a basis of the selected one or plurality of the recording signals.
The signal processing apparatus according to claim 2, wherein the rendering unit selects the recording signal to be used for generating the reproduction data on a basis of a priority of the recording signal.
The signal processing apparatus according to claim 3, further comprising a priority calculation unit that calculates the priority on a basis of at least one of a sound pressure of the recording signal, a result of interval detection of target sound or non-target sound with respect to the recording signal, a type of noise reduction processing performed on the recording signal, a position of the moving body in the target space, a direction in which the moving body faces, information related to motion of the moving body, the listening position, a listening direction in which a virtual listener at the listening position faces, information related to motion of the listener, or information indicating a specified sound source.
The signal processing apparatus according to claim 4, wherein the priority calculation unit calculates the priority such that the recording signal of the moving body closer to the listening position has a higher priority.
The signal processing apparatus according to claim 4, wherein the priority calculation unit calculates the priority such that the recording signal of the moving body having a smaller amount of movement has a higher priority.
The signal processing apparatus according to claim 4, wherein the priority calculation unit calculates the priority such that the recording signal having less noise has a higher priority, on a basis of the result of the interval detection or the type of the noise reduction processing.
The signal processing apparatus according to claim 4, wherein the priority calculation unit calculates the priority such that the recording signal not including the non-target sound has a higher priority on a basis of the result of the interval detection.
The signal processing apparatus according to claim 8, wherein the non-target sound is an utterance sound of a predetermined no good word, a rubbing sound of clothing, a vibration sound, a contact sound, a wind noise, or a noise sound.
The signal processing apparatus according to claim 4, wherein the rendering unit generates the reproduction data by weighting and adding the selected one or plurality of the recording signals on a basis of at least one of the priority, the sound pressure of the recording signal, the result of the interval detection, the type of the noise reduction processing, the position of the moving body in the target space, the direction in which the moving body faces, the information related to the motion of the moving body, the listening position, the listening direction, the information related to the motion of the listener, or the information indicating the specified sound source.
The signal processing apparatus according to claim 10, wherein the rendering unit generates the reproduction data of the listening direction at the listening position.
A signal processing apparatus, comprising generating, by a signal processing apparatus, reproduction data of sound at an optional listening position in a target space on a basis of recording signals of microphones attached to a plurality of moving bodies in the target space.
A program that causes a computer to execute processing comprising the step of generating reproduction data of sound at an optional listening position in a target space on a basis of recording signals of microphones attached to a plurality of moving bodies in the target space.

Description

TECHNICAL FIELD

[0001] The present technology relates to a signal processing apparatus and method, and a program, and more particularly, to a signal processing apparatus and method, and a program that are capable of reproducing sound at an optional listening position with a high sense of reality.

BACKGROUND ART

[0002] For example, in reproduction of content related to a space, such as soccer or a concert, if sound heard at an optional listening position in the space, that is, a sound field can be reproduced, content reproduction with a high sense of reality can be achieved.

[0003] Examples of the techniques related to sound recording for a general wide field (space) include surround sound collection in which microphones are disposed at a plurality of fixed positions in a concert hall or the like to perform recording, gun microphone collection from a distance, and application of beamforming to sound recorded by a microphone array.

[0004] Additionally, there is proposed a system in which, when a plurality of speakers is present in a space, sound is collected by microphones for each of the speakers, and the recorded sound for each of the speakers is recorded in association with positional information of the speaker, to achieve sound image localization corresponding to a listening position in the space (for example, see Patent Literature 1).

[0005] Further, in the sound field reproduction at a free viewpoint such as an omnidirectional view, a bird view, or a walk-through view, there are known sound collection by a plurality of surround microphones installed at wide intervals, omnidirectional sound collection using a spherical microphone array in which a plurality of microphones is disposed in a spherical shape, and the like. For example, the omnidirectional sound collection involves decomposition and reconstruction into Ambisonics. The simplest one is to collect sound using three microphones provided in a video camera or the like and obtain 5.1 channel surround-sound.

CITATION LIST

Patent Literature

[0006] Patent Literature 1: WO 2015/162947

DISCLOSURE OF INVENTION

Technical Problem

[0007] However, the above-mentioned techniques have had difficulty of reproducing sound at an optional listening position in a space with a high sense of reality.

[0008] For example, in the technique related to the sound recording for a general wide field, a distance from a sound source to a sound collection position may be large. In such a case, the sound quality is lowered due to the limit of the signal-to-noise ratio (SN ratio) performance of the microphone per se, thereby decreasing the sense of reality. In addition, if the distance from the sound source to the sound collection position is large, the decrease in clarity of the sound due to the influence of reverberation is not negligible in some cases. Although a reverberation removing technique for eliminating reverberation components from recorded sound is also known, such reverberation elimination technique has a limit in eliminating the reverberation components.

[0009] Additionally, when a recording engineer manually changes an orientation of a microphone with respect to the movement of a sound source, there is also a limit in changing a sound collection direction by carrying out an accurate rotation operation for a microphone by human power. This makes it difficult to achieve sound reproduction with a high sense of reality.

[0010] Further, also in the case of applying beamforming to the recorded sound obtained by the microphone array, there is a limit in tracking capability with respect to the movement of a sound source when the sound source is moving. This makes it difficult to achieve sound reproduction with a high sense of reality.

[0011] Moreover, in this case, in order to make the sound source in a predetermined direction to have an equal phase by the beamforming for the purpose of emphasis, it is necessary to take as large an opening portion of the microphone as possible in the low frequency range, and thus the apparatus is extremely enlarged. In addition, is a case where the beamforming is performed, the calibration becomes more complicated as the number of microphones increases, and in reality, only the emphasis of the sound source in a fixed direction can be performed.

[0012] Additionally, in the technique described in Patent Literature 1, it is not assumed that a speaker moves. In content in which a sound source moves, the sound reproduction with a sufficiently high sense of reality cannot be performed.

[0013] Further, also in the sound field reproduction at a free viewpoint, it is difficult to record sound of a sound source located at a distance due to the limitation of the SN ratio performance of the microphone, similarly to the above-mentioned case of the technique related to the sound recording for a general wide field. Therefore, the sound at an optional listening position has been hardly reproduced with a high sense of reality.

[0014] The present technology has been made in view of such circumstances and allows sound at an optional listening position in a space to be reproduced with a high sense of reality.

Solution to Problem

[0015] A signal processing apparatus according to one aspect of the present technology includes a rendering unit that generates reproduction data of sound at an optional listening position in a target space on the basis of recording signals of microphones attached to a plurality of moving bodies in the target space.

[0016] A signal processing method or a program according to one aspect of the present technology includes the step of generating reproduction data of sound at an optional listening position in a target space on the basis of recording signals of microphones attached to a plurality of moving bodies in the target space.

[0017] In one aspect of the present technology, the sound reproduction data of the sound at the optional listening position in the target space is generated on the basis of the recording signals of the microphones attached to the plurality of moving bodies in the target space.

Advantageous Effects of Invention

[0018] According to one aspect of the present technology, the sound at the optional listening position in the space can be reproduced with a high sense of reality.

[0019] Note that the effects described herein are not necessarily limitative, and any of the effects described in the present disclosure may be provided.

BRIEF DESCRIPTION OF DRAWINGS

[0020] FIG. 1 is a diagram showing a configuration example of a sound field reproduction system.

[0021] FIG. 2 is a diagram showing a configuration example of a recording apparatus.

[0022] FIG. 3 is a diagram showing a configuration example of a recording apparatus.

[0023] FIG. 4 is a diagram showing a configuration example of a signal processing unit.

[0024] FIG. 5 is a diagram showing a configuration example of a reproduction apparatus.

[0025] FIG. 6 is a diagram showing a configuration example of a signal processing unit.

[0026] FIG. 7 is a diagram showing a configuration example of a reproduction apparatus.

[0027] FIG. 8 is a flowchart for describing recording processing.

[0028] FIG. 9 is a flowchart for describing reproduction processing.

[0029] FIG. 10 is a flowchart for describing recording processing.

[0030] FIG. 11 is a flowchart for describing reproduction processing.

[0031] FIG. 12 is a diagram showing a configuration example of a sound field reproduction system.

[0032] FIG. 13 is a diagram showing a configuration example of a recording apparatus.

[0033] FIG. 14 is a diagram showing a configuration example of a computer. Mode(s) for Carrying Out the Invention

[0034] Hereinafter, embodiments to which the present technology is applied will be described with reference to the drawings.

FIRST EMBODIMENT

[0035]

[0036] In the present technology, a plurality of moving bodies is provided with microphones and ranging devices in a target space, information regarding sound, a position, a direction, and movement (motion) of each moving body is acquired, and the acquired pieces of information are combined on a reproduction side, whereby sound at an optional position serving as a listening position in the space is reproduced in a pseudo manner. In particular, the present technology allows sound (sound field), which would be heard by a virtual listener when the virtual listener at an optional listening position faces in an optional direction, to be reproduced in a pseudo manner.

[0037] The present technology can be applied to, for example, a sound field reproduction system such as a virtual reality (VR) free viewpoint service that records sound (sound field) at each position in a space and reproduces sound at an optional listening position in the space in a pseudo manner on the basis of the recorded sound.

[0038] Specifically, in the sound field reproduction system to which the present technology is applied, one microphone array including a plurality of microphones or microphone arrays, which is dispersedly disposed in the space for sound field recording, is used to record sound at a plurality of positions in the space.

[0039] Here, at least some of the microphones or microphone arrays for sound collection are attached to a moving body that moves in the space.

[0040] Note that in the following description, for the sake of simplicity of description, it is assumed that sound collection at one position in a space is performed by a microphone array and that the microphone array is attached to a moving body. Further, hereinafter, a recording signal that is a signal of sound collected by the microphone array attached to the moving body (recorded sound), and more particularly, a recording signal that is a signal of recorded sound will also be referred to as an object.

[0041] In each moving body, not only the microphone array for sound collection, but also a ranging device such as a global positioning system (GPS) or a 9-axis sensor are attached thereto, and moving body position information, moving body orientation information, and sound collection position movement information about the moving body are also acquired.

[0042] Here, the moving body position information is information indicating the position of the moving body in a space, and the moving body orientation information is information indicating a direction in which the moving body faces in the space, more particularly, a direction in which the microphone array attached to the moving body faces. For example, the moving body orientation information is an azimuth angle indicating a direction in which the moving body faces when a predetermined direction in the space is set as a reference.

[0043] In addition, the sound collection position movement information is information regarding the motion (movement) of the moving body, such as a movement speed of the moving body or an acceleration at the time of movement. Hereinafter, information including the moving body position information, the moving body orientation information, and the sound collection position movement information will also be referred to as moving body-related information.

[0044] When the object and the moving body-related information are acquired for each moving body, object transmission data including the object and the moving body-related information is generated and transmitted to the reproduction side. On the reproduction side, signal processing or rendering is performed as appropriate on the basis of the received object transmission data, and reproduction data is generated.

[0045] In the rendering, audio data in a predetermined format such as the number of channels specified by a user (listener), is generated as reproduction data. The reproduction data is audio data for reproducing sound that would be heard by a virtual listener who has an optional listening position in a space and faces in an optional listening direction at that listening position.

[0046] For example, rendering and reproduction of a recording signal of a stationary microphone, including a microphone attached to a stationary object, is generally known. It is also generally known to render an object prepared for each sound source type as processing on the reproduction side.

[0047] The present technology differs from the rendering and reproduction of recorded signals of these stationary microphones or the rendering for each sound source type, in particular, in that a microphone array is attached to a moving body to collect (record) sound of an object and acquire the moving body-related information.

[0048] In such a manner, it is possible to synthesize a sound field by combining the objects and the pieces of moving body-related information obtained in respective moving bodies.

[0049] Additionally, in the rendering, a priority corresponding to a situation is calculated for each of the objects obtained by the plurality of moving bodies, and reproduction data can be generated using objects having a higher priority. Sound at an optional listening position can be reproduced with a higher sense of reality.

[0050] Note that while the generation of the reproduction data based on the priority will be described later, for example, it is conceivable to select an object of a moving body close to the listening position to generate reproduction data, or select an object of a moving body having a small amount of movement to generate reproduction data. For example, in the case of a moving body having a small amount of movement, an object having a small amount of noise caused by vibrations or the like of the moving body, that is, an object having a high signal-to-noise ratio (SN ratio) can be obtained, so that it is possible to obtain high-quality reproduction data.

[0051] Further, as an example of a moving body to which a microphone array or a ranging device is attached, a player of sports such as soccer is conceivable. Additionally, as a specific target of the sound collection (recording), that is, the content accompanied by sound, for example, the following targets (1) to (4) are conceivable.

[0052] Target (1)

[0053] Recording of team sports

[0054] Target (2)

[0055] Recording for a space where performances such as musicals, operas, and theatrical performances are performed

[0056] Target (3)

[0057] Recording for an optional space in live venues or theme parks

[0058] Target (4)

[0059] Recording for bands such as orchestras and marching bands

[0060] For example, in the above target (1), a player may be assumed as a moving body, and a microphone array or a ranging device may be attached to the player. Similarly, in the targets (2) to (4), performers or audience may be assumed as moving bodies, and microphone arrays or ranging devices may be attached to the performers or the audience. Additionally, for example, in the target (3), recording may be performed at a plurality of locations.

[0061] Hereinafter, more specific embodiments of the present technology will be described.

[0062] FIG. 1 is a diagram showing a configuration example of an embodiment of a sound field reproduction system to which the present technology is applied.

[0063] The sound field reproduction system shown in FIG. 1 is to record sound at each position in a target space, set an optional position in the space as a listening position, and reproduce sound (sound field) that would be heard by a virtual listener facing in an optional direction at the listening position.

[0064] Note that, hereinafter, a space in which sound is to be recorded is also referred to as a recording target space, and a direction in which a virtual listener at a listening position faces is also referred to as a listening direction.

[0065] The sound field reproduction system of FIG. 1 includes recording apparatus 11-1 to the recording apparatus 11-5 and a reproduction apparatus 12.

[0066] The recording apparatus 11-1 to the recording apparatus 11-5 each include a microphone array or a ranging device and are each attached to a moving body in a recording target space. Thus, the recording apparatus 11-1 to the recording apparatus 11-5 are discretely disposed in the recording target space.

[0067] The recording apparatus 11-1 to the recording apparatus 11-5 each record an object and acquire moving body-related information, for the moving body to which the recording apparatus itself is attached, and generate object transmission data including the object and the moving body-related information.

[0068] The recording apparatus 11-1 to the recording apparatus 11-5 each transmit the generated object transmission data to the reproduction apparatus 12 by wireless communication.

[0069] Note that if the recording apparatus 11-1 to the recording apparatus 11-5 do not need to be distinguished from one another hereinafter, the recording apparatus 11-1 to the recording apparatus 11-5 will be simply referred to as recording apparatuses 11. Additionally, an example in which the recording of objects (recording of sound) at the positions of the respective moving bodies is performed by the five recording apparatuses 11 in the recording target space will be described here, but the number of recording apparatuses 11 may be any number.

[0070] The reproduction apparatus 12 receives the object transmission data transmitted from each recording apparatus 11, and generates reproduction data of a specified listening position and a specified listening direction on the basis of the object and the moving body-related information acquired for each moving body. Additionally, the reproduction apparatus 12 reproduces sound of the listening direction at the listening position on the basis of the generated reproduction data. Thus, content having the listening position and the listening direction serving as an optional position and an optional direction in the recording target space is reproduced.

[0071] For example, in a case where a sound recording target is sports, a field or the like in which the sports is to be performed is set as a recording target space, each player is set as a moving body, and the recording apparatus 11 is attached to each player.

[0072] Specifically, the recording apparatus 11 is attached to each player in a team sport played in a wide field, such as soccer, American football, rugby, or hockey, or in a competitive sport played in a wide environment, such as marathon.

[0073] The recording apparatus 11 includes a small microphone array, a ranging device, and a wireless transmission function. Additionally, in a case where the recording apparatus 11 includes storage, the object transmission data can be read from the storage after the end of the game or competition and supplied to the reproduction apparatus 12.

[0074] For example, in the recording from a position far from the recording target space, such as recording using a gun microphone from the outside of a wide field, it is difficult to collect sound in the vicinity of players due to the SN ratio limit of the microphone, and the sound field cannot be reproduced with a high sense of reality.

[0075] Meanwhile, in the sound field reproduction system to which the present technology is applied, each player is set as a moving body and an object is recorded. In particular, the recording apparatus 11 is attached to each player, and thus sound emitted by the player, walking sound, ball kick sound, and the like of the player can be recorded at a high SN ratio in a short distance from the player.

[0076] Therefore, by reproduction of the sound based on the reproduction data, a sound field that is heard by a listener facing in an optional direction (listening direction) at an optional viewpoint (listening position) in the area where the player exists can be artificially reproduced. This allows a sound field experience with a high sense of reality to be provided to a listener as if the listener were one of the players and were in the same field or the like with the players.

[0077] The object, which is recorded sound acquired for one moving body, i.e., one player, is sound in which not only voice and operation sound of the player but also sound and cheers of players in the vicinity are mixed.

[0078] Additionally, since the players move within the recording target space over time, the positions of the players, the relative distances between the players, and the directions in which the players are facing constantly fluctuate.

[0079] For that reason, in the recording apparatus 11, time-series data of the moving body position information, the moving body orientation information, and the sound collection position movement information is obtained as moving body-related information about the player (moving body). Such time series data may be smoothed in the time direction as necessary.

[0080] The reproduction apparatus 12 calculates the priority of each object on the basis of the moving body-related information of each moving body thus obtained or the like, and generates reproduction data by, for example, weighting and adding a plurality of objects in accordance with the obtained priority.

[0081] The reproduction data obtained in such a manner is audio data for reproducing in a pseudo manner the sound field that would be heard by a listener facing in an optional listening direction at an optional listening position.

[0082] Note that when the recording apparatus 11, more specifically, the microphone array of the recording apparatus 11 is attached to the player serving as a moving body, if microphones are attached at the positions of both ears of the player, binaural sound collection is performed. However, even when the microphone is attached to a portion other than the both ears of the player, the sound field can be recorded by the recording apparatus 11 with substantially the same sound volume balance or sense of localization as the sound volume balance or sense of localization from each sound source listened to by the player.

[0083] Additionally, in the sound field reproduction system, a wide space is set as a recording target space, and a sound field is recorded at each of a plurality of positions. That is, sound field recording is performed by a plurality of recording apparatuses 11 located at respective positions in the recording target space.

[0084] Normally, in the sound field recording in the recording target space performed using an integrated single microphone array or the like, if there is contact or the like between the microphone array and another object, noise of a signal due to the contact is mixed into the recorded signal obtained by recording in each of all the microphones constituting the microphone array.

[0085] Similarly, in the sound field reproduction system, for example, if there is contact between players, it is highly likely that noise due to vibrations of the contact is mixed into the objects obtained by the recording apparatuses 11 attached to those players.

[0086] However, in the sound field reproduction system, since the sound field recording is performed by the plurality of recording apparatuses 11, even at the timing when there is contact between players, there is a high possibility that noise due to vibrations of the contact between the players is not mixed into the objects obtained by the recording apparatuses 11 attached to other non-contact players. Thus, in the recording apparatus 11 attached to a player without contact, a high-quality object without contamination of noise sound can be obtained.

[0087] In the sound field reproduction system as described above, attaching the recording apparatuses 11 to a plurality of moving bodies leads to a risk distribution of noise contamination in a case where important target sound is to be recorded. Selecting and using an object having the best state, that is, an object including target sound of the best quality, among the objects obtained by the plurality of recording apparatuses 11, allows reproduction of sound having a high quality and a high sense of reality.

[0088] Further, in the sound field reproduction system, reproduction data of an optional listening position and listening direction is generated on the basis of the objects obtained by the recording apparatuses 11 discretely disposed in the recording target space. The reproduction data does not reproduce a completely physically correct sound field. However, in the sound field reproduction system, it is possible to appropriately reproduce a sound field of an optional listening position and listening direction in accordance with various circumstances in consideration of a priority, a listening position, a listening direction, a position and a direction of a moving body, and the like. In other words, in the sound field reproduction system, since the reproduction data is generated from the objects obtained by the recording apparatuses 11 discretely disposed, a sound field with a high sense of reality can be reproduced with a relatively high degree of freedom.

[0089]

[0090] Next, specific configuration examples of the recording apparatus 11 and the reproduction apparatus 12 shown in FIG. 1 will be described. First, a configuration example of the recording apparatus 11 will be described.

[0091] The recording apparatus 11 is configured, for example, as shown in FIG. 2.

[0092] In the example shown in FIG. 2, the recording apparatus 11 includes a microphone array 41, a recording unit 42, a ranging device 43, an encoding unit 44, and an output unit 45.

[0093] The microphone array 41 collects ambient sound (sound field) around a moving body to which the recording apparatus 11 is attached, and supplies the resulting recording signal as an object to the recording unit 42.

[0094] The recording unit 42 performs analog-to-digital (AD) conversion or amplification processing on the object supplied from the microphone array 41, and supplies the obtained object to the encoding unit 44.

[0095] The ranging device 43 includes, for example, a position measuring sensor such as a GPS, the recording apparatus 11, i.e., a 9-axis sensor for measuring a movement speed and an acceleration of the moving body and a direction (orientation) in which the moving body faces, or the like.

[0096] The ranging device 43 measures, for the moving body to which the recording apparatus 11 is attached, moving body position information indicating a position of the moving body, moving body orientation information indicating a direction in which the moving body faces, i.e., an orientation of the moving body, and sound collection position movement information indicating a movement speed of the moving body and an acceleration at the time of movement, and supplies the measurement result to the encoding unit 44.

[0097] Note that the ranging device 43 may include a camera, an acceleration sensor, and the like. For example, in a case where the ranging device 43 includes a camera, the moving body position information, the moving body orientation information, and the sound collection position movement information can also be obtained from a video (image) captured by that camera.

[0098] The encoding unit 44 encodes the object supplied from the recording unit 42 and moving body-related information including the moving body position information, the moving body orientation information, and the sound collection position movement information supplied from the ranging device 43, and generates object transmission data.

[0099] In other words, the encoding unit 44 packs the object and the moving body-related information and generates the object transmission data.

[0100] Note that when the object transmission data is generated, the object and the moving body-related information may be compression-encoded or may be stored as it is in a packet of the object transmission data or the like.

[0101] The encoding unit 44 supplies the object transmission data generated by encoding to the output unit 45.

[0102] The output unit 45 outputs the object transmission data supplied from the encoding unit 44.

[0103] For example, in a case where the output unit 45 has a wireless transmission function, the output unit 45 wirelessly transmits the object transmission data to the reproduction apparatus 12.

[0104] Additionally, for example, in a case where the recording apparatus 11 includes storage, i.e., a storage unit such as a non-volatile memory, the output unit 45 outputs the object transmission data to the storage unit and records the object transmission data in the storage unit. In this case, at an optional timing, the object transmission data recorded in the storage unit is directly or indirectly read by the reproduction apparatus 12.

[0105]

[0106] Additionally, in the recording apparatus 11, the object may be subjected to beamforming, which emphasizes the sound of a predetermined desired sound source, that is, target sound or the like, or subjected to noise reduction (NR) processing or the like.

[0107] In such a case, the recording apparatus 11 is configured as shown in FIG. 3, for example. Note that portions in FIG. 3 corresponding to those in FIG. 2 will be denoted by the same reference numerals, and description thereof will be omitted as appropriate.

[0108] The recording apparatus 11 shown in FIG. 3 includes a microphone array 41, a recording unit 42, a signal processing unit 71, a ranging device 43, an encoding unit 44, and an output unit 45.

[0109] The configuration of the recording apparatus 11 shown in FIG. 3 is a configuration in which the signal processing unit 71 is newly provided between the recording unit 42 and the encoding unit 44 of the recording apparatus 11 shown in FIG. 2.

[0110] The signal processing unit 71 performs beamforming or NR processing on the object supplied from the recording unit 42 by using the moving body-related information supplied from the ranging device 43 as necessary, and supplies the resulting object to the encoding unit 44.

[0111] Additionally, the signal processing unit 71 is configured as shown in FIG. 4, for example. That is, the signal processing unit 71 shown in FIG. 4 includes an interval detection unit 101, a beamforming unit 102, and an NR unit 103.

[0112] The interval detection unit 101 performs interval detection on the object supplied from the recording unit 42 by using the moving body-related information supplied from the ranging device 43 as necessary, and supplies the detection result to the beamforming unit 102 and the NR unit 103.

[0113] For example, the interval detection unit 101 includes a detector for a predetermined target sound and a detector for a predetermined non-target sound, and detects an interval of the target sound or the non-target sound in the object by an arithmetic operation based on the detectors.

[0114] The interval detection unit 101 then outputs, as a result of the interval detection, information indicating an interval in which each target sound or non-target sound in the object serving as a time signal is detected, i.e., information indicating an interval of the target sound or an interval of the non-target sound. In such a manner, in the interval detection, the presence or absence of the target sound or the non-target sound in each time interval of the object is detected.

[0115] Here, the predetermined target sound is, for example, a ball sound such as a kick sound of a soccer ball, an utterance of a player as a moving body, a foot sound (walking sound) of the player, or an operation sound such as a gesture.

[0116] In contrast to the above, the non-target sound is sound that is unfavorable as content sound or the like. Specifically, for example, the non-target sound includes a wind sound (wind noise), a rubbing sound of player’s clothing, some vibration sounds, a contact sound between the player and another player or a matter, an environmental sound such as cheers, an utterance sound related to a strategy of a competition or privacy, an utterance sound of predetermined unfavorable no good words such as jeering, and other noise sounds (noises).

[0117] Additionally, when the interval is detected, the moving body-related information is used as necessary.

[0118] For example, if the sound collection position movement information included in the moving body-related information is referred to, it is possible to specify whether the moving body is moving or stationary. In this regard, for example, when the moving body is moving, the interval detection unit 101 detects a specific noise sound or determines an interval of the specific noise sound. Conversely, when the moving body is not moving, the interval detection unit 101 does not perform the detection of the specific noise sound or determines that it is not an interval of the specific noise sound.

[0119] Additionally, for example, in a case where the amount of movement or the like of the moving body is included as a parameter of the detectors for detecting the target sound and the non-target sound, the interval detection unit 101 obtains the amount of movement or the like of the moving body from the time-series moving body position information, time-series sound collection position movement information, and the like, and performs an arithmetic operation based on the detectors by using the amount of movement or the like.

[0120] The beamforming unit 102 performs beamforming on the object supplied from the recording unit 42, by using the result of the interval detection supplied from the interval detection unit 101 and the moving body-related information supplied from the ranging device 43 as necessary.

[0121] That is, for example, the beamforming unit 102 suppresses (reduces) a predetermined directional noise or emphasizes sound arriving from a specific direction by multi-microphone beamforming on the basis of the moving body orientation information or the like serving as the moving body-related information.

[0122] Additionally, in the multi-microphone beamforming, for example, an excessively large target sound such as a loud voice of the player included in the object or an unnecessary non-target sound such as environmental sound can be suppressed by reversing the phases of the components of such sound on the basis of the result of the interval detection. In addition, in the multi-microphone beamforming, for example, necessary target sound such as a kick sound of a ball included in the object can be emphasized by making the phases thereof equal on the basis of the result of the interval detection.

[0123] The beamforming unit 102 supplies the object, which is obtained by emphasizing or suppressing a predetermined sound source component by beamforming, to the NR unit 103.

[0124] The NR unit 103 performs NR processing on the object supplied from the beamforming unit 102 on the basis of the result of the interval detection supplied from the interval detection unit 101, and supplies the resulting object to the encoding unit 44.

[0125] For example, in the NR processing, among the components included in the object, the components of non-target sound or the like such as a wind sound, a rubbing sound of clothing, a relatively steady and unnecessary environmental sound, and predetermined noises are suppressed.

[0126]

[0127] Subsequently, a configuration example of the reproduction apparatus 12 shown in FIG. 1 will be described.

[0128] For example, the reproduction apparatus 12 is configured as shown in FIG. 5.

[0129] The reproduction apparatus 12 is a signal processing apparatus that generates reproduction data on the basis of the acquired object transmission data. The reproduction apparatus 12 shown in FIG. 5 includes an acquisition unit 131, a decoding unit 132, a signal processing unit 133, a reproduction unit 134, and a speaker 135.

[0130] The acquisition unit 131 acquires the object transmission data output from the recording apparatus 11, and supplies the object transmission data to the decoding unit 132. The acquisition unit 131 acquires the object transmission data from all the recording apparatuses 11 in the recording target space.

[0131] For example, when the object transmission data is transmitted wirelessly from the recording apparatus 11, the acquisition unit 131 receives the object transmission data transmitted from the recording apparatus 11, thus acquiring the object transmission data.

[0132] Additionally, for example, when the object transmission data is recorded in the storage of the recording apparatus 11, the acquisition unit 131 acquires the object transmission data by reading the object transmission data from the recording apparatus 11. Note that in a case where the object transmission data is output from the recording apparatus 11 to an external apparatus or the like and held in the external apparatus, the object transmission data may be acquired by reading the object transmission data from that apparatus or the like.

[0133] The decoding unit 132 decodes the object transmission data supplied from the acquisition unit 131 and supplies the resulting object and moving body-related information to the signal processing unit 133. In other words, the decoding unit 132 extracts the object and the moving body-related information by performing unpacking of the object transmission data and supplies the extracted object and moving body-related information to the signal processing unit 133.

[0134] The signal processing unit 133 performs beamforming or NR processing on the basis of the moving body-related information and the object supplied from the decoding unit 132, generates reproduction data in a predetermined format, and supplies the reproduction data to the reproduction unit 134.

[0135] The reproduction unit 134 performs digital-to-analog (DA) conversion or amplification processing on the reproduction data supplied from the signal processing unit 133, and supplies the resulting reproduction data to the speaker 135. The speaker 135 reproduces a pseudo sound (simulated sound) in the listening position and the listening direction in the recording target space, on the basis of the reproduction data supplied from the reproduction unit 134.

[0136] Note that the speaker 135 may be a single speaker unit or may be a speaker array including a plurality of speaker units.

[0137] Additionally, while the case where the acquisition unit 131 to the speaker 135 are provided in a single apparatus will be described here, for example, a part of the blocks constituting the reproduction apparatus 12, such as the acquisition unit 131 to the signal processing unit 133, may be provided in another apparatus.

[0138] For example, the acquisition unit 131 to the signal processing unit 133 may be provided in a server on a network, and reproduction data may be supplied from the server to a reproduction apparatus including the reproduction unit 134 and the speaker 135. Alternatively, the speaker 135 may be provided outside the reproduction apparatus 12.

[0139] Further, the acquisition unit 131 to the signal processing unit 133 may be provided in a personal computer, a game machine, a portable device, or the like, or may be achieved by a cloud on the network.

[0140] Additionally, the signal processing unit 133 is configured, for example, as shown in FIG. 6.

[0141] The signal processing unit 133 shown in FIG. 6 includes a synchronization calculation unit 161, an interval detection unit 162, a beamforming unit 163, an NR unit 164, and a rendering unit 165.

[0142] The synchronization calculation unit 161 performs synchronization detection on the plurality of objects supplied from the decoding unit 132, synchronizes the objects of all the moving bodies on the basis of the detection result, and supplies the synchronized objects of the respective moving bodies to the interval detection unit 162 and the beamforming unit 163.

[0143] For example, in the synchronization detection, an offset between the microphone arrays 41 and a clock drift, which is the difference in clock cycle between the transmission side and the reception side of the object, i.e., the object transmission data, are detected. The synchronization calculation unit 161 synchronizes all the objects on the basis of the detection results of the offsets and the clock drifts.

[0144] For example, in the recording apparatus 11, the microphones constituting the microphone array 41 are synchronized with each other, and thus the processing of synchronizing the signals of the respective channels of the object is unnecessary. On the other hand, the reproduction apparatus 12 handles the objects obtained by the plurality of recording apparatuses 11, and thus needs to synchronize the objects.

[0145] The interval detection unit 162 performs interval detection on each object supplied from the synchronization calculation unit 161 on the basis of the moving body-related information supplied from the decoding unit 132, and supplies the detection result to the beamforming unit 163, the NR unit 164, and the rendering unit 165.

[0146] The interval detection unit 162 includes a detector for predetermined target sound or non-target sound and performs interval detection similar to that in the case of the interval detection unit 101 of the recording apparatus 11. In particular, the sound of a sound source to be the target sound or non-target sound in the interval detection unit 162 is the same as the sound of a sound source to be the target sound or non-target sound in the interval detection unit 101.

[0147] The beamforming unit 163 performs beamforming on each object supplied from the synchronization calculation unit 161, by using the result of the interval detection supplied from the interval detection unit 162 and the moving body-related information supplied from the decoding unit 132 as necessary.

[0148] That is, the beamforming unit 163 corresponds to the beamforming unit 102 of the recording apparatus 11, and performs the processing similar to that in the case of the beamforming unit 102 to suppresses or emphasizes the sound or the like of a predetermined sound source by beamforming.

[0149] Note that in the beamforming unit 163, basically, a sound source component similar to that in the case of the beamforming unit 102 is suppressed or emphasized. However, in the beamforming unit 163, the moving body-related information of another moving body can also be used in beamforming for an object of a predetermined moving body.

[0150] Specifically, for example, when there is another moving body near a moving body to be processed, a sound component of the other moving body, which is included in the object of the moving body to be processed, may be suppressed. In this case, for example, when a distance from the moving body to be processed to the other moving body obtained from the moving body position information of each moving body is equal to or smaller than a predetermined threshold value, the sound component of the other moving body may be suppressed by suppressing the sound arriving from a direction of the other moving body viewed from the moving body to be processed.

……
……
……

本文链接：https://patent.nweon.com/17649

Sony Patent | Signal processing apparatus and method, and program

您可能还喜欢...

分类

最新AR/VR行业分享

最新AR/VR论文

最新AR/VR行业招聘

Sony Patent | Signal processing apparatus and method, and program

您可能还喜欢...

Sony Patent | Information Processing Device, Notification State Control Method, And Program

Sony Patent | Image Display Device And Display Apparatus

Sony Patent | Head Mounted Displays (Hmds) With Front Facing Cameras For Transitioning Between Non-Transparent Modes And Transparent Modes

分类

最新AR/VR行业分享

最新AR/VR论文

最新AR/VR行业招聘