Sony Patent | Information Processing Apparatus, Information Processing Method And Program

编辑：映维 | 分类：Sony | 2020年7月10日

Patent: Information Processing Apparatus, Information Processing Method And Program

Publication Number: 20200221245

Publication Date: 20200709

Applicants: Sony

Abstract

An information processing apparatus includes a calculation section that calculates a relative position of a sound source of a virtual object to a user, the virtual object allowing a user to perceive that the virtual object exists in a real space by sound image localization, on the basis of a position of a sound image of the virtual object and a position of the user, a sound image localization section that performs a sound signal process of the sound source such that the sound image is localized at the calculated localization position, and a sound image position holding section that holds the position of the sound image. When sound to be emitted from the virtual object is to be changed over, the calculation section may refer to the position of the sound image held in the sound image position holding section to calculate the position of the sound image.

TECHNICAL FIELD

[0001] The present technique relates to an information processing apparatus, an information processing method and a program and, and particularly to an information processing apparatus, an information processing method and a program suitable for application, for example, to an AR (Augmented Reality) game and so forth.

BACKGROUND ART

[0002] Together with the progress of information processing and information communication technologies, a computer is widely spread and is positively utilized also for support of daily file and amusement. Recently, computer processing is utilized also in the field of entertainment, and such entertainment as just described is not only utilized by a user who works in a specific place such as an office or a home but also demanded by a user who is moving.

[0003] Regarding entertainment during movement, for example, PTL 1 specified below proposes an information processing apparatus in which an interaction of a character displayed on a screen is controlled in response to a rhythm of the body of a user during movement to get a sense of intimacy of the user to allow the user to enjoy the movement itself as entertainment.

CITATION LIST

Patent Literature

[0004] [PTL 1]

[0005] Japanese Patent Laid-Open No. 2003-305278

SUMMARY

Technical Problem

[0006] However, in PTL 1 mentioned above, since an image of a character is displayed on a display screen, the entertainment cannot be enjoyed in a case where it is difficult to watch a screen image during walking or running. Further, it is desired to make it possible for a user to enjoy for a longer period of type on an information processing apparatus for entertaining a user.

[0007] The present technique has been made in view of such a situation as described above and makes it possible to entertain a user.

Solution to Problem

[0008] An information processing apparatus of one aspect of the present technique includes a calculation section that calculates a relative position of a sound source of a virtual object to a user, the virtual object allowing the user to perceive such that the virtual object exists in a real space by sound image localization, on the basis of a position of a sound image of the virtual object and a position of the user, a sound image localization section that performs a sound signal process of the sound source such that the sound image is localized at the calculated localization position, and a sound image position holding section that holds the position of the sound image, and in which, when sound to be emitted from the virtual object is to be changed over, in a case where a position of a sound image of sound after the changeover is to be set to a position that takes over a position of the sound image of the sound before the changeover, the calculation section refers to the position of the sound image held in the sound image position holding section to calculate the position of the sound image.

[0009] An information processing method of the one aspect of the present technique includes the steps of calculating a relative position of a sound source of a virtual object to a user, the virtual object allowing the user to perceive such that the virtual object exists in a real space by sound image localization, on the basis of a position of a sound image of the virtual object and a position of the user, performing a sound signal process of the sound source such that the sound image is localized at the calculated localization position, and updating the position of the held sound image, and in which, when sound to be emitted from the virtual object is to be changed over, in a case where a position of a sound image of sound after the changeover is to be set to a position that takes over a position of the sound image of the sound before the changeover, the held position of the sound image is referred to to calculate the position of the sound image.

[0010] A program of the one aspect of the present technique is for causing a computer to execute a process including the steps of calculating a relative position of a sound source of a virtual object to a user, the virtual object allowing the user to perceive such that the virtual object exists in a real space by sound image localization, on the basis of a position of a sound image of the virtual object and a position of the user, performing a sound signal process of the sound source such that the sound image is localized at the calculated localization position, and updating the position of the held sound image, and in which, when sound to be emitted from the virtual object is to be changed over, in a case where a position of a sound image of sound after the changeover is to be set to a position that takes over a position of the sound image of the sound before the changeover, the held position of the sound image is referred to to calculate the position of the sound image.

[0011] In the information processing apparatus, information processing method and program of the one aspect of the present technique, a relative position of a sound source of a virtual object, which allows a user to perceive such that the virtual object exists in a real space by sound image localization, to the user is calculated on the basis of a position of a sound image of the virtual object and a position of the user, and a sound signal process of the sound source is performed such that the sound image is localized at the calculated localization position and the held position of the sound image is updated. Further, when sound to be emitted from the virtual object is to be changed over, in a case where a position of a sound image of sound after the changeover is to be set to a position that takes over a position of the sound image of the sound before the changeover, the position of the sound image held in the sound image position holding section is referred to to calculate the position of the sound image.

[0012] It is to be noted that the information processing apparatus may be an independent apparatus or may be an internal block configuring one apparatus.

[0013] Further, the program can be provided by transmission through a transmission medium or as a recording medium on which it is recorded.

Advantageous Effect of Invention

[0014] With the one aspect of the present technique, it can entertain its user.

[0015] It is to be noted that the advantageous effect described herein is not necessarily restrictive and may be any advantageous effect disclosed in the present disclosure.

BRIEF DESCRIPTION OF DRAWINGS

[0016] FIG. 1 is a view illustrating an outline of an information processing apparatus to which the present technique is applied.

[0017] FIG. 2 is a perspective view depicting an example of an appearance configuration of the information processing apparatus to which the present technique is applied.

[0018] FIG. 3 is a block diagram depicting an example of an internal configuration of the information processing apparatus.

[0019] FIG. 4 is a view illustrating physique data of a user.

[0020] FIG. 5 is a flow chart illustrating operation of the information processing apparatus.

[0021] FIG. 6 is a view illustrating a sound image.

[0022] FIG. 7 is a view illustrating sound image animation.

[0023] FIG. 8 is a view illustrating sound image animation.

[0024] FIG. 9 is a view illustrating sound image animation.

[0025] FIG. 10 is a view illustrating sound image animation.

[0026] FIG. 11 is a view illustrating content.

[0027] FIG. 12 is a view illustrating a configuration of a node.

[0028] FIG. 13 is a view illustrating a configuration of a key frame.

[0029] FIG. 14 is a view illustrating interpolation between key frames.

[0030] FIG. 15 is a view illustrating sound image animation.

[0031] FIG. 16 is a view illustrating sound image animation.

[0032] FIG. 17 is a view illustrating takeover of sound.

[0033] FIG. 18 is a view illustrating takeover of sound.

[0034] FIG. 19 is a view illustrating takeover of sound.

[0035] FIG. 20 is a view illustrating a configuration of a control section.

[0036] FIG. 21 is a flow chart illustrating operation of the control section.

[0037] FIG. 22 is a flow chart illustrating operation of the control section.

[0038] FIG. 23 is a view illustrating a recording medium.

DESCRIPTION OF EMBODIMENT

[0039] In the following, a mode for carrying out the present technique (hereinafter referred to as an embodiment) is described.

Outline of Information Processing Apparatus According to Embodiment of Present Disclosure

[0040] First, an outline of an information processing apparatus according to the embodiment of the present disclosure is described with reference to FIG. 1. As depicted in FIG. 1, the information processing apparatus 1 according to the present embodiment is, for example, a neckband type information processing apparatus capable of being worn on the neck of a user A, and includes a speaker and various sensors (an acceleration sensor, a gyroscope sensor, a geomagnetism sensor, an absolute position measurement section and so forth). Such an information processing apparatus 1 as just described has a function for allowing the user to sense such that a virtual character 20 really exists in the real space by a sound image localization technique for disposing sound information spatially. It is to be noted that the virtual character 20 is an example of a virtual object. The virtual object may be an object such as a virtual radio or a virtual musical instrument, an object that generates noise in the city (for example, sound of a car, sound of a railway crossing, chat sound in a crowd or the like) or the like may be used.

[0041] Therefore, the information processing apparatus 1 according to the present embodiment makes it possible to suitably calculate a relative three-dimensional position for positioning sound for causing a virtual character to be sensed on the basis of a state of a user and information of a virtual character and then present the presence of the virtual object in the real space with a higher degree of reality. In particular, for example, the information processing apparatus 1 can calculate a relative height for positioning voice of a virtual character to perform sound image localization on the basis of a height and a state of the user A (standing, sitting or the like) and height information of the virtual character such that the size of the virtual character is actually sensed by the user.

[0042] Further, the information processing apparatus 1 can vary sound of the virtual character in response to a state or movement of the user A to implement that reality is applied to movement of the virtual character. At this time, the information processing apparatus 1 performs control so as to localize a corresponding portion of the virtual character on the basis of a type of sound such that sound of the voice of the virtual character is localized at the mouth (head) of the virtual character while footsteps of the virtual character are localized at the feet of the virtual character.

[0043] An outline of the information processing apparatus 1 according to the present embodiment has been described. Now, a configuration of the information processing apparatus 1 according to the present embodiment is described with reference to FIGS. 2 and 3.

[0044]

[0045] FIG. 2 is a perspective view depicting an example of an appearance configuration of the information processing apparatus 1 according to the present embodiment. The information processing apparatus 1 is a so-called wearable terminal. As depicted in FIG. 2, the neckband type information processing apparatus 1 has a mounting unit having a shape over one half circumference from both sides of the neck to the rear side (back side) (housing configured for mounting), and is mounted on the user by being worn on the neck of the user. In FIG. 2, a perspective view in a state in which the user wears the mounting unit is depicted.

[0046] It is to be noted that, although, in the present document, a word indicating a direction such as, upward, downward, leftward, rightward, forward or rearward, it is assumed that the directions individually indicate directions as viewed from the center of the body of the user (for example, a position of the pit of the stomach) in an uprightly standing posture of the user. For example, it is assumed that “right” indicates a direction of the right half body side of the user and “left” indicates the direction of the left half body side of the user, and “up” indicates the direction of the head side of the user and “down” indicates the direction of the foot side of the user. Further, it is assumed that “front” indicates the direction in which the body of the user is directed and “rear” indicates the direction of the back side of the user.

[0047] As depicted in FIG. 2, the mounting unit may be worn in a closely contacting relationship with the neck of the user or may be worn in a spaced relationship from the neck of the user. It is to be noted that, as a different shape of a neck wearing type mounting unit, for example, a pendant type worn by the user through a neck strap or a headset type having a neck band passing the rear side of the neck in place of a headband to be worn on the head is conceivable.

[0048] Further, a usage of the mounting unit may be a mode in which it is used in a state directly mounted on the human body. The mode in which the mounting unit is used in a state directly mounted signifies a mode in which the mounting unit is used in a state in which no object exists between the mounting unit and the human body. For example, a mode in which the mounting unit depicted in FIG. 2 is mounted so as to contact with the skin of the neck of the user is applicable as the mode described above. Further, various other modes such as a headset type directly mounted on the head or a glass type are conceivable.

[0049] Alternatively, the usage of the mounting unit may be a mode in which the mounting unit is used in an indirectly mounted relationship on the human body. The mode in which the mounting unit is used in an indirectly mounted state signifies a mode in which the mounting unit is used in a state in which some object exists between the mounting unit and the human body. For example, the case where the mounting unit is mounted so as to contact with the user through clothes as in a case in which the mounting unit depicted in FIG. 2 is mounted so as to hide under a collar of a shirt is applicable as the present mode. Further, various modes such as a pendant type mounted on the user by a neck strap or a brooch type fixed by a fastener on clothes are conceivable.

[0050] Further, as depicted in FIG. 2, the information processing apparatus 1 includes a plurality of microphones 12 (12A, 12B), cameras 13 (13A, 13B) and speakers 15 (15A, 15B). The microphone 12 acquires sound data such as user sound or peripheral environment sound. The cameras 13 captures images of the surroundings and acquire image data. Further, the speakers 15 perform reproduction of sound data. Especially, the speakers 15 according to the present embodiment reproduce a sound signal after a sound image localization process of a virtual character for allowing a user to sense such that the virtual character actually exists in the real space.

[0051] In this manner, the information processing apparatus 1 is configured such that it at least includes a housing that incorporates a plurality of speakers for reproducing a sound signal after the sound image positioning process and is configured for mounting on part of the body of the user.

[0052] It is to be noted that, while FIG. 2 depicts the configuration that the two microphones 12, two cameras 13 and two speakers 15 are provided on the information processing apparatus 1, the present embodiment is not limited to this. For example, the information processing apparatus 1 may include one microphone 12 and one camera 13 or may include three or more microphones 12, three or more cameras 13 and three or more speakers 15.

[0053]

[0054] Now, an internal configuration of the information processing apparatus 1 according to the present embodiment is described referring to FIG. 3. FIG. 3 is a block diagram depicting an example of an internal configuration of the information processing apparatus 1 according to the present embodiment. As depicted in FIG. 3, the information processing apparatus 1 includes a control section 10, a communication section 11, a microphone 12, a camera 13, a nine-axis sensor 14, a speaker 15, a position measurement section 16 and a storage section 17.

[0055] The control section 10 functions as an arithmetic operation processing apparatus and a control apparatus, and controls overall operation in the information processing apparatus 1 in accordance with various programs. The control section 10 is implemented by electronic circuitry such as, for example, a CPU (Central Processing Unit) or a microprocessor. Further, the control section 10 may include a ROM (Read Only Memory) that stores programs, arithmetic operation parameters and so forth to be used and a RAM (Random Access Memory) that temporarily stores parameters and so forth that change suitably.

[0056] Further, as depicted in FIG. 3, the control section 10 according to the present embodiment functions as a state-behavior detection section 10a, a virtual character behavior determination section 10b, a scenario updating section 10c, a relative position calculation section 10d, a sound image localization section 10e, a sound output controlling section 10f and a reproduction history-feedback storage controlling section 10g.

[0057] The state-behavior detection section 10a performs detection of a state of a user and recognition of a behavior based on the detected state and outputs the detected state and the recognized behavior to the virtual character behavior determination section 10b. In particular, the state-behavior detection section 10a acquires such information as position information, a moving speed, an orientation, a height of the ear (or head) as information relating to the state of the user. The user state is information that can be uniquely specified at the detected timing and can be calculated and acquired as numerical values from various sensors.

[0058] For example, the position information is acquired from the position measurement section 16. Further, the moving speed is acquired from the position measurement section 16, the acceleration sensor included in the nine-axis sensor 14, the camera 13 or the like. The orientation is acquired from the gyro sensor, acceleration sensor and geomagnetic sensor included in the nine-axis sensor 14 or from the camera 13. The height of the ear (or the head) is acquired from physique data of the user, the acceleration sensor and the gyro sensor. Further, the moving speed and the orientation may be acquired using SLAM (Simultaneous Localization and Mapping) for calculating a movement on the basis of a change of feature points in videos when the surroundings are successively imaged using the camera 13.

[0059] Meanwhile, the height of the ear (or the head) can be calculated on the basis of physique data of the user. As the physique data of the user, the stature H1, sitting height H2 and distance H3 from the ear to the top of the head are set, for example, as depicted in a left view in FIG. 4 and stored into the storage section 17. The state-behavior detection section 10a calculates the height of the ear, for example, in the following manner. It is to be noted that “E1 (inclination of the head)” can be detected as an inclination of the upper body as depicted in a right view in FIG. 4 by the acceleration sensor, the gyro sensor or the like.

[0060] (Expression 1) In a case where the user stands uprightly:

Height of ear=stature-sitting height+(sitting height-distance from ear to top of head).times.E1 (inclination of head)

[0061] (Expression 2) In a case where the user is sitting/lying:

Height of ear=(sitting height-distance from ear to top of head).times.E1 (inclination of head)

[0062] The physique data of the user may be generated by other formulae.

[0063] Also it is possible for the state-behavior detection section 10a to recognize a user behavior by referring to the preceding and succeeding states. As the user behavior, for example, “stopping,” “walking,” “running,” “sitting,” “lying,” “in a car,” “riding a bicycle,” “oriented to a character” and so forth are supposed. Also it is possible for the state-behavior detection section 10a to recognize a user behavior using a predetermined behavior recognition engine on the basis of information detected by the nine-axis sensor 14 (acceleration sensor, gyro sensor and geomagnetic sensor) and the position information detected by the position measurement section 16.

[0064] The virtual character behavior determination section 10b determines a virtual behavior in the real space of the virtual character 20 in response to the user behavior recognized by the state-behavior detection section 10a (or including also selection of a scenario) and selects a sound content corresponding to the determined behavior from a scenario.

[0065] For example, the virtual character behavior determination section 10b can present the presence of the virtual character by causing the virtual character to take a same action as that of the user such that, for example, when the user is walking, the virtual character behavior determination section 10b causes also the virtual character 20 to walk, but when the user is running, the virtual character behavior determination section 10b causes the virtual character 20 to run in such a manner as to follow the user.

[0066] Further, after a behavior of the virtual character is determined, the virtual character behavior determination section 10b selects, from within a sound source list (sound contents) stored in advance as scenarios of contents, a sound source corresponding to the behavior of the virtual character. Thereupon, in regard to a sound source having a limited number of reproductions, the virtual character behavior determination section 10b decides permission/inhibition of reproduction on the basis of a reproduction log. Further, the virtual character behavior determination section 10b may select a sound source that corresponds to the behavior of the virtual character and meets preferences of the user (a sound source of a favorite virtual character or the like) or a sound source of a specific virtual character tied with the present location (place).

[0067] For example, in a case where the determined behavior of the virtual character is that the virtual character is stopping, the virtual character behavior determination section 10b selects a sound content of voice (for example, lines, breath or the like), but in a case where the determined behavior is that the virtual character is walking, the virtual character behavior determination section 10b selects a sound content of voice and another sound content of footsteps. Further, in a case where the determined behavior of the virtual character is that the virtual character is running, the virtual character behavior determination section 10b selects shortness of breath or the like as a sound content. In this manner, a sound content is selected and selective sounding according to the behavior is executed (in other words, a sound content that does not correspond to the behavior is not selected and not reproduced).

[0068] Since the scenario progresses through selection of a sound content corresponding to the behavior of the virtual character determined by the virtual character behavior determination section 10b from within the scenario, the scenario updating section 10c performs updating of the scenario. The scenario is stored, for example, in the storage section 17.

[0069] The relative position calculation section 10d calculates a relative three-dimensional position (xy coordinate positions and height) for localizing a sound source of the virtual character (sound content) selected by the virtual character behavior determination section 10b. In particular, the relative position calculation section 10d first sets a position of a portion of a virtual character corresponding to a type of a sound source by referring to the behavior of the virtual character determined by the virtual character behavior determination section 10b. The relative position calculation section 10d outputs the calculated sound localization position (three-dimensional position) for each sound content to the sound image localization section 10e.

[0070] The sound image localization section 10e performs a sound signal process for a sound content such that a corresponding sound content (sound source) selected by the virtual character behavior determination section 10b is localized at the sound image localization position for each content calculated by the relative position calculation section 10d.

[0071] The sound output controlling section 10f controls such that a sound signal processed by the sound image localization section 10e is reproduced by the speaker 15. Consequently, the information processing apparatus 1 according to the present embodiment can localize a sound image of a sound content, which corresponds to a movement of the virtual character according to a state and behavior of the user, at an appropriate position, distance and height to the user, presents reality in movement and size of the virtual character and increase the presence of the virtual character in the real space.

[0072] The reproduction history-feedback storage controlling section 10g controls such that a sound source (sound content) outputted in sound from the sound output controlling section 10f is stored as a history (reproduction log) into the storage section 17. Further, the reproduction history-feedback storage controlling section 10g controls such that, when sound is outputted by the sound output controlling section 10f, such a reaction of the user that the user turns to a direction of the voice or stops and listens to a story is stored as feedback into the storage section 17. Consequently, the control section 10 is enabled to learn user’s tastes and the virtual character behavior determination section 10b described above can select a sound content according to the user’s tastes.

[0073] The communication section 11 is a communication module for performing transmission and reception of data to and from a different apparatus by wired/wireless communication. The communication section 11 wirelessly communicate with an external apparatus directly or through a network access point by such a method as, for example, a wired LAN (Local Area Network), a wireless LAN, Wi-Fi (Wireless Fidelity, registered trademark), infrared communication, Bluetooth (registered trademark) or a near field/contactless communication method.

[0074] For example, in a case where the functions of the control section 10 described above are included in a different apparatus such as a smartphone or a server on the cloud, the communication section 11 may transmit data acquired by the microphone 12, camera 13 or nine-axis sensor 14. In this case, behavior determination of a virtual character, selection of a sound content, calculation of a sound image localization position, a sound image localization process and so forth are performed by the different apparatus. Further, in a case where, for example, the microphone 12, camera 13 or nine-axis sensor 14 is provided in the different apparatus, the communication section 11 may receive data acquired by them and output the data to the control section 10. Further, the communication section 11 may receive a sound content selected by the control section 10 from a different apparatus such as a server on the cloud.

[0075] The microphone 12 collects voice of the user and sound of an ambient environment and outputs them as sound data to the control section 10.

[0076] The camera 13 includes a lens system configured from an imaging lens, a diaphragm, a zoom lens, a focusing lens and so forth, a driving system for causing the lens system to perform focusing operation and zooming operation, a solid-state imaging element array for photoelectrically converting imaging light obtained by the lens system to generate an imaging signal, and so forth. The solid-state imaging element array may be implemented, for example, by a CCD (Charge Coupled Device) sensor array or a CMOS (Complementary Metal Oxide Semiconductor) sensor array.

[0077] For example, the camera 13 may be provided for imaging the front from the user in a state in which the information processing apparatus 1 (mounting unit) is mounted on the user. In this case, the camera 13 can image a movement of the surrounding landscape, for example, according to the movement of the user. Further, the camera 13 may be provided for imaging the face of the user in a state in which the information processing apparatus 1 is mounted on the user. In this case, the information processing apparatus 1 can specify the position of the ear or the facial expression of the user from the captured image. Further, the camera 13 outputs data of the captured image in the form of a digital signal to the control section 10.

[0078] The nine-axis sensor 14 includes a three-axis gyro sensor (detection of angular velocities (rotational speeds)), a three-axis acceleration sensor (also called G sensor: detection of accelerations upon movement) and a three-axis geomagnetism sensor (compass: detection of an absolution direction (orientation)). The nine-axis sensor 14 has a function of sensing a state of a user who mounts the information processing apparatus 1 thereon or a surrounding situation. It is to be noted that the nine-axis sensor 14 is an example of a sensor section and the present embodiment is not limited to this, and, for example, a velocity sensor, a vibration sensor or the like may be used further or at least one of an acceleration sensor, a gyro sensor or a geomagnetism sensor may be used.

[0079] Further, the sensor section may be provided in an apparatus different from the information processing apparatus 1 (mounting unit) or may be provided dispersedly in a plurality of apparatus. For example, the acceleration sensor, gyro sensor and geomagnetism sensor may be provided on a device mounted on the head (for example, an earphone) and the acceleration sensor or the vibration sensor may be provided on a smartphone. The nine-axis sensor 14 outputs information indicative of a sensing result to the control section 10.

[0080] The speaker 15 reproduces an audio signal processed by the sound image localization section 10e under the control of the sound output controlling section 10f. Further, also it is possible for the speaker 15 to convert a plurality of sound sources of arbitrary positions/directions into stereo sound and output the stereo sound.

[0081] The position measurement section 16 has a function for detecting the present position of the information processing apparatus 1 on the basis of an acquisition signal from the outside. In particular, for example, the position measurement section 16 is implemented by a GPS (Global Positioning System) measurement section, and receives radio waves from GPS satellites to detect the position at which the information processing apparatus 1 exists and outputs the detected position information to the control section 10. Further, the information processing apparatus 1 may detect the position, in addition to the GPS, by transmission and reception, for example, by Wi-Fi (registered trademark), Bluetooth (registered trademark), a portable telephone set, a PHS, a smartphone and so forth or by near field communication or the like.

[0082] The storage section 17 stores programs and parameters for allowing the control section 10 to execute the functions described above. Further, the storage section 17 according to the present embodiment stores scenarios (various sound contents), setting information of virtual characters (shape, height and so forth) and user information (name, age, home, occupation, workplace, physique data, hobbies and tastes, and so forth). It is to be noted that at least part of information stored in the storage section 17 may be stored in a different apparatus such as a server on the cloud or the like.

[0083] The configuration of the information processing apparatus 1 according to the present embodiment has been described particularly.

[0084]

[0085] Subsequently, a sound process of the information processing apparatus 1 according to the present embodiment is described with reference to FIG. 5. FIG. 5 is a flow chart depicting the sound process according to the present embodiment.

[0086] As depicted in FIG. 5, first at step S101, the state-behavior detection section 10a of the information processing apparatus 1 detects a user state and behavior on the basis of information detected by various sensors (microphone 12, camera 13, nine-axis sensor 14 or position measurement section 16).

[0087] At step S102, the virtual character behavior determination section 10b determines a behavior of a virtual character to be reproduced in response to the detected state and behavior of the user. For example, the virtual character behavior determination section 10b determines a behavior same as the detected behavior of the user (for example, such that, if the user walks, then the virtual character walks together, if the user runs, then the virtual character runs together, if the user sits, then the virtual character sits, if the user lies, then the virtual character lies, or the like).

[0088] At step S103, the virtual character behavior determination section 10b selects a sound source (sound content) corresponding to the determined behavior of the virtual character from a scenario.

[0089] At step S104, the relative position calculation section 10d calculates a relative position (three-dimensional position) of the selected sound source on the basis of the detected user state and user behavior, physique data of the stature or the like of the user registered in advance, determined behavior of the virtual character, setting information of the stature of the virtual character registered in advance and so forth.

[0090] At step S105, the scenario updating section 10c updates the scenario in response to the determined behavior of the virtual character and the selected sound content (namely, advances to the next event).

[0091] At step S106, the sound image localization section 10e performs a sound image localization process for the corresponding sound content such that the sound image is localized at the calculated relative position for the sound image.

[0092] At step S107, the sound output controlling section 10f controls such that the sound signal after the sound image localization process is reproduced from the speaker 15.

[0093] At step S108, a history of the reproduced (namely outputted in sound) sound content and feedback of the user to the sound content are stored into the storage section 17 by the reproduction history-feedback storage controlling section 10g.

[0094] Steps S103 to S124 described above are repeated until the event of the scenario comes to an end at step S109. For example, if one game comes to an end, then the scenario ends.

[0095] As described above, the information processing system according to the embodiment of the present disclosure makes it possible to appropriately calculate a relative three-dimensional position for localizing sound, which allows a virtual character (an example of a virtual object) to be perceived, on the basis of the state of the user and information of the virtual character and present the presence of the virtual character in the real space with a higher degree of reality.

[0096] Meanwhile, the information processing apparatus 1 according to the present embodiment may be implemented by an information processing system including a headphone (or an earphone, eyewear or the like) in which the speaker 15 is provided and a mobile terminal (smartphone or the like) having functions principally of the control section 10. On this occasion, the mobile terminal transmits a sound signal subjected to a sound localization process to the headphone so as to be reproduced. Further, the speaker 15 is not limited to being incorporated in an apparatus mounted on the user but may be implemented, for example, by an environmental speaker installed around the user, and in this case, the environmental speaker can localize a sound image at an arbitrary position around the user.

[0097] Now, sound to be emitted by execution of the processes described above is described. First, an example of a three-dimensional position including xy coordinate positions and height is described with reference to FIG. 6.

[0098] FIG. 6 is a view illustrating an example of sound image localization according to a behavior and the stature of the virtual character 20 and a state of a user according to the present embodiment. Here, a scenario is assumed that, for example, in a case where the user A returns to a station in the neighborhood of its home from the school or the work and is walking toward the home, a virtual character 20 finds and speaks to the user A and returns together.

[0099] The virtual character behavior determination section 10b starts an event (provision of a sound content) using it as a trigger that it is detected by the state-behavior detection section 10a that the user A arrives at the nearest station, exits the ticket gate and begins to walk.

[0100] First, such an event is performed that the virtual character 20 finds and speaks to the walking user A as depicted in FIG. 6. In particular, the relative position calculation section 10d calculates a location direction of an angle F1 with respect the ear of the user a few meters behind the user A as the xy coordinate positions of the sound source of a sound content V1 (“oh!”) of a voice to be reproduced first as depicted at an upper part of FIG. 6.

[0101] Then, the relative position calculation section 10d calculates the xy coordinate positions of the sound source of a sound content V2 of footsteps chasing the user A such that the xy coordinate positions gradually approach the user A (localization direction of an angle F2 with respect to the ear of the user). Then, the relative position calculation section 10d calculates the localization direction of an angle F3 with respect the ear of the user at a position just behind the user A as the xy coordinate positions of the sound source of a sound content V3 of voice (“welcome back”).

[0102] By calculating the sound image localization position (localization direction and distance with respect to the user) in accordance with the behavior and the lines of the virtual character 20 such that there is no sense of incongruity in a case where it is assumed that the virtual character 20 actually exists and is behaving in the real space in this manner, it is possible to allow a movement of the virtual character 20 to be felt with a higher degree of reality.

[0103] Further, the relative position calculation section 10d calculates the height of the sound image localization position in response to a part of the virtual character 20 corresponding to the type of the sound content. For example, in a case where the height of the ear of the user is higher than the head of the virtual character 20, the heights of the sound contents V1 and V3 of the voice of the virtual character 20 are lower than the height of the ear of the user as depicted at a lower part of FIG. 6 (lower by an angle G1 with respect to the ear of the user).

[0104] Further, since the sound source of the sound content V2 of the footsteps of the virtual character 20 is the feet of the virtual character 20, the height of the sound source is lower than the sound source of the voice (lower by an angle G2 with respect to the ear of the user). In a case where it is supposed that the virtual character 20 actually exists in the real space, by calculating the height of the sound image localization position taking the state (standing, sitting) and the magnitude (stature) of the virtual character 20 into consideration in this manner, it is possible to allow the presence of the virtual character 20 to be felt in a higher degree of reality.

[0105] Where the sound to be provided to the user moves in this manner, sound with which an action that allows the user to feel as if the virtual character 20 exists there is performed by the user and reaches the user is provided to the user. Here, such movement of sound, in other words, animation by sound, is suitably referred to as sound image animation.

[0106] The sound image animation is a representation for allowing the user to recognize the existence of the virtual character 20 through sound by providing a movement (animation) to the position of the sound image as described hereinabove, and as implementation means of this, a technique called key frame animation or the like can be applied.

[0107] By the sound image animation, the series of animation that the virtual character 20 gradually approaches the user from behind (angle F1) of the user and the lines “welcome back” are emitted at the angle F3 as depicted in FIG. 6 is provided to the user.

[0108] Although the sound image animation is described below, in the following description, although animation relating to the xy coordinates is described while description of animation relating to the heightwise direction is omitted, similar processing in regard to the xy coordinates can be applied also to that in regard to the heightwise direction.

[0109] The sound image animation is described further with reference to FIG. 7. In the description given with reference to figures beginning with FIG. 7, it is assumed that the front of the user A is the angle zero degrees and the left side of the user A is the negative side while the right side of the user A is the positive side.

[0110] At time t=0, the virtual character 20 is positioned at -45 degrees and the distance of 1 m and is emitting predetermined voice (lines or the like). After time t=0 to time t=3, the virtual character 20 moves to the front of the user A in such a manner as to draw an arc. At time t3, the virtual character 20 is positioned at zero degrees and the distance of 1 m and is emitting predetermined voice (lines or the like).

[0111] After time t=3 to time t=5, the virtual character 20 moves to the right side of the user A. At time t=5, the virtual character 20 is positioned at 45 degrees and the distance of 1.5 m and is emitting predetermined voice (lines or the like).

[0112] In a case where such sound image animation is provided to the user A, information relating to the position of the virtual character 20 at each time t is described as a key frame. The following description is given assuming that the key frame here is information relating to the position of the virtual character 20 (sound image position information).

[0113] In particular, as depicted in FIG. 7, information of the key frame [0]={t=0, -45 degrees, distance 1 m}, key frame [1]={t=3, zero degrees, distance 1 m} and key frame [2]={t=5, +45 degrees, distance 1.5 m} is set and subjected to an interpolation process such that sound image animation exemplified in FIG. 7 is executed.

[0114] The sound image animation depicted in FIG. 7 is animation when the lines A are emitted, and emission of the lines B after then is described with reference to FIG. 8.

[0115] A view depicted on the left side in FIG. 8 is similar to the view depicted in FIG. 7 and depicts an example of sound image animation when the lines A are emitted. After the lines A are emitted, the lines B are emitted successively or after lapse of a predetermined period of time. At the starting point of time (time t=0) of the lines B, information of the key frame [0]={t=0, +45 degrees, distance 1.5 m} is processed, and as a result, the virtual character 20 exists at 45 degrees on the right of the user and at the distance 1.5 m and the utterance of the lines B is started.

[0116] At the ending point of time (time t=10) of the lines B, information of the key frame [1]={t=10, +135 degrees, distance 3 m} is processed, and as a result, the virtual character 20 exists at 135 degrees right of and 3 m away from the user and the utterance of the lines B is ended. Since such sound image animation is executed, the virtual character 20 who is uttering the lines B while moving from the right front to the right rear of the user A can be expressed.

[0117] Incidentally, if the user A does not move, especially, in this case, if the head does not move, then the sound image moves in accordance with an intention of a creator who created the sound image animation and utterance of the lines B is started from the ending position of the lines A, and such a sense that the virtual character 20 is moving can be provided to the user A. Here, referring back to FIGS. 1 and 2, the information processing apparatus 1 is configured such that, where the information processing apparatus 1 to which the present technique is applied is mounted on the head (neck) of the user A and moves together with the user A, it can implement such a situation that the user A enjoys entertainment on the information processing apparatus 1 while exploring a wide area together for a longer period of time.

[0118] Therefore, when the information processing apparatus 1 is mounted, it is supposed that the head of the user moves, and where the head of the user moves, there is the possibility that the sound image animation described with reference to FIG. 7 or 8 may not be able to be provided as intended by the creator. This is described with reference to FIGS. 9 and 10.

[0119] It is assumed that, when the head of the user A of the sound image moves, at the ending time of the lines A, in the leftward direction by an angle F11 from a state in which the sound image is positioned at the angle F10 (+45 degrees) with respect to the user A as depicted in a left upper view in FIG. 9, the lines B are started. In this case, the sound image is localized to the direction of +45 degrees with respect to zero degrees given by the front of the user A on the basis of the information of the key frame [0], and the lines B are started.

[0120] This is described with reference to a lower view in FIG. 9 in regard to the position of the virtual character 20 in the real space (space in which the user actually is) assuming that the virtual character 20 is in the real space. It is to be noted that, in the following description, the position of the virtual character 20 with respect to the user is referred to as relative position, and the position of the virtual character 20 in the real space is referred to as absolute position.

[0121] The following description is given assuming that a coordinate system for a relative position (hereinafter referred to suitably as relative coordinate system) is a coordinate system where the center of the head of the user A is x=y=0 (hereinafter referred to as center point) and the front direction of the user A (direction in which the nose exists) is the y axis and is a coordinate system fixed to the head of the user A. Therefore, in the relative coordinate system, even if the user A moves its head, the front direction of the user A is the coordinate system having the angle of zero degrees.

[0122] The coordinate system for an absolute position (hereinafter referred to suitably as absolute coordinate system) is a coordinate system where the center of the head of the user A at a certain point of time is x=y=0 (hereinafter referred to as center point) and the front direction of the user A (direction in which the nose exists) is the y axis. However, description is given assuming that the absolute coordinate system is a coordinate system that is not fixed to the head of the user A but is fixed to the real space. Therefore, in the absolute coordinate system, the absolute coordinate system set at a certain point of time is a coordinate system in which, even if the user A moves its head, the axial directions do not change in accordance with the movement of the head but are fixed in the real space.

[0123] Referring to a left lower view in FIG. 9, the absolute position of the virtual character 20 at the time of the end of the lines A is the direction of an angle F10 when the head of the user A is the center point. Referring to a right lower view in FIG. 9, the absolute position of the virtual character 20 at the time of the start of the lines B is the direction of an angle F12 from the center point (x=Y=0) on the absolute coordinate system same as the coordinate system at the time of the end of the lines A.

[0124] For example, in a case where the angle F10 is +45 degrees and the angle F11 over which the head of the user moves is 70 degrees, since the position (angle F12) of the virtual character 20 on the absolute coordinate system is the difference of 35 degrees, which is on the negative side, as viewed in a right lower view in FIG. 9, the position is -35 degrees.

[0125] In this case, although, at the time of the end of the lines A, the virtual character 20 was at the place of the angle F10 (=45 degrees) on the absolute coordinate system, at the time of the start of the lines B, the virtual character 20 is at the angle F12 (=-35 degrees) on the absolute coordinate system. Therefore, the user A recognizes that the virtual character 20 has momentarily moved from the angle F10 (=45 degrees) to the angle F12 (=-35 degrees).

[0126] Furthermore, in a case where sound image animation is set at the time of utterance of the lines B, for example, in a case where sound image animation for such lines B as described hereinabove with reference to FIG. 8 is set, sound image animation that the virtual character 20 moves from the angle F10 by the relative position (angle F12 by the absolute position) to the relative position defined by the key frame [1] as viewed in a left upper view in FIG. 9.

[0127] In this manner, in a case where the creator of the sound image animation intends that the lines B are to be emitted from the direction of right +45 degrees of the user A irrespective of the direction of the face of the user A, such processes as described above are executed. In other words, the creator of sound image animation can create a program such that a sound image is positioned at an intended position by a relative position.

[0128] On the other hand, in a case where it is desired to provide the user A with such a recognition that the lines B are emitted while the virtual character 20 does not move from the ending spot of the lines A, in other words, in a case where it is desired to provide the user A with such a recognition that the lines B are emitted in a state in which the virtual character 20 is fixed (not moved) in the real space, a process following up a movement of the head of the user A is performed as described with reference to FIG. 10.

[0129] It is assumed that, as depicted in a left upper view in FIG. 10, the lines B are started when the head of the user A moves in the leftward direction by the angle F11 from a state in which the sound image is positioned at the angle F10 (+45 degrees) with respect to the user A at the time of the end of the lines A. During a period after the time of the end of the lines A to the time of the start of the lines B (while the voice changes over from the lines A to the lines B), the movement of the head of the user A is detected and the amount and the direction of the movement are detected. It is to be noted that, also during utterance of the lines A and the lines B, the amount of movement of the user A is detected.

[0130] Upon starting of the utterance of the lines B, the position of the sound image of the virtual character 20 is set on the basis of the amount of movement of the user A and information of the key frame [0] at the point of time. Referring to a right upper view in FIG. 10, in a case where the user A changes its orientation by the angle F11, such setting of the sound image that the virtual character 20 is at the position of an angle F13 by a relative position is performed. The angle F13 has a value of the sum of the angle with which the angle F11 that is the amount of movement of the user A is cancelled and the angle defined by the key frame [0].

[0131] Referring to a right lower view in FIG. 10, the virtual character 20 is at the position of the angle F10 in the real space (real coordinate system). This angle F10 is a position same as the position at the point of time of the end of the lines A depicted in the left lower view in FIG. 10 as a result of addition of the value for cancelling the amount of movement of the user A. In this case, the relationship of the angle F13-angle F11=angle F10 is satisfied.

[0132] By detecting the amount of movement of the user A and performing a process for cancelling the amount of movement in this manner, such a sense that the virtual character 20 is fixed in the real space can be provided to the user A. It is to be noted that, although details are hereinafter described, in a case where it is desired such that the end position of the lines A becomes the start position of the lines B in this manner, the key frame [0] at time t=0 of the lines B is defined as key frame [0] {t=0, (end position of lines A)} as depicted in FIG. 10.

[0133] In a case where a key frame is not set after time t=0 at the time of the start of the lines B, the virtual character 20 continues the utterance of the lines B at the position at the point of the start of the lines B.

[0134] In a case where a key frame is set after time t=0 at the time of the start of the lines B, in other words, in a case where sound image animation is set at the time of utterance of the lines B, for example, in a case where sound image animation same as such sound image animation for the lines B as described hereinabove above with reference to FIG. 8 is set, sound image animation is executed in which the virtual character 20 moves from the angle F13 at the relative position (angle F10 at the absolute position) to the relative position defined in the key frame [1] as depicted in a left upper view in FIG. 10.

[0135] In a case where the creator of the sound image animation intends that the position of the virtual character 20 in the real space is fixed and the lines B are emitted irrespective of the orientation of the face of the user A, such processes as described above are performed. In other words, the creator of the sound image animation can create a program such that the sound image is positioned at a position intended by an absolute position.

[0136]

[0137] Here, content is described. FIG. 11 is a view depicting a configuration of content.

[0138] Content includes a plurality of scenes. Although FIG. 11 depicts such that the content includes only one scene for the convenience of description, a plurality of scenes are prepared for each scene.

[0139] When a predetermined ignition condition is satisfied, a scene is started. The scene is a series of processing flows that occupy the time of the user. One scene includes one or more nodes. The scene depicted in FIG. 11 indicates an example in which it includes four nodes N1 to N4. A node is a minimum execution processing unit.

[0140] If a predetermined ignition condition is satisfied, then processing by the node N1 is started. For example, the node N1 is a node that perform a process for emitting lines A. After the node N1 is executed, transition conditions are set, and depending upon the satisfied condition, the processing advances to the node N2 or the node N3. For example, in a case where the transition condition is a transition condition that the user turns to the right and this condition is satisfied, the processing transits to the node N2, but in a case where the transition condition is a transition condition that the user turns to the left and this condition is satisfied, the processing transits to the node N3.

[0141] For example, the node N2 is a node for performing a process for emitting the lines B, and the node N3 is a node for performing a process for emitting the lines C. In this case, after the lines A are emitted by the node N1, an instruction waiting state from the user (waiting condition until the user satisfies a transition condition) is entered, and in a case where an instruction from the user is made available, a process by the node N2 or the node N3 is executed on the basis of the instruction. When a node changes over in this manner, changeover of lines (voice) occurs.

[0142] After the process by the node N2 or the node N3 ends, the processing transits to the node N4 and a process by the node N4 is executed. In this manner, a scene is executed while the node changes over successively.

[0143] A node has an element as an execution factor in the inside thereof, and for the element, for example, “voice is reproduced,” “a flag is set” and “a program is controlled (ended or the like)” are prepared.

[0144] Here, description is given taking an element that generates voice as an example.

[0145] FIG. 12 is a view illustrating a setting method of a parameter or the like configuring a node. In the node (Node), “id,” “type,” “element” and “branch” are set as the parameters.

[0146] “id” is an identifier allocated for identifying the node and is information to which “string” is set as a data type. In a case where the data type is “string,” this indicates that the type of the parameter is a letter type.

……
……
……

本文链接：https://patent.nweon.com/12471

Sony Patent | Information Processing Apparatus, Information Processing Method And Program

您可能还喜欢...

分类

最新AR/VR行业分享

最新AR/VR论文

最新AR/VR行业招聘

Sony Patent | Information Processing Apparatus, Information Processing Method And Program

您可能还喜欢...

Sony Patent | Image Processing Apparatus, Projection Control Method, And Program

Sony Patent | Methods and systems for balancing audio directed to each ear of user