Sony Patent | Information Processing Apparatus, Information Processing Method, And Program
Patent: Information Processing Apparatus, Information Processing Method, And Program
Publication Number: 20200279110
Publication Date: 20200903
Applicants: Sony
Abstract
[Problem] To provide an information processing apparatus, an information processing method, and a program capable of preventing a target object outside a field of view from being overlooked. [Solution] An information processing apparatus includes: a control unit that extracts a target object and a basic point object from an image corresponding to a user’s field of view, stores basic point object information on the basic point object in a storage unit, determines whether the target object is included in an image corresponding to a current field of view when the user is guided to the target object, and performs a process of presenting a position of the target object using the stored basic point object information when the target object is not included in an image corresponding to the current field of view.
FIELD
[0001] The present disclosure relates to an information processing apparatus, an information processing method, and a program.
BACKGROUND
[0002] Conventionally, a technology for appropriately presenting information on user guidance or leading to a user using an information processing apparatus has been proposed.
[0003] For example, Patent Literature 1 below discloses an advertisement presentation server that detects a line-of-sight direction of a customer in a store, judges information on a product being gazed from the line-of-sight direction, determines attributes of the customer in the store, acquires the information on the product based on both the judged results, and reads the corresponding content and play the content using a signage apparatus.
[0004] In addition, Patent Literature 2 below discloses a head-mounted display system that filters augmented reality (AR) objects superimposed and displayed in real space according to priorities such as mode, preference, and a proximity level.
CITATION LIST
Patent Literature
[0005] Patent Literature 1: JP 2016-38877** A**
[0006] Patent Literature 2: JP 2016-507833** A**
SUMMARY
Technical Problem
[0007] However, in the conventional technology, it is effective when an object that the user is likely to be interested in enters the user’s line-of-sight direction, and it is difficult for a system side to notify the user of an object what does not enter the user’s line-of-sight direction. As a result, the user may overlook the object which the user is likely to interest in.
[0008] Therefore, the present disclosure proposes an information processing apparatus, an information processing method, and a program capable of preventing a target object outside a field of view from being overlooked.
Solution to Problem
[0009] According to the present disclosure, an information processing apparatus is provided that includes: a control unit that extracts a target object and a basic point object from an image corresponding to a user’s field of view, stores basic point object information on the basic point object in a storage unit, determines whether the target object is included in an image corresponding to a current field of view when the user is guided to the target object, and performs a process of presenting a position of the target object using the stored basic point object information when the target object is not included in an image corresponding to the current field of view.
[0010] According to the present disclosure, an information processing method is provided that includes: extracting, by a processor, a target object and a basic point object from an image corresponding to a user’s field of view; storing, by the processor, basic point object information on the basic point object in a storage unit; judging, by the processor, whether the target object is included in an image corresponding to a current field of view when guiding the user to the target object; and performing, by the processor, a process of presenting the position of the target object using the stored basic point object information when the target object is not included in the image corresponding to the current field of view.
[0011] According to the present disclosure, a program is provided that allows a computer to function as a control that extracts a target object and a basic point object from an image corresponding to a user’s field of view, stores basic point object information on the basic point object in a storage unit, judges whether the target object is included in an image corresponding to a current field of view when guiding the user to the target object, and performs a process of presenting the position of the target object using the stored basic point object information when the target object is not included in the image corresponding to the current field of view.
Advantageous Effects of Invention
[0012] As described above, according to the present disclosure, it is possible to prevent the target object outside the field of view from being overlooked.
[0013] It is noted that the above effects are not necessarily limited, and, along with or instead of the above effects, any of the effects described in the present specification or other effects which can be understood from the present specification may be exhibited.
BRIEF DESCRIPTION OF DRAWINGS
[0014] FIG. 1 is a diagram for describing an outline of an information processing terminal used in an information processing system according to an embodiment of the present disclosure.
[0015] FIG. 2 is a block diagram illustrating a configuration example of an information processing system according to the present embodiment.
[0016] FIG. 3 is a block diagram illustrating a configuration example of an information processing terminal according to the present embodiment.
[0017] FIG. 4 is a block diagram illustrating a configuration example of an information processing server according to the present embodiment.
[0018] FIG. 5 is a block diagram illustrating a configuration example of a response information generation unit of an information processing server according to a first embodiment.
[0019] FIG. 6 is a diagram for explaining an example of an information processing terminal according to a first embodiment.
[0020] FIG. 7 is a flowchart illustrating an operation process of collecting request information according to the first embodiment.
[0021] FIG. 8 is a flowchart illustrating an operation process of notifying request information according to the first embodiment.
[0022] FIG. 9 is a diagram illustrating an example of displaying as an AR image of a purchase request item according to the first embodiment.
[0023] FIG. 10 is a diagram illustrating an example of presenting the request information according to the first embodiment on a smartphone screen.
[0024] FIG. 11 is a block diagram illustrating a configuration example of a response information generation unit of an information processing server according to a second embodiment.
[0025] FIG. 12 is a flowchart illustrating a process of registering a target object and a basic point object according to the second embodiment.
[0026] FIG. 13 is a diagram for explaining a situation of a user according to the second embodiment.
[0027] FIG. 14 is a diagram illustrating an example of extracting the target object and the basic point object from an image corresponding to a field of view of a user according to the second embodiment.
[0028] FIG. 15 is a flowchart illustrating a process of guidance to the target object according to the second embodiment.
[0029] FIG. 16 is a diagram illustrating an example of marking AR on the target object according to the second embodiment.
[0030] FIG. 17 is a block diagram illustrating a hardware configuration example of the information processing terminal and the information processing server according to the embodiment of the present disclosure.
DESCRIPTION OF EMBODIMENTS
[0031] Hereinafter, preferred embodiments of the present disclosure will be described in detail with reference to the accompanying drawings. In addition, in this specification and drawings, components which have the substantially same function configuration are denoted by the same reference numerals, and therefore duplicate description thereof will be omitted.
[0032] In addition, the description will be made in the following order.
[0033] 1.* Overview*
[0034] 2.* Configuration*
[0035] 2-1.* System Configuration Example*
[0036] 2-2. Configuration of Information Processing Terminal 1
[0037] 2-3. Configuration of Information Processing Server 2
[0038] 3. First Embodiment (Product Purchase Request)
[0039] 3-1.* Configuration*
[0040] 3-2.* Operation Process*
[0041] (3-2-1. Process of Collecting Request Information)
[0042] (3-2-2. Process of Notifying Request Information)
[0043] 3-3.* Effect*
[0044] 4. Second Embodiment (Guidance to Target Object)
[0045] 4-1.* Configuration*
[0046] 4-2.* Operation Process*
[0047] (4-2-1. Process of Registering Target Object and Basic Point Object)
[0048] (4-2-2. Process of Guidance to Target Object)
[0049] 4-3.* Effect*
[0050] 5.* Hardware Configuration*
[0051] 6.* Summary*
1.* Overview*
[0052] FIG. 1 is a diagram illustrating an overview of an information processing terminal 1 used in an information processing system according to an embodiment of the present disclosure. As illustrated in FIG. 1, the information processing terminal 1 used in the information processing system according to the present embodiment is realized by, for example, a glasses-type head mounted display (HMD) attached to a head of a user U. A display unit 13 corresponding to a spectacle lens portion located in front of eyes of the user U when worn may be transmissive or non-transmissive. The information processing terminal 1 can present a virtual object in a field of view of the user U by displaying the virtual object on the display unit 13. Further, the HMD that is an example of the information processing terminal 1 is not limited to those that present an image to both eyes, and may be those that present an image only to one eye. For example, the HMD may be a one-eye type provided with the display unit 13 that presents an image to one eye.
[0053] In addition, the information processing terminal 1 is provided with an outward camera 110 that captures a line-of-sight direction of the user U, that is, a field of view of a user U when worn. Further, although not illustrated in FIG. 1, the information processing terminal 1 is provided with various sensors such as an inward camera and a microphone (hereinafter referred to as “mike”) that captures the eyes of the user U when worn. A plurality of outward cameras 110 and inward cameras may be provided.
[0054] Further, a shape of the information processing terminal 1 is not limited to the example illustrated in FIG. 1. For example, the information processing terminal 1 may be a headband-type (type that is worn with a band that goes around the entire circumference of the head. In addition, there may also be a type provided with a band passing through a crown as well as a temporal region) HMD or a helmet-type (a visor portion of a helmet corresponds to a display) HMD. In addition, the information processing terminal 1 may be realized by wearable devices such as a wristband type (for example, including a case with or without a smart watch display), a headphone type (without a display), or a neckphone type (including a case with or without a neck type display).
[0055] Here, for example, when the display unit 13 is a transmissive type, the information processing terminal 1 can perform display control to dispose a virtual object in a real space based on information (an image corresponding to a field of view of a user) on the real space (for example, a field of view of a user) obtained by photographing with the outward camera 110.
[0056] However, it is possible for a user to be aware of an object that is in the field of view of the user U by the display control to dispose the virtual object in the real space, but it is difficult for the user to be aware of an object that is outside the field of view of the user U.
[0057] Therefore, in the information processing system according to the present embodiment, it is possible to prevent a target object outside the field of view from being overlooked by guidance to the target object using a basic point object that can easily attract user’s visual attention.
[0058] The target object outside the field of view is assumed to be a real object that the user is likely to be interested in, a predetermined real object that should be notified to the user, or the like.
[0059] Further, the information processing terminal 1 according to the present embodiment can notify the user of a product purchase request from another user, for example. As described above, for example, when the information processing terminal 1 is realized by a glasses-type HMD and a user wears the glasses-type HMD every day, convenience can be further enhanced by allowing family, or the like at home or other places to perform a purchase request at the appropriate time while the user is out.
[0060] Hereinafter, in the present specification, first, a basic configuration of an information processing system according to the present embodiment will be described, and then each function of the information processing system according to the present embodiment will be described in detail with reference to examples.
2.* Configuration*
2-1.* System Configuration Example*
[0061] Next, a configuration example of the information processing system according to the present embodiment will be described. FIG. 2 is a block diagram illustrating a configuration example of the information processing system according to the present embodiment. Referring to FIG. 2, the information processing system according to the present embodiment includes an information processing terminal 1 and an information processing server 2. The information processing terminal 1 and the information processing server 2 are connected to each other via a network 3 so that the information processing terminal 1 and the information processing server 2 can communicate with each other.
[0062] (Information Processing Terminal 1)
[0063] The information processing terminal 1 according to the present embodiment is an information processing apparatus having a function of guiding a user to a target object based on control by the information processing server 2. Further, the information processing terminal 1 according to the present embodiment may have a function of collecting various information on user behavior.
[0064] (Information Processing Server 2)
[0065] The information processing server 2 according to the present embodiment is an information processing apparatus having a function of controlling guidance to a target object by the information processing terminal 1. Specifically, for example, the information processing server 2 has an agent function of interacting with a user, and can guide a target object as one of information presentations by the agent. The agent function is a function of assisting the user through a natural language, and is sometimes called a digital assistant function, an artificial intelligence (AI) assistant, an intelligent personal assistant, or the like.
[0066] (Network 3)
[0067] The network 3 has a function of connecting the information processing terminal 1 and the information processing server 2. The network 3 may include a public line network such as the Internet, a telephone line network, a satellite communication network, various local area networks (LANs) including Ethernet (registered trademark), a wide area network (WAN), and the like. The network 3 may include dedicated line networks such as an internet protocol-virtual private network (IP-VPN). The network 3 may include wireless communication networks such as Wi-Fi (registered trademark) or Bluetooth (registered trademark).
[0068] The system configuration example of the information processing system according to the present embodiment has been described above. Note that the above-described configuration described with reference to FIG. 2 is merely an example, and the configuration of the information processing system according to the present embodiment is not limited to the example. For example, the functions of the information processing terminal 1 and the information processing server 2 according to the present embodiment may be realized by a single device. The configuration of the information processing system according to the present embodiment can be flexibly modified according to specifications and operations.
2-2. Configuration of Information Processing Terminal 1
[0069] FIG. 3 is a block diagram illustrating an example of the configuration of the information processing terminal 1 according to the present embodiment. FIG. 3 is a block diagram illustrating a configuration example of the information processing terminal 1 according to the present embodiment. As illustrated in FIG. 3, the information processing terminal 1 includes a sensor unit 11, a control unit 12, a display unit 13, a speaker 14, a communication unit 15, an operation input unit 16, and a storage unit 17.
[0070] (Sensor Unit 11)
[0071] The sensor unit 11 has a function of acquiring various types of information on the user or the surrounding environment. For example, the sensor unit 11 includes an outward camera 110, an inward camera 111, a mike 112, a gyro sensor 113, an acceleration sensor 114, an orientation sensor 115, a location positioning unit 116, and a biometric sensor 117. In addition, a specific example of the sensor unit 11 mentioned herein is one example, and the present embodiment is not limited thereto. In addition, each sensor may be plural.
[0072] Further, the specific examples of the sensor unit 11 illustrated in FIG. 3 is given as preferable examples, but it is not essential to have all of these examples. For example, the configuration may include a part of the specific examples of the sensor unit 11 illustrated in FIG. 3 such as the outward camera 110, the acceleration sensor 114, and the location positioning unit 116, or may include another sensor.
[0073] The outward camera 110 and the inward camera 111 each include a lens system that includes an imaging lens, an aperture, a zoom lens, a focus lens, and the like, a drive system that causes the lens system to perform a focus operation and a zoom operation, a solid-state image device array that photoelectrically converts imaging light obtained by the lens system to generate an imaging signal, and the like. The solid-state image device array may be realized by, for example, a charge coupled device (CCD) sensor array or a complementary metal oxide semiconductor (CMOS) sensor array.
[0074] In the present embodiment, it is preferable that the outward camera 110 is set with an angle of view and an orientation so as to capture an area corresponding to a field of view of a user in a real space.
[0075] The mike 112 collects a user’s voice and surrounding environmental sounds and outputs the user’s voice and surrounding environmental sounds to the control unit 12 as voice data.
[0076] The gyro sensor 113 is realized by, for example, a three-axis gyro sensor, and detects an angular velocity (rotational speed).
[0077] The acceleration sensor 114 is realized by, for example, a three-axis acceleration sensor (also referred to as a G sensor), and detects acceleration during movement.
[0078] The orientation sensor 115 is realized by, for example, a three-axis geomagnetic sensor (compass), and detects an absolute direction (azimuth).
[0079] The location positioning unit 116 has a function of detecting the current position of the information processing terminal 1 based on a signal acquired from the outside. Specifically, for example, the location positioning unit 116 is realized by a global positioning system (GPS) positioning unit, and receives a radio wave from a GPS satellite, detects a location where the information processing terminal 1 exists, and outputs the detected location information to the control unit 12. In addition to the GPS, the location positioning unit 116 may detect the location by transmission and reception to and from, for example, Wi-Fi (registered trademark), Bluetooth (registered trademark), mobile phone, PHS, smartphone and the like, or near field communication.
[0080] The biometric sensor 117 detects biometric information on the user. Specifically, for example, a heart rate, a body temperature, sweating, a blood pressure, sweating, a pulse, breathing, blinking, an eye movement, a gaze time, a pupil size, a blood pressure, a brain wave, a body movement, a body position, a skin temperature, a skin electrical resistance, microvibration (MV), myoelectric potential, or SPO2 (blood oxygen saturation)), and the like can be detected.
[0081] (Control Unit 12)
[0082] The control unit 12 functions as an arithmetic processing device and a control device, and controls the overall operation of the information processing terminal 1 according to various programs. The control unit 12 is realized by an electronic circuit such as a central processing unit (CPU) or a microprocessor, for example. The control unit 12 may include a read only memory (ROM) that stores programs to be used, calculation parameters, and the like, and a random access memory (RAM) that temporarily stores parameters varying as appropriate.
[0083] The control unit 12 according to the present embodiment controls, for example, starting and stopping of each component. Further, the control unit 12 can input a control signal generated by the information processing server 2 to the display unit 13 or the speaker 14.
[0084] Further, the control unit 12 according to the present embodiment may function as a recognition unit 120, a response information acquisition unit 121, and an output control unit 122, as illustrated in FIG. 3.
[0085] Recognition Unit 120
[0086] The recognition unit 120 has a function of recognizing (including detection) information on the user or information on the surrounding situation using various sensor information sensed by the sensor unit 11. For example, the recognition unit 120 can perform voice recognition based on the user’s utterance sensed by the sensor unit 11, and can recognize a request from the user and a user’s response. The recognition unit 120 can recognize the user’s behavior from the image and voice sensed by the sensor unit 11, position information, motion information, and the like. The recognition unit 120 outputs the recognition result to the response information acquisition unit 121.
[0087] Note that the level of recognition processing performed by the recognition unit 120 according to the present embodiment may be simple, and advanced recognition processing may be performed by an external device, for example, the information processing server 2. That is, by appropriately using the recognition unit 120 of the information processing terminal 1 and a recognition unit 201 of the information processing server 2, it is possible to reduce a burden due to the distribution of processing, improve real-time properties, and ensure security. Alternatively, the information processing terminal 1 may not include the recognition unit 120, and all recognition processes may be performed by an external device, for example, the information processing server 2. Alternatively, the recognition unit 120 according to the present embodiment may have a function equivalent to that of the recognition unit 201 of the information processing server 2 described later.
[0088] Response Information Acquisition Unit 121
[0089] Based on the recognition result by the recognition unit 120, the response information acquisition unit 121 acquires information to be presented to the user (herein referred to as response information) and outputs the information to the output control unit 122. The response information includes a wide variety of output information such as an answer to the user’s request, guidance information corresponding to the user’s behavior, notification of a predetermined target object, interaction with the user’s murmur, dialogue with the user according to the situation. The response information may be, for example, voice data, image data (still image, moving image, virtual object (also referred to as AR image)).
[0090] The response information may be acquired from the storage unit 17 or may be acquired from the information processing server 2 via the communication unit 15. For example, the response information acquisition unit 121 may transmit the recognition result by the recognition unit 120 from the communication unit 15 to the information processing server 2 and acquire response information generated based on the recognition result in the information processing server 2.
[0091] Further, the response information acquisition unit 121 is not limited to the case based on the recognition result by the recognition unit 120, and may acquire the response information based on various sensor information sensed by the sensor unit 11. For example, the response information acquisition unit 121 may transmit various sensor information sensed by the sensor unit 11 from the communication unit 15 to the information processing server 2, and acquire response information generated based on recognition processing based on the various sensor information performed in the information processing server 2.
[0092] Alternatively, the response information acquisition unit 121 may acquire response information based on the recognition result and various sensor information. For example, the response information acquisition unit 121 may transmit a recognition result and various sensor information from the communication unit 15 to the information processing server 2 and acquire the response information generated based on the recognition result and the various sensor information in the information processing server 2.
[0093] Output Control Unit 122
[0094] The output control unit 122 performs control to output various types of information from the display unit 13 or the speaker 14. The output control unit 122 according to the present embodiment controls, for example, to output the response information acquired by the response information acquisition unit 121 in either voice or display, or both voice and display. For example, the output control unit 122 controls the voice output from the speaker 14 when the response information is voice data, and executes display control related to the display unit 13 in the case of a virtual object so as to be within the field of view of the user.
[0095] (Display Unit 13)
[0096] The display unit 13 is realized by, for example, a lens unit (an example of a transmissive display unit) that performs display using, for example, a hologram optical technique, a liquid crystal display (LCD) device, an organic light emitting diode (OLED) device, and the like. In addition, the display unit 13 may be a transmissive type, a transflective type, or a non-transmissive type.
[0097] (Speaker 14)
[0098] The speaker 14 plays a voice signal according to the control of the control unit 12.
[0099] (Communication Unit 15)
[0100] The communication unit 15 is a communication module for transmitting and receiving data to and from other devices in a wired/wireless manner. The communication unit 15 performs wireless communication with external devices directly or via a network access point by, for example, a wired local area network (LAN), a wireless LAN, wireless fidelity (Wi-Fi (registered trademark)), infrared communication, Bluetooth (registered trademark), near field/non-contact communication, a mobile communication network (long term evolution (LTE)), 3rd (3G) generation mobile communication system), and the like.
[0101] (Operation Input Unit 16)
[0102] The operation input unit 16 is realized by an operation member having a physical structure such as a switch, a button, or a lever.
[0103] (Storage Unit 17)
[0104] The storage unit 17 is realized by a read only memory (ROM) that stores programs and calculation parameters used for the processing of the control unit 12 described above, and a random access memory (RAM) that temporarily stores parameters varying as appropriate. For example, the various sensor information, the recognition results, the response information, the user information, and the like may be stored in the storage unit 17 according to the present embodiment.
[0105] The configuration of the information processing terminal 1 according to the present embodiment has been specifically described above. Note that the above-described configuration described with reference to FIG. 3 is merely an example, and the functional configuration of the information processing terminal 1 according to the present embodiment is not limited to this example. For example, the information processing terminal 1 according to the present embodiment does not necessarily have all of the configurations illustrated in FIG. 3. The information processing terminal 1 can be configured not to include the mike 112, the biometric sensor 117, and the like. In addition, the information processing terminal 1 may be configured by a plurality of communication-connected devices (a wearable device separately worn by a user, a device attached to glasses, and the like). Further, for example, at least a part of the functions of the control unit 12 of the information processing terminal 1 may exist in another device connected via the communication unit 15. The functional configuration of the information processing terminal 1 according to the present embodiment can be flexibly modified according to specifications and operations.
2-3. Configuration of Information Processing Server 2
[0106] FIG. 4 is a block diagram illustrating an example of the configuration of the information processing server 2 according to the present embodiment. As illustrated in FIG. 4, the information processing server 2 (an example of an information processing apparatus) includes a control unit 20, a communication unit 21, and a storage unit 22.
[0107] (Control Unit 20)
[0108] The control unit 20 functions as an arithmetic processing device and a control device, and controls the overall operation of the information processing server 2 according to various programs. The control unit 20 is realized by an electronic circuit such as a central processing unit (CPU) or a microprocessor. In addition, the control unit 20 may include a read only memory (ROM) that stores programs to be used, calculation parameters, and the like, and a random access memory (RAM) that temporarily stores parameters varying as appropriate.
[0109] Further, the control unit 20 according to the present embodiment also functions as a recognition unit 201, a response information generation unit 202, a voice synthesis unit 203, and an output control unit 204, as illustrated in FIG. 4.
[0110] Recognition Unit 201
[0111] The recognition unit 201 has a function of recognizing (including detecting) information on a user or information on a surrounding situation based on various sensor information received from the information processing terminal 1.
[0112] For example, the recognition unit 201 can perform recognition of a user by comparing a user’s utterance or an image collected by the information processing terminal 1 with user’s voice characteristics or images stored in a user information DB 221 in advance as recognition of information on the user.
[0113] In addition, the recognition unit 201 can recognize the user’s behavior based on sound information, an image, and sensor information collected by the information processing terminal 1. For example, the recognition unit 201 can perform voice recognition based on the user’s utterance collected by the information processing terminal 1, and can recognize a user’s request, instruction, response, or the like. The recognition unit 201 can also recognize a user’s hobby, preference, schedule, or the like based on the user’s request, instruction, response, or the like. Further, for example, the recognition unit 201 can recognize a state of a user (running, walking, riding a train, eating, sleeping, and the like, where and what he/she are doing) based on the image and sensor information collected by the information processing terminal 1.
[0114] Further, for example, the recognition unit 201 may recognize a position and posture of a user’s head (including an orientation or inclination of a face with respect to a body), a user’s line-of-sight, a user’s gazing point, and the like as the recognition related to the user. The recognition unit 201 may detect the user’s gazing point based on the user’s line-of-sight. For example, when the user’s line-of-sight stays in a certain range for a predetermined time or longer, the recognition unit 201 may detect a point (three-dimensional position) ahead of the user’s line-of-sight as the gazing point. Note that a method for detecting a user’s gazing point by the recognition unit 201 is not limited to this example, and the detection may be performed by various known methods.
[0115] Further, the recognition unit 201 may recognize a three-dimensional shape in a field of view of a user as information on the surrounding situation. For example, when the information processing terminal 1 is provided with the plurality of outward cameras 110, the recognition unit 201 may obtain a depth image (distance image) from parallax information and recognize the three-dimensional shape in the field of view of the user. In addition, even when the information processing terminal 1 has only one outward camera 110, the recognition unit 201 recognizes the three-dimensional shape in the field of view of the user from the images acquired in time series.
[0116] Further, the recognition unit 201 may detect a real object (object) in the field of view of the user as the information on the surrounding situation. Specifically, the detection of the real object may be realized, for example, by detecting a boundary surface of the real object. In this specification, the “boundary surface” is used as an expression including, for example, a surface between a real object and another real object, or a surface between a space where no real object exists and the real object, and the like. In addition, the boundary surface may be a curved surface. The recognition unit 201 may detect the real object from the image acquired by the outward camera 110, or may detect the boundary surface of the real object based on the recognized three-dimensional shape in the field of view of the user. For example, when the three-dimensional shape in the field of view of the user is expressed as point cloud data, the recognition unit 201 can detect the boundary surface by clustering the point cloud data. Note that the method for detecting a boundary surface by the recognition unit 201 is not limited to this example, and the detection may be performed by various known methods.
[0117] Further, the recognition unit 201 may perform the object recognition of the detected real object. An algorithm for object recognition is not particularly limited, but for example, technologies such as general object recognition that recognizes an object by extracting features from an input image and classifying the features by a learned classifier or specific object recognition that extracts features from an input image and judged the extracted features by comparing with a database generated in advance may be used.
[0118] The various recognition processes performed by the recognition unit 201 have been described above, but at least a part of the recognition processes may be performed by the recognition unit 120 of the information processing terminal 1 or the external device. For example, the recognition unit 120 of the information processing terminal 1 may perform the recognition of the posture, line-of-sight, and gazing point of the user described above, and the recognition of the three-dimensional shape in the field of view of the user.
[0119] In addition, various recognition results recognized by the recognition unit 201 may be stored in the storage unit 22.
[0120] Response Information Generation Unit 202
[0121] The response information generation unit 202 generates information to be presented to a user in real time based on the information on the user recognized by the recognition unit 201 or the situation around the user. As described above, the “response information” includes a wide variety of output information such as an answer to a user’s request, guidance information corresponding to a user’s behavior, notification of a predetermined target object, interaction with a user’s murmur, dialogue with the user according to the situation.
[0122] In addition, when the response information is generated, the response information generation unit 202 may use user information (user profile, behavior history, hobby preferences, schedule, and the like) accumulated in the storage unit 22, response generation information (a response fixed phrase, an answer sentence pattern, and the like corresponding to predetermined keywords), and content (news, weather forecast, moving image, music, game, and the like), and the like, or may use information acquired from an external device (such as another server) communication-connected via the communication unit 21. Note that the generated contents of the specific response information according to the present embodiment will be described in each example described later.
[0123] Further, the response information generated by the response information generation unit 202 can be presented to the user by visual expression or auditory expression in the information processing terminal 1. Specifically, the visual expression is an information form that is assumed to be text data, image data (still image, moving image), AR object, or the like and is output using the display unit 13 of the information processing terminal 1. In addition, the auditory expression is voice data output using the speaker 14 of the information processing terminal 1, and an artificial voice is synthesized by the voice synthesis unit 203 described below.
[0124] Voice Synthesis Unit 203
[0125] The voice synthesis unit 203 has a function of synthesizing artificial voice output from the information processing terminal 1. Specifically, the voice synthesis unit 203 synthesizes an artificial voice corresponding to the response information generated by the response information generation unit 202.
[0126] Output Control Unit 204
[0127] The output control unit 204 transmits various types of response information such as the synthesized artificial voice or the generated visual information to the information processing terminal 1 and controls the information processing terminal 1 to output the response information.
[0128] The control unit 20 according to the present embodiment has been described above. Note that the function of the control unit 20 according to the present embodiment is not limited to the example illustrated in FIG. 4, and the control of various devices (switch ON/OFF, operation control, and the like) or the use of Internet services (Internet shopping, accommodation, reservation of seats, and the like) can also be performed, for example, according to the recognized information on the user or the situation around the user.
[0129] (Communication Unit 21)
[0130] The communication unit 21 is connected to the network 3 in a wired or wireless manner, and transmits and receives data to and from the external device via the network 3. The communication unit 21 is communication-connected to the network 3 through, for example, a wired/wireless local area network (LAN) or wireless fidelity (Wi-Fi (registered trademark)). Specifically, the communication unit 21 according to the present embodiment receives sound information, image information, and sensor information from the information processing terminal 1. In addition, the communication unit 21 transmits the response information generated by the response information generation unit 202 or the artificial voice (voice data of response information) synthesized by the voice synthesis unit 203 to the information processing terminal 1 according to the control of the output control unit 204.
[0131] (Storage Unit 22)
[0132] The storage unit 22 is realized by a ROM that stores programs, calculation parameters, or the like used for the processing of the control unit 20, and a RAM that temporarily stores parameters varying as appropriate. For example, the storage unit 22 according to the present embodiment stores a user information database (DB) 221, a response generation information DB 222, and a content DB 223.
[0133] The user information DB 221 stores a user profile, a behavior history, a hobby preference, a schedule, and the like. These may be registered in advance, or may be automatically recognized and accumulated by the recognition unit 201 from the user’s behavior or dialogue. In addition, the response generation information DB 222 stores an algorithm or the like used when generating response information. For example, the response fixed phrase and the response sentence pattern corresponding to the predetermined keywords are stored.
[0134] In addition, the content DB 223 stores content such as news, weather forecast, moving image, music, game, and the like. Such content may be accumulated by periodically acquiring latest information from the outside by the communication unit 21.
[0135] The functional configuration example of the information processing server 2 according to the present embodiment has been described above. Note that the above-described functional configuration described with reference to FIG. 4 is merely an example, and the functional configuration of the information processing server 2 according to the present embodiment is not limited to this example. For example, the information processing server 2 does not necessarily have all of the configurations illustrated in FIG. 4. The recognition unit 201, the response information generation unit 202, the voice synthesis unit 203, the output control unit 204, and the storage unit 22 can be provided in another device different from the information processing server 2. The functional configuration of the information processing server 2 according to the present embodiment can be flexibly modified according to specifications or operations.
[0136] For example, at least a part of the configuration of the recognition unit 201, the response information generation unit 202, the voice synthesis unit 203, the output control unit 204, and the storage unit 22 may be in an external device, or at least a part of each function of the control unit 20 may be realized by the information processing terminal 1 or the information processing apparatus (for example, a so-called edge server) having a communication distance relatively close to the information processing terminal 1. As described above, it is possible to improve real-time performance, reduce a processing load, and further ensure security by appropriately distributing each configuration of the information processing server 2.
[0137] In addition, each configuration of the control unit 20 illustrated in FIG. 4 and the storage unit 22 are all provided in the information processing terminal 1, and the information processing system according to the present embodiment may be executed by the application of the information processing terminal 1.
3.* First Embodiment*
[0138] Next, an example of the functions of the information processing system according to the present embodiment will be described with reference to FIGS. 5 to 10.
[0139] The information processing system according to the first embodiment includes an information processing terminal 1a that collects product purchase request information from a first user (purchase requester), an information processing terminal 1b that appropriately presents product purchase request information to a second user (proxy purchaser), and an information processing server 2 that generates response information to each user. In the present embodiment, for example, it is assumed that the first user and the second user are family and an agent function provided by the system is shared by the family.
[0140] The information processing server 2 (virtual agent) can perform a conversation with each user via the information processing terminal 1a and the information processing terminal 1b. Although the form of the information processing terminal 1 is not particularly limited, for example, the information processing terminal 1a (such as a stationary dedicated device, see FIG. 6) disposed at home or the information processing terminal 1b worn by the second user (glasses-type HMD and the like) is assumed.
[0141] <3-1. Configuration>
[0142] The basic configuration of the information processing server 2 is as described with reference to FIG. 4, but in this embodiment, particularly in the response information generation unit 202 of the control unit 20, the collection of the product purchase request information or the response information for request are appropriately generated. A configuration of a response information generation unit 202-1 according to the present embodiment that generates the response information with the user related to such a request will be described with reference to FIG. 5.
[0143] FIG. 5 is a block diagram illustrating a configuration example of the response information generation unit 202-1 of the information processing server 2 according to the present embodiment. As illustrated in FIG. 5, the response information generation unit 202-1 functions as a request information collection response generation unit 300, a request contents determination unit 301, a priority calculation unit 302, a stepwise notification determination unit 303, an abstraction level determination unit 304, and a request response generation unit 305.
[0144] (Determination Function of Request Contents)
[0145] The response information generation unit 202-1 collects and determines the request contents by the request information collection response generation unit 300 and the request contents determination unit 301.
[0146] Specifically, the request information collection response generation unit 300 appropriately generates response information (question sentence) for collecting request information on a user’s product purchase, and presents the generated response information to a user through the information processing terminal 1. Specific examples of the response information (question sentence) for collecting the request information will be described later, but for example, specific information related to purchase such as an item, a quantity, a price, and how to obtain (where to purchase or ask someone) is acquired through user interaction.
[0147] The request contents determination unit 301 determines the request content (in this case, the shopping content) as one from the conversation with the user. In this embodiment, the user asks a question until the request contents determination unit 301 determines that the request content is one, and when the necessary information is available, the request contents determination unit 301 determines whether or not the request content is correct after obtaining approval from the user.
[0148] (Request Function)
[0149] In addition, the response information generation unit 202-1 performs a purchase request response to a user through the priority calculation unit 302, the stepwise notification determination unit 303, the abstraction level determination unit 304, and the request response generation unit 305. When requesting a purchase to the user, it is preferable to present the purchase request information step by step from an item having high priority and abstraction level in a natural conversation flow with the user. As a result, for example, it is possible to avoid confusion of the user or excessive presentation of the information on the request to the user even when the request is refused by presenting detailed information at a time from the beginning.
[0150] Specifically, the priority calculation unit 302 calculates priorities of each item of the determined request contents. The priority calculation algorithm is not particularly limited, but for example, an item that is estimated to have a large influence when a proxy purchaser determines whether or not to perform a proxy purchase may be calculated high. For example, priority of information such as what to purchase, how much trouble is required for purchase, and whether it is an immediately necessary item is calculated high. The priority calculation unit 302 may calculate priority in consideration of the current state of the proxy purchaser. For example, when the proxy purchaser moves by bicycle or on foot, priority of information on carrying such as the number, size, and weight of purchased items is calculated high. Further, when a travel time to a purchase location based on a current location of a proxy purchaser and a transportation means exceeds a predetermined value, the priority of information on the purchase location is calculated high. Further, the priority of the amount of purchased item may be calculated according to the amount possessed by the proxy purchaser.
[0151] As the items of the request contents, for example, the following items can be considered as an example. The items listed below are merely examples, and the present embodiment is not limited thereto, and it is not always necessary to acquire the information on all the items below when collecting the request information from the purchase requester described above. The requested items may include items other than the items listed below. [0152] Purchase request product-item, brand (sales/manufacturer), product name, product number, product image, size, weight, color, and the like. [0153] Purchase reason (purpose) [0154] Quantity [0155] Budget [0156] Purchase location-store name, address (map), store image, inventory status, price, discount information, and the like. [0157] Desired time of purchase (during today, until 0 o’clock, until tomorrow, and the like) [0158] Delivery address (family, friend, and the like) [0159] Delivery method (when returning home and the like) [0160] Payment method (wallet shared by family and the like)
[0161] The stepwise notification determination unit 303 determines whether to perform stepwise notification of the request information according to the environment of the user (proxy purchaser), and judges a stepwise notification method when the stepwise notification is performed. The user’s environment is information on the user recognized by the recognition unit 201, and includes, for example, a user’s situation (where and what is being done (or timing)), a usage status of an output device (information processing terminal 1), output characteristics, and the like. As the information processing terminal 1, various devices having different output characteristics such as a device capable of presentation with auditory information, a device capable of presentation with visual information, a device capable of presentation with auditory information and visual information, a device capable of presentation of a virtual object (AR display) as visual information are assumed. As the stepwise notification method, for example, stepwise output notification using visual information and auditory information, stepwise output notification using only auditory information, and stepwise output notification using only visual information, and the like are assumed.
[0162] For example, the stepwise notification determination unit 303 judges that the stepwise notification is possible when the user is running in the state in which he/she wears the glasses-type HMD (an example of the information processing terminal 1), and determines a method for stepwise notifying request information by voice and an image. In addition, even when the user is driving in the state in which he/she wears the glasses-type HMD (an example of the information processing terminal 1), it is determined that the stepwise notification is possible, and it is determined by a method for stepwise notifying request information with voice notification while driving and with an image when stopped. In addition, when the user is operating the smartphone terminal, it is assumed that the request information is presented as an image, but in this case, since a user more easily understand displaying the specific information on the request at a time than the stepwise notification, it may be determined that the stepwise notification is not performed.
[0163] The abstraction level determination unit 304 determines an abstraction level of the notification information at each step determined by the stepwise notification determination unit 303 according to the output unit. For example, the abstraction level determination unit 304 makes the abstraction level higher in the step of presentation with audio than in the step of presentation with an image. When detailed information is presented by voice, since the user is likely to be confused and difficult to remember contents, it is preferable to present information having a high abstraction level by voice presentation. On the other hand, in the case of image presentation, it is preferable to present information having a low abstraction level (high concreteness) because information on purchase can be easily communicated with text, diagrams, photographs, and the like.
[0164] The request response generation unit 305 generates response information (response (utterance) sentence, image, and the like) for notifying a user of abstract level request information corresponding to the abstraction level determined by the abstraction level determination unit 304. The items of the request information to be notified may be determined based on a predetermined order set in advance and may be determined randomly, and may be determined based on the priority calculated by the priority calculation unit 302, utterance contents (a question about request from the user, a flow of dialogue with the user) of the user, or the like.
[0165] For example, when generating based on the priority, the request response generation unit 305 first generates response information that inquires whether or not a proxy purchase is possible together with information on an item (for example, “purchase location”) having the highest priority among the items of the request information. At this time, the request response generation unit 305 presents information on an item having a higher priority at a level corresponding to the abstraction level determined by the abstraction level determination unit 304. For example, it can be said that when the request information item is “purchase location”, “purchase location name (store name)” has a high abstraction level, and “address (map)” and “store image” have a low abstraction level (high concreteness). For example, it can be said that when the item of the request information is “purchase request product”, the “item” has a high abstraction level, and “product name” and “brand” have the next highest abstraction level, “product number” and “product image” has a low abstraction level. Note that there may be a plurality of items of request information to be notified, and for example, in the case of voice presentation, response information for inquiring whether a proxy purchase is possible may be generated together with the purchase location information and the product information with a high abstraction level.
[0166] Further, the request response generation unit 305 basically may determine an item of request information to be notified based on a predetermined order set in advance or the calculated priority, and when there is a question from a user, may generate the response information primarily (with interrupt) including information on the questioned item.
[0167] In addition, the request response generation unit 305 may be used to estimate the knowledge of the purchase request item possessed by a user (proxy purchaser) from a purchase history of a user or family, and the like and recall the purchase request item when notifying the request information.
[0168] Further, the request response generation unit 305 according to the present embodiment is not limited to the generation of the response information for the stepwise notification of the request information, and may generate the response information for the request information notification as appropriate according to the user environment. For example, when the information processing terminal 1 used by the user has characteristics in which the stepwise notification is not preferable, such as a smartphone or a tablet terminal, the request response generation unit 305 may generate screen data which presents the specific request information as the response information at a time.
[0169] The response information generation unit 202-1 according to the present embodiment has been specifically described above. Subsequently, an operation process of this embodiment will be described.
[0170] <3-2. Operation Process>
[0171] (3-2-1. Process of Collecting Request Information)
[0172] First, a collection process of request information from a purchase requester will be described with reference to FIGS. 6 and 7.
[0173] The request information from the purchase requester can be collected by voice dialogue between the user and the agent via the information processing terminal 1, for example. FIG. 6 is a diagram for explaining an example of the information processing terminal according to the present embodiment. As illustrated in FIG. 6, for example, the information processing terminal 1a may be realized by a stationary dedicated device, and the request information may be collected by performing a voice dialogue with the user A. At this time, analysis of utterance contents of user A speech or generation of an agent’s voice response can be performed by the information processing server 2 connected to the information processing terminal 1a via the network 3, for example.
[0174] The information processing terminal 1a can present information by projecting an image on a wall surface by a voice response or, if necessary, a projector (for example, a small single focus projector) provided in the information processing terminal 1a. Here, as an example, a stationary dedicated device is illustrated, but the present embodiment is not limited thereto, and for example, the information processing terminal 1a may be a smartphone. In this case, the agent’s voice response is output from the speaker of the smartphone, the user A is interacted, and the request information is collected.
[0175] FIG. 7 is a flowchart illustrating an operation process for collecting request information according to this embodiment. As illustrated in FIG. 7, first, the information processing server 2 uses the request information collection response generation unit 300 to acquire the request contents while interacting with the user A (requester) via the information processing terminal 1a (step S103).
[0176] Next, the information processing server 2 repeats the dialog with the client until the request contents determination unit 301 can determine the request target (step S106). As the request information to be collected, for example, information such as an item, a brand, a product name, and a product number of the purchased product, at least the product that can be identified, and information such as a quantity, a budget, and a desired purchase time are assumed.
[0177] Next, when a target has been determined (step S106/Yes), the request contents determination unit 301 searches for candidates of an acquisition means (step S109). Examples of the acquisition means include purchase on the Internet, purchase at an actual store, purchase request at an actual store, and the like.
[0178] Next, the request contents determination unit 301 proposes an acquisition means to the requester (step S112), and determines the acquisition means obtained from the requester (step S115).
[0179] The information processing server 2 performs a product acquisition process determined based on the collected request information by an acquisition means approved by the client. In the case of Internet purchase, a predetermined mail order site or the like is displayed on the wall surface by the information processing terminal 1a, and the purchase processing is performed according to an instruction from the user A (or purchase processing is automatically performed). Further, when the user A himself/herself purchases at the actual store, the information processing server 2 displays a map up to an actual store on the wall surface by the information processing terminal 1a, or performs navigation or the like when the user A starts moving. Alternatively, when the purchase request at the actual store is approved, the information processing server 2 presents the purchase request and the request information to the proxy purchaser who is a request partner. The process of presenting the request information to the proxy purchaser will be described later with reference to FIG. 8.
[0180] The example of the request information collection process according to the present embodiment has been described above. Note that the operation process illustrated in FIG. 7 is an example, and the present disclosure is not limited to the example illustrated in FIG. 7. For example, the present disclosure is not limited to the order of the steps illustrated in FIG. 7. At least one of the steps may be processed in parallel, or may be processed in the reverse order. For example, after the determination of the acquiring method illustrated in step S115, the request contents acquisition process illustrated in step S103 may be performed to determine the target.
[0181] Further, all the processes illustrated in FIG. 7 need not be executed. For example, the acquisition method may be automatically determined by skipping the process of obtaining approval illustrated in step S115.
[0182] Further, all the processes illustrated in FIG. 7 do not necessarily have to be performed by a single device. For example, the processing from step S103 to step S106 may be performed by the information processing terminal 1a, and the processing from step S109 to step S115 may be performed by the information processing server 2.
[0183] Also, the processes illustrated in FIG. 7 do not necessarily have to be performed continuously in time. For example, after the processes illustrated in steps S103 to S109 are performed, the process illustrated in steps S112 to S115 may be performed at a predetermined timing (for example, when there is a request from the user, when the user is not busy, or when multiple purchase requests are accumulated).
[0184] Here, an example of a dialogue (via the information processing terminal 1a) with the user A (purchase requester) of the agent who collects the request information is illustrated below. The following dialogue example is in a situation where, for example, user A is consulting with an agent about the purchase of a present to be taken to a farewell party that participates today.
Dialogue Example
[0185] Agent: “How about this dish?” (product presentation; information processing terminal 1a projects an image of a dish of a present candidate on a wall with a projector)
[0186] User A (wife): “Good, where can I buy it?”
[0187] Agent: “One is sold at the World Kitchen AA store near User A’s office or at a store called CC miscellaneous goods near B Park.” (Presentation of purchase location information)
[0188] User A (wife): “Ah, CC miscellaneous goods is selling it?”
[0189] Agent: “Yes, this store has started to deal recently.”
[0190] User A (wife): “CC miscellaneous goods is not too far, but can we meet for the party time”
[0191] Agent: “User A, right now, your husband should be walking in the vicinity of Park B. Why don’t you ask him to buy?”
(Search for an acquisition means and propose proxy purchase)
[0192] User A (wife): “Okay, good idea. Can I ask you?”
[0193] Agent: “Yes, I can. How many do you need?” (collection of request information; product information is already collected because it was recommended by the agent)
[0194] User A (wife): “Can I ask for two. You saved me.”
[0195] Agent: “Yes” (collection of request information is finished)
[0196] Here, the information processing server 2 proposes the proxy purchase by the user B as a candidate for the acquisition means. At this time, the information processing server 2 may consider the state of the proxy purchaser and propose the proxy purchase when the possibility of the proxy purchase is high. Here, “he state of the proxy purchaser” means, for example, the current location, the possessed amount (or a holding state of an alternative means such as a credit card), the means of carrying the purchased product assumed from the current transportation means (walking, bicycle, car), availability of time for a proxy purchase (schedule, dialogue with proxy purchaser, obtainable by context analysis by the recognition unit 201), and the like.
……
……
……