Sony Patent | Information Processing Apparatus And Information Processing Method

Patent: Information Processing Apparatus And Information Processing Method

Publication Number: 20200183496

Publication Date: 20200611

Applicants: Sony

Abstract

The present disclosure relates to an information processing apparatus and an information processing method for performing processes on a target of interest corresponding to an operation input more accurately. Processing is performed on a target of interest on the basis of the target of interest and one of a first recognizer or a second recognizer, the target of interest being identified on the basis of status information regarding a user including at least either action information or position information regarding the user, the first recognizer being configured to recognize an operation input of the user, the second recognizer being configured to be different from the first recognizer and to recognize the operation input of the user. The present disclosure may be applied, for example, to information processing apparatuses, image processing apparatuses, control apparatuses, information processing systems, information processing methods, or information processing programs.

TECHNICAL FIELD

[0001] The present disclosure relates to an information processing apparatus and an information processing method. More particularly, the disclosure relates to an information processing apparatus and an information processing method for performing processes on a target of interest more accurately corresponding to an operation input.

BACKGROUND ART

[0002] Heretofore, there have been devices and systems which accept an operation input performed by a user such as by voice or gesture (action) and which perform processes on a target of the user’s interest in a manner corresponding to the operation input (e.g., see PTL 1).

CITATION LIST

Patent Literature

[PTL 1]

[0003] Japanese Patent Laid-open No. 2014-186361

SUMMARY

Technical Problem

[0004] However, it has not always been the case that given the operation input by the user, the target of the user’s interest is processed exactly as intended by the user. Methods of performing processes on the target of interest corresponding to the operation input more accurately have therefore been sought after.

[0005] The present disclosure has been devised in view of the above circumstances. An object of the disclosure is to perform processes more accurately on a target of interest corresponding to an operation input.

Solution to Problem

[0006] According to one aspect of the present technology, there is provided an information processing apparatus including a control section performing a process on a target of interest on the basis of the target of interest and one of a first recognizer or a second recognizer, the target of interest being identified on the basis of status information regarding a user including at least either action information or position information regarding the user, the first recognizer being configured to recognize an operation input of the user, the second recognizer being configured to be different from the first recognizer and to recognize the operation input of the user.

[0007] Also according to one aspect of the present technology, there is provided an information processing method including, by an information processing apparatus, performing a process on a target of interest on the basis of the target of interest and one of a first recognizer or a second recognizer, the target of interest being identified on the basis of status information regarding a user including at least either action information or position information regarding the user, the first recognizer being configured to recognize an operation input of the user, the second recognizer being configured to be different from the first recognizer and to recognize the operation input of the user.

[0008] Thus, according to one aspect of the present technology, there are provided an information processing apparatus and an information processing method by which a process is performed on a target of interest on the basis of the target of interest and one of a first recognizer or a second recognizer. The target of interest is identified on the basis of status information regarding a user including at least either action information or position information regarding the user. The first recognizer is configured to recognize an operation input of the user. The second recognizer is configured to be different from the first recognizer and to recognize the operation input of the user.

Advantageous Effects of Invention

[0009] According to the present disclosure, it is possible to process information. More particularly, it is possible to perform processes on an object of interest more accurately corresponding to an operation input.

BRIEF DESCRIPTION OF DRAWINGS

[0010] FIG. 1 is a view depicting examples of external appearances of an optical see-through HMD.

[0011] FIG. 2 is a block diagram depicting a principal configuration example of the optical see-through HMD.

[0012] FIG. 3 is a view explaining examples of how recognizers perform control corresponding to different operation targets.

[0013] FIG. 4 is a view explaining examples of how the recognizers perform control corresponding to different states.

[0014] FIG. 5 is a view explaining other examples of how the recognizers perform control corresponding to different states.

[0015] FIG. 6 is a view depicting examples of functions implemented by the optical see-through HMD.

[0016] FIG. 7 is a flowchart explaining an example of the flow of a control process.

[0017] FIG. 8 is a view explaining examples of gesture-related rules.

[0018] FIG. 9 is a view explaining another example of the gesture-related rules.

[0019] FIG. 10 is a view explaining other examples of the gesture-related rules.

[0020] FIG. 11 is a view explaining other examples of the gesture-related rules.

[0021] FIG. 12 is a view depicting examples of other functions implemented by the optical see-through HMD.

[0022] FIG. 13 is a flowchart explaining an example of the flow of another control process.

[0023] FIG. 14 is a flowchart explaining an example of the flow of a narrowing-down process.

DESCRIPTION OF EMBODIMENTS

[0024] Some preferred embodiments for implementing the present disclosure (referred to as the embodiments) are described below. The description will be given under the following headings:

  1. Execution of processes corresponding to the operation input 2. First embodiment (optical see-through HMD) 3. Second embodiment (utilization of rules of the operation input) 4.* Other examples of application*

5.* Others*

<1. Execution of Processes Corresponding to the Operation Input>

[0025] Heretofore, there have been devices and systems which accept an operation input performed by a user such as by voice or gesture (action) and which perform processes on a target of the user’s interest in a manner corresponding to the operation input. For example, the HMD (Head Mounted Display) described in PTL 1 recognizes and accepts gestures of the user relative to a virtual UI (User Interface) as an operation input. Such devices and systems detect information of images and sounds including the user’s voice and gestures by use of a camera and a microphone, for example. On the basis of the detected information, these devices and systems recognize and accept the operation input of the user.

[0026] However, it has not always been the case that given the operation input by the user, the target of interest is processed exactly as intended by the user. Methods of processing the target of interest corresponding to the operation input more accurately have therefore been sought after.

[0027] What is proposed herein involves performing a process on a target of interest based on one of a first recognizer or a second recognizer, the first recognizer being configured to recognize the target of interest identified on the basis of status information regarding a user including at least either action information or position information regarding the user, the first recognizer being further configured to recognize an operation input of the user, the second recognizer being configured to be different from the first recognizer and to recognize the operation input of the user.

[0028] For example, there is provided an information processing apparatus including: a first recognizer configured to recognize a target of interest identified on the basis of status information regarding a user including at least either action information or position information regarding the user, the first recognizer being further configured to recognize an operation input of the user; a second recognizer configured to be different from the first recognizer and to recognize the operation input of the user; and a control section configured to perform a process on the target of interest on the basis of one of the first recognizer or the second recognizer.

[0029] The action information regarding the user refers to the information related to the user’s actions. Here, the user’s actions may include the operation input performed by the user having resource to visual line direction, focal point distance, degree of pupil dilation, ocular fundus pattern, and opening and closing of eyelids (the operation input may also be referred to as the visual line input hereunder). For example, the visual line input includes the user moving the direction of his or her visual line and fixing it in a desired direction. In another example, the visual line input includes the user varying his or her focal point distance and fixing it to a desired distance. In yet another example, the visual line input includes the user varying the degree of his or her pupil dilation (dilating and contracting the pupils). In a further example, the visual line input includes the user opening and closing his or her eyelids. In a yet further example, the visual line input includes the user inputting the user’s identification information such as his or her ocular fundus pattern.

[0030] Also, the user’s actions may include an operation input performed by the user moving his or her body (such physical motion or movement may be referred to as the gesture hereunder; and such operation input may be referred to as the gesture input hereunder). In another example, the user’s actions may include an operation input based on the user’s voice (referred to as the voice input hereunder). Obviously, the user’s actions may include other actions than those mentioned above.

[0031] The gestures may include motions such as varying the orientation of the neck (head (face); referred to as the head-bobbing gesture or as the head gesture hereunder). In another example, the gestures may include moving the hand (shoulder, arm, palm, or fingers) or bringing it into a predetermined posture (referred to as the hand gesture hereunder). Obviously, the gestures may include motions or movements other than those mentioned above. The operation input performed by head gesture is also referred to as the head gesture input. Further, the operation input performed by hand gesture is also referred to as the hand gesture input.

[0032] The position information regarding the user means information regarding the position of the user. This position information may be given either as an absolute position in a predetermined coordinate system or as a relative position in reference to a predetermined object.

[0033] The status information regarding the user means user-related information including at least either the action information or the position information regarding the user. The target of interest means the target of the user’s interest. As described above, the target of interest is identified on the basis of the status information regarding the user.

[0034] For example, the user performs an operation input as an instruction to execute a certain process on the target of interest. The control section mentioned above recognizes the operation input using a recognizer, identifies the process on the target of interest corresponding to the operation input (i.e., the process desired by the user), and performs the identified process. At this point, the control section performs the process on the target of interest on the basis of the target of interest and on one of the first and the second recognizers that are different from each other. Thus, the control section can perform the process more accurately on the target of interest corresponding to the operation input.

[0035] As described above, the first and the second recognizers are each configured to recognize the operation input of the user and are different from each other. The first and the second recognizers may each be configured with a single recognizer or with multiple recognizers. That is, the first and the second recognizers may each be configured to recognize either a single operation input type (e.g., hand gesture input alone or voice input alone) or multiple operation input types (e.g., hand gesture input and voice input, or head gesture input and visual line input).

[0036] If the recognizer or recognizers (recognizable operation input types) constituting the first recognizer are not completely identical to the recognizer or recognizers (recognizable operation input types) making up the second recognizer, then the first and the second recognizers may each have any desired configuration or configurations (recognizable operation input types). For example, the first recognizer may include a recognize not included in the second recognizer, and the second recognizer may include a recognizer not included in the first recognizer. This enables the control section to select one of the first recognizer or the second recognizer in order to accept (recognize) different types of operation input. That is, the control section can accept the appropriate type of operation input depending on the circumstances (e.g., the target of interest), thereby accepting the user’s operation input more accurately. As a result, the control section can perform processes more accurately on the target of interest corresponding to the operation input.

[0037] Alternatively, the first recognizer may include a recognizer not included in the second recognizer. As another alternative, the second recognizer may include a recognizer not included in the first recognizer.

[0038] As a further alternative, the number of recognizers (number of recognizable operation input types) constituting the first recognizer need not be the same as the number of recognizers (number of recognizable operation input types) making up the second recognizer. For example, the first recognizer may include a single recognizer, and the second recognizer may include multiple recognizers.

2.* First Embodiment*

[0039] <Incorrect Recognition or Non-Recognition of the User’s Operation Input>

[0040] For example, it is not always the case that the operation input of the user is correctly recognized by any method. There are easy and difficult methods of recognition depending on the circumstances. For this reason, if there are only difficult methods of recognition under the circumstances, there can be a case where the use’s operation input is not recognized (missed, hence the fear of non-recognition). Conversely, if there are too many easy methods of recognition, there can be a case where the absence of the user’s operation input is falsely recognized as an operation input (hence the fear of incorrect recognition).

[0041]

[0042] In order to reduce the occurrence of the above-mentioned incorrect recognition or non-recognition, a first embodiment uses the more appropriate of the recognizers depending on the circumstances. For example, the above-mentioned control section activates one of the first recognizer or the second recognizer and deactivates the other recognizer on the basis of the identified target of interest so as to carry out processes on the target of interest in accordance with the activated recognizer.

[0043] In this manner, the recognizer to be used is selected more appropriately depending on the circumstances (target of interest). The control section is thus able to recognize the user’s operation input more accurately. On the basis of the result of the recognition, the control section can perform processes on the target of interest more precisely.

[0044]

[0045] FIG. 1 depicts examples of external appearances of an optical see-through HMD as an information processing apparatus to which the present technology is applied. As illustrated in A in FIG. 1, a housing 111 of an optical see-through HMD 100 has a so-called eyeglass shape. As with eyeglasses, the housing 111 is attached to the user’s face in such a manner that its end parts are hooked on the user’s ears.

[0046] The parts corresponding to the lenses of eyeglasses constitute a display section 112 (including a right-eye display section 112A and a left-eye display section 112B). When the user wears the optical see-through HMD 100, the right-eye display section 112A is positioned near the front of the user’s right eye and the left-eye display section 112B is positioned near the front of the user’s left eye.

[0047] The display section 112 is a transmissive display that lets light pass through. That means the user’s right eye can see the backside of the right-eye display section 112A, i.e., a real-world scene (see-through image) in front of the right-eye display section 112A as viewed therethrough. Likewise, the user’s left eye can see the backside of the left-eye display section 112B, i.e., a real-world scene (see-through image) in front of the left-eye display section 112B as viewed therethrough. Therefore, the user sees the image displayed on the display section 112 in a manner superimposed in front of the real-world scene beyond the display section 112.

[0048] The right-eye display section 112A displays an image for the user’s right eye to see (right-eye image), and the left-eye display section 112B displays an image for the user’s left eye to see (left-eye image). That is, the display section 112 may display a different image on each of the right-eye display section 112A and the left-eye display section 112B. This makes it possible for the display section 112 to display a three-dimensional image, for example.

[0049] Further, as illustrated in FIG. 1, a hole 113 is formed near the display section 112 of the housing 111. Inside the housing 111 near the hole 113 is an imaging section for capturing a subject. Through the hole 113, the imaging section captures the subject in the real space in front of the optical see-through HMD 100 (the subject in the real space beyond the optical see-through HMD 100 as seen from the user wearing the optical see-through HMD 100). More specifically, the imaging section captures the subject in the real space positioned within a display region of the display section 112 (right-eye display section 112A and left-eye display section 112B) as viewed from the user. Capturing the subject generates image data of the captured image. The generated image data is stored on a predetermined storage medium or transmitted to another device, for example.

[0050] Incidentally, the hole 113 (i.e., the imaging section) may be positioned where desired. The hole 113 may be formed in a position other than that depicted in A in FIG. 1. Also, a desired number of holes 113 (i.e., imaging sections) may be provided. There may be a single hole 113 as in A in FIG. 1, or there may be multiple holes 113.

[0051] Further, the housing 111 may be shaped as desired as long as the housing 111 can be attached to the user’s face (head) in such a manner that the right-eye display section 112A is positioned near the front of the user’s right eye and the left-eye display section 112B is positioned near the front of the user’s left eye. For example, the optical see-through HMD 100 may be shaped as illustrated in B in FIG. 1.

[0052] In the case of the example in B in FIG. 1, a housing 131 of the optical see-through HMD 100 is shaped in a manner pinching the user’s head from behind when in a fixed position. A display section 132 in this case is also a transmissive display as with the display section 112. That is, the display section 132 also has a right-eye display section 132A and a left-eye display section 132B. When the user wears the optical see-through HMD 100, the right-eye display section 132A is positioned near the front of the user’s right eye and the left-eye display section 132B is positioned near the front of the user’s left eye.

[0053] The right-eye display section 132A is a display section similar to the right-eye display section 112A. The left-eye display section 132B is a display section similar to the left-eye display section 112B. That is, as with the display section 112, the display section 132 can also display a three-dimensional image.

[0054] In the case of B in FIG. 1, as in the case of A in FIG. 1, a hole 133 similar to the hole 113 is provided near the display section 132 of the housing 131. Inside the housing 131 near the hole 133 is an imaging section for capturing a subject. As in the case of A in FIG. 1, through the hole 133, the imaging section captures the subject in the real space in front of the optical see-through HMD 100 (the subject in the real space beyond the optical see-through HMD 100 as seen from the user wearing the optical see-through HMD 100).

[0055] Obviously, as in the case of A in FIG. 1, the hole 133 (i.e., imaging section) may be positioned where desired. The hole 133 may be formed in a position other than that depicted in B in FIG. 1. Also, a desired number of holes 133 (i.e., imaging sections) may be provided as in the case of A in FIG. 1.

[0056] Further, as in an example depicted in C in FIG. 1, a portion of the optical see-through HMD 100 configured as illustrated in A in FIG. 1 may be configured separately from the housing 111. In the case of the example in C in FIG. 1, the housing 111 is connected with a control box 152 via a cable 151.

[0057] The cable 151, which is a communication channel for predetermined wired communication, electrically connects the circuits in the housing 111 with the circuits in the control box 152. The control box 152 includes a portion of the internal configuration (circuits) of the housing 111 in the case of the example in A in FIG. 1. For example, the control box 152 has a control section and a storage section for storing image data. With communication established between the circuits in the housing 111 and those in the control box 152, the imaging section in the housing 111 may capture an image under control of the control section in the control box 152, before supplying image data of the captured image to the control box 152 for storage into its storage section.

[0058] The control box 152 may be placed in a pocket of the user’s clothes, for example. In such a configuration, the housing 111 of the optical see-through HMD 100 may be formed to be smaller than in the case of A in FIG. 1.

[0059] Incidentally, the communication between the circuits in the housing 111 and those in the control box 152 may be implemented in wired or wireless fashion. In the case of wireless communication, the cable 151 may be omitted.

[0060]

[0061] FIG. 2 is a block diagram depicting an example of the internal configuration of the optical see-through HMD 100. As illustrated in FIG. 2, the optical see-through HMD 100 has a control section 201.

[0062] The control section 201 is configured using a microcomputer that includes a CPU (Central Processing Unit), a ROM (Read Only Memory), a RAM (Random Access Memory), a nonvolatile memory section, and an interface section, for example. The control section 201 performs desired processes by executing programs. For example, the control section 201 performs processes based on the recognition of the user’s operation input and on the result of the recognition. Further, the control section 201 controls the components of the optical see-through HMD 100. For example, the control section 201 may drive the components in a manner corresponding to the performed processes such as detecting information regarding the user’s action and outputting the result of the process reflecting the user’s operation input.

[0063] The optical see-through HMD 100 also includes an imaging section 211, a voice input section 212, a sensor section 213, a display section 214, a voice output section 215, and an information presentation section 216.

[0064] The imaging section 211 includes an optical system configured with an imaging lens, a diaphragm, a zoom lens, and a focus lens; a driving system that causes the optical system to perform focus and zoom operations; and a solid-state imaging element that generates an imaging signal by detecting imaging light obtained by the optical system and by subjecting the detected light to photoelectric conversion. The solid-state imaging element is configured as a CCD (Charge Coupled Device) image sensor or a CMOS (Complementary Metal Oxide Semiconductor) image sensor, for example.

[0065] The imaging section 211 may have a desired number of optical systems, a desired number of driving systems, and a desired number of solid-state imaging elements. Each of these components may be provided singly or in multiple numbers. Each optical system, each driving system, and each solid-state imaging element in the imaging section 211 may be positioned where desired in the housing of the optical see-through HMD 100, or may be provided independent of the housing of the optical see-through HMD 100 (as a separate unit or units). There may be a single or multiple directions (field angles) in which the imaging section 211 captures images.

[0066] Under control of the control section 201, the imaging section 211 focuses on a subject, captures the subject, and supplies data of the captured image to the control section 201.

[0067] The imaging section 211 captures the scene in front of the user (real-world subject in front of the user) through the hole 113, for example. Naturally, the imaging section 211 may capture the scene in some other direction such as behind the user. Using such a captured image, the control section 201 may grasp (recognize) the surroundings (environment), for example. The imaging section 211 may supply the captured image as the position information regarding the user to the control section 201, so that the control section 201 can grasp the position of the user on the basis of the captured image. In another example, the imaging section 211 may supply the captured image as the action information regarding the user to the control section 201, so that the control section 201 may grasp (recognize) the head gesture (e.g., direction in which the user faces, visual line direction of the user, or what the head-bobbing gesture looks like) of the user wearing the optical see-through HMD 100.

[0068] Further, the imaging section 211 may capture the head (or face) of the user wearing the optical see-through HMD 100. For example, the imaging section 211 may supply such a captured image as the action information regarding the user to the control section 201, so that the control section 201 can grasp (recognize) the head gesture of the user on the basis of the captured image.

[0069] Furthermore, the imaging section 211 may capture the eyes (eyeball potions) of the user wearing the optical see-through HMD 100. For example, the imaging section 211 may supply such a captured image as the action information regarding the user to the control section 201, so that the control section 201 can grasp (recognize) the user’s visual line input on the basis of the captured image.

[0070] Moreover, the imaging section 211 may capture the hand (shoulder, arm, palm, or fingers) of the user wearing the optical see-through HMD 100. For example, the imaging section 211 may supply such a captured image as the action information regarding the user to the control section 201, so that the control section 201 can grasp (recognize) the user’s hand gesture input on the basis of the captured image.

[0071] Incidentally, the light detected by the solid-state imaging element of the imaging section 211 may be in any wavelength band and is not limited to visible light. The solid-state imaging element may capture visible light and have the captured image displayed on the display section 214, for example.

[0072] The voice input section 212 includes a voice input device such as a microphone. The voice input section 212 may include a desired number of voice input devices. There may be a single or multiple voice input devices. Each voice input device of the voice input section 212 may be positioned where desired in the housing of the optical see-through HMD 100. Alternatively, each voice input device may be provided independent of the housing of the optical see-through HMD 100 (as a separate unit).

[0073] The voice input section 212, under control of the control section 201, for example, collects sounds from the surroundings of the optical see-through HMD 100 and performs signal processing such as A/D conversion on the collected sounds. For example, the voice input section 212 collects the voice of the user wearing the optical see-through HMD 100, performs signal processing on the collected voice to obtain a voice signal (digital data), and supplies the voice signal as the action information regarding the user to the control section 201. The control section 201 may then grasp (recognize) the user’s voice input on the basis of that voice signal.

[0074] The sensor section 213 includes such sensors as an acceleration sensor, a gyro sensor, a magnetic sensor, and an atmospheric pressure sensor. The sensor section 213 may have any number of sensors of any type. There may be a single or multiple sensors. Each of the sensors of the sensor section 213 may be positioned where desired in the housing of the optical see-through HMD 100. Alternatively, the sensors may be provided independent of the housing of the optical see-through HMD 100 (as a separate unit or units).

[0075] The sensor section 213, under control of the control section 201, for example, drives the sensors to detect information regarding the optical see-through HMD 100 as well as information regarding the surroundings thereof. For example, the sensor section 213 may detect an operation input such as visual line input, gesture input, or voice input performed by the user wearing the optical see-through HMD 100. The information detected by the sensor section 213 may be supplied as the action information regarding the user to the control section 201. In turn, the control section 201 may grasp (recognize) the user’s operation input on the basis of the supplied information. The information detected by the sensor section 213 and supplied, for example, as the action information regarding the user to the control section 201 may be used by the latter as the basis for grasping the position of the user, for example.

[0076] The display section 214 includes the display section 112 as a transmissive display, an image processing section that performs image processing on the image displayed on the display section 112, and control circuits of the display section 112. The display section 214, under control of the control section 201, for example, causes the display section 112 to display the image corresponding to the data supplied from the control section 201. This allows the user to view the information presented as the image.

[0077] The image displayed on the display section 112 is viewed by the user as superimposed in front of the real-space scene. For example, the display section 214 enables the user to view the information corresponding to an object in the real space in a manner superimposed on that object in the real space.

……
……
……

更多阅读推荐......