Sony Patent | Information Processing System And Target Information Acquisition Method

Patent: Information Processing System And Target Information Acquisition Method

Publication Number: 20200279401

Publication Date: 20200903

Applicants: Sony

Abstract

Multiple imaging apparatuses 12a and 12b are arranged to capture images of a space in which an HMD 18 is found. The images captured by the imaging apparatuses are analyzed individually to acquire position and posture information regarding the HMD 18 in each of camera coordinate systems of the apparatuses. The position and posture information is aggregated to one information processing apparatus and transformed thereby into information in a world coordinate system independent of the imaging apparatuses. Relative relations between the positions and postures of the imaging apparatuses are acquired by making use of a period during which the HMD 18 is in a region 186 where the fields of view of the imaging apparatuses overlap with each other. The relative relations are used as the basis for acquiring parameters for transforming coordinates between the imaging apparatuses.

TECHNICAL FIELD

[0001] The present invention relates to an information processing apparatus and a target information acquisition method for acquiring status information regarding a target on the basis of captured images.

BACKGROUND ART

[0002] Games may be played by a user watching a display screen of a head-mounted display (referred to as an HMD hereunder) worn on a head and connected with a game machine (e.g., see PTL 1). If a position and posture of the user’s head are acquired so that images of a virtual world are displayed in such a manner that a field of view is varied in accordance with face orientation, for example, this can create a situation in which the user feels as if he or she is in the virtual world. The position and posture of the user are generally acquired from a result of analyzing visible and infrared light images captured of the user and from measurements taken by motion sensors incorporated in the HMD.

CITATION LIST

Patent Literature

[0003] [PTL 1]

[0004] Japanese Patent No. 5580855

SUMMARY

Technical Problems

[0005] Technology for performing any kind of information processing on the basis of captured images is based on an assumption that a target such as a user is within an angle of view of a camera. However, because the user wearing the HMD cannot view the outside world, the user may become disoriented or may be immersed in a game so much that the user may move to an unexpected location in real space without noticing it. This puts the user out of the angle of view of the camera, disrupting the ongoing information processing or lowering its accuracy. Furthermore, the user may remain unaware of a cause of such aberrations. Regardless of the HMD being used or not, in order to implement information processing in more diverse ways with a minimum of stress on the user, it is desirable to acquire status information stably in a more extensive movable range than before.

[0006] The present invention has been made in view of the above problems. An object of the invention is therefore to provide techniques that, in acquiring status information regarding the target by image capture, extend the movable range of the target in an easy and stable manner.

Solution to Problems

[0007] One embodiment of the present invention is an information processing system. The information processing system includes: multiple imaging apparatuses configured to capture images of a target from different points of view at a predetermined rate; and an information processing apparatus configured to analyze each of the images covering the target captured by the multiple imaging apparatuses so as to individually acquire sets of position and posture information regarding the target, the information processing apparatus further using one of the sets of the position and posture information to generate and output final position and posture information at a predetermined rate.

[0008] Another embodiment of the present invention is a target information acquisition method. The information acquisition method includes the steps of: causing multiple imaging apparatuses to capture images of a target from different points of view at a predetermined rate; and causing an information processing apparatus to analyze each of the images covering the target captured by the multiple imaging apparatuses so as to individually acquire sets of position and posture information regarding the target, the information processing apparatus being further caused to use one of the sets of the position and posture information to generate and output final position and posture information at a predetermined rate.

[0009] Incidentally, if other combinations of the above-outlined constituent elements or the above expressions of the present invention are converted between different forms such as a method, an apparatus, a system, a computer program, and a recording medium that records the computer program, they still constitute effective embodiments of this invention.

Advantageous Effect of Invention

[0010] In acquiring status information regarding a target by image capture, the techniques according to the present invention permit extension of the movable range of the target in an easy and stable manner.

BRIEF DESCRIPTION OF DRAWINGS

[0011] FIG. 1 is a view depicting an exemplary configuration of an information processing system to which an embodiment of the present invention may be applied.

[0012] FIG. 2 is a view depicting an exemplary external shape of an HMD according to the embodiment.

[0013] FIG. 3 is a view depicting an internal circuit configuration of an information processing apparatus having main functions according to the embodiment.

[0014] FIG. 4 is a view depicting an internal circuit configuration of the HMD according to the embodiment.

[0015] FIG. 5 is a view depicting configurations of functional blocks in information processing apparatuses according to the embodiment.

[0016] FIG. 6 is a view depicting relations between the arrangement of imaging apparatuses on one hand and the movable range of the HMD on the other hand according to the embodiment.

[0017] FIG. 7 is a view explaining a technique by which a transformation parameter acquisition section according to the present embodiment obtains parameters for transforming local information to global information.

[0018] FIG. 8 is a flowchart depicting a processing procedure in which the information processing apparatuses according to the embodiment acquire position and posture information regarding a target so as to generate and output data reflecting the acquired information.

[0019] FIG. 9 is a view explaining a technique of reciprocal transformation of timestamps between the information processing apparatuses according to the embodiment.

[0020] FIG. 10 is a view depicting an exemplary arrangement of three or more pairs of the imaging apparatus and the information processing apparatus according to the embodiment.

DESCRIPTION OF EMBODIMENTS

[0021] FIG. 1 depicts an exemplary configuration of an information processing system to which an embodiment of the present invention may be applied. The information processing system is configured with multiple pairs 8a and 8b of imaging apparatuses 12a and 12b for capturing images of a target and of information processing apparatuses 10a and 10b for acquiring position and posture information regarding the target using the images captured by the imaging apparatuses. The target is not limited to anything specific. By acquiring the position and posture of an HMD 18, for example, the system identifies the position and motion of a head of a user 1 wearing the HMD 18, and displays images in a field of view in accordance with the user’s line of sight.

[0022] The imaging apparatuses 12a and 12b have cameras for capturing images of the target such as the user at a predetermined frame rate, and mechanisms for generating output data representing captured images obtained by performing common processes such as demosaicing on an output signal from the cameras, before outputting generated data to the paired information processing apparatuses 10a and 10b with which communication is established. The cameras include visual light sensors such as CCD (Charge Coupled Device) sensors or CMOS (Complementary Metal Oxide Semiconductor) sensors used in common digital cameras and digital video cameras. The imaging apparatus 12 may include either a single camera or what is called a stereo camera having two cameras disposed right and left at a known distance apart as illustrated.

[0023] Alternatively, the imaging apparatuses 12a and 12b may each be constituted by combining a monocular camera with an apparatus that emits reference light such as infrared light to the target and measures reflected light therefrom. In the case where the stereo camera or the reflected light measuring mechanism is installed, it is possible to obtain the position of the target in a three-dimensional space with high accuracy. It is well known that the stereo camera operates by the technique of determining the distance from the camera to the target by the principle of triangulation using stereoscopic images captured from right and left points of view. Also well known is the technique of determining the distance from the camera to the target through measurement of reflected light on a TOF (Time of Flight) basis or by use of a pattern projection method.

[0024] However, even where the imaging apparatuses 12a and 12b are a monocular camera each, by attaching markers of predetermined sizes and shapes to the target or by having the size and shape of the target made known beforehand, it is possible to identify the position of the target in the real world from the position and size of images captured of the target.

[0025] The information processing apparatuses 10a and 10b establish communication with the corresponding imaging apparatuses 12a and 12b, respectively, to acquire information regarding the position and posture of the target using data of its images captured and transmitted by the imaging apparatuses. Generally, the position and posture of the target obtained with the above-described techniques using captured images are given as information in a camera coordinate system that has its origin at the optical center of each imaging apparatus and has the axes oriented in the longitudinal, crosswise, and vertical directions of the imaging plane of the information apparatus. With this embodiment, the position and posture information regarding the target is first obtained by the information processing apparatuses 10a and 10b in each camera coordinate system.

[0026] The information in the camera coordinate systems is then transformed to information in a world coordinate system integrating these coordinate systems. This generates the final position and posture information regarding the target. This makes it possible to perform information processing using the position and posture information regardless of the field of view of any imaging apparatus covering the target. That is, the movable range of the target is extended by an amount reflecting the number of configured imaging apparatuses without affecting subsequent processes. Because the information processing apparatuses 10a and 10b acquire and use the position and posture information independently in the camera coordinate systems of the corresponding imaging apparatuses 12a and 12b, the existing pairs 8a and 8b of the imaging apparatuses and information processing apparatuses may be utilized unmodified, which makes system implementation easy to accomplish.

[0027] FIG. 1 depicts two pairs 8a and 8b, i.e., the pair 8a of the imaging apparatus 12a and information processing apparatus 10a and the pair 8b of the imaging apparatus 12b and information processing apparatus 10b. However, the number of the pairs is not limited to any specific number. The position and posture information obtained in each of the camera coordinate systems is aggregated by a predetermined information processing apparatus 10a. This information processing apparatus 10a collects the position and posture information acquired on its own and from the other information processing apparatus 10b, thereby generating the position and posture information in the world coordinate system. The information processing apparatus 10a then performs predetermined information processing on the resulting position and posture information so as to generate output data such as images and sounds.

[0028] In the description that follows, the information processing apparatus 10a that collects the position and posture information in the camera coordinate systems, transforms the collected information into the final position and posture information, and performs predetermined information processing using the generated information may be referred to as “information processing apparatus 10a having the main functions,” and any other information processing apparatus as “information processing apparatus having the sub functions.”

[0029] The content of processes performed by the information processing apparatus 10a having the main functions using the position and posture information is not limited to anything specific and may be determined in accordance with the functions or the content of applications desired by the user. The information processing apparatus 10a may acquire the position and posture information regarding the HMD 18 in the manner described above, for example, thereby implementing a virtual reality by rendering it in the field of view in accordance with the user’s line of sight. Further, the information processing apparatus 10a may identify the motions of the user’s head and hands in order to advance games in which characters or the items reflecting the identified motions appear, or so as to convert the user’s motions into command input for information processing. The information processing apparatus 10a having the main functions outputs the generated output data to a display apparatus such as the HMD 18.

[0030] The HMD 18 is a display apparatus that presents the user wearing it with images on a display panel such as an organic EL panel positioned in front of the user’s eyes. For example, parallax images acquired from right and left points of views are generated and displayed on a right and a left display region bisecting the display screen to let the images be viewed stereoscopically. However, this is not limitative of the embodiment of the present invention. Alternatively, a single image may be displayed over the entire display screen. The HMD 18 may further incorporate speakers or earphones that output sounds to the positions corresponding to the user’s ears.

[0031] Incidentally, the destination to which the information processing apparatus 10a having the main functions outputs data is not limited to the HMD 18. The destination of the data output may alternatively be a flat-screen display, not illustrated.

[0032] The communication between the information processing apparatuses 10a and 10b on one hand and the corresponding imaging apparatuses 12a and 12b on the other hand, between the information processing apparatus 10a having the main functions on one hand and the information processing apparatus 10b having the sub functions on the other hand, and between the information processing apparatus 10a having the main functions on one hand and the HMD 18 on the other hand, may be implemented either by cable such as Ethernet (registered trademark) or wirelessly such as by Bluetooth (registered trademark). The external shapes of these apparatuses are not limited to those illustrated. For example, the imaging apparatus 12a and the information processing apparatus 10a may be integrated into an information terminal, and so may be the imaging apparatus 12b and the information processing apparatus 10b.

[0033] Further, the apparatuses may each be provided with an image display function, and images generated in accordance with the position and posture of the target may be displayed by each apparatus. With this embodiment, as described above, the pairs 8a and 8b of the information processing apparatuses and imaging apparatuses acquire the position and posture information regarding the target in the camera coordinate systems. Whereas the target is not limited to anything specific because the process involved may be implemented using existing techniques, the description that follows assumes that the HMD 18 is the target.

[0034] FIG. 2 depicts an exemplary external shape of the HMD 18. In this example, the HMD 18 is configured with an output mechanism section 102 and a wearing mechanism section 104. The wearing mechanism section 104 includes a wearing band 106 that encircles the head and attaches the apparatus thereto when worn by the user. The wearing band 106 may be made of such a material or have such a structure that its length can be adjusted according to the circumference of the user’s head. For example, the wearing band 106 may be made of an elastic body such as rubber or may be structured using a buckle or gear wheels.

[0035] The output mechanism section 102 includes a housing 108 shaped in such a manner as to cover the right and left eyes when the user wears the HMD 18. Inside the output mechanism section 102 is a display panel directly facing the eyes when worn. Disposed on the outer surface of the housing 108 are markers 110a, 110b, 110c, 110d, and 110e that are lit in a predetermined color. The number of markers, their arrangements, and their shapes are not limited to anything specific. In the illustrated example, approximately rectangular markers are provided in the four corners and at the center of the output mechanism section 102.

[0036] Further, both rear sides of the wearing band 106 are provided with elliptically shaped markers 110f and 110g. On the basis of their number and their positions, the markers thus arranged permit identification of situations in which the user faces sideways or backwards relative to the imaging apparatuses 12a and 12b. Incidentally, the markers 110d and 110e are disposed under the output mechanism section 102 and the markers 110f and 110g are outside the wearing band 116, so that their contours are indicated by dotted lines because the markers are invisible from points of view in FIG. 2. The markers need only have predetermined colors and shapes and be configured to be distinguishable from the other objects in an imaging space. In some cases, the markers need not be lit.

[0037] FIG. 3 depicts an internal circuit configuration of the information processing apparatus 10a having the main functions. The information processing apparatus 10a includes a CPU (Central Processing Unit) 22, a GPU (Graphics Processing Unit) 24, and a main memory 26. These components are interconnected via a bus 30. The bus 30 is further connected with an input/output interface 28. The input/output interface 28 is connected with a peripheral device interface such as a USB (universal serial bus) or IEEE (Institute of Electrical and Electronics Engineers) 1394 port, a communication section 32 including a wired or wireless LAN (local area network) network interface, a storage section 34 including a hard disk drive or a nonvolatile memory, an output section 36 that outputs data to the information processing apparatus 10b having the sub functions and to the HMD 18, an input section 38 that receives input of data from the information processing apparatus 10b, from the imaging apparatus 12, and from the HMD 18, and a recording medium drive section 40 that drives removable recording media such as magnetic disks, optical disks, or semiconductor memories.

[0038] The CPU 22 controls the information processing apparatus 10a as a whole by executing an operating system stored in the storage section 34. Further, the CPU 22 performs various programs read from the removable recording media or downloaded via the communication section 32 and loaded into the main memory 26. The GPU 24 has the functions of both a geometry engine and a rendering processor. Under rendering instructions from the CPU 22, the GPU 24 performs rendering processes and stores resulting display image into a frame buffer, not depicted.

[0039] The display image stored in the frame buffer is converted to a video signal before being output to the output section 36. The main memory 26 is configured with a RAM (Random Access Memory) that stores programs and data necessary for processing. The information processing apparatus 10b having the sub functions has basically the same internal circuit configuration. It is to be noted, however, that in the information processing apparatus 10b, the input section 38 receives input of data from the information processing apparatus 10a and the output section 36 outputs the position and posture information in the camera coordinate system.

[0040] FIG. 4 depicts an internal circuit configuration of the HMD 18. The HMD 18 includes a CPU 50, a main memory 52, a display section 54, and a sound output section 56. These components are interconnected via a bus 58. The bus 58 is further connected with an input/output interface 60. The input/output interface 60 is connected with a communication section 62 including a wired or wireless LAN network interface, IMU (inertial measurement unit) sensors 64, and a light-emitting section 66.

[0041] The CPU 50 processes information acquired from the components of the HMD 18 via the bus 58, and supplies output data acquired from the information processing apparatus 10a having the main functions to the display section 54 and to the sound output section 56. The main memory 52 stores the programs and data required by the CPU 50 for processing. Depending on the application to be executed or on the design of the apparatus, there may be a case where the information processing apparatus 10a performs almost all processing so that the HMD 18 need only output the data transmitted from the information processing apparatus 10a. In this case, the CPU 50 and the main memory 52 may be replaced with more simplified devices.

[0042] The display section 54, configured with a display panel such as a liquid crystal panel or an organic EL panel, displays images before the eyes of the user wearing the HMD 18. As described above, a pair of parallax images may be displayed on the display regions corresponding to the right and left eyes so as to implement stereoscopic images. The display section 54 may further include a pair of lenses positioned between the display panel and the eyes of the user wearing the HMD 18, the paired lenses serving to extend the viewing angle of the user.

[0043] The sound output section 56 is configured with speakers or earphones positioned corresponding to the ears of the user wearing the HMD 18, the speakers or earphones outputting sounds for the user to hear. The number of channels on which sounds are output is not limited to any specific number. There may be monaural, stereo, or surround channels. The communication section 62 is an interface that transmits and receives data to and from the information processing apparatus 10a, the interface being implemented using known wireless communication technology such as Bluetooth (registered trademark). The IMU sensors 64 include a gyro sensor and an acceleration sensor and acquire angular velocity and acceleration of the HMD 18. The output values of the sensors are transmitted to the information processing apparatus 10a via the communication section 62. The light-emitting section 66 is an element or an aggregate of elements emitting light in a predetermined color. As such, the light-emitting section 66 constitutes the markers disposed at multiple positions on the outer surface of the HMD 18 depicted in FIG. 2.

[0044] FIG. 5 depicts a configuration of functional blocks in the information processing apparatus 10a having the main functions and a configuration of functional blocks in the information processing apparatus 10b having the sub functions. The functional blocks depicted in FIG. 5 may be implemented in hardware using the CPU, GPU, and memory depicted in FIG. 3, for example, or implemented in software using programs that are loaded typically from recording media into memory to provide such functions as data input, data retention, image processing, and input/output. Thus, it will be understood by those skilled in the art that these functional blocks are implemented in hardware alone, in software alone, or by a combination of both in diverse forms and are not limited to any of such forms.

[0045] The information processing apparatus 10a having the main functions includes a captured image acquisition section 130 that acquires data representing captured images from the imaging apparatus 12a, an image analysis section 132 that acquires position and posture information based on the captured images, a sensor value acquisition section 134 that acquires the output values of the IMU sensors 64 from the HMD 18, a sensor value transmission section 136 that transmits the output values of the IMU sensors 64 to the information processing apparatus 10b having the sub functions, and a local information generation section 138 that generates position and posture information in the camera coordinate system by integrating the output values of the IMU sensors 64 and the position and posture information based on the captured images. The information processing apparatus 10a further includes a local information reception section 140 that receives the position and posture information transmitted from the information processing apparatus 10b having the sub functions, a global information generation section 142 that generates position and posture information in the world coordinate system, an output data generation section 150 that generates output data by performing information processing using the position and posture information, and an output section 152 that transmits the output data to the HMD 18.

[0046] The captured image acquisition section 130 is implemented using the input section 38, CPU 22, and main memory 26 in FIG. 3, for example. The captured image acquisition section 130 acquires sequentially the data of captured images output by the imaging apparatus 12a at a predetermined frame rate, and supplies the acquired data to the image analysis section 132. In the case where the imaging apparatus 12a is configured with a stereo camera, the data of images captured by right and left cameras is acquired sequentially. The captured image acquisition section 130 may be arranged to control the start and end of image capture by the imaging apparatus 12a in accordance with processing start/end requests acquired from the user via an input apparatus or the like, not depicted.

[0047] The image analysis section 132 is implemented using the CPU 22, GPU 24, and main memory 26 in FIG. 3, for example. The image analysis section 132 acquires the position and posture information regarding the HMD 18 at a predetermined rate by detecting images of the markers disposed on the HMD 18 from the captured image. In the case where the imaging apparatus 12a is configured with a stereo camera, the distance from the imaging plane to each of the markers is obtained by the principle of triangulation on the basis of the parallax between corresponding points acquired from right and left images. Then, by integrating the information regarding the positions of multiple captured markers in the image and the information regarding the distances to the markers, the image analysis section 132 estimates the position and posture of the HMD 18 as a whole.

[0048] The target is not limited to the HMD 18 as discussed above. The position and posture information regarding the user’s hand as the target may be acquired on the basis of images of light-emitting markers disposed on the input apparatus, not depicted. Further, it is possible to use in combination the techniques of image analysis for tracking a part of the user’s body using contour lines and for recognizing the face or the target having a specific pattern through pattern matching. Depending on the configuration of the imaging apparatus 12a, the distance to the target may be identified by measuring the reflection of infrared rays as described above. That is, the techniques of image analysis are not limited to anything specific as long as they serve to acquire the position and posture of a subject through image analysis.

[0049] The sensor value acquisition section 134 is implemented using the input section 38, communication section 32, and main memory 26 in FIG. 3, for example. The sensor value acquisition section 134 acquires the output values of the IMU sensors 64, i.e., angular velocity and acceleration data, from the HMD 18 at a predetermined rate. The sensor value transmission section 136 is implemented using the output section 36 and communication section 32 in FIG. 3, for example. The sensor value transmission section 136 transmits the output values of the IMU sensors 64 to the information processing apparatus 10b at a predetermined rate, the output values having been acquired by the sensor value acquisition section 134.

[0050] The local information generation section 138 is implemented using the CPU 22 and main memory 26 in FIG. 3, for example. The local information generation section 138 generates the position and posture information regarding the HMD 18 in the camera coordinate system of the imaging apparatus 12a using the position and posture information acquired by the image analysis section 132 and the output values of the IMU sensors 64. In the description that follows, the position and posture information obtained in the camera coordinate system specific to each imaging apparatus will be referred to as “local information.” The acceleration and angular velocity on the three axes represented by the output values of the IMU sensors 64 are integrated for use in obtaining the amounts of change in the position and posture of the HMD 18.

[0051] The local information generation section 138 estimates a subsequent position and posture of the HMD 18 using the position and posture information regarding the HMD 18 identified at the time of the preceding frame and the changes in the position and posture of the HMD 18 based on the output values of the IMU sensors 64. By integrating the estimated position and posture information and the information regarding the position and posture obtained through analysis of captured images, the local information generation section 138 identifies with high accuracy the information regarding the position and position at the time of the next frame. The techniques for status estimation that use the Kalman filter and are known in the field of computer vision may be applied to this process.

[0052] The local information reception section 140 is implemented using the communication section 32 and input section 38 in FIG. 3, for example. The local information reception section 140 receives local information generated by the information processing apparatus 10b having the sub functions. The global information generation section 142 is implemented using the CPU 22 and main memory 26 in FIG. 3, for example. The global information generation section 142 generates the position and posture information regarding the HMD 18 in the world coordinate system independent of the imaging apparatuses 12a and 12b using at least either the local information generated by the local information generation section 138 in the own apparatus or the local information transmitted from the information processing apparatus 10b having the sub functions. In the description that follows, the position and posture information thus generated will be referred to as “global information.”

[0053] More specifically, the global information generation section 142 includes a transformation parameter acquisition section 144, an imaging apparatus switching section 146, and a coordinate transformation section 148. The transformation parameter acquisition section 144 acquires transformation parameters for transforming the position and posture information in each camera coordinate system into the world coordinate system by identifying the position and posture information regarding the imaging apparatuses 12a and 12b in the world coordinate system. The acquisition, at this time, of the transformation parameters takes advantage of the fact that if the HMM 18 is found in a region where the fields of view of the imaging apparatuses 12a and 12b overlap with each other (the region will be referred to as “field-of-view overlap region” hereunder), the local information obtained in the camera coordinate systems of both imaging apparatuses proves to be the same when transformed into global information.

[0054] When the transformation parameters are derived using the local information actually obtained during operation, the coordinate transformation is accomplished advantageously by taking into consideration error characteristics that may occur upon generation of the local information by each of the information processing apparatuses 10a and 10b. Another advantage is that there is no need to position the imaging apparatuses 12a and 12b with high precision where they are arranged. Also, the transformation parameter acquisition section 144 gradually corrects the transformation parameters in such a manner that the position and posture information thus obtained regarding the imaging apparatuses 12a and 12b in the world coordinate system of will be smoothed in the time direction or that their posture values will approach normal values.

[0055] The imaging apparatus switching section 146 switches the imaging apparatuses whose fields of view cover the HMD 18 to select the imaging apparatus for use in acquiring global information. In the case where the HMD 18 is found only in the image captured by one imaging apparatus, the global information is obviously generated using the local information generated by the information processing apparatus corresponding to that imaging apparatus. In the case where the HMD 18 is found in the fields of view of multiple imaging apparatuses, one of them is selected in accordance with predetermined rules. For example, the imaging apparatus closest to the HMD 18 is selected, and the global information is generated using the local information generated by the information processing apparatus corresponding to the selected imaging apparatus.

……
……
……

更多阅读推荐......