Sony Patent | Information Processing Device, Information Processing Method, And Recording Medium

Patent: Information Processing Device, Information Processing Method, And Recording Medium

Publication Number: 20200211275

Publication Date: 20200702

Applicants: Sony

Abstract

[Problem] Provided is a technique to reduce volume of data for a model reconstructed from an object in real space and to reconstruct a shape of the object as a further preferable aspect. [Solution] An information processing device includes: a first estimation unit configured to estimate a first distribution of geometric structure information regarding at least part of a face of an object in real space, in accordance with each of a plurality of beams of polarized light, having different polarization directions from each other, as a result detected by a polarization sensor; a second estimation unit configured to estimate a second distribution of information related to continuity of a geometric structure in the real space based on an estimation result of the first distribution; and a processing unit configured to determine a size of unit data for simulating three-dimensional space in accordance with the second distribution.

FIELD

[0001] The present disclosure relates to an Information processing device, an information processing method, and a recording medium.

BACKGROUND

[0002] In recent years, due to advancement of image identification techniques, it is becoming possible to three-dimensionally estimate (or measure) a position, an orientation, a shape, and the like of an object in real space (hereinafter, will also be referred to as a “real object”) based on an image captured by an imaging unit such as a digital camera. It is also becoming possible to use the position, the orientation, the shape, and the like of the real object estimated to reconstruct (restructure) a three-dimensional shape of the real object as a model, e.g., a polygon model. For example, Non Patent Literature 1 and Non Patent Literature 2 disclose an example of a technique to reconstruct the three-dimensional shape of the real object as a model based on a distance (depth) measured from the real object.

[0003] Further, in application of the technique described above, it is becoming possible to estimate (identify) a position and/or an orientation (i.e., a self-position) of a predetermined viewpoint, such as the imaging unit capturing the image of the real object, in the real space.

CITATION LIST

Non Patent Literature

[0004] Non Patent Literature 1: Matthias Neibner et al., “Real-time 3D Reconstruction at Scale using Voxel Hashing”, ACM Transactions on Graphics (TOG), 2013, [searched on Aug. 11, 2017], Internet <https://graphics.stanford.edu/.about.niessner/papers/2013/4hashing/ni essner2013hashing.pdf>

[0005] Non Patent Document 2: Frank Stenbrucker et al., “Volumetric 3D Mapping in Real-Time on a CPU”, ICRA, 2014, [searched on Aug. 11, 2017], Internet <http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.601- .15 21&rep=rep1&type=pdf>

SUMMARY

Technical Problem

[0006] When reconstructing the three-dimensional shape, for example, of the object in the real space as the model above, in other words, when reconstructing three-dimensional space, a wider region targeted for modeling tends to require larger volume of data for the model. Further, when reconstructing the three-dimensional shape of the object at higher accuracy, the volume of the data for the model tends to be even larger.

[0007] In view of the respects described above, the present disclosure provides a technique to reduce the volume of the data for the model reconstructed from the object in the real space and to reconstruct the shape of the object as a further preferable aspect.

Solution to Problem

[0008] According to the present disclosure, an information processing device is provided that includes: a first estimation unit configured to estimate a first distribution of geometric structure information regarding at least a part of a face of an object in real space, in accordance with each of a plurality of beams of polarized light, having different polarization directions from each other, as a result detected by a polarization sensor; a second estimation unit configured to estimate a second distribution of information related to continuity of a geometric structure in the real space based on an estimation result of the first distribution; and a processing unit configured to determine a size of unit data for simulating three-dimensional space in accordance with the second distribution.

[0009] Moreover, according to the present disclosure, an information processing method performed by a computer, the information processing method is provided that includes: estimating a first distribution of geometric structure information regarding at least a part of a face of an object in real space, in accordance with each of a plurality of beams of polarized light, having different polarization directions from each other, as a result detected by a polarization sensor; estimating a second distribution of information related to continuity of a geometric structure in the real space based on an estimation result of the first distribution; and determining a size of unit data for simulating three-dimensional space in accordance with the second distribution.

[0010] Moreover, according to the present disclosure, a recording medium is provided that is recorded a program for causing a computer to execute: estimating a first distribution of geometric structure information regarding at least a part of a face of an object in real space, in accordance with each of a plurality of beams of polarized light, having different polarization directions from each other, as a result detected by a polarization sensor; estimating a second distribution of information related to continuity of a geometric structure in the real space based on an estimation result of the first distribution; and determining a size of unit data for simulating three-dimensional space in accordance with the second distribution.

Advantageous Effects of Invention

[0011] As has been described above, the present disclosure provides a technique to reduce volume of data for a model reconstructed from an object in real space and to reconstruct a shape of the object as a further preferable aspect.

[0012] Note that the effects described above are not necessarily limitative. In addition to or in place of the effects described above, any one of effects described in this specification or other effects grasped from this specification may be encompassed within the present disclosure.

BRIEF DESCRIPTION OF DRAWINGS

[0013] FIG. 1 is an explanatory diagram illustrating a schematic configuration example of an information processing system according to an embodiment of the present disclosure.

[0014] FIG. 2 is an explanatory diagram illustrating a schematic configuration example of an input/output device according to the embodiment.

[0015] FIG. 3 is a block diagram illustrating a functional configuration example of the information processing system according to the embodiment.

[0016] FIG. 4 is an explanatory diagram illustrating an exemplary flow of a process performed in a geometric continuity estimation unit.

[0017] FIG. 5 is an explanatory diagram illustrating an overview of a geometric continuity map.

[0018] FIG. 6 is an explanatory diagram, each illustrating an overview of the geometric continuity map.

[0019] FIG. 7 is an explanatory diagram illustrating an exemplary flow of a process performed in an integrated processing unit.

[0020] FIG. 8 is an explanatory diagram illustrating an exemplary flow of a process to merge voxels into one and/or split the voxel.

[0021] FIG. 9 is an explanatory diagram illustrating an exemplary result of controlling a size of the voxel.

[0022] FIG. 10 is a flowchart illustrating an exemplary flow of a series of process steps performed in the information processing system according to the embodiment.

[0023] FIG. 11 is a functional block diagram illustrating a configuration example of a hardware configuration in an information processing device included in an information processing system according to an embodiment of the present disclosure.

DESCRIPTION OF EMBODIMENTS

[0024] Hereinafter, a preferred embodiment of the present disclosure will be described in detail with reference to the accompanying drawings. Note that, in this specification and the accompanying drawings, structural elements that have substantially identical functions and structures are denoted with the same reference signs, and repeated explanation of these structural elements is thus omitted.

[0025] Note that the description will be provided in the following order.

[0026] 1.* Schematic configuration*

[0027] 1.1.* System configuration*

[0028] 1.2. Configuration of input/output device

[0029] 2. Study of 3D modeling

[0030] 3.* Technical feature*

[0031] 3.1.* Functional configuration*

[0032] 3.2.* Process*

[0033] 4.* Hardware configuration*

[0034] 5.* Conclusion*

[0035] 1.* Schematic Configuration*

[0036] <1.1. System Configuration>First, a schematic configuration example of an information processing system according to an embodiment of the present disclosure will be described with reference to FIG. 1. FIG. 1 is an explanatory diagram illustrating the schematic configuration example of the information processing system according to the embodiment of the present disclosure, and illustrates an example of displaying to a user various contents based on a typically-called augmented reality (AR) technique.

[0037] In FIG. 1, an object positioned in real space (e.g., a real object) is schematically illustrated with reference sign m111. Additionally, virtual contents (e.g., virtual objects), each displayed to be superimposed in the real space, are schematically illustrated with reference signs v131 and v133. In other words, an information processing system 1 according to this embodiment displays to the user the object in the real space, such as the real object m111, with the virtual object superimposed on the object in the real space by using, for example, the AR technique. Note that FIG. 1 illustrates both the real object and the virtual objects such that the feature of the information processing system according to this embodiment is more easily identified.

[0038] As illustrated in FIG. 1, the information processing system 1 according to this embodiment includes an information processing device 10 and an input/output device 20. The information processing device 10 and the input/output device 20 are configured to transmit/receive information to/from each other via a predetermined network. The type of the network connecting the information processing device 10 with the input/output device 20 is not particularly limited. As a specific example, the network may be a typical wireless network such as a Wi-Fi (registered trademark) standard network. Alternatively, as another example, the network may be an internet, a leased line, a local area network (LAN), a wide area network (WAN), or the like. Still alternatively, the network may include a plurality of networks or may be at least partially wired.

[0039] The input/output device 20 is configured to acquire various input information and to display various output information for the user holding the input/output device 20. The information processing device 10 is configured to control the input/output device 20 to display the output information based on the input information acquired by the input/output device 20. For example, the input/output device 20 acquires information to identify the real object m111 (e.g., an image of the real space captured) as the input information, and outputs the information acquired to the information processing device 10. The information processing device 10 identifies a position and/or an orientation of the real object m111 in the real space based on the information acquired from the input/output device 20. Then, based on a result of the identification, the information processing device 10 causes the input/output device 20 to display the virtual object v131 and the virtual object v133. Under this control, the input/output device 20 displays to the user the virtual objects v131 and v133 based on the AR technique, in a way that the virtual objects v131 and v133 are superimposed on the real object m111.

[0040] The input/output device 20 is, for example, a typically-called head mounted device that is worn on at least part of a head of the user, and may be configured to detect a viewpoint of the user. With such a configuration, the information processing device 10 identifies, for example, a desired target at which the user gazes (e.g., the real object m111, the virtual object v131, the virtual object v133, or the like) based on the viewpoint of the user detected by the input/output device 20. In this case, the information processing device 10 may specify the desired target as an operational target. Alternatively, the information processing device 10 may regard a predetermined operation of the input/output device 20 input by the user as a trigger to identify a target to which the viewpoint of the user is directed, and specify the target as the operational target. Accordingly, the information processing device 10 may specify the operational target and execute a process related to the operational target, so as to provide various services to the user via the input/output device 20.

[0041] As has been described, the information processing system according to this embodiment identifies the object in the real space (real object), and here, a more specific configuration example of the information processing system will be described. As illustrated in FIG. 1, the input/output device 20 according to this embodiment includes a depth sensor 201 and a polarization sensor 230.

[0042] The depth sensor 201 acquires information to estimate a distance between a predetermined viewpoint and the object positioned in the real space (the real object), and transmits the information acquired to an information processing device 100. Hereinafter, the information that the depth sensor 201 acquires to estimate the distance between the predetermined viewpoint and the real object will also be referred to as “depth information”.

[0043] In the example illustrated in FIG. 1, the depth sensor 201 is a typical stereo camera that includes a plurality of imaging units, i.e., an imaging unit 201a and an imaging unit 201b. The imaging units 201a and 201b capture images of the object positioned in the real space from respective viewpoints that are different from each other. In this case, the depth sensor 201 transmits the image captured by each of the imaging units 201a and 201b to the information processing device 100.

[0044] With this configuration, a plurality of images are captured from the different viewpoints, and based on, for example, parallax between the plurality of images, it is possible to estimate (calculate) the distance between the predetermined viewpoint (e.g., a position of the depth sensor 201) and a subject (i.e., the real object captured in each of the images). Thus, it is also possible, for example, to generate a typically-called depth map where the distance estimated between the predetermined viewpoint and the subject is mapped out on an imaging plane.

[0045] Note that, when it is possible to estimate the distance between the predetermined viewpoint and the object in the real space (real object), a configuration of a part corresponding to the depth sensor 201 or a method to estimate the distance is not particularly limited. As a specific example, the distance between the predetermined viewpoint and the real object may be measured based on a method such as a multi-camera stereo, moving parallax, a time of flight (TOF), or a structured light system. Here, the TOF is a measurement of time taken by light, e.g., infrared light, radiated to the subject (i.e., the real object) to return after reflecting from the subject, and the time is measured for each pixel. Based on a result of the measurement, the image including the distance to the subject (depth), in other words, the depth map is obtained. The structured light system is to radiate the subject with a pattern of light, e.g., the infrared light, to capture the image. Then, based on a change in the pattern obtained from the image captured, the depth map including the distance to the subject (depth) is obtained. The moving parallax is a method of measuring the distance to the subject based on the parallax, even in a case of a monocular camera. Specifically, the monocular camera moves to capture the images of the subject from different viewpoints, and based on the parallax between the images captured, the distance to the subject is measured. Note that, with various sensors that identify a distance and a direction of the moving camera, it is possible to more accurately measure the distance to the subject. The configuration of the depth sensor 201 (e.g., the monocular camera, the stereo camera, or the like) may be changed in accordance with the method of measuring the distance.

[0046] The polarization sensor 230 detects light polarized in a predetermined polarization direction (hereinafter, will be simply referred to as “polarized light”) out of light reflecting from the object positioned in the real space, and transmits information corresponding to a result of detecting the polarized light to the information processing device 100. In the information processing system 1 according to this embodiment, the polarization sensor 230 is configured to detect a plurality of beams of polarized light (more preferably, three or more beams of polarized light), each having a different polarization direction from the others. Hereinafter, the information corresponding to the polarized light detected by the polarization sensor 230 will also be referred to as “polarization information”.

[0047] As a specific example, the polarization sensor 230 is a typically-called polarization camera, and captures a polarization image based on the light polarized in the predetermined polarization direction. Here, the polarization image corresponds to the information in which the polarization information is mapped out on the imaging plane (in other words, an image plane) of the polarization camera. In this case, the polarization sensor 230 transmits the polarization image captured to the information processing device 100.

[0048] Additionally, the polarization sensor 230 may preferably be configured to capture the polarized light coming from a region that is at least partially superimposed on (ideally, a region substantially matching) a region in the real space, i.e., the region in the real space from which the depth sensor 201 acquires the information to estimate the distance. Note that, when each of the depth sensor 201 and the polarization sensor 230 is fixed at a predetermined position, information indicating the position of each of the depth sensor 201 and the polarization sensor 230 in the real space may be previously obtained to be used as known information.

[0049] Further, as illustrated in FIG. 1, the depth sensor 201 and the polarization sensor 230 are preferably held in a shared device (e.g., the input/output device 20). In this case, a relative positional relationship that each of the depth sensor 201 and the polarization sensor 230 has with respect to the shared device may be previously calculated. Thus, based on a position and an orientation of the shared device, it is possible, for example, to estimate a position and an orientation of each of the depth sensor 201 and the polarization sensor 230.

[0050] Further, the shared device, in which the depth sensor 201 and the polarization sensor 230 are held (e.g., the input/output device 20) may be configured to be movable. In this case, a technique called self-position estimation may be applied to estimate the position and the orientation of the shared device in the real space.

[0051] Next, as a more specific example of the technique to estimate a position and an orientation of a predetermined device in the real space, a technique called simultaneous localization and mapping (SLAM) will be described. The SLAM uses various sensors, an encoder, an imaging unit such as a camera, or the like to concurrently perform the self-position estimation and construct a map of an environment. As a more specific example, based on a moving image captured by the imaging unit, the SLAM (particularly visual SLAM) sequentially restores a three-dimensional shape of a scene (or the subject) captured. Then, the SLAM correlates a restored result of the scene captured with a position and an orientation of the imaging unit detected, so as to construct the map of the environment surrounding the imaging unit and estimate the position and the orientation of the imaging unit in the environment. Note that with various sensors, such as an acceleration sensor or an angular velocity sensor, provided to a device in which the imaging unit is held, it is possible to estimate the position and the orientation of the imaging unit based on results detected by the various sensors (as relative change information). It is naturally to be understood that, when it is possible to estimate the position and the orientation of the imaging unit, the estimation method is not necessarily limited to the method based on the results detected by the various sensors, such as the acceleration sensor or the angular velocity sensor.

[0052] Further, at least one of the depth sensor 201 and the polarization sensor 230 may be configured to be movable separately from the other. In this case, the depth sensor 201 configured to be movable or the polarization sensor 230 configured to be movable preferably has its own position and its own orientation in the real space estimated separately, based on, for example, the self-position estimation technique described above, or other techniques.

[0053] The information processing device 100 acquires the depth information and the polarization information from the depth sensor 201 and the polarization sensor 230, but may instead acquire the information above from the input/output device 20. In this case, for example, the information processing device 100 may identify the real object positioned in the real space based on the depth information and the polarization information acquired, so as to generate a model in which the three-dimensional shape of the real object is reconstructed. Further, based on the depth information and the polarization information acquired, the information processing device 100 may correct the model generated. A process to generate the model and a process to correct the model will be separately described in detail later.

[0054] Note that the configurations described above are merely illustrative, and thus the system configuration of the information processing system 1 according to this embodiment is not necessarily limited to the example illustrated in FIG. 1. As a specific example, the input/output device 20 and the information processing device 10 may be integrally formed. A configuration and a process of each of the input/output device 20 and the information processing device 10 will be separately described in detail later.

[0055] The schematic configuration example of the information processing system according to the embodiment of the present disclosure has been described above with reference to FIG. 1.

[0056] <1.2. Configuration of Input/Output Device>

[0057] Next, a schematic configuration example of the input/output device 20 according to this embodiment as illustrated in FIG. 1 will be described with reference to FIG. 2. FIG. 2 is an explanatory diagram illustrating the schematic configuration example of the input/output device according to this embodiment.

[0058] As has been described, the input/output device 20 according to this embodiment is the typically-called head mounted device that is worn on at least part of the head of the user. For example, in the example illustrated in FIG. 2, the input/output device 20 is a typically-called eyewear (eyeglasses) device, and at least one of a lens 293a and a lens 293b is a transmission-type display (a display unit 211). The input/output device 20 includes the imaging unit 201a, the imaging unit 201b, the polarization sensor 230, an operation unit 207, and a holding unit 291, each corresponding to a part of a frame of the eyeglasses. Further, the input/output device 20 may include an imaging unit 203a and an imaging unit 203b. Note that, hereinafter, various descriptions will be provided on an assumption that the input/output device 20 includes the imaging units 203a and 203b. When the input/output device 20 is worn on the head of the user, the holding unit 291 holds each of the display unit 211, the imaging unit 201a, the imaging unit 201b, the polarization sensor 230, the imaging unit 203a, the imaging unit 203b, and the operation unit 207 in a predetermined position with respect to the head of the user. Note that the imaging unit 201a, the imaging unit 201b, and the polarization sensor 230 respectively correspond to the imaging unit 201a, the imaging unit 201b, and the polarization sensor 230 illustrated in FIG. 1. While not illustrated in FIG. 2, the input/output device 20 may also include a sound collecting unit for collecting a voice of the user.

[0059] Here, a more specific configuration of the input/output device 20 will be described. For example, in the example illustrated in FIG. 2, the lens 293a corresponds to a right-eye lens, and the lens 293b corresponds to a left-eye lens. In other words, when the input/output device 20 is worn, the holding unit 291 holds the display unit 211 (i.e., the lenses 293a and 293b) in a way that the display unit 211 is positioned in front of eyes of the user.

[0060] Each of the imaging units 201a and 201b is the typical stereo camera, and is held by the holding unit 291 to face in a direction in which the head of the user faces (i.e., frontward of the user) when the input/output device 20 is worn on the head of the user. In this state, the imaging unit 201a is held in a vicinity of a right eye of the user, and the imaging unit 201b is held in a vicinity of a left eye of the user. With such a configuration, the imaging units 201a and 201b capture the images of the subject positioned frontward of the input/output device 20 (i.e., the real object positioned in the real space) from respective positions that are different from each other. Accordingly, the input/output device 20 acquires the images of the subject positioned frontward of the user; and concurrently, based on the parallax between the images captured by the imaging units 201a and 201b, it is possible to calculate the distance from the input/output device 20 (in addition to the viewpoint of the user) to the subject.

[0061] As has been described, when it is possible to measure the distance between the input/output device 20 and the subject, the configuration or the method to measure the distance is not particularly limited.

[0062] Each of the imaging units 203a and 203b is also held by the holding unit 291 to have an eyeball of the user positioned within the corresponding imaging range when the input/output device 20 is worn on the head of the user. As a specific example, the imaging unit 203a is held to have the right eye of the user positioned in the imaging range. With such a configuration, based on an image on a right eyeball captured by the imaging unit 203a and a positional relationship between the right eye and the imaging unit 203a, it is possible to identify a direction in which a viewpoint from the right eye faces. Similarly, the imaging unit 203b is held to have the left eye of the user positioned within the imaging range. In other words, based on an image on a left eyeball captured by the imaging unit 203b and a positional relationship between the left eye and the imaging unit 203b, it is possible to identify a direction in which a viewpoint from the left eye faces. In the example illustrated in FIG. 2, the input/output device 20 is configured to include both the imaging units 203a and 203b, but alternatively may include only one of the imaging units 203a and 203b.

[0063] The polarization sensor 230 here corresponds to the polarization sensor 230 illustrated in FIG. 1, and is held by the holding unit 291 to face in the direction in which the head of the user faces (i.e., frontward of the user) when the input/output device 20 is worn on the head of the user. With such a configuration, the polarization sensor 230 captures the polarization image in space in front of the eyes of the user wearing the input/output device 20. Note that the position of the polarization sensor 230 illustrated in FIG. 2 is merely illustrative; and when the polarization sensor 230 is capable of capturing the polarization image in the space in front of the eyes of the user wearing the input/output device 20, the position of the polarization sensor 230 is not limited.

[0064] The operation unit 207 is configured to receive the operation of the input/output device 20 input by the user. The operation unit 207 may be an input device such as a touch panel or a button. The operation unit 207 is held by the holding unit 291 at a predetermined position in the input/output device 20. For example, in the example illustrated in FIG. 2, the operation unit 207 is held at a position corresponding to a temple of the eyeglasses.

[0065] The input/output device 20 according to this embodiment may be provided with, for example, the acceleration sensor or the angular velocity sensor (a gyro sensor) to detect a movement of the head of the user wearing the input/output device 20 (in other words, an own movement of the input/output device 20). As a specific example of detecting the movement of the head of the user, the input/output device 20 may detect each component in a yaw direction, in a pitch direction, and in a roll direction, so as to identify a change in at least one of a position and an orientation of the head of the user.

[0066] The configuration described above causes the input/output device 20 according to this embodiment to identify a change in its own position and/or orientation in accordance with the movement of the head of the user. The configuration also causes the input/output device 20 to display the virtual content (i.e., the virtual object) on the display unit 211 based on the AR technique in the way that the virtual content is superimposed on the real object positioned in the real space. In this state, the input/output device 20 may estimate its own position and orientation in the real space (i.e., the self-position) based on, for example, the technique called SLAM or the like that has been described above, and use a result of the estimation to display the virtual object.

[0067] An example of a head mounted display (HMD) device applicable as the input/output device 20 includes a see-through HMD, a video see-through HMD, and a retinal projection HMD.

[0068] The see-through HMD uses, for example, a half mirror or a transparent light guide plate in order to hold a virtual image optical system formed of a transparent light guide unit or the like in front of the eyes of the user and display an image inside the virtual image optical system. Thus, when wearing the see-through HMD, the user views the image displayed inside the virtual image optical system, while including an external landscape within a field of view of the user. With such a configuration, the see-through HMD may use, for example, the AR technique to display an image of the virtual object to be superimposed on an optical image of the real object positioned in the real space, in accordance with at least one of a position and an orientation of the see-through HMD that has been identified. A specific example of the see-through HMD includes a typically-called eyeglasses wearable device in which a part corresponding to each of lenses of the eyeglasses is configured as the virtual image optical system. For example, the input/output device 20 illustrated in FIG. 2 corresponds to the example of the see-through HMD.

[0069] When the video see-through HMD is worn on the head or a face of the user, the video see-through HMD is worn to cover the eyes of the user such that its display unit such as a display is held in front of the eyes of the user. The video see-through HMD includes an imaging unit configured to capture an image of its surrounding landscape, and displays, on the display unit, the image of the landscape positioned frontward of the user and captured by the imaging unit. With such a configuration, the user wearing the video see-through HMD, while having a difficulty with directly including the external landscape within the field of his/her view, confirms the external landscape based on the image displayed on the display unit. In this state, the video see-through HMD may use, for example, the AR technique to display the virtual object to be superimposed on the image of the external landscape, in accordance with at least one of a position and an orientation of the video see-through HMD that has been identified.

[0070] The retinal projection HMD holds a projection unit in front of the eyes of the user, and the project unit projects an image on each of the eyes of the user in a way that the image is superimposed on the external landscape. More specifically, in the retinal projection HMD, the projection unit projects the image directly on a retina of each of the eyes of the user such that the image is formed on the retina. Such a configuration causes the user to view a clearer image even when the user is short sighted or far sighted. Additionally, the user wearing the retinal projection HMD views the image projected from the projection unit, while including the external landscape within the field of his/her view. With such a configuration, the retinal projection HMD uses, for example, the AR technique to display the image of the virtual object to be superimposed on the optical image of the real object positioned in the real space, in accordance with at least one of a position and an orientation of the retinal projection HMD that has been identified.

[0071] The configuration example of the input/output device 20 according to this embodiment has been described above on an assumption that the AR technique is applied, but the configuration of the input/output device 20 is not limited thereto. For example, on an assumption that a VR technique is applied, the input/output device 20 according to this embodiment may employ an HMD called an immersive HMD. As with the video see-through HMD, the immersive HMD is worn to cover the eyes of the user such that its display unit such as a display is held in front of the eyes of the user. Thus, the user wearing the immersive HMD has the difficulty with directly including the external landscape (i.e., a real-world landscape) within the field of his/her view, and thus only views an image displayed on the display unit. With such a configuration, the immersive HMD provides a sense of immersion to the user viewing the image.

[0072] Note that the configuration of the input/output device 20 described above is merely illustrative and thus not necessarily limited to the configuration illustrated in FIG. 2. As a specific example, in accordance with a use or a function of the input/output device 20, an additional configuration may be employed for the input/output device 20. As a specific example for the additional configuration, the input/output device 20 may include, as an output unit configured to present information to the user, a sound output unit (e.g., a speaker or the like) for presenting voice or sound, an actuator for providing tactile or force feedback, or others.

[0073] The schematic configuration example of the input/output device according to the embodiment of the present disclosure has been described above with reference to FIG. 2.

[0074] 2. Study of 3D Modeling

[0075] Next, an overview of techniques for 3D modeling to reconstruct three-dimensional space, such as a case of reconstructing a three-dimensional shape or the like of an object in the real space (real object) as a model, e.g., a polygon model, will be described. Then, a technical object of the information processing system according to this embodiment will be summarized.

[0076] The 3D modeling uses, for example, an algorithm configured to hold information indicating a position of the object in the three-dimensional space; hold data (hereinafter, will also be referred to as “3D data”), such as data for a distance from a surface of the object or a weight based on the number of observations; and update the data based on information from a plurality of viewpoints (e.g., a depth or the like). The techniques for the 3D modeling include, as an example, a generally known technique for using the distance (depth) from the object in the real space detected by a depth sensor or the like.

[0077] On the other hand, when using the depth sensor represented by a TOF sensor or the like, resolution tends to be low, and further, an increase in the distance from the object to be detected as the depth tends to degrade an accuracy of the detection and increase an influence of noise. With such characteristics, when performing the 3D modeling based on the depth detected, there is a difficulty in acquiring information related to a geometric structure (in other words, a geometric feature) of the object in the real space (hereinafter, the information will also be referred to as “geometric structure information”) precisely and highly accurately with a relatively small number of observations.

[0078] In view of the circumstances, the information processing system according to this embodiment, as previously described, includes a polarization sensor configured to detect polarized light reflecting from the object positioned in the real space, and uses polarization information corresponding to the polarized light detected for the 3D modeling. Generally, when acquiring the geometric structure information based on a polarization image captured by the polarization sensor, the resolution tends to be higher compared with based on the depth information acquired by the depth sensor, and even with the increase in the distance from the object to be detected, the accuracy of the detection is less prone to be degraded. In other words, when performing the 3D modeling based on the polarization information, it is possible to acquire the geometric structure information of the object in the real space precisely and highly accurately with the relatively small number of observations. The 3D modeling using the polarization information will be separately described in detail later.

[0079] When reconstructing the three-dimensional space as the polygon model or the like, a wider region targeted for the 3D modeling tends to require larger volume of the 3D data (in other words, volume of data for the model). Such a problem may also arise in the case of the 3D modeling using the polarization information.

[0080] In view of these circumstances, the present disclosure provides a technique to reduce the volume of the data for the model reconstructed from the object in the real space and to reconstruct the shape of the object as a further preferable aspect. Specifically, in general techniques for the 3D modeling, the 3D data is evenly located on the surface of the object, and based on the 3D data, a polygon mesh or the like is generated. However, compared with a case of reconstructing a complex shape such as an edge, a simple shape such as a plane may be reconstructed based on less dense 3D data. Accordingly, based on the 3D modeling using the polarization information together with the characteristics described above, the information processing system according to the present disclosure reduces the volume of the data for the model while maintaining reconstruction of the three-dimensional space. Hereinafter, technical features of the information processing system according to this embodiment will be described in further detail.

[0081] 3.* Technical Features*

[0082] The technical features of the information processing system according to this embodiment will be described below.

[0083] <3.1. Functional Configuration>

[0084] First, a functional configuration example of the information processing system according to this embodiment will be described with reference to FIG. 3. FIG. 3 is a block diagram illustrating the functional configuration example of the information processing system according to this embodiment. Note that, in the example illustrated in FIG. 3, as with the example described with reference to FIG. 1, the description will be provided on an assumption that the information processing system 1 includes the input/output device 20 and the information processing device 10. In other words, the input/output device 20 and the information processing device 10 illustrated in FIG. 3 respectively correspond to the input/output device 20 and the information processing device 10 illustrated in FIG. 1. Additionally, the input/output device 20 will be described on an assumption that the input/output device 20 described with reference to FIG. 2 is employed.

[0085] As illustrated in FIG. 3, the input/output device 20 includes the depth sensor 201 and the polarization sensor 230. The depth sensor 201 here corresponds to the depth sensor 201 illustrated in FIG. 1 and the imaging units 201a and 201b illustrated in FIG. 2. The polarization sensor 230 here corresponds to the polarization sensor 230 illustrated in each of FIGS. 1 and 2. Each of the depth sensor 201 and the polarization sensor 230 has been described, and thus a detailed description thereof will be omitted.

[0086] Next, a configuration of the information processing device 10 will be described. As illustrated in FIG. 3, the information processing device 10 includes a self-position estimation unit 110, a depth estimation unit 120, a normal estimation unit 130, a geometric continuity estimation unit 140, and an integrated processing unit 150.

[0087] The self-position estimation unit 110 estimates the position of the input/output device 20 (particularly, the polarization sensor 230) in the real space. In this state, the self-position estimation unit 110 estimates the orientation of the input/output device 20 in the real space. Hereinafter, the position and the orientation of the input/output device 20 in the real space will collectively be referred to as the “self-position of the input/output device 20”. In other words, in the following description, the “self-position of the input/output device 20” includes at least one of the position and the orientation of the input/output device 20 in the real space.

[0088] Note that, when the self-position estimation unit 110 is capable of estimating the self-position of the input/output device 20, a technique related to the estimation or a configuration and information used for the estimation is not particularly limited. As a specific example, the self-position estimation unit 110 may estimate the self-position of the input/output device 20 based on the technique called SLAM that has been previously described. In this case, for example, the self-position estimation unit 110 may estimate the self-position of the input/output device 20 based on the depth information acquired by the depth sensor 201 and the change in position and/or orientation of the input/output device 20 detected by a predetermined sensor (e.g., the acceleration sensor, the angular velocity sensor, or the like).

[0089] Further, the self-position estimation unit 110 may previously calculate the relative positional relationship of the polarization sensor 230 to the input/output device 20, so as to calculate a self-position of the polarization sensor 230 based on the self-position of the input/output device 20 estimated.

[0090] Then, the self-position estimation unit 110 outputs information to the integrated processing unit 150, the information corresponding to the self-position of the input/output device 20 (in addition to the self-position of the polarization sensor 230) estimated.

[0091] The depth estimation unit 120 acquires the depth information from the depth sensor 201, and estimates the distance between the predetermined viewpoint (e.g., the depth sensor 201) and the object positioned in the real space based on the depth information acquired. Note that in the following description, the depth estimation unit 120 estimates the distance between the input/output device 20 in which the depth sensor 201 is held (strictly, a predetermined position as a datum of the input/output device 20) and the object positioned in the real space.

[0092] As a specific example, when the depth sensor 201 is the stereo camera, the depth estimation unit 120 estimates the distance between the input/output device 20 and the subject based on the parallax between the images of the subject captured by the plurality of the imaging units included in the stereo camera (e.g., the imaging units 201a and 201b illustrated in FIGS. 1 and 2). In this state, the depth estimation unit 120 may generate the depth map where the distance estimated is mapped out on the imaging plane. Then, the depth estimation unit 120 outputs, to the geometric continuity estimation unit 140 and the integrated processing unit 150, information (e.g., the depth map) corresponding to the distance estimated between the input/output device 20 and the object positioned in the real space.

[0093] A normal estimation unit 109 acquires a polarization image from the polarization sensor 230. Based on polarization information included in the polarization image acquired, the normal estimation unit 109 estimates information related to the geometric structure (e.g., a normal) of at least part of a face (e.g., the surface) of the object in the real space captured in the polarization image, that is, the geometric structure information.

[0094] The geometric structure information includes, for example, information corresponding to an amplitude and a phase obtained by fitting a cosine curve to a polarization value of each polarized light detected, or information related to the normal of the face of the object calculated based on the amplitude and the phase obtained (hereinafter, the information will also be referred to as “normal information”). The normal information includes information as a normal vector indicated as a zenith angle and an azimuth angle, information as the normal vector indicated in a three-dimensional coordinate system, or the like. The zenith angle may be calculated based on the amplitude of the cosine curve. The azimuth angle may be calculated based on the phase of the cosine curve. It is naturally to be understood that the zenith angle and the azimuth angle may be converted to the three-dimensional coordinate system, such as an X-Y-Z coordinate system. Here, information regarding a distribution of the normal information, i.e., the normal information mapped out on the image plane of the polarization image, corresponds to a typically-called normal map. Further, information related to the polarized light before being subjected to the imaging process above, i.e., the polarization information, may be used as the geometric structure information. Note that a distribution of the geometric structure information (for example, the normal information) such as a normal map corresponds to an example of a “first distribution”.

……
……
……

更多阅读推荐......