HTC Patent | Object tracking method and host
Patent: Object tracking method and host
Patent PDF: 20240062384
Publication Number: 20240062384
Publication Date: 2024-02-22
Assignee: Htc Corporation
Abstract
The embodiments of the disclosure provide an object tracking method and a host. The method includes: determining a reference motion state based on a first predicted motion state and a calibration factor; obtaining a first motion data of the host and a second motion data of a reference object; determining a first relative pose of the reference object relative to the host based on the first motion data, the second motion data, and the reference motion state; and determining a specific pose of the reference object based on the first relative pose.
Claims
What is claimed is:
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
Description
CROSS-REFERENCE TO RELATED APPLICATION
This application claims the priority benefit of U.S. provisional application Ser. No. 63/398,523, filed on Aug. 16, 2022. The entirety of the above-mentioned patent application is hereby incorporated by reference herein and made a part of this specification.
BACKGROUND
1. Field of the Invention
The present disclosure generally relates to a tracking mechanism, in particular, to an object tracking method and a host.
2. Description of Related Art
See FIG. 1, which shows a conventional mechanism for tracking a reference object. In FIG. 1, a host (e.g., a head-mounted display (HMD)) may track the pose of the reference object (e.g., a handheld VR controller) by using, for example, the inside-out tracking mechanism, and the obtained pose may be referred to a visual relative pose visualPoseBC. However, the quality of the visual relative pose visualPoseBC may be affected by problems such as jitter, delay, and/or lost tracking. Therefore, motion data collected by inertial measurement units (IMU) on the reference object may be used to determine the relative pose PoseBC of the reference object relative to the environment, and the relative pose PoseBC can be fused with the visual relative pose visualPoseBC based on the host pose of the host for improving the tracking performance, wherein the host pose may be determined by the host via using simultaneous localization and mapping (SLAM).
In general, the motion data (e.g., IMU data) is used to characterize the relative pose of the IMU relative to the world and/or the environment. For example, in FIG. 1, the relative pose PoseCG may be the relative pose the host relative to a reference point (which may be the origin of the coordinate system G of the environment. The relative pose PoseGW may be the relative pose of the reference point relative to the world (which corresponds to the coordinate system W).
To better fuse the relative pose PoseBC with the visual relative pose visualPoseBC, the relative poses PoseCG and PoseGW need to be taken into consideration. However, in the conventional art, the relative pose PoseBC can be better fused with the visual relative pose visualPoseBC only if the relative pose PoseGW stays constant. That is, if the relative pose PoseGW is varying, the relative pose PoseBC cannot be accurately fused with the visual relative pose visualPoseBC, such that the pose of the reference object would not be accurately tracked.
For example, if the host and the reference object are in a car (i.e., the environment where the host and the reference object locate), and the reference point is a particular point on the car, the coordinate system G can be assumed to be the coordinate system used within the car, and the coordinate system W can be assumed to be the coordinate system corresponding to the environment outside of the car (which can be understood as the coordinate system of the world).
In a case where the car is static, since relative pose PoseGW is constant, the relative pose PoseBC can be accurately fused with the visual relative pose visualPoseBC. However, in a case where the car is moving, since relative pose PoseGW is varying, the relative pose PoseBC cannot be properly fused with the visual relative pose visualPoseBC, such that the pose of the reference object would not be accurately tracked.
In addition, if the host and the reference object are in an environment with few feature points (e.g., the environment with white walls), since the translation component in of the host pose are almost unavailable, the relative pose PoseBC cannot be properly fused with the visual relative pose visualPoseBC as well.
SUMMARY OF THE INVENTION
Accordingly, the disclosure is directed to an object tracking method and a host, which may be used to solve the above technical problems.
The embodiments of the disclosure provide an object tracking method, adapted to a host, comprising: determining a reference motion state based on a first predicted motion state and a calibration factor; obtaining a first motion data of the host and a second motion data of a reference object; determining a first relative pose of the reference object relative to the host based on the first motion data, the second motion data, and the reference motion state; and determining a specific pose of the reference object based on the first relative pose.
The embodiments of the disclosure provide a host, comprising a non-transitory storage circuit and a processor. The non-transitory storage circuit stores a program code. The processor is coupled to the non-transitory storage circuit and accesses the program code to perform: determining a reference motion state based on a first predicted motion state and a calibration factor; obtaining a first motion data of the host and a second motion data of a reference object; determining a first relative pose of the reference object relative to the host based on the first motion data, the second motion data, and the reference motion state; and determining a specific pose of the reference object based on the first relative pose.
BRIEF DESCRIPTION OF THE DRAWINGS
The accompanying drawings are included to provide a further understanding of the invention, and are incorporated in and constitute a part of this specification. The drawings illustrate embodiments of the invention and, together with the description, serve to explain the principles of the disclosure.
FIG. 1 shows a conventional mechanism for tracking a reference object.
FIG. 2 shows a schematic diagram of a host according to an embodiment of the disclosure.
FIG. 3 shows a schematic diagram of the iterative process of the proposed method according to an embodiment of the disclosure.
FIG. 4 shows an application scenario according to an embodiment of the disclosure.
FIG. 5 shows a flow chart of the object tracking method according to an embodiment of the disclosure.
DESCRIPTION OF THE EMBODIMENTS
Reference will now be made in detail to the present preferred embodiments of the invention, examples of which are illustrated in the accompanying drawings. Wherever possible, the same reference numbers are used in the drawings and the description to refer to the same or like parts.
See FIG. 2, which shows a schematic diagram of a host according to an embodiment of the disclosure. In FIG. 2, the host 200 can be any device capable of tracking the pose of other to-be-tracked objects (e.g., handheld controllers) via performing inside-out tracking mechanisms, but the disclosure is not limited thereto. In some embodiment, the host 200 can be the HMD that provides AR/VR services/contents or the like.
In FIG. 2, the host 200 includes a storage circuit 202 and a processor 204. The storage circuit 202 is one or a combination of a stationary or mobile random access memory (RAM), read-only memory (ROM), flash memory, hard disk, or any other similar device, and which records a plurality of modules that can be executed by the processor 204.
The processor 204 may be coupled with the storage circuit 202, and the processor 204 may be, for example, a general purpose processor, a special purpose processor, a conventional processor, a digital signal processor (DSP), a plurality of microprocessors, one or more microprocessors in association with a DSP core, a controller, a microcontroller, Application Specific Integrated Circuits (ASICs), Field Programmable Gate Array (FPGAs) circuits, any other type of integrated circuit (IC), a state machine, and the like.
In the embodiments of the disclosure, the processor 204 may access the modules
stored in the storage circuit 202 to implement the object tracking method provided in the disclosure, which would be further discussed in the following.
See FIG. 3, which shows a schematic diagram of the iterative process of the proposed method according to an embodiment of the disclosure.
In FIG. 3, the iterative process can be regarded as including two sub-processes: (1) the state fusion process; and (2) the state prediction process.
In one embodiment, the processor 204 determines a reference motion state (referred to as , wherein j is a stage index associated with the state fusion process) based on a predicted motion state (referred to as XIMUj) and a calibration factor (referred to as ΔXj), which can be referred to the lower part of the state fusion process. In one embodiment, the reference motion state can be determined by combining the predicted motion state XIMUj with the calibration factor ΔXj, and the reference motion state can be characterized as “←XIMUj⊕ΔXj” in FIG. 3. In one embodiment, how the predicted motion state XIMUj is determined would be explained in the discussions associated with the state prediction process.
In one embodiment, in the procedure of determining the calibration factor ΔXj, the processor 204 obtain a specific gain (referred to as Kgainj), the visual relative pose (i.e., visualjPoseBC mentioned in the above) of the reference object relative to the host 200 and a motion relative pose (referred to as IMUjPoseBC) of the reference object relative to the host 200. Afterwards, the processor 204 determines the calibration factor ΔXj based on the specific gain Kgainj, the visual relative pose visualjPoseBC, and the motion relative pose IMUjPoseBC.
In the embodiments of the disclosure, the specific gain Kgainj can be understood as a Kalman gain, which can be determined based on some parameters determined in the state prediction process (which would be discussed later). The visual relative pose visualjPoseBC can be the tracked visual pose of the reference object relative to the host 200, which can be determined by the processor 204 via performing the inside-out tracking mechanism. The motion relative pose IMUjPoseBC can be determined by the processor 204 based on the first motion data collected by a first motion detection circuit (e.g., IMU) on the host 200 and the second motion data collected by a second motion detection circuit (e.g., IMU) on the reference object, and how the motion relative pose IMUjPoseBC is determined would be explained in the following discussions associated with the state prediction process.
In the procedure of determining the calibration factor ΔXj, the processor 204 can firstly determine a pose difference between the visual relative pose visualjPoseBC and the motion relative pose IMUjPoseBC, wherein the pose difference can be represented as “visualjPoseBC⊖IMUjPoseBC”. Next, the processor 204 can determine the calibration factor ΔXj based on the specific gain Kgainj and the pose difference via, for example, multiplying the pose difference by the specific gain Kgainj. In this case, the calibration factor ΔXj can be characterized as “ΔXj=Kgainj(visualjPoseBC⊖IMUjPoseBC)” as exemplarily shown in the state fusion process of FIG. 3, but the disclosure is not limited thereto.
In the embodiments of the disclosure, the reference motion state can be used to determine a next predicted motion state in the state prediction process.
In the embodiments of the disclosure, the stage indexes used in the state fusion process and the state prediction process can be different. In FIG. 3, the stage index used in the state prediction process can be i, and the reference motion state (which can be understood as the reference motion state determined at j-th stage of the state fusion process) can correspond to the predicted motion state (referred to as Xi) determined at the i-th stage of the state prediction process.
In this case, the above-mentioned next predicted motion state may be understood as the predicted motion state determined at the (i+1)-th stage of the state prediction process (referred to as Xi+1), and the predicted motion state Xi+1 can be determined based on the predicted motion state Xi, the first motion data, and the second motion data.
However, instead of explaining the details of determining the predicted motion state Xi+1, how the predicted motion state XIMUj (i.e., the predicted motion state Xi) is determined would be used as an illustrative example for better understanding the concept of the disclosure.
In FIG. 3, the first motion data and the second motion data used to determine the predicted motion state Xi can be referred to as IMUCi−1 and IMUBi−1, respectively. In some embodiments, the first motion data IMUCi−1 may include raw IMU data (e.g., 3-axis accelerations and 3-axis angular velocities) collected at the (i−1)-th stage of the state prediction process by the first motion detection circuit on the host 200, and the second motion data IMUBi−1 may include raw IMU data (e.g., 3-axis accelerations and 3-axis angular velocities) collected at the (i−1)-th stage of the state prediction process by the second motion detection circuit on the reference object.
In the embodiments of the disclosure, the processor 204 can determine the predicted motion state Xi based on the first motion data IMUCi−1, the second motion data IMUBi−1, and a reference motion state (which can be understood as the predicted motion state Xi−1).
In one embodiment, the processor 204 may determine a dynamic function (referred to as ƒc) used in the coordinate system C, wherein the dynamic function ƒc may consider the reference motion state (i.e., the predicted motion state Xi−1), the first motion data IMUCi−1, the second motion data IMUBi−1, and the time difference (referred to as Δt) between i-th stage of the state prediction process and the (i−1)-th stage of the state prediction process.
In FIG. 3, the predicted motion state Xi can be characterized as “Xi=ƒc(Xi−1,IMUCi−1,IMUBi−1,Δt)”. In one embodiment, the dynamic function can output/generate/determine a first relative pose and parameters associated with the first motion data and the second motion data in response to the reference motion state , the first motion data IMUCi−1, the second motion data IMUBi−1, and the time difference Δt. That is, the predicted motion state Xi can include the first relative pose and parameters associated with the first motion data and the second motion data.
In the embodiments of the disclosure, the first relative pose can be represented by IMUiPoseBC, which can be understood as an i-th motion relative pose of the reference object relative to the host 200 at the i-th stage of the state prediction process. In some embodiments, the first relative pose can be characterized as “IMUiPoseBC=[TBiCi qBiCi VBiCi WBiCi]”, wherein TBiCi, qBiCi, VBiCi, WBiCi respectively corresponds to translation, orientation, velocity, and angular velocity of the reference object relative to the host 200 at i-th stage of the state prediction process, but the disclosure is not limited thereto.
In one embodiment, the i-th motion relative pose (i.e., IMUiPoseBC) can be used to determine the calibration factor for the j-th stage of the state fusion process, and the details of determining the calibration factor for the j-th stage of the state fusion process can be referred to the above descriptions, which would not be repeated herein.
In one embodiment, the parameters associated with the first motion data include intrinsic and extrinsic parameters associated with the first motion detection circuit at the i-th stage of the state prediction process, which can be referred to as StateCi. The parameters associated with the second motion data include intrinsic and extrinsic parameters associated with the second motion detection circuit at the i-th stage of the state prediction process, which can be referred to as StateBi.
Accordingly, the second predicted motion state Xi can be further characterized as “Xi=[IMUiPoseBC,StateCi,StateBi]” as shown in FIG. 3.
In one embodiment, the contents in the predicted motion state Xi can be used to determine the specific gain Kgainj (e.g., the Kalman gain) for the j-th stage of the state fusion process.
In one embodiment, during determining the specific gain Kgainj, the processor 204 may obtain a first reference gain factor (referred to as PIMUj), the predicted motion state XIMUj (which can be characterized by [IMUjPoseBC,StateCj,StateBj] based on the above teachings), and the visual relative pose visualjPoseBC, and accordingly determine the specific gain Kgainj.
At the j-stage of the state fusion process in FIG. 3, the specific gain Kgainj can be characterized as “Kgainj=PIMUjHIMUjT(HIMUjPIMUjHIMUjT+VCVj)−1”, wherein
(which can be understood as taking partial derivatives on the contents of the first predicted motion state XIMUj) and VCVj is the noise of the visual relative pose visualjPoseBC, but the disclosure is not limited thereto.
In one embodiment, the reference gain factor PIMUj can be updated based on the specific gain Kgainj and the predicted motion state XIMUj. In FIG. 3, the updated reference gain factor (referred to as ) can be characterized as “←PIMUj−KgainjHIMUjPIMUj”, but the disclosure is not limited thereto.
In one embodiment, the updated reference gain factor can be used to determine a new reference gain factor at the next stage (i.e., the (i+1)-th stage) of the state prediction process.
However, for better understanding the concept of the disclosure, the mechanism for determining the reference gain factor PIMUj would be used as an example, but the disclosure is not limited thereto.
In one embodiment, the reference gain factor PIMUj used at the j-th stage of the state fusion process may correspond (or be mapped) to another reference gain factor determined at the i-th stage of the state prediction process, which can be referred to as Pi.
Specifically, in the procedure of determining the reference gain factor Pi, the processor 204 may obtain an updated reference gain factor (referred to as Pi−1) and the predicted motion state Xi, wherein the updated reference gain factor Pi−1 can be understood as the reference gain factor updated in the previous stage of the state fusion process.
In one embodiment, the processor 204 determines the reference gain factor Pi based on the updated reference gain factor Pi−1 and the predicted motion state Xi. In FIG. 3, the reference gain factor Pi can be characterized as “Pi=FCi−1Pi−1FCi−1T+GCi−1QCi−1GCi−1T+GBi−1QBi−1GBi−1T”, wherein
QCi−1 is the noise of the first motion data IMUCi−1, and QBi−1 is the noise of the second motion data IMUBi−1, but the disclosure is not limited thereto.
Once the reference gain factor Pi is determined, the reference gain factor Pi can be used as the reference gain factor PIMUj at the j-th stage of the state fusion process for determining, for example, the specific gain Kgainj, and the associated details can be referred to the above teachings.
In brief, the predicted motion state Xi, the first relative pose IMUiPoseBC, and the reference gain factor Pi determined at the i-stage of the state prediction process can be respectively used as the predicted motion state XIMUj, the motion relative pose IMUjPoseBC, and the reference gain factor PIMUj, and the predicted motion state XIMUj, the motion relative pose IMUjPoseBC, and the reference gain factor PIMUj can be used to determine the specific gain Kgainj, the predicted motion state and the reference gain factor at the j-th stage of the state fusion process.
Once the predicted motion state and the reference gain factor are determined, the predicted motion state and the reference gain factor can be further used to determine the predicted motion state Xi+1 and the reference gain factor Pi+1 at the (i+1)-stage of the state prediction process. Accordingly, the processor 204 can continuously perform the iterative process in FIG. 3 for determining the parameters/factors shown in FIG. 3 at different stages of the state fusion process and/or the state prediction process.
In one embodiment, the processor 204 can determine a specific pose (referred to as IMUiPoseBG) of the reference object based on the first relative pose IMUiPoseBC.
In one embodiment, the processor 204 obtains a specific relative pose (e.g., the relative pose PoseCG mentioned in the above) of the host 200 relative to a reference coordinate system (e.g., the coordinate system G mentioned in the above) and determines the specific pose IMUiPoseBG of the reference object via combining the specific relative pose PoseCG with the first relative pose IMUiPoseBC. The details of combining the specific relative pose PoseCG with the first relative pose IMUiPoseBC can be referred to the associated prior art, which would not be further discussed herein.
In the embodiments of the disclosure, since the first motion data associated with the host 200 is considered in the procedure for determining the first relative pose IMUiPoseBC in the state prediction process, the first relative pose IMUiPoseBC can be better fused with the visual relative pose visualjPoseBC at the state fusion process for determining the reference motion state . Afterwards, the reference motion state can be used to determine the first relative pose corresponding to the next stage of the state prediction process, and so on. In this case, the pose of the reference object can be determined without considering the relative pose PoseGW.
Therefore, in scenarios with varying relative pose PoseGW and/or environments with few feature points, the pose of the reference object can still be properly determined. In addition, since the proposed method can be performed without considering the relative pose PoseGW, the proposed method can be used to determine the pose of the reference object in the environments with no gravity.
See FIG. 4, which shows an application scenario according to an embodiment of the disclosure. In FIG. 4, the host 200 may be an HMD worn by the user on the simulator 499, the to-be-tracked reference object 410 can be the handheld controller connected with the HMD, and the HMD may be used to provide, for example, VR services to the user.
In the embodiment, the coordinate system G can be the coordinate system of the simulator 499, and the relative pose PoseGW between the coordinate systems G and W would be varying since the simulator 499 would move in response to the user's operations to the VR services provided by the HMD. In this case, the proposed method can still properly work to accurately determine the pose of the reference object 410 even if the relative pose PoseGW is varying.
See FIG. 5, which shows a flow chart of the object tracking method according to an embodiment of the disclosure. The method of this embodiment may be executed by the host 200 in FIG. 2, and the details of each step in FIG. 5 will be described below with the components shown in FIG. 2.
In step S510, the processor 204 determining the reference motion state (e.g., ) based on the predicted motion state (e.g., XIMUj) and the calibration factor (e.g., ΔXj). In step S520, the processor 204 obtains the first motion data (e.g., IMUCi) of the host 200 and the second motion data (e.g., IMUBi) of the reference object. In step S530, the processor 204 determine the first relative pose (e.g., IMUi+1PoseBC) of the reference object relative to the host 200 based on the first motion data, the second motion data, and the reference motion state. In step S540, the processor 204 determines the specific pose (e.g., IMUi+1PoseBG) of the reference object based on the first relative pose.
Details of the steps in FIG. 5 can be referred to the descriptions in the above embodiments, which would not be repeated herein.
To sum up, by considering the motion data associated with the movement of the host, the embodiments of the disclosure provide a solution to properly determine the pose of the to-be-tracked reference object even if the relative pose of the environment relative to the world is varying. Accordingly, the pose of the reference object can be tracked in a novel, flexible, and accurate way.
It will be apparent to those skilled in the art that various modifications and variations can be made to the structure of the present invention without departing from the scope or spirit of the disclosure. In view of the foregoing, it is intended that the present disclosure cover modifications and variations of this invention provided they fall within the scope of the following claims and their equivalents.