In this paper, we focus on the problem of motion tracking in unknown environments using
visual and inertial sensors.We term this estimation task visual-inertial odometry (VIO), in analogy
to the well-known visual-odometry problem. We present a detailed study of EKF-based VIO
algorithms, by comparing both their theoretical properties and empirical performance. We show
that an EKF formulation where the state vector comprises a sliding window of poses (the MSCKF
algorithm) attains better accuracy, consistency, and computational efficiency than the SLAM
formulation of the EKF, in which the state vector contains the current pose and the features
seen by the camera. Moreover, we prove that both types of EKF approaches are inconsistent,
due to the way in which Jacobians are computed. Specifically, we show that the observability
properties of the EKF’s linearized system models do not match those of the underlying system,
which causes the filters to underestimate the uncertainty in the state estimates. Based on our
analysis, we propose a novel, real-time EKF-based VIO algorithm, which achieves consistent
estimation by (i) ensuring the correct observability properties of its linearized system model, and
(ii) performing online estimation of the camera-to-IMU calibration parameters. This algorithm,
which we term MSCKF 2.0, is shown to achieve accuracy and consistency higher than even
an iterative, sliding-window fixed-lag smoother, in both Monte-Carlo simulations and real-world
testing.
I
1