The purpose of this study was to improve the accuracy of real-time ego-motion tracking through inertial sensor and vision sensor fusion. Due to low sampling rates supported by web-based vision sensor and accumulation of errors in inertial sensors, ego-motion tracking with vision sensors is commonly afflicted by slow updating rates, while motion tracking with inertial sensor suffers from rapid deterioration in accuracy with time. This paper starts with a discussion of developed algorithms for calibrating two relative rotations of the system using only one reference image. Next, stochastic noises associated with the inertial sensor are identified using Allan Variance analysis, and modeled according to their characteristics. Finally, the proposed models are incorporated into an extended Kalman filter for inertial sensor and vision sensor fusion. Compared with results from conventional sensor fusion models, we have shown that ego-motion tracking can be greatly enhanced using the proposed error correction model.